This Jupyter notebook assumes that the R kernel for Jupyter (IRkernel) has been installed; see
https://irkernel.github.io/installation/

# R code for logistic regression analysis of bar-profile types and B/P bulges

## Requirements

This notebook is meant to be run within the full **barprofiles_paper** repository, including the associated data files.

## Setup

Set the following so that it points to the directory with the (text) data files:

In [1]:
basedir <- "/Users/erwin/Documents/Working/Papers/Paper-BarProfiles/public/data/"

In [2]:
getwd()

## Bar profiles: Presence of P+Sh profile

### Full sample: Logistic regression for single variable: stellar mass, Hubble type, gas fraction

Logistic regression for fraction of barred spirals with Peak+Shoulders (P+Sh) bar profile as a function of stellar mass $\log (M_{\star} / M_{\odot})$, Hubble type $T$, and neutral gas mass fraction $f{\rm gas} = M_{\rm HI} / M_{\star}$, for the combined sample of 181 galaxies.

Load data into table.

In [3]:
ff1 <- paste(basedir, "PSh_profile-vs-stuff.dat", sep="")
theTable_profs1 <- read.table(ff1, header=TRUE)

Standard linear logistic regression: P+Sh fraction versus log of stellar mass, Hubble type, and gas mass fraction

In [4]:
thefit1a = glm(PSh_profile_both ~ logMstar, family = binomial, data = theTable_profs1)
thefit1b = glm(PSh_profile_both ~ t_leda, family = binomial, data = theTable_profs1)
thefit1c = glm(PSh_profile_both ~ logfgas, family = binomial, data = theTable_profs1)
summary(thefit1a)
summary(thefit1b)
summary(thefit1c)


Call:
glm(formula = PSh_profile_both ~ logMstar, family = binomial, 
    data = theTable_profs1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1046  -0.3592  -0.1079   0.3346   2.6234  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -50.7609     7.9287  -6.402 1.53e-10 ***
logMstar      5.0024     0.7836   6.384 1.73e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 225.52  on 180  degrees of freedom
Residual deviance: 103.34  on 179  degrees of freedom
AIC: 107.34

Number of Fisher Scoring iterations: 6



Call:
glm(formula = PSh_profile_both ~ t_leda, family = binomial, data = theTable_profs1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3603  -0.5992  -0.2311   0.4956   2.6967  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.9505     0.5453   5.411 6.26e-08 ***
t_leda       -0.7628     0.1121  -6.806 1.00e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 225.52  on 180  degrees of freedom
Residual deviance: 136.28  on 179  degrees of freedom
AIC: 140.28

Number of Fisher Scoring iterations: 5



Call:
glm(formula = PSh_profile_both ~ logfgas, family = binomial, 
    data = theTable_profs1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2640  -0.6476  -0.3581   0.6762   1.8716  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.8329     0.4051  -6.992 2.70e-12 ***
logfgas      -2.4910     0.4030  -6.180 6.39e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 225.52  on 180  degrees of freedom
Residual deviance: 159.47  on 179  degrees of freedom
AIC: 163.47

Number of Fisher Scoring iterations: 5


### B/P-detection subsample: Logistic regression for single variable: stellar mass, rotation velocity, Hubble type, gas fraction

Logistic regression for fraction of barred spirals with Peak+Shoulders (P+Sh) bar profile in the **B/P-detection subsample** (131 galaxies) as a function of stellar mass $\log (M_{\star} / M_{\odot})$, gas rotation velocity ($V_{rm rot}$), Hubble type $T$, and neutral gas mass fraction $f{\rm gas} = M_{\rm HI} / M_{\star}$.

Load data into table.

In [5]:
ff2 <- paste(basedir, "PSh_profile-vs-stuff_modinc.dat", sep="")
theTable_profs2 <- read.table(ff2, header=TRUE)

Standard linear logistic regression, now also including regression versus gas rotation velocity

In [6]:
thefit2a = glm(PSh_profile_both ~ logVrot, family = binomial, data = theTable_profs2)
thefit2b = glm(PSh_profile_both ~ logMstar, family = binomial, data = theTable_profs2)
thefit2c = glm(PSh_profile_both ~ t_leda, family = binomial, data = theTable_profs2)
thefit2d = glm(PSh_profile_both ~ logfgas, family = binomial, data = theTable_profs2)
summary(thefit2a)
summary(thefit2b)
summary(thefit2c)
summary(thefit2d)


Call:
glm(formula = PSh_profile_both ~ logVrot, family = binomial, 
    data = theTable_profs2)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.78008  -0.32007  -0.09048   0.11931   2.87568  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -43.028      8.293  -5.188 2.12e-07 ***
logVrot       20.355      3.953   5.150 2.61e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 157.782  on 130  degrees of freedom
Residual deviance:  62.398  on 129  degrees of freedom
AIC: 66.398

Number of Fisher Scoring iterations: 7



Call:
glm(formula = PSh_profile_both ~ logMstar, family = binomial, 
    data = theTable_profs2)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.94581  -0.27808  -0.08522   0.21571   2.79211  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -55.966     10.888   -5.14 2.74e-07 ***
logMstar       5.503      1.073    5.13 2.89e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 157.782  on 130  degrees of freedom
Residual deviance:  65.275  on 129  degrees of freedom
AIC: 69.275

Number of Fisher Scoring iterations: 7



Call:
glm(formula = PSh_profile_both ~ t_leda, family = binomial, data = theTable_profs2)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2758  -0.6310  -0.2694   0.5325   2.5824  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.7215     0.6332   4.298 1.72e-05 ***
t_leda       -0.7000     0.1234  -5.673 1.40e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 157.78  on 130  degrees of freedom
Residual deviance: 102.20  on 129  degrees of freedom
AIC: 106.2

Number of Fisher Scoring iterations: 5



Call:
glm(formula = PSh_profile_both ~ logfgas, family = binomial, 
    data = theTable_profs2)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1321  -0.6611  -0.3862   0.6991   1.8407  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.6407     0.4419  -5.975 2.30e-09 ***
logfgas      -2.2517     0.4460  -5.048 4.45e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 157.78  on 130  degrees of freedom
Residual deviance: 117.51  on 129  degrees of freedom
AIC: 121.51

Number of Fisher Scoring iterations: 5


### Full sample (galaxies with $g - r$ only): Logistic regression for single variable: stellar mass, Hubble type, gas fraction, $g - r$

In [11]:
ff4 <- paste(basedir, "PSh_profile-vs-stuff_gmr_modinc.dat", sep="")
theTable_profs4 <- read.table(ff4, header=TRUE)

Standard linear logistic regression: P+Sh fraction versus all variables

In [12]:
thefit4a = glm(PSh_profile_both ~ logMstar, family = binomial, data = theTable_profs4)
thefit4b = glm(PSh_profile_both ~ t_leda, family = binomial, data = theTable_profs4)
thefit4c = glm(PSh_profile_both ~ logfgas, family = binomial, data = theTable_profs4)
thefit4d = glm(PSh_profile_both ~ gmr_sga_tc, family = binomial, data = theTable_profs4)
summary(thefit4a)
summary(thefit4b)
summary(thefit4c)
summary(thefit4d)



Call:
glm(formula = PSh_profile_both ~ logMstar, family = binomial, 
    data = theTable_profs4)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9174  -0.2953  -0.0891   0.2156   2.7293  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -53.540     10.931  -4.898 9.68e-07 ***
logMstar       5.265      1.077   4.887 1.02e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 141.029  on 116  degrees of freedom
Residual deviance:  58.399  on 115  degrees of freedom
AIC: 62.399

Number of Fisher Scoring iterations: 7



Call:
glm(formula = PSh_profile_both ~ t_leda, family = binomial, data = theTable_profs4)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3723  -0.6015  -0.2437   0.4866   2.6575  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.9780     0.7073   4.211 2.55e-05 ***
t_leda       -0.7534     0.1392  -5.412 6.23e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 141.03  on 116  degrees of freedom
Residual deviance:  86.33  on 115  degrees of freedom
AIC: 90.33

Number of Fisher Scoring iterations: 5



Call:
glm(formula = PSh_profile_both ~ logfgas, family = binomial, 
    data = theTable_profs4)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2189  -0.6322  -0.3495   0.5624   1.8518  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.8087     0.4969  -5.652 1.59e-08 ***
logfgas      -2.5316     0.5207  -4.862 1.16e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 141.03  on 116  degrees of freedom
Residual deviance: 100.55  on 115  degrees of freedom
AIC: 104.55

Number of Fisher Scoring iterations: 5



Call:
glm(formula = PSh_profile_both ~ gmr_sga_tc, family = binomial, 
    data = theTable_profs4)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.0745  -0.4315  -0.2124   0.3706   2.4222  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -12.155      2.086  -5.826 5.67e-09 ***
gmr_sga_tc    19.791      3.513   5.634 1.76e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 141.029  on 116  degrees of freedom
Residual deviance:  70.155  on 115  degrees of freedom
AIC: 74.155

Number of Fisher Scoring iterations: 6


### B/P-detection subsample (galaxies with $g - r$ and $A_{\rm 2, max}$ only): Logistic regression for single variable: stellar mass, Hubble type, gas fraction, $g - r$, $A_{\rm 2, max}$

In [8]:
ff3 <- paste(basedir, "PSh_profile-vs-stuff_gmr+a2max.dat", sep="")
theTable_profs3 <- read.table(ff3, header=TRUE)

Standard linear logistic regression: P+Sh fraction versus all variables

In [10]:
thefit3a = glm(PSh_profile_both ~ logMstar, family = binomial, data = theTable_profs3)
thefit3b = glm(PSh_profile_both ~ t_leda, family = binomial, data = theTable_profs3)
thefit3c = glm(PSh_profile_both ~ logfgas, family = binomial, data = theTable_profs3)
thefit3d = glm(PSh_profile_both ~ gmr_sga_tc, family = binomial, data = theTable_profs3)
thefit3e = glm(PSh_profile_both ~ A2_max, family = binomial, data = theTable_profs3)
summary(thefit3a)
summary(thefit3b)
summary(thefit3c)
summary(thefit3d)
summary(thefit3e)


Call:
glm(formula = PSh_profile_both ~ logMstar, family = binomial, 
    data = theTable_profs3)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1551  -0.4323  -0.1130   0.5149   2.4715  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -47.9261     8.6856  -5.518 3.43e-08 ***
logMstar      4.7454     0.8594   5.522 3.35e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 166.408  on 123  degrees of freedom
Residual deviance:  81.744  on 122  degrees of freedom
AIC: 85.744

Number of Fisher Scoring iterations: 6



Call:
glm(formula = PSh_profile_both ~ t_leda, family = binomial, data = theTable_profs3)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.6035  -0.6149  -0.1916   0.5876   2.7706  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   3.6140     0.7278   4.966 6.85e-07 ***
t_leda       -0.8640     0.1542  -5.604 2.09e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 166.408  on 123  degrees of freedom
Residual deviance:  98.178  on 122  degrees of freedom
AIC: 102.18

Number of Fisher Scoring iterations: 5



Call:
glm(formula = PSh_profile_both ~ logfgas, family = binomial, 
    data = theTable_profs3)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4102  -0.7231  -0.3617   0.7287   1.7626  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.6999     0.4971  -5.432 5.58e-08 ***
logfgas      -2.7109     0.5274  -5.140 2.75e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 166.41  on 123  degrees of freedom
Residual deviance: 117.72  on 122  degrees of freedom
AIC: 121.72

Number of Fisher Scoring iterations: 5



Call:
glm(formula = PSh_profile_both ~ gmr_sga_tc, family = binomial, 
    data = theTable_profs3)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4564  -0.4712  -0.1941   0.3618   2.3410  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -13.336      2.248  -5.933 2.98e-09 ***
gmr_sga_tc    22.748      3.907   5.823 5.78e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 166.408  on 123  degrees of freedom
Residual deviance:  82.182  on 122  degrees of freedom
AIC: 86.182

Number of Fisher Scoring iterations: 6



Call:
glm(formula = PSh_profile_both ~ A2_max, family = binomial, data = theTable_profs3)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9367  -0.8952  -0.6950   1.1486   1.8570  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.1712     0.5053  -4.297 1.73e-05 ***
A2_max        4.0714     1.0930   3.725 0.000195 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 166.41  on 123  degrees of freedom
Residual deviance: 148.66  on 122  degrees of freedom
AIC: 152.66

Number of Fisher Scoring iterations: 4
