This Jupyter notebook assumes that the R kernel for Jupyter (IRkernel) has been installed; see
https://irkernel.github.io/installation/

It also requires the survey package: 
https://cran.r-project.org/package=survey

## Preliminary Setup

In [1]:
library(survey)

Loading required package: grid
Loading required package: Matrix
Loading required package: survival

Attaching package: ‘survey’

The following object is masked from ‘package:graphics’:

    dotchart



Set the following so that it points to the directory with the (text) data files:

In [2]:
basedir <- "./data/"

## Weighted logistic regression for Sample 1: log(M_star) alone

Logistic regression for fraction of galaxies with bars as a function of stellar mass $\log (M_{\star} / M_{\odot})$, using S4G galaxies in Sample 1 (spirals at $D \leq 25$ Mpc) with stellar
masses between $\log M_{\star} = 8.5$ and 11, with $V/V_{\rm max}$ weighting to account for S4G angular diameter limit.

Load data into table and then Survey-package design object

In [3]:
ff <- paste(basedir, "barpresence_vs_logmstar_for_R_w25_m8.5-11.txt", sep="")
logmstarBarWTable <- read.table(ff, header=TRUE)
logmstarBarWDesign <- svydesign(ids=~0, data=logmstarBarWTable, weights=~weight)
length(logmstarBarWTable$bar)

Standard linear logistic regression: bar fraction versus log of stellar mass

In [4]:
logMstarWFit1 <- svyglm(bar ~ logmstar, design=logmstarBarWDesign, family=quasibinomial)
summary(logMstarWFit1)


Call:
svyglm(formula = bar ~ logmstar, design = logmstarBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstarBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.4487     1.8675  -0.776    0.438
logmstar      0.1934     0.1897   1.019    0.308

(Dispersion parameter for quasibinomial family taken to be 1.00235)

Number of Fisher Scoring iterations: 4


Quadratic linear logistic regression: bar fraction versus log of stellar mass + square of same

In [5]:
logMstarWFit2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=logmstarBarWDesign, family=quasibinomial)
summary(logMstarWFit2)


Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2), design = logmstarBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstarBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -83.7853    25.6579  -3.265 0.001160 ** 
logmstar       17.3692     5.2634   3.300 0.001028 ** 
I(logmstar^2)  -0.8911     0.2690  -3.313 0.000984 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.00151)

Number of Fisher Scoring iterations: 4


### Comparison of AIC values

In [6]:
AIC(logMstarWFit1)
AIC(logMstarWFit2)

In [7]:
747.73 - 762.586

#### Summary

Since the quadratic fit has $\Delta$AIC $= -14.9$ relative to the linear fit, it is clearly preferred.

## Weighted logistic regression for Sample 1: f(bar) vs log(M_star) and g-r

Same as previous section, but now we do logistic regression versus both stellar mass and $g - r$ color, using a subsample
of Sample 1 galaxies with color data.

In [8]:
ff <- paste(basedir, "barpresence_vs_logmstar-gmr_for_R_w25.txt", sep="")
logmstargmrBarWTable <- read.table(ff, header=TRUE)
gmrBarWDesign <- svydesign(ids=~0, data=logmstargmrBarWTable, weights=~weight)
length(logmstargmrBarWTable$bar)

### Linear fit of $f_{\rm bar}$ vs just $g - r$

In [9]:
gmrWFit_gmr <- svyglm(bar ~ gmr, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit1)

ERROR: Error in summary(gmrWFit1): object 'gmrWFit1' not found


### Fit vs just logMstar for same sample: linear, then quadratic

In [10]:
# same sample, vs logmstar (linear) only
gmrWFit_logmstar <- svyglm(bar ~ logmstar, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit2a)

ERROR: Error in summary(gmrWFit2a): object 'gmrWFit2a' not found


In [11]:
# same sample, vs logmstar (quadratic) only
gmrWFit_logmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit2b)

ERROR: Error in summary(gmrWFit2b): object 'gmrWFit2b' not found


### Finally, fit vs logMstar (quadratic) *and* g-r

In [12]:
gmrWFit_gmrlogmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2) + gmr, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit3)

ERROR: Error in summary(gmrWFit3): object 'gmrWFit3' not found


### Comparison of AIC values

In [13]:
AIC(gmrWFit_gmr)
AIC(gmrWFit_logmstar)
AIC(gmrWFit_logmstar2)
AIC(gmrWFit_gmrlogmstar2)

#### Summary

Best fit from AIC standpoint is quadratic logMstar (*without* $g - r$) -- note that its AIC is actually *lower*
than the AIC for the quadric logMstar + $g - r$ fit.

## Weighted logistic regression for Sample 1: f(bar) vs log(M_star) and log(f_gas)

Same as previous section, but now we do logistic regression versus both log of stellar mass and log of gas mass ratio $f{\rm gas} = M_{\rm HI} / M_{\star}$, using a subsample
of Sample 1 galaxies with H I data.

In [14]:
basedir <- "/Users/erwin/Documents/Working/Projects/Project_BarSizes/"
ff <- paste(basedir, "barpresence_vs_logmstar-logfgas_for_R_w25.txt", sep="")
logMstarfgasBarWTable <- read.table(ff, header=TRUE)
logMstarfgasBarWDesign <- svydesign(ids=~0, data=logMstarfgasBarWTable, weights=~weight)
length(logMstarfgasBarWTable$bar)

### Fit vs just log(f_gas)

In [15]:
logMstarlogfgasWFit_fgas <- svyglm(bar ~ logfgas, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit1)

ERROR: Error in summary(logMstarlogfgasWFit1): object 'logMstarlogfgasWFit1' not found


### Fit vs just logMstar: linear, then quadratic

In [16]:
logMstarlogfgasWFit_logmstar <- svyglm(bar ~ logmstar, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit2a)

ERROR: Error in summary(logMstarlogfgasWFit2a): object 'logMstarlogfgasWFit2a' not found


In [17]:
logMstarlogfgasWFit_logmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit2b)

ERROR: Error in summary(logMstarlogfgasWFit2b): object 'logMstarlogfgasWFit2b' not found


### Finally, fit vs logMstar (quadratic) *and* log(f_gas)

In [18]:
logMstarlogfgasWFit_fgaslogmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2) + logfgas, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit3)

ERROR: Error in summary(logMstarlogfgasWFit3): object 'logMstarlogfgasWFit3' not found


### Comparison of AIC values

In [19]:
AIC(logMstarlogfgasWFit_fgas)
AIC(logMstarlogfgasWFit_logmstar)
AIC(logMstarlogfgasWFit_logmstar2)
AIC(logMstarlogfgasWFit_fgaslogmstar2)

#### Summary

The quadratic fit using logMstar (without log f_gas) is clearly the best model.