# Binomial Regression and Logistic Regression
Test how a binomial regression is related to a logistic regression.

In [4]:
import pandas as pd
import numpy as np

dat = pd.read_csv('/Users/okada/myWork/R/kubobook_2012/glmm/data_trunc.csv')
dat.head(10)

Unnamed: 0,N,y,x,id
0,8,0,2,1
1,8,2,3,21
2,8,6,4,41
3,8,7,5,61
4,8,1,6,81


In [7]:
dat['f'] = dat.N - dat.y
dat.head(10)

Unnamed: 0,N,y,x,id,f
0,8,0,2,1,8
1,8,2,3,21,6
2,8,6,4,41,2
3,8,7,5,61,1
4,8,1,6,81,7


In [6]:
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [56]:
formula_binom = 'y + f ~  x'
glm_binom = smf.glm(formula=formula_binom, data=dat, family=sm.families.Binomial())
res_binom = glm_binom.fit()
res_binom.summary()

0,1,2,3
Dep. Variable:,"['y', 'f']",No. Observations:,5
Model:,GLM,Df Residuals:,3
Model Family:,Binomial,Df Model:,1
Link Function:,logit,Scale:,1.0000
Method:,IRLS,Log-Likelihood:,-14.796
Date:,"Sun, 14 Jul 2019",Deviance:,21.186
Time:,11:38:02,Pearson chi2:,18.5
No. Iterations:,4,Covariance Type:,nonrobust

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-1.9522,1.060,-1.842,0.065,-4.029,0.125
x,0.3796,0.242,1.565,0.118,-0.096,0.855


In [57]:
res_binom.predict(dat[['x']])

0    0.232707
1    0.307139
2    0.393178
3    0.486399
4    0.580577
dtype: float64

In [61]:
1/(1+np.exp(-(-1.9522+0.3796*np.array([2,3,4,5,6]))))

array([ 0.23272282,  0.30716645,  0.39321929,  0.48645332,  0.58063971])

The intercept and the coefficient is -1.95 and 0.38. We try to get the same result from the logistic regression. To do this, we need to tranform the format of the data. Rather than aggregated, we need to convert it to 0/1 label data set.

In [10]:
dat.head(10)

Unnamed: 0,N,y,x,id,f
0,8,0,2,1,8
1,8,2,3,21,6
2,8,6,4,41,2
3,8,7,5,61,1
4,8,1,6,81,7


In [66]:
y = [0,0,0,0,0,0,0,0,
     1,1,0,0,0,0,0,0,
     1,1,1,1,1,1,0,0,
     1,1,1,1,1,1,1,0,
     1,0,0,0,0,0,0,0]
x = np.array([[2]*8, [3]*8,[4]*8,[5]*8,[6]*8]).reshape(-1)
dat1 = pd.DataFrame({'y':y, 'x':x})
dat1.to_csv('/Users/okada/myWork/R/kubobook_2012/glmm/data_trunc_non_aggregate.csv', index=False)
dat1.head(10)

Unnamed: 0,x,y
0,2,0
1,2,0
2,2,0
3,2,0
4,2,0
5,2,0
6,2,0
7,2,0
8,3,1
9,3,1


In [45]:
dat1[['x']].shape

(40, 1)

In [47]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(dat1[['x']], dat1['y'])

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [49]:
lr.intercept_

array([-0.94407303])

In [50]:
lr.coef_

array([[ 0.15806531]])

In [55]:
lr.predict_proba(dat1[['x']])[:,1]

array([ 0.34797724,  0.34797724,  0.34797724,  0.34797724,  0.34797724,
        0.34797724,  0.34797724,  0.34797724,  0.38464533,  0.38464533,
        0.38464533,  0.38464533,  0.38464533,  0.38464533,  0.38464533,
        0.38464533,  0.42267256,  0.42267256,  0.42267256,  0.42267256,
        0.42267256,  0.42267256,  0.42267256,  0.42267256,  0.46163891,
        0.46163891,  0.46163891,  0.46163891,  0.46163891,  0.46163891,
        0.46163891,  0.46163891,  0.5010797 ,  0.5010797 ,  0.5010797 ,
        0.5010797 ,  0.5010797 ,  0.5010797 ,  0.5010797 ,  0.5010797 ])

The results of a binomial regression is different from a logistic regression.