### Generalized Linear Model(GLM)

GLM was an effort by John Nelder and Robert Wedderburn to unify commonly used various statistical models such as linear, logistic and poission etc.

* Family               Description
* Binomial             Target variable is binary response
* Poisson              Target variable is a count of occurance
* Gaussian             Target variable is a continuous number
* Gamma                This distribution occurace when the a waiting time between Poisson distribution events are relevant i.e., number of events occured between two time period. 
* InverseGaussian      The tails of the distribution decrease slower than normal distribution i.e., there is an inverse relationship between the time required to cover a unit distance and distance covered in unit time
* NegativeBinomial     Target variable denotes number of successes in a sequence before a random failure

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# importing linear regression function
import sklearn.linear_model as lm

# function to calculate r-squared, MAE, RMSE
from sklearn.metrics import r2_score , mean_absolute_error, mean_squared_error

%matplotlib inline

### Load data
Lets consider a usecase where we have to predict students test outcome i.e., pass (1) or fail(0) based on hours studied.

In this case the outcome to be predicted is discrete.

In [2]:
# Load data

df = pd.read_csv('Data/Grade_Set_1.csv')
print df

   Hours_Studied  Test_Grade
0              2          57
1              3          66
2              4          73
3              5          76
4              6          79
5              7          81
6              8          90
7              9          96
8             10         100


In [3]:
print('####### Linear Regression Model ########')
# Create linear regression object
lr = lm.LinearRegression()

x= df.Hours_Studied[:, np.newaxis] # independent variable
y= df.Test_Grade.values            # dependent variable 

# Train the model using the training sets
lr.fit(x, y)

print "Intercept: ", lr.intercept_
print "Coefficient: ", lr.coef_

print('\n####### Generalized Linear Model ########')
import statsmodels.api as sm

# To be able to run GLM, we'll have to add the intercept constant to x variable
x = sm.add_constant(x, prepend=False)

# Instantiate a gaussian family model with the default link function.
model = sm.GLM(y, x, family = sm.families.Gaussian())
model = model.fit()
print model.summary()

####### Linear Regression Model ########
Intercept:  49.6777777778
Coefficient:  [ 5.01666667]

####### Generalized Linear Model ########
                 Generalized Linear Model Regression Results                  
Dep. Variable:                      y   No. Observations:                    9
Model:                            GLM   Df Residuals:                        7
Model Family:                Gaussian   Df Model:                            1
Link Function:               identity   Scale:                    5.3626984127
Method:                          IRLS   Log-Likelihood:                -19.197
Date:                Sun, 25 Dec 2016   Deviance:                       37.539
Time:                        21:27:42   Pearson chi2:                     37.5
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------

Note that the coeffeicients are same for both linear regression and GLM. However GLM can be used for other distributions such as binomial, poisson etc by just changing the family parameter.