## Generalized Linear Models
Based on: https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/glm.ipynb

In [None]:
import numpy as np
import statsmodels.api as sm
from scipy import stats
from matplotlib import pyplot as plt

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

### GLM: Binomial response data

#### Load data

 In this example, we will examine how to use GLMs with an example using the Star98 dataset which was taken with permission from Jeff Gill (2000) Generalized linear models: A unified approach. Codebook
 information can be obtained by typing: 

In [None]:
print(sm.datasets.star98.NOTE)

Load the data and add a constant to the exogenous (independent) variables:

In [None]:
data = sm.datasets.star98.load()
data.exog = sm.add_constant(data.exog, prepend=False)

 The dependent variable is N by 2 (Success: NABOVE, Failure: NBELOW). We will try to predict values for NABOVE and NBELOW based on the values of all the other (independent) variables. Here are the first 5 rows of the dependent variables:

In [None]:
print(data.endog[:5,:])

 The independent variables include all the other variables described above, as
 well as the interaction terms. Let's examine the first 2 rows:

In [None]:
print(data.exog[:2,:])

### Fit and summary

Let's fit a Generalized Linear Model and examine the results. We'll compare it to the results we get from OLS models.

In [None]:
glm_binom = sm.GLM(data.endog, data.exog, family=sm.families.Binomial())
res = glm_binom.fit()
print(res.summary())

There are 2 dependent/exogenous variables (outcomes): y1 and y2. There are 20 independent/endogenous variables (inputs): x1 through x20. We have calculated coefficients for the independent variables, along with statistics for all of those coefficients. The const coefficient is like the y-intercept from linear regression.

Just as we have done before, let's examine a residual plot.

In [None]:
fig, ax = plt.subplots()
y = data.endog[:,0]/data.endog.sum(1)

ax.scatter(y, res.resid_pearson)
ax.hlines(0, 0, 1)
ax.set_xlim(0, 1)
ax.set_title('Residual Plot')
ax.set_ylabel('Pearson Residuals')
ax.set_xlabel('y values')
plt.show()

Interpret this residual plot. Do you see what you would normally want to see in a residual plot?