# Generalized Additive Models (GAMs)

### **Lienar and Nonlinear Models**

##### **Linear Models (ARIMA and VAR)**
- Make strong assumptions about the relationships between dependent and independent variables
- But they are easily interpretable

##### **Non-linear models**
- Reduce (or eliminate) these assumptions
- But this isoften done at the cost of interpretability

### **Non-Linear Modeling**
Non-linear models can be written generally as
- $y = g(x) + \epsilon$

where $g(.)$ can be any function
- Tremendous flexibility
- Low likelihood of interpretability

### **Generalized Additive Models**
GAMs allow us much of the flexibility of non-linear models, without the difficulty of interpretation.
- Each parameter's effect on the dependent variable is modeled as its own function.
- Since the model is additive, interpretation is straightforward, and parameter effects can be isolated
    - Features are additively seperatable, therefore the slope in each direction is not affected by any other slope
    - Each function corresponds to one and only one exogenous regressor

In a GAM, y is equal to an additive sequence of functions
$$
y = \sum^n_{i=1} f_i(x_i) + \epsilon
$$
For two parameters, this could be expressed as
$$
y = f_1(x1) + f_2(x_2) + \epsilon
$$

### **Non-linearity and Smoothness**
- With GAMs, take care not to overfit your model
- Our true test will be when we fit a model, and use it to make predictions out-of-sample
- In sample, we can never do worse by applying a more complex function form
- Out of sample, excess complexity can ruin our predictions

### GAM Fitting Procedure
If we want to fit an additive model, we need to create a loss function that we can optimize. For one parameter, we need to optimize 
$$
y = a + f(x) + \epsilon
$$

### Choosing GAM Smoothness
In addition to minimizing the SSE term, we need to include a term that will regulate how smooth our function is, penalizing our model for "less smooth" functional forms

## **GAM in Python**

In [63]:
from pygam import LinearGAM, s, f, l
import pandas as pd
import patsy as pt
import numpy as np
from plotly import tools
import plotly.offline as py
import plotly.graph_objs as go

In [65]:
data = pd.read_csv('https://github.com/dustywhite7/Econ8310/raw/master/DataSets/HappinessWorld.csv')
eqn = "happiness ~ -1 + freedom + family + year + economy + health + trust"
y, x = pt.dmatrices(eqn, data=data)

# Initialize and fit model
gam = LinearGAM(s(0) + l(1) + s(2) + s(3) + s(4) + s(5))
gam.gridsearch(np.asarray(x), y)

  0% (0 of 11) |                         | Elapsed Time: 0:00:00 ETA:  --:--:--
 18% (2 of 11) |####                     | Elapsed Time: 0:00:00 ETA:  00:00:00
 36% (4 of 11) |#########                | Elapsed Time: 0:00:00 ETA:   0:00:00
 54% (6 of 11) |#############            | Elapsed Time: 0:00:00 ETA:   0:00:00
 72% (8 of 11) |##################       | Elapsed Time: 0:00:00 ETA:   0:00:00
 90% (10 of 11) |#####################   | Elapsed Time: 0:00:00 ETA:   0:00:00
100% (11 of 11) |########################| Elapsed Time: 0:00:00 Time:  0:00:00


LinearGAM(callbacks=[Deviance(), Diffs()], fit_intercept=True, 
   max_iter=100, scale=None, 
   terms=s(0) + l(1) + s(2) + s(3) + s(4) + s(5) + intercept, 
   tol=0.0001, verbose=False)

In [66]:
titles = ['freedom', 'family', 'year', 'economy', 'health','trust']

# Create the subplots in a single-row grid
fig = tools.make_subplots(rows=2, cols=3, subplot_titles=titles)
# Dictate the size of the figure, title, etc.
fig['layout'].update(height=800, width=1000, title='pyGAM', showlegend=False)

# Loop over the titles, and create the corresponding figures
for i, title in enumerate(titles):
    # Create the grid over which to estimate the effect of parameters
    XX = gam.generate_X_grid(term=i)
    # Calculate the value and 95% confidence intervals for each parameter
    # This will become the expected effect on the dependent variable for a given value of x
    pdep, confi = gam.partial_dependence(term=i, width=.95)
    
    # Create the effect and confidence interval traces (there are 3 total)
    trace = go.Scatter(x=XX[:,i], y=pdep, mode='lines', name='Effect')
    ci1 = go.Scatter(x = XX[:,i], y=confi[:,0], line=dict(dash='dash', color='grey'), name='95% CI')
    ci2 = go.Scatter(x = XX[:,i], y=confi[:,1], line=dict(dash='dash', color='grey'), name='95% CI')

    if i<3:
        fig.append_trace(trace, 1, i+1)
        fig.append_trace(ci1, 1, i+1)
        fig.append_trace(ci2, 1, i+1)
    else:
        fig.append_trace(trace, 2, i-2)
        fig.append_trace(ci1, 2, i-2)
        fig.append_trace(ci2, 2, i-2)

#Plot the figure
py.iplot(fig)


plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead



GAM with chicago bus ride data

In [67]:
data = pd.read_csv('https://github.com/dustywhite7/Econ8310/raw/master/DataSets/chicagoBusRiders.csv')
route3 = data[data['route'] == '3'][['date', 'rides']]
route3['date'] = pd.to_datetime(route3['date'], infer_datetime_format=True)
route3['weekday'] = route3['date'].dt.day_of_week
route3['day_year'] = route3['date'].dt.day_of_year
route3['year'] = route3['date'].dt.year

eqn = 'rides ~ -1 + year + weekday + day_year'
y, x = pt.dmatrices(eqn, data=route3)

In [68]:
gam = LinearGAM(s(0) + s(1) + s(2))
gam = gam.gridsearch(np.asarray(x), y)

  0% (0 of 11) |                         | Elapsed Time: 0:00:00 ETA:  --:--:--
  9% (1 of 11) |##                       | Elapsed Time: 0:00:00 ETA:  00:00:00
 18% (2 of 11) |####                     | Elapsed Time: 0:00:00 ETA:   0:00:00
 27% (3 of 11) |######                   | Elapsed Time: 0:00:00 ETA:   0:00:00
 36% (4 of 11) |#########                | Elapsed Time: 0:00:00 ETA:   0:00:00
 45% (5 of 11) |###########              | Elapsed Time: 0:00:00 ETA:   0:00:00
 54% (6 of 11) |#############            | Elapsed Time: 0:00:00 ETA:   0:00:00
 63% (7 of 11) |###############          | Elapsed Time: 0:00:00 ETA:   0:00:00
 72% (8 of 11) |##################       | Elapsed Time: 0:00:00 ETA:   0:00:00
 81% (9 of 11) |####################     | Elapsed Time: 0:00:00 ETA:   0:00:00
 90% (10 of 11) |#####################   | Elapsed Time: 0:00:00 ETA:   0:00:00
100% (11 of 11) |########################| Elapsed Time: 0:00:01 Time:  0:00:01


In [69]:
titles = ['year', 'weekday', 'day_year']

# Create the subplots in a single-row grid
fig = tools.make_subplots(rows=1, cols=3, subplot_titles=titles)
# Dictate the size of the figure, title, etc.
fig['layout'].update(height=400, width=1000, title='pyGAM', showlegend=False)

# Loop over the titles, and create the corresponding figures
for i, title in enumerate(titles):
    # Create the grid over which to estimate the effect of parameters
    XX = gam.generate_X_grid(term=i)
    # Calculate the value and 95% confidence intervals for each parameter
    # This will become the expected effect on the dependent variable for a given value of x
    pdep, confi = gam.partial_dependence(term=i, width=.95)
    
    # Create the effect and confidence interval traces (there are 3 total)
    trace = go.Scatter(x=XX[:,i], y=pdep, mode='lines', name='Effect')
    ci1 = go.Scatter(x = XX[:,i], y=confi[:,0], line=dict(dash='dash', color='grey'), name='95% CI')
    ci2 = go.Scatter(x = XX[:,i], y=confi[:,1], line=dict(dash='dash', color='grey'), name='95% CI')

    if i<3:
        fig.append_trace(trace, 1, i+1)
        fig.append_trace(ci1, 1, i+1)
        fig.append_trace(ci2, 1, i+1)
    else:
        fig.append_trace(trace, 2, i-2)
        fig.append_trace(ci1, 2, i-2)
        fig.append_trace(ci2, 2, i-2)

#Plot the figure
py.iplot(fig)


plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead



Use model to make a forecast

In [70]:
new = [
    [2010, 6, 256],
    [2015, 2, 157],
    [2013, 7, 361],
    [2016, 4, 12]
]
gam.predict(new)

array([11924.70476988, 19612.09653813,   325.75895394, 17507.52236945])