# Generalized Additive Models (GAMs)

### **Lienar and Nonlinear Models**

##### **Linear Models (ARIMA and VAR)**
- Make strong assumptions about the relationships between dependent and independent variables
- But they are easily interpretable

##### **Non-linear models**
- Reduce (or eliminate) these assumptions
- But this isoften done at the cost of interpretability

### **Non-Linear Modeling**
Non-linear models can be written generally as
- $y = g(x) + \epsilon$

where $g(.)$ can be any function
- Tremendous flexibility
- Low likelihood of interpretability

### **Generalized Additive Models**
GAMs allow us much of the flexibility of non-linear models, without the difficulty of interpretation.
- Each parameter's effect on the dependent variable is modeled as its own function.
- Since the model is additive, interpretation is straightforward, and parameter effects can be isolated
    - Features are additively seperatable, therefore the slope in each direction is not affected by any other slope
    - Each function corresponds to one and only one exogenous regressor

In a GAM, y is equal to an additive sequence of functions
$$
y = \sum^n_{i=1} f_i(x_i) + \epsilon
$$
For two parameters, this could be expressed as
$$
y = f_1(x1) + f_2(x_2) + \epsilon
$$

### **Non-linearity and Smoothness**
- With GAMs, take care not to overfit your model
- Our true test will be when we fit a model, and use it to make predictions out-of-sample
- In sample, we can never do worse by applying a more complex function form
- Out of sample, excess complexity can ruin our predictions

### GAM Fitting Procedure
If we want to fit an additive model, we need to create a loss function that we can optimize. For one parameter, we need to optimize 
$$
y = a + f(x) + \epsilon
$$

### Choosing GAM Smoothness
In addition to minimizing the SSE term, we need to include a term that will regulate how smooth our function is, penalizing our model for "less smooth" functional forms

In [4]:
from pygam import LinearGAM, s, f, l
import pandas as pd
import patsy as pt
import numpy as np
from plotly import tools
import plotly.offline as py
import plotly.graph_objs as go

data = pd.read_csv('https://github.com/dustywhite7/Econ8310/raw/master/DataSets/HappinessWorld.csv')

In [16]:
eqn = "happiness ~ -1 + freedom + family + year + economy + health + trust"
y, x = pt.dmatrices(eqn, data=data)

# Initialize and fit model
gam = LinearGAM(s(0) + s(1) + s(2) + s(3) + s(4) + s(5))
gam = gam.gridsearch(x, y)

  0% (0 of 11) |                         | Elapsed Time: 0:00:00 ETA:  --:--:--


AttributeError: module 'numpy' has no attribute 'int'

In [10]:
x

DesignMatrix with shape (470, 6)
  freedom   family  year  economy   health    trust
  0.66557  1.34951  2015  1.39651  0.94143  0.41978
  0.62877  1.40223  2015  1.30232  0.94784  0.14145
  0.64938  1.36058  2015  1.32548  0.87464  0.48357
  0.66973  1.33095  2015  1.45900  0.88521  0.36503
  0.63297  1.32261  2015  1.32629  0.90563  0.32957
  0.64169  1.31826  2015  1.29025  0.88911  0.41372
  0.61576  1.28017  2015  1.32944  0.89284  0.31814
  0.65980  1.28907  2015  1.33171  0.91087  0.43844
  0.63938  1.31967  2015  1.25018  0.90837  0.42922
  0.65124  1.30923  2015  1.33358  0.93156  0.35637
  0.41319  1.22393  2015  1.22857  0.91387  0.07785
  0.63376  1.23788  2015  0.95578  0.86027  0.10583
  0.62433  1.29704  2015  1.33723  0.89042  0.18676
  0.48181  0.91451  2015  1.02054  0.81444  0.21312
  0.54604  1.24711  2015  1.39451  0.86179  0.15890
  0.49049  1.23287  2015  0.98124  0.69702  0.17521
  0.61583  1.21963  2015  1.56391  0.91894  0.37798
  0.61777  1.36948  2015  1.335