# Basics to Generalized Additive Models


#### Resources 
* [Generalized Additive Models in Python](https://pygam.readthedocs.io)
* [GAM: The Predictive Modeling Silver Bullet - Kim Larsen](https://multithreaded.stitchfix.com/blog/2015/07/30/gam/)
* [Regularization](https://multithreaded.stitchfix.com/assets/files/gam.pdf)
* [Medium - Getting Started with Generalized Additive Models in Python](https://codeburst.io/pygam-getting-started-with-generalized-additive-models-in-python-457df5b4705f)


### Import Libraries and Load Breast Cancer Dataset

In [4]:
import pandas as pd
from pygam import LogisticGAM
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)[['mean radius', 'mean texture', 'mean perimeter', 'mean area','mean smoothness', 'mean compactness']]
target_df = pd.Series(data.target)
df.describe()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness
count,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813
min,6.981,9.71,43.79,143.5,0.05263,0.01938
25%,11.7,16.17,75.17,420.3,0.08637,0.06492
50%,13.37,18.84,86.24,551.1,0.09587,0.09263
75%,15.78,21.8,104.1,782.7,0.1053,0.1304
max,28.11,39.28,188.5,2501.0,0.1634,0.3454


With the breast cancer dataset we will have:
- 569 obserfations 
- 30 features


## Logistic GAM
Lets see what we can do with building out the model

Logistic GAM - https://pygam.readthedocs.io/en/latest/api/logisticgam.html

In [7]:
X = df[['mean radius', 'mean texture', 'mean perimeter', 'mean area',
        'mean smoothness', 'mean compactness']]
y = target_df

# fit model
gam = LogisticGAM().fit(X, y)
gam

LogisticGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
   fit_intercept=True, max_iter=100, 
   terms=s(0) + s(1) + s(2) + s(3) + s(4) + s(5) + intercept, 
   tol=0.0001, verbose=False)

Lets take a look at some of the summary statistics from the GAM classification model

In [8]:
gam.summary()

LogisticGAM                                                                                               
Distribution:                      BinomialDist Effective DoF:                                     19.4476
Link Function:                        LogitLink Log Likelihood:                                   -54.0256
Number of Samples:                          569 AIC:                                              146.9464
                                                AICc:                                             148.5483
                                                UBRE:                                               2.2856
                                                Scale:                                                 1.0
                                                Pseudo R-Squared:                                   0.8562
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
s(0)                              [0.

 
Please do not make inferences based on these values! 

Collaborate on a solution, and stay up to date at: 
github.com/dswah/pyGAM/issues/163 

  """Entry point for launching an IPython kernel.


In [9]:
gam.accuracy(X, y)

0.9560632688927944