<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#GA2M" data-toc-modified-id="GA2M-1">GA2M</a></span><ul class="toc-item"><li><span><a href="#Linear-Model-Problems-and-how-to-fix?" data-toc-modified-id="Linear-Model-Problems-and-how-to-fix?-1.1">Linear Model Problems and how to fix?</a></span></li><li><span><a href="#Fix-1:-Generalized-Linear-Model-(GLM)" data-toc-modified-id="Fix-1:-Generalized-Linear-Model-(GLM)-1.2">Fix 1: Generalized Linear Model (GLM)</a></span></li><li><span><a href="#Fix-2:-Add-interactive-terms-manually" data-toc-modified-id="Fix-2:-Add-interactive-terms-manually-1.3">Fix 2: Add interactive terms <em>manually</em></a></span></li><li><span><a href="#Fix-3:-Add-Non-Linear-effect" data-toc-modified-id="Fix-3:-Add-Non-Linear-effect-1.4">Fix 3: Add Non-Linear effect</a></span></li><li><span><a href="#Improve-of-GAM:-GA2M" data-toc-modified-id="Improve-of-GAM:-GA2M-1.5">Improve of GAM: GA2M</a></span></li><li><span><a href="#Reference" data-toc-modified-id="Reference-1.6">Reference</a></span></li></ul></li><li><span><a href="#When-to-use-GAM?" data-toc-modified-id="When-to-use-GAM?-2">When to use GAM?</a></span></li><li><span><a href="#When-to-use-GA2M?" data-toc-modified-id="When-to-use-GA2M?-3">When to use GA2M?</a></span></li><li><span><a href="#When-not-to-use-GA2M?" data-toc-modified-id="When-not-to-use-GA2M?-4">When not to use GA2M?</a></span></li><li><span><a href="#Demo:-interpret-library" data-toc-modified-id="Demo:-interpret-library-5">Demo: <code>interpret</code> library</a></span></li></ul></div>

# GA2M

## Linear Model Problems and how to fix?

$$
y=\beta_{0}+\beta_{1}x_{1}+\ldots+\beta_{p}x_{p}+\epsilon
$$

Assumptions of linear model:

1. Target is Normal Distribution.
2. Additivity (no feature interaction).
3. Linear Relationship between target and features.

## Fix 1: Generalized Linear Model (GLM)

Because target is not Normal Distribution, for example: Poisson distribution. We just link the prediction of linear to Poisson distribution. How?

- Get average target given current features value: $E_Y(y|x)$
- Link this value to target distribution (e.g. Poisson): $g(E_Y(y|x))$. $g$ is the link function.
- Learn the link function with linear model.

Then we have GLM.

$$
g(E_Y(y|x))=\beta_0+\beta_1{}x_{1}+\ldots{}\beta_p{}x_{p}
$$

## Fix 2: Add interactive terms *manually*

There are several ways to add interaction:
- Just Cross features together: $x1.x2$, $x1.x2.x3$
- Add polynomial features: $x^2$, $x^3$

Heuristics and labor works!


## Fix 3: Add Non-Linear effect

There are several ways:

- Transform features so it has linear relationship with features. We can apply log(), exp(), sqrt(), etc. on each feature. If we are luck, we can achieve a nice linear relationship. For example, if we apply a sqrt() on feature A, then to explain weight of feature A now is harder than before: 1 change in sqrt() of A will change target by an amount weight.
- Categorize feature into buckets -> one-hot-encoding. Why this works? Why this is not a good idea at all? (left as excercise).
- Use Generalized Addictive Models (GAM)

Lets talk about GAM. Idea is simple: We replace $\beta_i{}x_i$ by $f_i(x_i)$. This new function will learn the non-linear pattern in feature x.

$$
g(E_Y(y|x))=\beta_0+f_1(x_{1})+f_2(x_{2})+\ldots+f_p(x_{p})
$$

- In GAM, $f$ is spline function. Spline functions is list of functions when combine together will create a curve.
- Each function in $f$ is a polynomial.
- In below pictures, to estimate the non-linear curve between target and temperature, we use a spline function which has 4 polynomials.
- Each spline function add value to final prediction. That is where "addictive" comes in.

<img src="https://christophm.github.io/interpretable-ml-book/images/splines-1.png" width="600" height="600">

GAM is able to capture Non-Linear relationship and works on any outcome distribution. It is complicated but it is still a glass-box model. We can interpret each feature effect independently.

## Improve of GAM: GA2M

GA2M add interaction of features into GAM. 

<img src="https://blog.fiddler.ai/wp-content/uploads/2019/06/ga2m_eq-1200x188.png" width="400" width="400">

- Interactions between any pair of 2 features will be another addictive term in GAM equation.
- It is impractical to find all interactions.
- A better way is to incorporate top N interactions to final model.

Now, linear model is more complicated, but it is still a glassbox.

## Reference

- [A gentle introduction to GA2Ms, a white box model](https://blog.fiddler.ai/2019/06/a-gentle-introduction-to-ga2ms-a-white-box-model/)
- [Interpretable Machine Learning, chapter 4.3.](https://christophm.github.io/interpretable-ml-book/)

# When to use GAM?

When we want a glassbox model and it is significantly better than Linear model.

# When to use GA2M?

When we want a glassbox model and it is significantly better than GA2M model.

# When not to use GA2M?

When we can use a blackbox model and blackbox model is far more accurate than GA2M.

# Demo: `interpret` library

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from tabkit import opendata

In [2]:
Xtrain, Xtest, ytrain, ytest = opendata.ToyRegression().load()

In [3]:
ytrain.mean(), ytrain.std()

(22.907915567282323, 9.429546081039085)

In [4]:
from interpret import show
from interpret.data import Marginal

marginal = Marginal().explain_data(Xtrain, ytrain, name = 'Train Data')
show(marginal)

In [5]:
# GA2M model
from interpret.glassbox import ExplainableBoostingRegressor


# Pick top 3 interaction
# Set interacton=0 to use GAM
ebm = ExplainableBoostingRegressor(random_state=0, interactions=3)
ebm.fit(Xtrain, ytrain)

ExplainableBoostingRegressor(feature_names=['CRIM', 'ZN', 'INDUS', 'CHAS',
                                            'NOX', 'RM', 'AGE', 'DIS', 'RAD',
                                            'TAX', 'PTRATIO', 'B', 'LSTAT',
                                            'RM x TAX', 'INDUS x LSTAT',
                                            'DIS x LSTAT'],
                             feature_types=['continuous', 'continuous',
                                            'continuous', 'categorical',
                                            'continuous', 'continuous',
                                            'continuous', 'continuous',
                                            'continuous', 'continuous',
                                            'continuous', 'continuous',
                                            'continuous', 'pairwise',
                                            'pairwise', 'pairwise'],
                             interactions=3, random_state=0)

In [6]:
from interpret.perf import RegressionPerf


ebm_perf = RegressionPerf(ebm.predict).explain_perf(Xtest, ytest, name='EBM')
show(ebm_perf)

In [7]:
ebm_global = ebm.explain_global(name='EBM')
show(ebm_global)

In [8]:
ebm_local = ebm.explain_local(Xtest[:5], ytest[:5], name='EBM')
show(ebm_local)