#### Introduction to Statistical Learning, Lab 3.3

# Interactions & Non-linear Transformations

We often want to include interaction terms and non-linear transformations of the predictors in our model. This is fully supported by the formula mini language.


  - [statsmodels documentation](https://www.statsmodels.org/stable/)
  - [statsmodels formula interface](https://www.statsmodels.org/stable/example_formulas.html)
  - [the formula mini language](https://patsy.readthedocs.io/en/latest/formulas.html#the-formula-language)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
from islpy import datasets
sns.set()
%matplotlib inline

#### Data Set

We use the `Boston` data set to demonstrate multiple linear regression.

In [None]:
boston = datasets.Boston()
boston.head()

#### Model Specification & Fit

The `smf.ols()` function builds a statistical *model* prepared for fitting with *ordinary least squares* (ols). This is the type of fit explained in detail in the lecture.

The syntax to use interaction terms is `y~x1:x2`. This will include a term corresponding to $x_1\times x_2$ in the model.

There is a shorthand notation for including an interaction term and the predictors themselves: `y~x1*x2`. This is equivalent to `y~x1+x2+x1:x2`.

As in the simple regression with one predictor, a constant term for the intercept is added automatically.

The formula `medv~lstat*age` means we are using `lstat`, `age` and the interaction term `lstat`$\times$`age` as our predictors and `medv` as our dependent variable:

$$ \mathrm{medv} = \beta_0 + \beta_1 \mathrm{lstat} + \beta_2 \mathrm{age} + \beta_3 \mathrm{lstat}\times\mathrm{age}$$

In [None]:
model = smf.ols(formula='medv~lstat*age', data=boston)
model_fit = model.fit()

#### Fit Result Summary

We can get a comprehensive summary using the `summary()` method. Now we get the results for all three $\beta$ coefficients.

In [None]:
model_fit.summary()

#### A Fancy 3D Plot

It's so nice. :)

In [None]:
import plots
fig, ax = plots.plot_linear_model_3D(model_fit, 'lstat', 'age', data=boston)
fig.suptitle(f'Boston Housing Data Set: {model.formula}')
plt.show()