# Sect 28: Extensions to Linear Models

- online-ds-ft-100719


## Objectives

**DISCUSSION:**
- Discuss interactions between variables
- Discuss polynomial regressions
- Discuss Regularization Techniques
    - Ridge Regression (L2 normalization)
    - Lasso Regression (L1 normalization)

**APPLICATION:**
- Lab Walkthrough (pick one):
    - [Sect 28: Ridge and Lasso Regression Lab](https://learn.co/tracks/data-science-career-v2/module-4-a-complete-data-science-project-using-multiple-regression/section-28-section-recap/ridge-and-lasso-regression-lab)
    
    - [Sect 28: Extensions to Linear Models Lab](https://learn.co/tracks/data-science-career-v2/module-4-a-complete-data-science-project-using-multiple-regression/section-28-section-recap/extensions-to-linear-models-lab)
    
- Alternative:
    - Walk through feature selection lesson, but re-write so that all of the results for each method are collected into one table for us to review together.
    - [Sect 28: Feature Selection](https://learn.co/tracks/data-science-career-v2/module-4-a-complete-data-science-project-using-multiple-regression/section-28-section-recap/feature-selection-methods)

# Interactions

When variables interact to **cause an effect** to another variable but is **not the sum of their parts**

## Confounding factor

![](images/diet_interaction.png)

In [1]:
!pip install -U fsds_100719
from fsds_100719.imports import *

fsds_1007219  v0.5.13 loaded.  Read the docs: https://fsds.readthedocs.io/en/latest/ 


Handle,Package,Description
dp,IPython.display,Display modules with helpful display and clearing commands.
fs,fsds_100719,Custom data science bootcamp student package
mpl,matplotlib,Matplotlib's base OOP module with formatting artists
plt,matplotlib.pyplot,Matplotlib's matlab-like plotting module
np,numpy,scientific computing with Python
pd,pandas,High performance data structures and tools
sns,seaborn,High-level data visualization library based on matplotlib


In [4]:
df = fs.datasets.load_autompg()
display(df.head(),df.dtypes)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


mpg             float64
cylinders         int64
displacement    float64
horsepower        int64
weight            int64
acceleration    float64
model year        int64
origin            int64
car name         object
dtype: object

## Example of slight interaction (mostly additive)

![](https://github.com/learn-co-students/dsc-2-24-03-interactions-online-ds-sp-000/raw/master/index_files/index_20_0.png)

## Example of interaction (definitley not additive)

![](https://github.com/learn-co-students/dsc-2-24-03-interactions-online-ds-sp-000/raw/master/index_files/index_31_0.png)

# Polynomial Regressions

Remember we started with (multiple) linear equation:

$$ \large \hat{y} = w_0 + w_1  x_1 + w_2  x_2 + ... + w_N  x_N $$
$$ \large \hat{y} = \sum_{n=0}^{N} w_n x_n $$


**Knowledge check:** Why is this "linear"?

## Making it more complex!

![](https://github.com/learn-co-students/dsc-2-24-05-polynomial-regression-online-ds-sp-000/raw/master/index_files/index_8_0.png)

Imagine making this (start with just one variable):

$$ \large \hat{y} = \beta_0 + \beta_1  x + \beta_2  x^2 + ... + \beta_N  x^N $$
$$ \large \hat{y} = \sum_{n=0}^{N} \beta_n x^n $$

![](https://github.com/learn-co-students/dsc-2-24-05-polynomial-regression-online-ds-sp-000/raw/master/index_files/index_23_0.png)

![](https://github.com/learn-co-students/dsc-2-24-05-polynomial-regression-online-ds-sp-000/raw/master/index_files/index_28_0.png)

For more general (multiple variables):

$$ \large \hat{y} = \beta_{0,0} + \sum_{i=0}^{N} \beta_{1,i} x_1^i + \sum_{i=0}^{N} \beta_{2,i} x_2^i + ... + \sum_{i=0}^{N} \beta_{M,i} x_M^i$$ 

$$ \large \hat{y} = \sum_{i=0}^{N}\sum_{j=0}^{M} \beta_{j,i} x_j^i $$ 


## Discussion

There is no limit on the polynomial degree

- What is the $R^2$ of this complicated? Higher or lower?
- How many should we use?
- Are there any disadvantages to making it more complicated?

# Regularization techniques

We can "shrink down" prediction variables effects instead of deleting/zeroing them

## Cost Function Previously Used

$$ \large J = \sum_{i=1}^n(y_i - \hat{y})^2 $$ 
$$ = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij} + b))^2$$

## Ridge Regression - L2 Norm Regularization

Define a penalty ***hyperparameter*** $\lambda$ for extra terms (large $m$)

$$ \large J_{ridge}= \sum_{i=1}^n(y_i - \hat{y})^2 + \lambda m_i^2  $$
$$ = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij} + b))^2 + \lambda \sum_{i=1}^n m_i^2$$


- By adding the penalty term $\lambda$, ridge regression puts a constraint on the coefficients $m$. 
- Therefore, large coefficients will penalize the optimization function. 
    - This shrinks the coefficients and helps to reduce model complexity and multicollinearity.

### Uses

Used mostly to prevent overfitting but since includes all features it can be computationally expensive (for many variables)

Correlated values spread evenly on coefficients

## Lasso Regression - L1 Norm Regularization

"Least Absolute Shrinkage and Selection Operator"

$$ \large J_{ridge}= \sum_{i=1}^n(y_i - \hat{y})^2 + \lambda \mid m_i \mid $$
$$ = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij} + b))^2 + \lambda \sum_{i=1}^n \mid m_i \mid$$

### Uses

Useful since absolute value can be set at zero: performs estimation & selection (good for many variables) --> ***sparse solution***

# AIC/BIC