## Codio Activity 8.4: Fitting Higher Order Polynomials

**Expected Time: 60 Minutes**

**Total Points: 20**

In this exercise, you will examine the effect of fitting more complex models on the automobile data.  Using the `Pipeline`, you will fit models degree 1 - 10 and evaluate the mean squared error of each model.  Using these results, you are to generate a plot of the model complexity vs. mse.  

#### Index:

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)


In [1]:
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import plotly.express as px

In [2]:
auto = pd.read_csv("data/auto.csv")
auto.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,1,ford torino


[Back to top](#Index:) 

### Problem 1

#### Loop of Models

**10 Points** 

Below, construct a loop over the values 1 - 10. For each iteration, fit a regression model with corresponding degree using the `Pipeline`.  Determine the mean squared error and append it to the list `mses` below.  Follow the comments for hints if you have trouble getting started.  Also, be sure to set `include_bias = False` in your transformer.

In [8]:
### GRADED

mses = []
for k in range(10):
    # create pipeline
    # fit pipeline
    pipeline = Pipeline(
        [
            ("transform", PolynomialFeatures(degree=k + 1, include_bias=False)),
            ("regression", LinearRegression()),
        ]
    ).fit(auto[["horsepower"]], auto["mpg"])

    # make predictions
    # compute mse
    # append mse to mses
    mses.append(
        float(mean_squared_error(auto["mpg"], pipeline.predict(auto[["horsepower"]])))
    )
    display([k + 1, mses[-1]])

# Answer check
print(len(mses))
print(np.round(mses, 2))

[1, 23.943662938603108]

[2, 18.98476890761722]

[3, 18.94498981448592]

[4, 18.876333244865013]

[5, 18.426968585984714]

[6, 18.240647061163287]

[7, 18.411730923842036]

[8, 18.56919171911313]

[9, 18.58333783702966]

[10, 18.474005605387042]

10
[23.94 18.98 18.94 18.88 18.43 18.24 18.41 18.57 18.58 18.47]


[Back to top](#Index:) 

### Problem 2

#### Minimum MSE

**10 Points**

Which model complexity had the smallest Mean Squared Error?  Assign your answer as an integer to `best_complexity` below.  

In [4]:
### GRADED

best_complexity = int(np.array(mses).argmin() + 1)

# Answer check
print(type(best_complexity))
print(best_complexity)

<class 'int'>
6


### Plotting the Results

Uncomment the code below to generate a plot of the model complexity vs. the mean squared error of your fit pipe.  Do you see the same behavior from the videos -- a decrease in MSE as the complexity increases?

In [5]:
px.scatter(
    x=list(range(1, 11)),
    y=mses,
    labels={"x": "Degree", "y": "Mean Squared Error"},
    title="Complexity vs. MSE",
)