## Codio Exercise 8.2: Using `PolynomialFeatures`

**Estimated time: 60 minutes**

**Total Points: 20 Points**


This activity focuses on using the scikit-learn transformer `PolynomialFeatures`.  As seen in video 8.4, you can use this transformer to create the modified DataFrame with appropriate column names using the `.get_feature_names_out()` method on the fit transformer.  You will focus on building second, third, and fourth degree polynomial models using `PolynomialFeatures`, and converting the results to pandas DataFrames.

## Index:

 - [Problem 1](#Problem-1)
 - [Problem 2](#Problem-2)
 - [Problem 3](#Problem-3)
 - [Problem 4](#Problem-4)

In [2]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.feature_extraction.text import CountVectorizer

### The Data

Again, the automobile dataset is used.  You will build the additional features using the `horsepower` column of the data.  

In [4]:
auto = pd.read_csv('../data/auto.csv')

In [6]:
auto.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,1,ford torino


[Back to top](#Index:) 

## Problem 1

### Creating Quadratic Features

**5 Points**

Complete the code below according to the instructions below:

- Instantiate a `PolynomialFeatures()` transformer and assign it to the variable `pfeatures`.
- Apply the transformation to the `horsepower` column by applying the `fit_transform` function with argument equal to `auto[['horsepower']]` on `pfeatures`. Assign your result to `quad_features` below.



In [13]:
### GRADED

# YOUR CODE HERE
pfeatures = PolynomialFeatures()
quad_features = pfeatures.fit_transform(auto[['horsepower']])

# Answer check
print(type(quad_features))

<class 'numpy.ndarray'>
[1.00e+00 1.30e+02 1.69e+04]


[Back to top](#Index:) 

## Problem 2

### Creating the DataFrame

**5 Points**

Use the transformed array `quad_features` to create a DataFrame of the transformed data.  As shown in the lectures, use the `get_feature_names_out()` method of your fit transformer `pfeatures` from above to define the column names.  Use `iloc[:, 1:]` to drop the bias term from the DataFrame so that you only have the columns `horsepower` and `horsepower^2`.  

Assign your response as a DataFrame to the variable `poly_features_df` below.

In [17]:
### GRADED

# YOUR CODE HERE
poly_features_df = pd.DataFrame(quad_features, columns = pfeatures.get_feature_names_out())
poly_features_df = poly_features_df.iloc[:, 1:]

# Answer check
print(poly_features_df.shape)
poly_features_df.head()

(392, 2)


Unnamed: 0,horsepower,horsepower^2
0,130.0,16900.0
1,165.0,27225.0
2,150.0,22500.0
3,150.0,22500.0
4,140.0,19600.0


[Back to top](#Index:) 

## Problem 3

### DataFrame with Cubic Features

**5 Points**

Complete the code below according to the instructions below:

- Instantiate a `PolynomialFeatures()` transformer with `degree` equal to `3` and assign it to the variable `pfeatures`.
- Apply the transformation to the `horsepower` column by applying the `fit_transform` function with argument equal to `auto[['horsepower']]` on `pfeatures`. Assign your result to `cubic_features` below.
- Use the transformed array `cubic_features` to create a DataFrame of the transformed data.  As shown in the lectures, use the `get_feature_names_out()` method of your fit transformer `pfeatures` from above to define the column names.  Use `iloc[:, 1:]` to drop the bias term from the DataFrame so that you only have the columns `horsepower`,`horsepower^2`, and `horsepower^3`.  

Assign your response as a DataFrame to the variable `cubic_features_df` below.


In [21]:
### GRADED

# YOUR CODE HERE
pfeatures = PolynomialFeatures(degree=3)
cubic_features = pfeatures.fit_transform(auto[['horsepower']])

cubic_features_df = pd.DataFrame(cubic_features, columns = pfeatures.get_feature_names_out())
cubic_features_df = cubic_features_df.iloc[:, 1:]

# Answer check
print(type(cubic_features))

# Answer check
print(cubic_features_df.shape)
cubic_features_df.head()

<class 'numpy.ndarray'>
(392, 3)


Unnamed: 0,horsepower,horsepower^2,horsepower^3
0,130.0,16900.0,2197000.0
1,165.0,27225.0,4492125.0
2,150.0,22500.0,3375000.0
3,150.0,22500.0,3375000.0
4,140.0,19600.0,2744000.0


[Back to top](#Index:) 

## Problem 4

### Experimenting with Multiple Features

**5 Points**

Complete the code below according to the instructions below:

- Instantiate a `PolynomialFeatures()` transformer with `degree` equal to `2` and assign it to the variable `pfeatures`.
- Apply the transformation to the `horsepower` and `weight` columns by applying the `fit_transform` function with argument equal to `auto[['horsepower', 'weight']]` on `pfeatures`. Assign your result to `two_features` below.
- Use the transformed array `two_features` to create a DataFrame of the transformed data.  As shown in the lectures, use the `get_feature_names_out()` method of your fit transformer `pfeatures` from above to define the column names.  Use `iloc[:, 1:]` to drop the bias term from the DataFrame.

Assign your response as a DataFrame to the variable `two_feature_poly_df` below.

In [23]:
### GRADED

# YOUR CODE HERE
pfeatures = PolynomialFeatures(degree=2)
two_features = pfeatures.fit_transform(auto[['horsepower', 'weight']])

two_feature_poly_df = pd.DataFrame(two_features, columns = pfeatures.get_feature_names_out())
two_feature_poly_df = two_feature_poly_df.iloc[:, 1:]

# Answer check
print(two_feature_poly_df.shape)
two_feature_poly_df.head()

(392, 5)


Unnamed: 0,horsepower,weight,horsepower^2,horsepower weight,weight^2
0,130.0,3504.0,16900.0,455520.0,12278016.0
1,165.0,3693.0,27225.0,609345.0,13638249.0
2,150.0,3436.0,22500.0,515400.0,11806096.0
3,150.0,3433.0,22500.0,514950.0,11785489.0
4,140.0,3449.0,19600.0,482860.0,11895601.0


#### Summary

Now that you have the hang of using `PolynomialFeatures`, you will combine the transformer with an estimator using scikitlearn's pipeline utilities.  As demonstrated in the videos, the pipeline is a handy abstraction for combining the data transformations with the model in a single object.  This is especially handy when making predictions with new data points.