<a href="https://colab.research.google.com/github/jaidatta71/ML---Berkeley/blob/main/Colab%20Activity%208.2%3A%20Using%20Polynomial%20Features.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Colab Activity 8.2: Using Polynomial Features

**Estimated time: 60 minutes**



This activity focuses on using the scikit-learn transformer `PolynomialFeatures`.  As seen in video 8.4, you can use this transformer to create the modified DataFrame with appropriate column names using the `.get_feature_names_out()` method on the fit transformer.  You will focus on building second, third, and fourth-degree polynomial models using `PolynomialFeatures`, and converting the results to pandas DataFrames.

## Index:

 - [Problem 1](#Problem-1)
 - [Problem 2](#Problem-2)
 - [Problem 3](#Problem-3)
 - [Problem 4](#Problem-4)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.feature_extraction.text import CountVectorizer

### The Data

Again, the automobile dataset is used.  You will build the additional features using the `horsepower` column of the data.  

In [None]:
auto = pd.read_csv('drive/MyDrive/colab_activity8_2_POLYNOMIAL/auto.csv')

In [None]:
#auto.head()
X = auto.loc[:,['horsepower']]
#print(X)
#print(type(X))
y = auto['mpg']
#print(type(y))
#print(y)
X.shape
sample = auto.sample(10, random_state = 22)
X_train = sample.loc[:, ['horsepower']]
y_train = sample['mpg']

#print(sample)

In [None]:
model_predictions = {f'degree_{i}': None for i in range(1, 11)}

print("Starting Dictionary of Predictions\n", model_predictions)
#for 1, 2, 3, ..., 10

    #create pipeline
     pipe = Pipeline([
     ('quad_features',PolynomialFeatures(degree=i, include_bias = False)),
     ('quad_model',LinearRegression())
])

    #fit pipeline on training data
    pipe.fit(X_train, y_train)

    #make predictions on all data
    preds = pipe.predict(X_train)
    #assign to model_predictions



# Answer check
model_predictions['degree_1'][:10]

[Back to top](#Index:)

## Problem 1

### Creating Quadratic Features


Complete the code below according to the instructions below:

- Instantiate a `PolynomialFeatures()` transformer and assign it to the variable `pfeatures`.
- Apply the transformation to the `horsepower` column by applying the `fit_transform` function with an argument equal to `auto[['horsepower']]` on `pfeatures`. Assign your result to `quad_features` below.



In [None]:
from sklearn.preprocessing import PolynomialFeatures
pfeatures = PolynomialFeatures(degree=2)
quad_features = pfeatures.fit_transform(auto[['horsepower']])

x_columns=pfeatures.get_feature_names_out()
print(x_columns)

# Answer check
print(quad_features)
print(type(quad_features))

['1' 'horsepower' 'horsepower^2']
[[1.0000e+00 1.3000e+02 1.6900e+04]
 [1.0000e+00 1.6500e+02 2.7225e+04]
 [1.0000e+00 1.5000e+02 2.2500e+04]
 ...
 [1.0000e+00 8.4000e+01 7.0560e+03]
 [1.0000e+00 7.9000e+01 6.2410e+03]
 [1.0000e+00 8.2000e+01 6.7240e+03]]
<class 'numpy.ndarray'>


[Back to top](#Index:)

## Problem 2

### Creating the DataFrame



Use the transformed array `quad_features` to create a DataFrame of the transformed data.  As shown in the lectures, use the `get_feature_names_out()` method of your fit transformer `pfeatures` from above to define the column names.  Use `iloc[:, 1:]` to drop the bias term from the DataFrame so that you only have the columns `horsepower` and `horsepower^2`.  

Assign your response as a DataFrame to the variable `poly_features_df` below.

In [None]:
poly_features_df = pd.DataFrame(quad_features, columns = x_columns)
poly_features_df = poly_features_df.iloc[:, 1:]

# Answer check
print(poly_features_df.shape)
poly_features_df.head()

(392, 2)


Unnamed: 0,horsepower,horsepower^2
0,130.0,16900.0
1,165.0,27225.0
2,150.0,22500.0
3,150.0,22500.0
4,140.0,19600.0


[Back to top](#Index:)

## Problem 3

### DataFrame with Cubic Features



Complete the code below according to the instructions below:

- Instantiate a `PolynomialFeatures()` transformer with `degree` equal to `3` and assign it to the variable `pfeatures`.
- Apply the transformation to the `horsepower` column by applying the `fit_transform` function with an argument equal to `auto[['horsepower']]` on `pfeatures`. Assign your result to `cubic_features` below.
- Use the transformed array `cubic_features` to create a DataFrame of the transformed data.  As shown in the lectures, use the `get_feature_names_out()` method of your fit transformer `pfeatures` from above to define the column names.  Use `iloc[:, 1:]` to drop the bias term from the DataFrame so that you only have the columns `horsepower`,`horsepower^2`, and `horsepower^3`.  

Assign your response as a DataFrame to the variable `cubic_features_df` below.


In [None]:
from sklearn.preprocessing import PolynomialFeatures
pfeatures = PolynomialFeatures(degree=3)

cubic_features_df = pfeatures.fit_transform(auto[['horsepower']])

cubic_features_df = pd.DataFrame(cubic_features_df, columns = pfeatures.get_feature_names_out())
cubic_features_df = cubic_features_df.iloc[:, 1:]

# Answer check
print(cubic_features_df.shape)
cubic_features_df.head()

(392, 3)


Unnamed: 0,horsepower,horsepower^2,horsepower^3
0,130.0,16900.0,2197000.0
1,165.0,27225.0,4492125.0
2,150.0,22500.0,3375000.0
3,150.0,22500.0,3375000.0
4,140.0,19600.0,2744000.0


[Back to top](#Index:)

## Problem 4

### Experimenting with Multiple Features



Complete the code below according to the instructions below:

- Instantiate a `PolynomialFeatures()` transformer with `degree` equal to `2` and assign it to the variable `pfeatures`.
- Apply the transformation to the `horsepower` and `weight` columns by applying the `fit_transform` function with an argument equal to `auto[['horsepower', 'weight']]` on `pfeatures`. Assign your result to `two_features` below.
- Use the transformed array `two_features` to create a DataFrame of the transformed data.  As shown in the lectures, use the `get_feature_names_out()` method of your fit transformer `pfeatures` from above to define the column names.  Use `iloc[:, 1:]` to drop the bias term from the DataFrame.

Assign your response as a DataFrame to the variable `two_feature_poly_df` below.

In [None]:
from sklearn.preprocessing import PolynomialFeatures
pfeatures = PolynomialFeatures(degree=2)
two_features = pfeatures.fit_transform(auto[['horsepower','weight']])

pfeatures.get_feature_names_out()
print(pfeatures.get_feature_names_out())
two_feature_poly_df = pd.DataFrame(two_features, columns = pfeatures.get_feature_names_out()).iloc[: ,1:]

# Answer check
print(two_feature_poly_df.shape)
two_feature_poly_df.head()

['1' 'horsepower' 'weight' 'horsepower^2' 'horsepower weight' 'weight^2']
(392, 5)


Unnamed: 0,horsepower,weight,horsepower^2,horsepower weight,weight^2
0,130.0,3504.0,16900.0,455520.0,12278016.0
1,165.0,3693.0,27225.0,609345.0,13638249.0
2,150.0,3436.0,22500.0,515400.0,11806096.0
3,150.0,3433.0,22500.0,514950.0,11785489.0
4,140.0,3449.0,19600.0,482860.0,11895601.0


In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [None]:
pipe = Pipeline([('sasi_trans', PolynomialFeatures(degree=3)),
                 ('sasi_model', LinearRegression())])

Now, to examine the coefficients, use the .named_steps attribute on the pipe object to extract the regressor. Assign the model to quad_reg below.

Extract the coefficients from the model and assign these as an array to the variable coefs.

In [None]:
import numpy as np
import pandas as pd
import warnings
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
warnings.filterwarnings("ignore")

In [None]:
quad_features = ('sasi_trasnform',PolynomialFeatures(degree=2, include_bias = False))
quad_model = ('sasi_regression',LinearRegression())

pipe = Pipeline([
     ('sasi_trasnform',PolynomialFeatures(degree=2)),
     ('sasi_regression',LinearRegression())
])

X = auto[["horsepower"]]
y = auto["mpg"]

mmodel = pipe.fit(X, y)
quad_pipe_mse = mean_squared_error(mmodel.predict(X),y)

quad_reg = pipe.named_steps #regressor from pipeline
#coefs =   quad_model.coef_ #coefficients of regressor