# **Lab: Engineering for ML**




## Exercise 3: Polynomial Regression

This time we will perform a polynomial transformation before training a Linear Regression model.


**Pre-requisites:**
- Create a github account (https://github.com/join)
- Install git (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
- Install pyenv (https://realpython.com/lessons/installing-pyenv/)
- Install poetry (https://python-poetry.org/docs/#installation)
- Install Wget for Windows users (https://eternallybored.org/misc/wget/)


The steps are:
1.   Create new Git branch
2.   Load the dataset
3.   Apply Polynomial Transformation
4.   Train Linear Regression model
5.   Push changes


## 1. Create new Git branch


**[1.1]** Create a new git branch called `adv_mla_1_poly`


In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git checkout -b adv_mla_1_poly

**[1.2]** Launch Jupyter Lab from your virtual environment

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
poetry run jupyter lab

**[1.3]** Navigate the folder `notebooks` and create a new jupyter notebook called `2_linear_poly.ipynb`

## 2. Load the dataset


**[2.1]** Launch magic commands to automatically reload modules

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
%load_ext autoreload
%autoreload 2

**[2.1]** Import the pandas, numpy packages and dump from joblib

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
import pandas as pd
import numpy as np
from joblib import dump

**[2.2]** Load the saved sets from `data/processed`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
X_train = pd.read_csv('../data/processed/X_train.csv')
X_val   = pd.read_csv('../data/processed/X_val.csv'  )
X_test  = pd.read_csv('../data/processed/X_test.csv' )
y_train = pd.read_csv('../data/processed/y_train.csv')
y_val   = pd.read_csv('../data/processed/y_val.csv'  )
y_test  = pd.read_csv('../data/processed/y_test.csv' )

## 3. Apply Polynomial Transformation

**[3.1]** Import PolynomialFeatures from sklearn.preprocessing

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
from sklearn.preprocessing import PolynomialFeatures

**[3.2]** Instantiate a PolynomialFeatures with degree 2

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
poly = PolynomialFeatures(2)

**[3.3]** Fit the PolynomialFeatures and perform transformation on X_train

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
X_train = poly.fit_transform(X_train)

**[3.4]** Display the dimensions of X_train

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
X_train.shape

**[3.5]** Perform transformation on X_val and X_test with PolynomialFeatures

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
X_val = poly.transform(X_val)
X_test = poly.transform(X_test)

# 4. Train Linear Regression model

**[4.1]** Import the linear regression module from sklearn

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
from sklearn.linear_model import LinearRegression

**[4.2]** Task: instantiate the LinearRegression class into a variable called reg

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
reg = LinearRegression()

**[4.3]** Task: Fit the model with the prepared data

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
reg.fit(X_train, y_train)

**[4.4]** Import `dump` from `joblib` and save the fitted model into the folder `models` as a file called `linear_poly_2`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
from joblib import dump

dump(reg,  '../models/linear_poly_2.joblib')

**[4.5]** Save the predictions from this model for the training and validation sets into 2 variables called `y_train_preds` and `y_val_preds`


In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
y_train_preds = reg.predict(X_train)
y_val_preds = reg.predict(X_val)

**[4.6]** Import mean_squared_error and mean_absolute_error from sklearn.metrics

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
from sklearn.metrics import root_mean_squared_error as rmse
from sklearn.metrics import mean_absolute_error as mae

**[4.7]** Display the RMSE and MAE scores of this model on the training set

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
print(rmse(y_train_preds, y_train))
print(mae(y_train_preds, y_train))

**[4.8]** Display the RMSE and MAE scores of this model on the validation set

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
print(rmse(y_val_preds, y_val))
print(mae(y_val_preds, y_val))

**[4.9]** Display the RMSE and MAE scores of this model on the testing set

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
y_test_preds = reg.predict(X_test)
print(rmse(y_test_preds, y_test))
print(mae(y_test_preds, y_test))

# 5.   Push changes

**[5.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git add .

**[5.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git commit -m "linear regression with poly 2"

**[5.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git push -u origin adv_mla_1_poly

[5.4] Go to to github and merge your change to the master/main branch

**[5.5]** Check out to the master branch

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git checkout master

**[5.6]** Pull the latest updates

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git pull

**[5.7]** Stop Jupyter Lab