# Exercise: Polynomial Regression

Get some practice implementing polynomial regression in this exercise. In data.csv, you can see data generated for one predictor feature ('Var_X') and one outcome feature ('Var_Y'), following a non-linear trend. Use sklearn's PolynomialFeatures class to extend the predictor feature column into multiple columns with polynomial features. Play around with different degrees of polynomial and the Test Run button to see what fits best: when you think you have the best-fitting degree, press the Submit button to check your work!
Perform the following steps below:

1. Load in the data

    The data is in the file called 'data.csv'. Note that this data has a header line.
    Make sure that you've split out the data into the predictor feature in X and outcome feature in y.
    For X, make sure it is in a 2-d array of 20 rows by 1 column. You might need to use NumPy's reshape function to accomplish this.

2. Create polynomial features

    Create an instance of sklearn's PolynomialFeatures class and assign it to the variable poly_feat. Pay attention to how to set the degree of features, since that will be how the exercise is evaluated.
    Create the polynomial features by using the PolynomialFeatures object's .fit_transform() method. The "fit" side of the method considers how many features are needed in the output, and the "transform" side applies those considerations to the data provided to the method as an argument. Assign the new feature matrix to the X_poly variable.

3. Build a polynomial regression model

    Create a polynomial regression model by combining sklearn's LinearRegression class with the polynomial features. Assign the fit model to poly_model.

In [20]:
# Add import statements
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

In [21]:
# Load the data
train_data = pd.read_csv('data_4.csv')
print(f"[INFO] Train_dat has a shape of: {train_data.shape} \n")
print(train_data.head())

# Assign the data to predictor and outcome variables
# Warning: For X, make sure it is in a 2-d array of 20 rows by 1 column by using the `reshape` function.
#  he `reshape`` is a function from the NumPy library that is used to change the shape of an array.
#  In this case, train_data['Var_X'].values returns a 1-dimensional array, 
#   and reshape(-1, 1) is used to reshape it into a 2-dimensional array with 'r' rows and 1 column.
#  The -1 in the reshape function is a placeholder that tells NumPy to automatically 
#   determine the number of rows based on the size of the original array. The 1 specifies
#   the number of columns.
#  So, by using reshape(-1, 1), we are transforming the 1-dimensional array 
#  train_data['Var_X'].values into a 2-dimensional array X with 'r' rows and 1 column, 
#  which is the required format for the predictor feature in polynomial regression. 
X = train_data['Var_X'].values.reshape(-1, 1)

y = train_data['Var_Y'].values



[INFO] Train_dat has a shape of: (20, 2) 

     Var_X    Var_Y
0 -0.33532  6.66854
1  0.02160  3.86398
2 -1.19438  5.16161
3 -0.65046  8.43823
4 -0.28001  5.57201


In [22]:
# Create polynomial features
# Create a PolynomialFeatures object, then fit and transform the predictor feature
poly_feat = PolynomialFeatures(degree = 4)
X_poly = poly_feat.fit_transform(X)

In [23]:
# Make and fit the polynomial regression model
# Create a LinearRegression object and fit it to the polynomial predictor features
poly_model = LinearRegression(fit_intercept = False).fit(X_poly, y)

print(f"Coefficients: {poly_model.coef_}")
print(f"Intercept: {poly_model.intercept_}")

Coefficients: [ 3.37563501 -6.28126025 -2.3787942   0.55307182  0.22699807]
Intercept: 0.0
