# Polynomial Regression using Scikit Learn

In data.csv, we can see data generated for one predictor feature ('Var_X') and one outcome feature ('Var_Y'), following a non-linear trend. Use sklearn's PolynomialFeatures class to extend the predictor feature column into multiple columns with polynomial features.

### import libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

### Load in the data

    The data is in the file called 'data.csv'. Note that this data has a header line. Make sure that you've split out the data into the predictor feature in X and outcome feature in y. For X, make sure it is in a 2-d array of 20 rows by 1 column. You might need to use NumPy's reshape function to accomplish this.


In [8]:
train_data = pd.read_csv("data.csv")
print(train_data["Var_X"], end='\n\n')
X = train_data["Var_X"].values.reshape(-1,1)
print(X)
y = train_data["Var_Y"].values
print(y)

0    -0.33532
1     0.02160
2    -1.19438
3    -0.65046
4    -0.28001
5     1.93258
6     1.22620
7     0.74727
8     3.32853
9     2.87457
10   -1.48662
11    0.37629
12    1.43918
13    0.24183
14   -2.79140
15    1.08176
16    2.81555
17    0.54924
18    2.36449
19   -1.01925
Name: Var_X, dtype: float64

[[-0.33532]
 [ 0.0216 ]
 [-1.19438]
 [-0.65046]
 [-0.28001]
 [ 1.93258]
 [ 1.2262 ]
 [ 0.74727]
 [ 3.32853]
 [ 2.87457]
 [-1.48662]
 [ 0.37629]
 [ 1.43918]
 [ 0.24183]
 [-2.7914 ]
 [ 1.08176]
 [ 2.81555]
 [ 0.54924]
 [ 2.36449]
 [-1.01925]]
[  6.66854   3.86398   5.16161   8.43823   5.57201 -11.1327   -5.31226
  -4.63725   3.8065   -6.06084   7.22328   2.38887  -7.13415   2.00412
   4.29794  -5.86553  -5.20711  -3.52863 -10.16202   5.31123]


### Create polynomial features

    Create an instance of sklearn's PolynomialFeatures class and assign it to the variable poly_feat. Pay attention to how to set the degree of features, since that will be how the exercise is evaluated.
    Create the polynomial features by using the PolynomialFeatures object's .fit_transform() method. The "fit" side of the method considers how many features are needed in the output, and the "transform" side applies those considerations to the data provided to the method as an argument. Assign the new feature matrix to the X_poly variable.


In [10]:
poly_feat = PolynomialFeatures(degree = 4)
X_poly = poly_feat.fit_transform(X)
print(X_poly)

[[ 1.00000000e+00 -3.35320000e-01  1.12439502e-01 -3.77032139e-02
   1.26426417e-02]
 [ 1.00000000e+00  2.16000000e-02  4.66560000e-04  1.00776960e-05
   2.17678234e-07]
 [ 1.00000000e+00 -1.19438000e+00  1.42654358e+00 -1.70383513e+00
   2.03502660e+00]
 [ 1.00000000e+00 -6.50460000e-01  4.23098212e-01 -2.75208463e-01
   1.79012097e-01]
 [ 1.00000000e+00 -2.80010000e-01  7.84056001e-02 -2.19543521e-02
   6.14743813e-03]
 [ 1.00000000e+00  1.93258000e+00  3.73486546e+00  7.21792628e+00
   1.39492200e+01]
 [ 1.00000000e+00  1.22620000e+00  1.50356644e+00  1.84367317e+00
   2.26071204e+00]
 [ 1.00000000e+00  7.47270000e-01  5.58412453e-01  4.17284874e-01
   3.11824468e-01]
 [ 1.00000000e+00  3.32853000e+00  1.10791120e+01  3.68771565e+01
   1.22746722e+02]
 [ 1.00000000e+00  2.87457000e+00  8.26315268e+00  2.37530108e+01
   6.82796923e+01]
 [ 1.00000000e+00 -1.48662000e+00  2.21003902e+00 -3.28548821e+00
   4.88427249e+00]
 [ 1.00000000e+00  3.76290000e-01  1.41594164e-01  5.32804680e-02

### Build a polynomial regression model

    Create a polynomial regression model by combining sklearn's LinearRegression class with the polynomial features. Assign the fit model to poly_model.


In [11]:
poly_model = LinearRegression()
poly_model = poly_model.fit(X_poly,y)