In [2]:
import numpy as np
import pandas as pd

In [3]:
df = pd.read_csv('Advertising.csv')

In [3]:
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [4]:
X = df.drop('sales', axis=1)
y = df['sales']

In [5]:
from sklearn.preprocessing import PolynomialFeatures

In [6]:
polynomial_converter = PolynomialFeatures(degree=2, include_bias=False)

In [7]:
poly_features = polynomial_converter.fit_transform(X)

In [8]:
polynomial_converter.get_feature_names_out()

array(['TV', 'radio', 'newspaper', 'TV^2', 'TV radio', 'TV newspaper',
       'radio^2', 'radio newspaper', 'newspaper^2'], dtype=object)

In [9]:
poly_features

array([[ 230.1 ,   37.8 ,   69.2 , ..., 1428.84, 2615.76, 4788.64],
       [  44.5 ,   39.3 ,   45.1 , ..., 1544.49, 1772.43, 2034.01],
       [  17.2 ,   45.9 ,   69.3 , ..., 2106.81, 3180.87, 4802.49],
       ...,
       [ 177.  ,    9.3 ,    6.4 , ...,   86.49,   59.52,   40.96],
       [ 283.6 ,   42.  ,   66.2 , ..., 1764.  , 2780.4 , 4382.44],
       [ 232.1 ,    8.6 ,    8.7 , ...,   73.96,   74.82,   75.69]])

In [10]:
poly_features.shape

(200, 9)

---

In [11]:
from sklearn.model_selection import train_test_split

In order to fairly compare my previous linear regression algorithm, I should choose the same test size and same random state as I did in the last project.

In [12]:
X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)

In [13]:
from sklearn.linear_model import LinearRegression

In [14]:
model = LinearRegression() # with default values

In [15]:
model.fit(X_train, y_train)

# recall, this is now training a regression model on nine features instead of just the original three.

LinearRegression()

In [16]:
# it's time to evaluate its performance on the test set

test_predictions = model.predict(X_test)

In [17]:
model.coef_

array([ 5.17095811e-02,  1.30848864e-02,  1.20000085e-02, -1.10892474e-04,
        1.14212673e-03, -5.24100082e-05,  3.34919737e-05,  1.46380310e-04,
       -3.04715806e-05])

In [18]:
len(model.coef_)

9

In [19]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [20]:
MAE = mean_absolute_error(y_test, test_predictions)

In [21]:
MSE = mean_squared_error(y_test, test_predictions)

In [22]:
RMSE = np.sqrt(MSE)

In [23]:
MAE

0.48967980448037

In [24]:
RMSE # punishing my model when it's off, even on just a few data points by a lot

0.6646431757269196

---

In case of simple linear regression we had the following results:
<br> MAE: 1.213  RMSE: 1.516

---

So, obviously, our polynomial regression model is performing much better than just a linear regression.

It's __important to note__ that the only way to fairly compare your previous values is if you perform the exact same train_test_split in this model as you did on the other one.

In [25]:
model.coef_

array([ 5.17095811e-02,  1.30848864e-02,  1.20000085e-02, -1.10892474e-04,
        1.14212673e-03, -5.24100082e-05,  3.34919737e-05,  1.46380310e-04,
       -3.04715806e-05])

Some of the beta coefficients are almost 0.

So, essentially, in our first linear regression, we weren't really considering newspaper, which makes sense that now when we're squaring things, we're definitely not going to consider the squared newspaper value on its own.

---

__Again, main things to consider is when you're performing your model training, you want to make sure that you're passing in the polynomial features on your train_test_split. DON'T actually pass in the original X values.__

---

In case of polynomial regression the line only appears curved because its fitting to the polynomial feature set.

---

__A quick note__, later on when we read in a new data point with three new feature points, for example, we later on want to test on some sampling of TV, radio and newspaper, because those are only three features, our regression model is actually going to be trained on nine features of these polynomial features. So we will need to eventually save the polynomial converter in order to transform future data points. 
<br>So just a quick note. Later on, we'll be __saving__ not just __the model__ but __also__ this __converter object__, since it's going to be necessary for new incoming data points where we only have the three original features.

---

One question that I have is what exactly does x^2 entail?
- it basically magnifies the feature, in case it's not strong enough for the ML algo to pick up on.

---