<h1><center> Regression analysis / Ridge regression / Lasso regression (R^2 evaluation) </center></h1>


In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

In [14]:
import pandas as pd
data = pd.read_csv("Advertising.csv")
print(data)

     Unnamed: 0     TV  radio  newspaper  sales
0             1  230.1   37.8       69.2   22.1
1             2   44.5   39.3       45.1   10.4
2             3   17.2   45.9       69.3    9.3
3             4  151.5   41.3       58.5   18.5
4             5  180.8   10.8       58.4   12.9
..          ...    ...    ...        ...    ...
195         196   38.2    3.7       13.8    7.6
196         197   94.2    4.9        8.1    9.7
197         198  177.0    9.3        6.4   12.8
198         199  283.6   42.0       66.2   25.5
199         200  232.1    8.6        8.7   13.4

[200 rows x 5 columns]


In [12]:
## Split data into train and test set

from sklearn.model_selection import train_test_split 
Xs = data.drop(['sales', 'Unnamed: 0'], axis=1)
y = data['sales'].values.reshape(-1,1)
Xs_train, Xs_test, y_train, y_test=train_test_split(Xs,y,test_size=.2,random_state=3)

### 1. Sales ~ TV

In [247]:
## Regressing Sales on Tv advertisement

tv_train = Xs_train['TV'].values.reshape(-1,1)
tv_test = Xs_test['TV'].values.reshape(-1,1)
y = y_train
reg1 = LinearRegression()
reg1.fit(tv_train, y_train)
print("The linear model is: Y = {:.5} + {:.5}*TV ".format(reg.intercept_[0], reg.coef_[0][0]))

The linear model is: Y = 3.0545 + 0.047353*TV 


### 2. Sales ~ Radio

In [248]:
## Regressing Sales on Radio advertisement

radio_train = Xs_train['radio'].values.reshape(-1,1)
radio_test = Xs_test['radio'].values.reshape(-1,1)
reg2 = LinearRegression()
reg2.fit(radio_train, y_train)
print("The linear model is: Y = {:.5} + {:.5}*Radio ".format(reg2.intercept_[0], reg2.coef_[0][0]))

The linear model is: Y = 9.3527 + 0.2014*Radio 


### 3. Sales ~ Newspaper

In [249]:
## Regressing Sales on Newspaper advertisement

npp_train = Xs_train['newspaper'].values.reshape(-1,1)
npp_test = Xs_test['newspaper'].values.reshape(-1,1)
reg3 = LinearRegression()
reg3.fit(npp_train, y_train)
print("The linear model is: Y = {:.5} + {:.5}*newspaper".format(reg3.intercept_[0], reg3.coef_[0][0]))

The linear model is: Y = 12.47 + 0.050066*newspaper


### 4. R2_scores 

In [250]:
## Test and train data R^2 from Sales~TV

from sklearn.metrics import r2_score
t1=r2_score(y_true=y_train,y_pred=reg1.predict(tv_train))
t2=r2_score(y_true=y_test,y_pred=reg1.predict(tv_test))
print("For Sales~TV, train data R2_scores =",round(t1,4),";","test data R2_scores =",round(t2,4))

For Sales~TV, train data R2_scores = 0.6265 ; test data R2_scores = 0.5444


In [251]:
## Test and train data R^2 from Sales~radio

t1=r2_score(y_true=y_train,y_pred=reg2.predict(radio_train))
t2=r2_score(y_true=y_test,y_pred=reg2.predict(radio_test))
print("For Sales~radio, train data R2_scores =",round(t1,4),";","test data R2_scores =",round(t2,4))

For Sales~radio, train data R2_scores = 0.3143 ; test data R2_scores = 0.4101


In [252]:
## Test and train data R^2 from Sales~newspaper

t1=r2_score(y_true=y_train,y_pred=reg3.predict(npp_train))
t2=r2_score(y_true=y_test,y_pred=reg3.predict(npp_test))
print("For Sales~newspaper, train data R2_scores =",round(t1,4),";","test data R2_scores =",round(t2,4))

For Sales~newspaper, train data R2_scores = 0.0431 ; test data R2_scores = 0.0892


From the training data, the amount of variability in Sales that is explained by TV advertisement, radio advertisement and newspaper advertisement are about 62.65%, 31.43% and 4.31% respectively. Thus, TV advertisement has the higest impact on Sales, followed by radio advertisement, whereas newspaper advertisement has the least impact on sales. A similar pattern can be seen from the test data where TV advertisement explains the highest amount of the variability in sales, with test R2 score of about 54.44% followed by radio advertisement with test R2 score of about 41.01% and then the least, newspaper advertisement with test R2 score of about 8.92%.

### 1. Multiple Linear Regression

In [253]:
## Regressing Sales on all predictors

regM = LinearRegression()
regM.fit(Xs_train, y_train)
print("The linear model is: Y = {:.5} + {:.5}*TV + {:.5}*radio + {:.5}*newspaper".format(regM.intercept_[0], regM.coef_[0][0], regM.coef_[0][1], regM.coef_[0][2]))

The linear model is: Y = 2.911 + 0.045529*TV + 0.18512*radio + 0.0011988*newspaper


### 2. Ridge Regression

In [254]:
## Ridge Regression of Sales on all predictors 

alphas = 10**np.linspace(10,-2,100)*0.5
ridge = Ridge(normalize = True)
r2_values = []

for a in alphas:
    ridge.set_params(alpha = a)
    ridge.fit(Xs_train, y_train)
    r2_values.append(r2_score(y_true=y_test,y_pred=ridge.predict(Xs_test)))

ridge.set_params(alpha = alphas[np.argmax(r2_score)])
ridge.fit(Xs_train, y_train)
ridge.coef_
print("The Ridge linear model is: Y = {:.5} + {:.5}*TV + {:.5}*radio + {:.5}*newspaper".format(ridge.intercept_[0], ridge.coef_[0][0], ridge.coef_[0][1], ridge.coef_[0][2]))



The Ridge linear model is: Y = 13.997 + 9.4585e-12*TV + 4.0281e-11*radio + 1.0013e-11*newspaper


### 3. Lasso Regression

In [255]:
## Lasso Regression of Sales on all predictors 

from sklearn.linear_model import Lasso
lasso = Lasso(max_iter = 10000, normalize = True)
r2_values_lasso = []

for a in alphas:
    lasso.set_params(alpha = a)
    lasso.fit(Xs_train, y_train)
    r2_values_lasso.append(r2_score(y_true=y_test,y_pred=lasso.predict(Xs_test)))
    
lasso.set_params(alpha = alphas[np.argmax(r2_values_lasso)])
lasso.fit(Xs_train, y_train)
lasso.coef_
print("The lasso linear model is: Y = {:.5} + {:.5}*TV + {:.5}*radio + {:.5}*newspaper".format(lasso.intercept_[0], lasso.coef_[0], lasso.coef_[1], lasso.coef_[2]))


The lasso linear model is: Y = 3.1283 + 0.044848*TV + 0.18168*radio + 0.0*newspaper


### 4. R2_scores

In [256]:
t=r2_score(y_true=y_test,y_pred=regM.predict(Xs_test))
print("The Multiple regression test data R2_score =",round(t,4))

The Multiple regression test data R2_score = 0.9138


In [257]:
print("The Ridge regression test data R2_score =",round(np.max(r2_values),4) )

The Ridge regression test data R2_score = 0.9134


In [258]:
print("The Lasso regression test data R2_score =",round(np.max(r2_values_lasso),4) )

The Lasso regression test data R2_score = 0.9127


The test data R2 scores from the multiple regression model and the two regularized models are approximately the same, about 91%. With respect to the coefficients, it can be seen that the coeffecient estimates in the ridge regression are relatively very small and the lasso regression makes the coefficient estimate of newspaper advertisement zero, indicating that newspaper advertisement is relatively less important as was seen in problem 2. One can consider the lasso regression model as a better model since it is more parsimonious even though its not so different from the other models.

Based on the results from problems 2 and 3 we can clearly see that, newspaper advertisement has the least importance on sales out of the 3 features. For TV and radio advertisement, the simple linear regressions in problem 2 suggest that TV advertisement is more important, however, the regularized models in probelm 3 seems to suggest that radio advertisement is more important since the coefficient of TV advertisement seems to be approaching zero faster.