# Sales price prediction from the different AD options

target feature is numerical continuous feature so this project goes with regression . Linear regression is used .


features :
- **TV:** advertising dollars spent on TV for a single product in a given market (in thousands of dollars)
- **Radio:** advertising dollars spent on Radio
- **Newspaper:** advertising dollars spent on Newspaper

- **Sales:** sales of a single product in a given market (in thousands of items) [ Target feature ]



In [1]:
# Importing libs
import pandas as pd
import seaborn as sns
import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn import metrics

# allow plots to appear within the notebook
%matplotlib inline

In [2]:
dataset = pd.read_csv('Advertising.csv', index_col=0)
print(dataset.shape)
dataset.head()

(200, 4)


Unnamed: 0,TV,Radio,Newspaper,Sales
1,230.1,37.8,69.2,22.1
2,44.5,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9


In [3]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 200 entries, 1 to 200
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   TV         200 non-null    float64
 1   Radio      200 non-null    float64
 2   Newspaper  200 non-null    float64
 3   Sales      200 non-null    float64
dtypes: float64(4)
memory usage: 7.8 KB


## Linear regression

### Form of linear regression

$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n$

- $y$ is the response
- $\beta_0$ is the intercept
- $\beta_1$ is the coefficient for $x_1$ (the first feature)
- $\beta_n$ is the coefficient for $x_n$ (the nth feature)

In this case:

$y = \beta_0 + \beta_1 \times TV + \beta_2 \times Radio + \beta_3 \times Newspaper$

The $\beta$ values are called the **model coefficients**. These values are "learned" during the model fitting step using the "least squares" criterion. Then, the fitted model can be used to make predictions!

In [4]:
# create X & Y
feature_cols = ['TV', 'Radio', 'Newspaper']
X = dataset[feature_cols]
y = dataset['Sales']
X.shape , y.shape

((200, 3), (200,))

## Splitting X and y into training and testing sets

In [5]:
# default split is 75% for training and 25% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
X_train.shape,y_train.shape,X_test.shape,y_test.shape

((150, 3), (150,), (50, 3), (50,))

## Linear regression model

In [6]:
linreg = LinearRegression()
linreg.fit(X_train, y_train)

LinearRegression()

### Interpreting model coefficients

In [7]:
# print the intercept and coefficients
print(linreg.intercept_)
print(linreg.coef_)

2.8769666223179353
[0.04656457 0.17915812 0.00345046]


In [8]:
# pair the feature names with the coefficients
list(zip(feature_cols, linreg.coef_))

[('TV', 0.04656456787415026),
 ('Radio', 0.1791581224508884),
 ('Newspaper', 0.0034504647111804065)]

$$y = 2.88 + 0.0466 \times TV + 0.179 \times Radio + 0.00345 \times Newspaper$$


### Making predictions

In [9]:
# make predictions on the testing set
y_pred = linreg.predict(X_test)

In [10]:
# Predicting with new data point - 151.5,41.3,58.5,18.5
input = [[151.5,41.3,58.5]]
sales = 18.5
linreg.predict(input)



array([17.5325813])

We need an **evaluation metric** in order to compare our predictions with the actual values!

## Model evaluation metrics for regression

Evaluation metrics for classification problems, such as **accuracy**, are not useful for regression problems. Instead, we need evaluation metrics designed for comparing continuous values.

Let's create some example numeric predictions, and calculate **three common evaluation metrics** for regression problems:

**Mean Absolute Error** (MAE) is the mean of the absolute value of the errors:

$$\frac 1n\sum_{i=1}^n|y_i-\hat{y}_i|$$


**Mean Squared Error** (MSE) is the mean of the squared errors:

$$\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2$$

**Root Mean Squared Error** (RMSE) is the square root of the mean of the squared errors:

$$\sqrt{\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2}$$

Comparing these metrics:

- **MAE** is the easiest to understand, because it's the average error.
- **MSE** is more popular than MAE, because MSE "punishes" larger errors.
- **RMSE** is even more popular than MSE, because RMSE is interpretable in the "y" units.

In [11]:
# calculate MAE 
print(metrics.mean_absolute_error(y_test, y_pred))

1.0668917082595208


In [12]:
# calculate MSE 
print(metrics.mean_squared_error(y_test, y_pred))

1.9730456202283364


In [13]:
# calculate RMSE
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

1.4046514230328948


## To improve RMSE score we will be doing some feature selection

Does **Newspaper** "belong" in our model? In other words, does it improve the quality of our predictions?

Let's **remove it** from the model and check the RMSE!

In [14]:
# removing newspaper from the X inputs
feature_cols = ['TV', 'Radio']
X = dataset[feature_cols]
y = dataset.Sales
# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# fit the model to the training data (learn the coefficients)
linreg = LinearRegression()
linreg.fit(X_train, y_train)

# make predictions on the testing set
y_pred = linreg.predict(X_test)

# compute the RMSE of our predictions
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

1.3879034699382882


The RMSE **decreased** when we removed Newspaper from the model. (Error is something we want to minimize, so **a lower number for RMSE is better**.) Thus, it is unlikely that this feature is useful for predicting Sales, and should be removed from the model.

In [15]:
# Predicting with new data point - 151.5,41.3,58.5,18.5
input = [[151.5,41.3]]
sales = 18.5
linreg.predict(input)



array([17.47020909])