# LINEAR REGRESSION

## Problem Statement
    
__Sales__ (in thousands of units) for a particular product as a __function__ of __advertising budgets__ (in thousands of dollars) for _TV, radio, and newspaper media_. Suppose that in our role as __Data Scientist__ we are asked to suggest.

- We want to find a function that given input budgets for TV, radio and newspaper __predicts the output sales__.

- Which media __contribute__ to sales?

- Visualize the __relationship__ between the _features_ and the _response_ using scatter plots.

## Data Loading and Description

The adverstising dataset captures sales revenue generated with respect to advertisement spends across multiple channels like radio, tv and newspaper.
- TV        - Spend on TV Advertisements
- Radio     - Spend on radio Advertisements
- Newspaper - Spend on newspaper Advertisements
- Sales     - Sales revenue generated

__Importing Packages__

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')


from sklearn import metrics

import numpy as np

# allow plots to appear directly in the notebook
%matplotlib inline

#### Importing the Dataset

In [None]:
data = pd.read_csv('../input/advertising.csv/Advertising.csv',index_col=0)
data.head(10)

In [None]:
data['TV'].head(5)

In [None]:
data['radio'].head(5)

In [None]:
data['TV'].values[0]

In [None]:
data.info()

In [None]:
data['TV'].min(), data['TV'].max()

In [None]:
data.describe()

What are the **features**?
- TV: advertising dollars spent on TV for a single product in a given market (in thousands of dollars)
- Radio: advertising dollars spent on Radio
- Newspaper: advertising dollars spent on Newspaper

What is the **response**?
- Sales: sales of a single product in a given market (in thousands of widgets)

## EDA

### Univariate plots

In [None]:
data['TV'].plot(kind='hist');

In [None]:
data['radio'].plot(kind='hist');

In [None]:
data['newspaper'].plot(kind='hist');

In [None]:
data['sales'].plot(kind='hist');

In [None]:
data['newspaper'].plot(kind='box');

In [None]:
data['TV'].plot(kind='box');

In [None]:
data['radio'].plot(kind='box');

### Multivariate plots

In [None]:
data.head()

In [None]:
sns.scatterplot(x="TV", y="sales", data=data);

In [None]:
sns.scatterplot(x="radio", y="sales", data=data);

In [None]:
sns.scatterplot(x="newspaper", y="sales", data=data);

## Train Test split

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
data.head()

In [None]:
X_cols = ['TV','radio','newspaper']

X = data[X_cols]


In [None]:
X

In [None]:
y = data['sales']
y.head(10)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)

In [None]:
X_train.shape

In [None]:
y_train.shape

In [None]:
X_test.shape

In [None]:
y_test.shape

## Train

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
model = LinearRegression()

In [None]:
model.fit(X_train,y_train)

```
sales = a*TV + b*radio + c*newspaper + d
```

In [None]:
X_train.head()

In [None]:
y_train.head()

In [None]:
model.predict(X_train)[:5]

## Evaluate on unseen data

In [None]:
y_pred = model.predict(X_test)

In [None]:
X_test.shape

In [None]:
y_pred

In [None]:
y_test

In [None]:
fig,ax = plt.subplots(figsize=(10,8))

sns.scatterplot(y_test,y_pred,ax=ax);

ax.set_xlabel("Actual Sales");

ax.set_ylabel("Predicted Sales");

In [None]:
from sklearn.metrics import mean_absolute_error

In [None]:
error = mean_absolute_error(y_true = y_test, y_pred = y_pred)

In [None]:
error

### benchmark

In [None]:
mean_sales = y_train.mean()
mean_sales

In [None]:
y_test.shape

In [None]:
len(y_test)

In [None]:
mean_prediction = [mean_sales]* len(y_test)
mean_prediction

In [None]:
error_benchmark = mean_absolute_error(y_true = y_test, y_pred = mean_prediction)

error_benchmark

## Save and load model

In [None]:
import joblib

In [None]:
model_filename = "lr_model.joblib"

In [None]:
joblib.dump(model,model_filename)

In [None]:
ls

In [None]:
model_new = joblib.load(model_filename)

In [None]:
model_new

## Predict

In [None]:
model_new.predict(X_test)

In [None]:
model.coef_