#  Linear Regression

Linear Regression is a statistical technique that is used to find the linear relationship between a dependent and one or more independent variables. 

This technique is applicable for Supervised learning Regression problems where we try to predict a continuous variable.

## Modeling the linear relationship between TV ad spending and product sales

In this example, we will build a Simple Linear Regression model to study the linear relationship between the TV ad spending and product sales for a breakfast cereal.

<image src="tv-ad.gif"/>

## Simple Linear Regression

Simple Linear Regression is one of the simplest models in machine learning. It models the linear relationship between the independent and dependent variables.

In this example, there is one independent or input variable that represents the TV ad spending data and is denoted by X. Similarly, there is one dependent or output variable that represents the product sales data and is denoted by y. 

We want to build a linear relationship between these variables. This linear relationship can be modeled by a mathematical equation of the form:

$$ y = m(x) + b $$

**where:** 

$ x $ is the dependent variable

$ y $ is the independent variable

$ b $ is the intercept

$ m $ is slope

### Step 1: Import the necessary packages

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

### Step 2: Analyze the dataset

Import the dataset into the dataframe with the standard read_csv() function of the pandas library and assign it to the df variable. Then, conduct exploratory data analysis to get a feel for the data.

In [None]:
df = pd.read_csv("advertisement.csv")
df.head()

In [None]:
df.info()

In [None]:
df.describe()

#### Visualize the data

In [None]:
df.plot(x='TV',y='Sales',kind='scatter')

### Step 4: Split the dataset into a training set and a testing set

In [None]:
X = df['TV'].values
y = df['Sales'].values

In [None]:
# Reshape X from a one dimensional array into a two dimensional array
X = X.reshape(-1,1)

# Reshape X from a one dimensional array into a two dimensional array
y = y.reshape(-1,1)

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [None]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

### Step 5: Create the Model

In [None]:
# Create an instance the linear regression class 
lr_model = LinearRegression()

# Train the model using training data sets
lr_model.fit(X_train,y_train)

# Predict on the test data
y_pred = lr_model.predict(X_test)

### Step 6: Get the slope and intercept

In [None]:
# Slope
m = lr_model.coef_[0]
m = m[0]

# Intercept
b = lr_model.intercept_[0]

print("b = ", b)
print("m = ", m)
print("y = ", m, "* x + ", b)

### Step 7: Draw the Regression Line

In [None]:
plt.scatter(X, y, color = 'blue', label='Data Points')
plt.plot(X_test, y_pred, color = 'red', linewidth=3, label = 'Regression Line')
plt.title('Relationship between TV Ads and Product Sales')
plt.xlabel('TV Ads')
plt.ylabel('Sales')
plt.legend(loc=4)
plt.show()

### Step 8: Loss Function

- Compute the Sum of Squared Error

In [None]:
# Reshape y_test from a two dimensional array back to a one dimensional array
y_test = y_test.reshape(-1)

# Reshape y_pred from a two dimensional array back to a one dimensional array
y_pred = y_pred.reshape(-1)

df1 = pd.DataFrame({'Actual_Sales': y_test, 'Predicted_Sales':y_pred})

In [None]:
df1.head()

In [None]:
sum_of_squared_errors = str(np.sum(np.square(df1['Actual_Sales'] - df1['Predicted_Sales'])))

print('residual sum of squares is : ', sum_of_squared_errors)

### Compute $R^2$

In [None]:
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
print(r2)