# Exercises: Linear Regression

Before getting started with coding, read through this [article](https://towardsdatascience.com/simple-linear-regression-model-using-python-machine-learning-eab7924d18b4) and follow the instructions within the article to download the dataset.

Use this notebook to code along with the article.

## Getting Started

Import the libraries you need to start working with the dataset and make a dataframe out of the CSV in the dataset.

In [None]:
# Supress Warnings
import warnings
warnings.filterwarnings('ignore')

# Import libraries
import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns

# Read the given CSV file, and view some sample records
advertising = pd.read_csv("Company_data.csv")
advertising

Run `shape()`, `info()`, and `describe()` to see what is going on with the dataset.

In [None]:
advertising.shape()

In [None]:
advertising.info()

In [None]:
advertising.describe()

## Visualizing Data

Import the necessary libraries to make the same pairplots as the author. Pairplots are used by analysts to see any relationships that may exist between the x- and y-variables.

In [None]:
# Pairplot
sns.pairplot(advertising, x_vars=['TV', 'Radio','Newspaper'], 
             y_vars='Sales', size=4, aspect=1, kind='scatter')
plt.show()

Try out the heatmap next!

In [None]:
# Heatmap
sns.heatmap(advertising.corr(), cmap="YlGnBu", annot = True)
plt.show()

## Time for Linear Regression

Follow the four steps in the article to perform linear regression.

In [None]:
# Step 1 is to assign your x and y
X = advertising['TV']
y = advertising['Sales']

In [None]:
# Step 2 is to create your train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state=100)

In [None]:
# Step 3 is to build the model. Remember to import any libraries you may need
import statsmodels.api as sm
X_train_sm = sm.add_constant(X_train)
lr = sm.OLS(y_train, X_train_sm).fit()
lr.params

In [None]:
# Step 4 is to perform residual analysis
lr.summary()

In [None]:
# Visualizing the regression line
plt.scatter(X_train, y_train)
plt.plot(X_train, 6.948 + 0.054*X_train, 'r')
plt.show()

## Evaluate Your Model

Use your model to make some predictions on the test data.

In [None]:
X_test_sm = sm.add.constant()

In [None]:
from sklearn.metrics import r2_score
r_squared = r2_score(y_test, y_test_pred)
r_squared

In [None]:
plt.scatter(X_test, y_test)
plt.plot(X_test, y_test_pred, 'r')
plt.show()

In [None]:
from sklearn.model_selection import train_test_split
X_train_lm, X_test_lm, y_train_lm, y_test_lm = train_test_split(
    X, y, train_size=0.7, test_size=0.3, random_state=100)
X_train_lm.shape

In [None]:
X_train_lm = X_train_lm.values.reshape(-1,1)
X_test_lm = X_test_lm.values.reshape(-1,1)
print(X_train_lm)
print(X_test_lm)

In [None]:
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train_lm, y_train_lm)

In [None]:
print(f"Intercept: {lm.intercept_}")
print(f"Slope: {lm.coef_}")

In [None]:
y_train_pred = lm.predict(X_train_lm)
y_test_pred = lm.predict(X_test_lm)

print(r2_score(y_train))

## Summarize Your Results

Make note of your answers to each of the following questions by editing the cell.

1. Did you get the same coefficients and p-values as the author?
2. Did you get the same R-squared values as the author?
3. Did you get the same F-statistic value and significance as the author?