# Basic Linear Regression in Python
Code adapted from [Towards Data Science](https://towardsdatascience.com/introduction-to-linear-regression-in-python-c12a072bedf0)

In [None]:
# Import the modules we want to use
import pandas
import numpy
import seaborn
from matplotlib import pyplot as plt

## Importing data

Data can be easily imported into dataframes from spreadsheets or databases.

In [None]:
# Import and display first five rows of advertising dataset
advert = pandas.read_csv('Advertising.csv')
advert = advert.rename(columns={'Unnamed: 0':'transaction'})

advert.head()

Data can be pivoted and reflowed:

In [None]:
pivoted = advert.melt(id_vars=['transaction', 'sales']) # Make a table with one row per channel
pivoted = pivoted.rename(columns={'value':'spend', 'variable':'channel'}) # Rename columns to be more friendly

pivoted.head()

## Taking a look at the data

We can quickly visualise the relationships in the data.

In [None]:
plt.figure(figsize=(12, 6))
ax = seaborn.scatterplot(data=pivoted, x='sales', y='spend', hue='channel')

## Modeling the data

Modules such as scikit-learn offer a huge range of options to model data

In [None]:
from sklearn.linear_model import LinearRegression

# Build linear regression model using TV and Radio as predictors
# Split data into predictors X and output Y
predictors = ['TV', 'radio']
X = advert[predictors]
y = advert['sales']

# Initialise and fit model
lm = LinearRegression()
model = lm.fit(X, y)

In [None]:
print(f'alpha = {model.intercept_}')
print(f'betas = {model.coef_}')

In [None]:
# Predict Sales for a combination of TV and Radio advertising spend
new_X = [[300, 200]]
print(model.predict(new_X))

## Model Validation

sklearn has many functions to simplfy the validation of models, such as generating training/test splits.

In [None]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(advert, test_size=0.2)

In [None]:
# Build linear regression model on the training data using all predictors
predictors = ['TV', 'radio', 'newspaper']
X = train[predictors]
y = train['sales']

# Initialise and fit model
lm = LinearRegression()
model = lm.fit(X, y)

In [None]:
# Use the model to predict sales for our test data
predictions = lm.predict(test[predictors])

In [None]:
# Plot predicted sales vs the true value
plt.figure(figsize=(12, 6))
plt.scatter(test['sales'], predictions)
plt.xlabel("True Values")
plt.ylabel("Predictions")

In [None]:
# sklearn has built in functions to generate quality metrics such as R^2
print(f'Model R^2: {model.score(test[predictors], test["sales"]):.2f}')