# Basic Linear Regression in Python
Code adapted from [Towards Data Science](https://towardsdatascience.com/introduction-to-linear-regression-in-python-c12a072bedf0)

This is a Jupyter notebook. It allows you to run Python code, and can display the results back to you.

The code in a notebook is divided into cells - code in each cell can be run independently, by clicking in the cell and then the 'Run' button in the menu bar above, or by typing _shift-enter_.

In [None]:
# In this cell we import the modules we want to use later on.
import pandas     # To work with tables
import numpy      # For maths
import seaborn    # For advanced plots
import cufflinks  # For interactive plots
from matplotlib import pyplot as plt

cufflinks.go_offline()

## Importing data

Data can be easily imported from spreadsheets or databases.

The `pandas` module provides common functions for working with (_n_-dimensional) tabular data in Python.

In [None]:
# Import a dataset of advertising spends
advert = pandas.read_csv('Advertising.csv', index_col=0)

# Display the first five rows of the data
advert.head()

Data can be pivoted and reflowed:

In [None]:
# Make a table with one row per ad channel
pivoted = pandas.melt(advert.reset_index(), id_vars=['index', 'sales'])

# Rename columns to be more friendly
pivoted = pivoted.rename(columns={'index':'transaction', 'value':'spend', 'variable':'channel'})

pivoted.head()

## Taking a look at the data

We can quickly visualise the relationships in the data.

The `seaborn` module provides many common plots.

See [here](https://seaborn.pydata.org/examples/index.html) for a gallery of `seaborn` plots.

In [None]:
plt.figure(figsize=(12, 6))
ax = seaborn.scatterplot(data=pivoted, x='sales', y='spend', hue='channel')

In [None]:
plt.figure(figsize=(12, 6))
ax = seaborn.violinplot(data=pivoted, y='spend', x='channel')

Sometimes an interative view of the data is more useful.

The `cufflinks` package provides an interactive view of a `pandas` dataframe.

More examples can be found [here](https://plot.ly/ipython-notebooks/cufflinks/).

In [None]:
advert.iplot(kind='scatter', x='sales', mode='markers')

In [None]:
advert.iplot(kind='box')

## Modeling the data

Modules such as `scikit-learn` offer a huge range of options to model data.

Here we build a simple linear regression model to predict sales as a function to advertising spend.

In [None]:
from sklearn.linear_model import LinearRegression

# Build linear regression model using TV and Radio as predictors
# Split data into predictors X and output Y
predictors = ['TV', 'radio']
X = advert[predictors]
y = advert['sales']

# Initialise and fit model
lm = LinearRegression()
model_TV_Radio = lm.fit(X, y)

In [None]:
# We can examine the model coefficients
print(f'alpha = {model_TV_Radio.intercept_}')
print(f'betas = {model_TV_Radio.coef_}')

In [None]:
# And predict Sales for a combination of TV and Radio advertising spend
new_X = [[300, 200]]
print(model_TV_Radio.predict(new_X))

## Model validation

`sklearn` has many functions to simplfy the validation of models, such as generating training/test splits.

In [None]:
from sklearn.model_selection import train_test_split

# Divide the data 80-20 into a training and test set
train, test = train_test_split(advert, test_size=0.2)

In [None]:
# Build linear regression model on the training data using all predictors
predictors = ['TV', 'radio', 'newspaper']
X = train[predictors]
y = train['sales']

# Initialise and fit model
lm = LinearRegression()
model_All = lm.fit(X, y)

In [None]:
# Use the model to predict sales for our test data
test['predictions'] = model_All.predict(test[predictors])

In [None]:
plt.figure(figsize=(12, 6))
ax = seaborn.regplot(y='sales', x='predictions', data=test)

`sklearn` has built in functions to generate quality metrics such as R²

In [None]:
r_squared = model_All.score(test[predictors], test["sales"])
print(f'Model R²: {r_squared:.2f}')

## Up for a challange?

Build a model predicting sales as a function of only radio and newpaper spend, and compare it to previous model.