
<a href="https://colab.research.google.com/github/kokchun/Maskininlarning-AI21/blob/main/Lectures/L0-Linear_regression.ipynb" target="_parent"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> &nbsp; for interacting with the code

---
# Lecture notes - Linear regression

---
This is the lecture note for **linear regression**

<p class = "alert alert-info" role="alert"><b>Note</b> that this lecture note gives a brief introduction to linear regression. I encourage you to read further about linear regression.

Read more 

- [ISLRv2 pp 59-82](https://www.statlearning.com/)
- [Numpy polyfit](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html)
- [Seaborn regplot](https://seaborn.pydata.org/generated/seaborn.regplot.html)

---

## Case 

There is a company which spends money on advertisement for different media channels: TV, radio and newspaper.  

**Task:**

- suggest marketing plan to increase sales units

This task is broad and can be broken down into the following subquestions
1. Relationship between ads and sales?
2. How strong relationship between ads and sales?
3. Which media associated with sales?
4. How large is the association between each ad and sale?
5. How accurate can we predict future sales? 
6. Relationship linear?
7. Synergy interacction among ad media?

---
## Initial EDA - Exploratory Data Analysis

The dataset for this lecture comes from ISLR - Introduction to Statistical Learning. The dataset used is [Advertising.csv](https://www.kaggle.com/ishaanv/ISLR-Auto)

Units: 
- TV, radio, newspaper - thousands dollars
- Sales - thousands units

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

plt.style.use("seaborn-white")

df = pd.read_csv("Data/Advertising.csv", index_col=0)
df.head()


In [None]:
df.info()

In [None]:
df.describe()

In [None]:
number_features = df.shape[1] - 1
fig, ax = plt.subplots(1, number_features, figsize=(8, 3), dpi=100)

for i, feature in enumerate(df.columns[:-1]):
    sns.scatterplot(data=df, x=feature, y="Sales", ax=ax[i])
    ax[i].set(xlabel="Spending", title=f"{feature} spendings")

fig.tight_layout()


In [None]:
# plot pairwise relationships in a df
ax = sns.pairplot(df, corner=True, height=2)
# set corner to True as upper right mirrors the corner, this saves computations


--- 
## Simple linear regression

Equation for linear regression for one predictor variable is
$y \approx \beta_0 + \beta_1X$, 

where $X$ is the predictor variable, $y$ is the response variable, $\beta_0$ is intercept and $\beta_1$ is slope. $\beta_0$ and $\beta_1$ are unknown parameters that needs to be estimated with the data points $(x_1, y_1), (x_2, y_2), \ldots, (x_n,y_n)$.

Example: 

- $y$ - sales 
- $X$ - TV spending as it looks to fit a line more than the others, through visually inspecting the scatterplots

We use the data points to get a sample estimate $\hat{\beta}_0$, $\hat{\beta}_1$, by using least squares estimate, which is the most common method. Then we get the prediction of $y$ as $\hat{y} = \hat{\beta}_0+\hat{\beta}_1x$, which is a line that is as close as possible to the data points.

In [None]:
X, y = df["TV"], df["Sales"]
# fits a polynomial of degree deg using least squares polynomial fit
beta_1, beta_0 = np.polyfit(X, y, deg=1)  # returns coefficient with highest power first

# predicted y, note that beta_0 and beta_1 are actually beta_0, beta_1 hats as they are sample estimates


y_hat = lambda x: beta_0 + beta_1 * x

print(f"Intercept beta_0 hat: {beta_0:.4f}")
print(f"Slope beta_1 hat: {beta_1:.4f}")  # spend $1000 gives 47.5 extra sold units

spend = np.linspace(0, 350)

fig, ax = plt.figure(figsize=(5, 3), dpi=100), plt.axes()

sns.scatterplot(data=df, x="TV", y="Sales")
sns.lineplot(x=spend, y=y_hat(spend), color="red")

ax.set(
    title="TV advertisement linear regression",
    xlabel="Thousands dollars",
    ylabel="Sales thousands units",
);


In [None]:
# regression line using seaborn regplot
sns.regplot(x=X,y=y); 

---
## Multiple linear regression

We used numpy polyfit to estimate $\beta_0, \beta_1$ from the TV-feature alone. polyfit requires 1D vector, but we want to use all the features (TV, radio, newspaper) for prediction, i.e. we want to use more explanatory variables. One way is to manually solve for $\hat{\bm{\beta}} = (\bm{X}^T\bm{X})^{-1}\bm{X}^T\bm{y}$ as we did [linear algebra course](https://github.com/kokchun/Linjar-algebra-21/blob/main/Lectures/Lec6-Matrix/Linear_regression.ipynb), but now we add more features which is simply more columns.

where $\bm{X} = 	\begin{bmatrix} 
	1 & x_1^{(1)} & x_2^{(1)}& \ldots &x_n^{(1)} \\
	1 & x_1^{(2)} & x_2^{(2)}& \ldots &x_n^{(2)}\\
	\vdots & \vdots & \vdots & \ddots & \vdots\\
	1 & x_1^{(m)}& x_2^{(m)}& \ldots &x_n^{(m)}
\end{bmatrix}, \bm{y} = \begin{bmatrix} 
y_1 \\ y_2 \\ \vdots \\ y_m
\end{bmatrix}, \hat{\bm{\beta}} = \begin{bmatrix} 
\hat\beta_0 \\\hat\beta_1\\ \vdots \\ \hat\beta_n
\end{bmatrix}$

In our example we have $n = 3$ features, $m = 200$ samples, which gives us the regression coefficients $\hat\beta_0, \hat\beta_1, \hat\beta_2, \hat\beta_3$.

The regression line is $y = \hat\beta_0 + \hat\beta_1x_1 + \hat\beta_2x_2 + \hat\beta_3x_3$. With this equation we can predict the sale for a new sample $i$ by $y^{(i)} = \hat\beta_0 + \hat\beta_1x_1^{(i)} + \hat\beta_2x_2^{(i)} + \hat\beta_3x_3^{(i)}$


---

Kokchun Giang

[LinkedIn][linkedIn_kokchun]

[GitHub portfolio][github_portfolio]

[linkedIn_kokchun]: https://www.linkedin.com/in/kokchungiang/
[github_portfolio]: https://github.com/kokchun/Portfolio-Kokchun-Giang

---