In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

In [2]:
data_df = pd.read_csv("https://www.statlearning.com/s/Advertising.csv", index_col=0)

data_df.head()

Unnamed: 0,TV,radio,newspaper,sales
1,230.1,37.8,69.2,22.1
2,44.5,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9


This is a multiple Linear Regression problem with three independent variables: TV , radio and newspaper and one dependent variable: sales. There are 200 samples in the dataset *ie.* $n = 200$

This multiple linear regression problem can be represented in matrix form as:

$$\mathbf{\hat{y}} = \mathbf{X} \boldsymbol{\beta}$$

$$\begin{bmatrix}
\hat{y_1} \\ 
\hat{y_2} \\
\vdots \\
\hat{y_{200}}
\end{bmatrix} =   \begin{bmatrix}
  1 & x_{1\ 1} & x_{1\ 2} & x_{1\ 3} \\
  1 & x_{2\ 1} & x_{2\ 2} & x_{2\ 3} \\
  \vdots  & \vdots  & \ddots & \vdots  \\
  1 & x_{200\ 1} & x_{200\ 2} & x_{200\ 3}
 \end{bmatrix} \times \begin{bmatrix}
\beta_0 \\ 
\beta_1 \\
\beta_2 \\
\beta_3
\end{bmatrix}$$


The predicted output for the samples can be computed as:
\begin{align*}\hat{y_1} &= \beta_0x_{1\ 0}+ \beta_1x_{1\ 1} + \beta_2x_{1\ 2} + \beta_3 x_{1\ 3}\\
\hat{y_2} &= \beta_0x_{2\ 0}+ \beta_2x_{2\ 1} + \beta_2x_{2\ 2} + \beta_3 x_{2\ 3}\\
\hat{y_3} &= \beta_0x_{3\ 0}+ \beta_1x_{3\ 1} + \beta_2x_{3\ 2} + \beta_3 x_{3\ 3}\\
.\\
.\\
.\\
.\\
\hat{y_{200}} &= \beta_0x_{200\ 0}+ \beta_1x_{200\ 1} + \beta_2x_{200\ 2} + \beta_3 x_{200\ 3}
\end{align*}


Generalizing, for any $i^{th}$ sample, predicted output can be computed as:

$$\hat{y_i} = \beta_0x_{i\ 0}+ \beta_1x_{i\ 1} + \beta_2x_{i\ 2} + \beta_3 x_{i\ 3}$$
 where for all $i$ = $1$ to $n$, $x_{i0} =1$



In [12]:
X = data_df.drop('sales', axis=1).to_numpy()
y = data_df['sales'].to_numpy()

scaler = StandardScaler()
scaled_X = scaler.fit_transform(X)

In [13]:
X = np.concat(
    (np.ones((200,1)) , scaled_X),
    axis = 1
)

In [14]:
n,d = X.shape