In [732]:
import numpy as np
import matplotlib.pyplot as plt

np.set_printoptions(precision=4, suppress=True)

# 1. Simple linear regression

In [733]:
age = np.array([5, 6, 7, 8, 9])
height = np.array([100, 105, 108, 112, 115])

Calculate $\bar{x}$ (Mean Age)  
Calculate $\bar{y}$ (Mean Height)

In [734]:
age.mean()
height.mean()

np.float64(7.0)

np.float64(108.0)

$x - \bar{x}$  
$y - \bar{y}$

In [735]:
age - age.mean()
height - height.mean()

array([-2., -1.,  0.,  1.,  2.])

array([-8., -3.,  0.,  4.,  7.])

$(x - \bar{x})^2$

In [736]:
(age - age.mean()) ** 2

array([4., 1., 0., 1., 4.])

$(x - \bar{x})(y - \bar{y})$

In [737]:
(age - age.mean()) * (height - height.mean())

array([16.,  3.,  0.,  4., 14.])

Sum the $(x - \bar{x})^2$ column. This is your Denominator ($S_{xx}$).

In [738]:
denominator = ((age - age.mean()) ** 2).sum()
denominator

np.float64(10.0)

Sum the $(x - \bar{x})(y - \bar{y})$ column. This is your Numerator ($S_{xy}$).

In [739]:
numerator = ((age - age.mean()) * (height - height.mean())).sum()
numerator

np.float64(37.0)

Calculate Slope $m = \frac{\text{Numerator}}{\text{Denominator}}$

In [740]:
m = numerator / denominator
m

np.float64(3.7)

Calculate Intercept $b = \bar{y} - m\bar{x}$

In [741]:
b = height.mean() - (m * age.mean())
b

np.float64(82.1)

Final Equation:$$Height = 3.7(Age) + 82.1$$

Prediction  
For Age 10: $y = 3.7(10) + 82.1 =$ $119.1 \text{ cm}$

In [742]:
x = 10
y = m * x + b
y

np.float64(119.1)

Let's do some more predictions

In [743]:
for x in age:
  y = m * x + b
  print(y)

100.6
104.3
108.0
111.69999999999999
115.4


How far off we are from actual heights?

In [744]:
for x, y in zip(age, height):
  y_ = m * x + b
  print(y, y_)

100 100.6
105 104.3
108 108.0
112 111.69999999999999
115 115.4


No need to manually iterate over samples.  
Should enjoy numpy broadcasting üòè

In [745]:
height_pred = m * age + b
height_pred

array([100.6, 104.3, 108. , 111.7, 115.4])

Calculate residuals

In [746]:
height - height_pred

array([-0.6,  0.7,  0. ,  0.3, -0.4])

Sum Squared Residuals (SSR)

In [747]:
np.sum((height - height_pred) ** 2)

np.float64(1.1000000000000085)

Mean Squared Error (MSE)

In [748]:
np.mean((height - height_pred) ** 2)

np.float64(0.2200000000000017)

Root Mean Squared Error (RMSE)

In [749]:
np.sqrt(np.mean((height - height_pred) ** 2))

np.float64(0.46904157598234475)

$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$$

Where:  
$SS_{res}$ (Residual Sum of Squares): $\sum (y_{true} - \hat{y})^2$ ‚Äî The error our model makes.  
$SS_{tot}$ (Total Sum of Squares): $\sum (y_{true} - \bar{y})^2$ ‚Äî The variation in the data itself.

In [750]:
ss_res = np.sum((height - height_pred) ** 2)
ss_res

ss_tot = np.sum((height - height.mean()) ** 2)
ss_tot

r2 = 1 - ss_res / ss_tot
r2

np.float64(1.1000000000000085)

np.float64(138.0)

np.float64(0.9920289855072463)

# 2. Multiple linear regression (with just 1 feature)

**`X` should be a matrix (two dimensions)**

1. Either expand dimension

In [751]:
X = np.expand_dims(age, axis=1)
X
X.shape  # 5 rows, 1 col

array([[5],
       [6],
       [7],
       [8],
       [9]])

(5, 1)

1. Or reshape (preferred)

In [752]:
X = age.reshape(-1, 1)
X
X.shape  # 5 rows, 1 col

array([[5],
       [6],
       [7],
       [8],
       [9]])

(5, 1)

In [753]:
bias_col = np.ones(len(age))
bias_col

array([1., 1., 1., 1., 1.])

Add bias column at the beginning. To finally have `X` like:  
Shape: $(5 \times 2)$  
Column 1: The Bias (all 1s).  
Column 2: The Age values (5, 6, 7, 8, 9).  

In [754]:
X = np.c_[bias_col, X]
X
X.shape

array([[1., 5.],
       [1., 6.],
       [1., 7.],
       [1., 8.],
       [1., 9.]])

(5, 2)

Create $y$. This is just the Height values.  
Shape: $(5 \times 1)$

In [755]:
y = height.reshape(-1, 1)
y
y.shape

array([[100],
       [105],
       [108],
       [112],
       [115]])

(5, 1)

The Transpose ($X^T$)  

It should be shape $(2 \times 5)$.  

In [756]:
X.T
X.T.shape

array([[1., 1., 1., 1., 1.],
       [5., 6., 7., 8., 9.]])

(2, 5)

The Gram Matrix ($X^T X$)

Captures the "spread" (variance) of your features and how much they overlap with each other (covariance).

Gram matrix will look like this:
$$\begin{bmatrix} \text{Count}(n) & \sum x \\ - & \sum x^2 \end{bmatrix}$$

In [757]:
X.T.shape, X.shape

((2, 5), (5, 2))

In [758]:
X.T  # Rows of this will be multiplied to -
X  # columns of this

array([[1., 1., 1., 1., 1.],
       [5., 6., 7., 8., 9.]])

array([[1., 5.],
       [1., 6.],
       [1., 7.],
       [1., 8.],
       [1., 9.]])

Matrix Multiplication

For matrices $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{n \times p}$, the elements of the product $C = AB$ are given by:$$C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}$$

Explanation:

$A$ is an $m \times n$ matrix (rows $\times$ columns).

$B$ is an $n \times p$ matrix.

The resulting matrix $C$ is $m \times p$.

To find the value at row $i$, column $j$ of the result, you perform a dot product of the $i$-th row of $A$ and the $j$-th column of $B$.

In [759]:
C = np.zeros(shape=(2, 2))

for i in range(2):
  for j in range(2):
    C[i, j] = np.dot(X.T[i], X[:, j])

C

array([[  5.,  35.],
       [ 35., 255.]])

Visualizing the dot product of rows and columns

In [760]:
C = np.zeros(shape=(2, 2), dtype=object)

for i in range(2):
  for j in range(2):
    row = X.T[i].astype(int)
    col = X[:, j].astype(int)
    dot_viz = ' + '.join([f'{a}*{b}' for a, b in zip(row, col)])
    dot_res = np.dot(row, col)
    print(f'at pos {i}{j}: {dot_viz} = {dot_res}')

at pos 00: 1*1 + 1*1 + 1*1 + 1*1 + 1*1 = 5
at pos 01: 1*5 + 1*6 + 1*7 + 1*8 + 1*9 = 35
at pos 10: 5*1 + 6*1 + 7*1 + 8*1 + 9*1 = 35
at pos 11: 5*5 + 6*6 + 7*7 + 8*8 + 9*9 = 255


In [761]:
gram_matrix = X.T @ X
gram_matrix
gram_matrix.shape

array([[  5.,  35.],
       [ 35., 255.]])

(2, 2)

The Moment Vector ($X^T y$)

Captures the "alignment" (correlation) between your features and the target variable.

Moment vector will look like:

$$\begin{bmatrix} \sum y \\ \sum (x \cdot y) \end{bmatrix}$$

In [762]:
X.T.shape, y.shape

((2, 5), (5, 1))

In [763]:
moment_vector = X.T @ y
moment_vector
moment_vector.shape

array([[ 540.],
       [3817.]])

(2, 1)

The Inverse ($(X^T X)^{-1}$)

In [764]:
gram_matrix_inv = np.linalg.inv(gram_matrix)
gram_matrix_inv
gram_matrix_inv.shape

array([[ 5.1, -0.7],
       [-0.7,  0.1]])

(2, 2)

Finally: Solve for $\beta$

Multiply `The Inverse` by `The Moment Vector`.

In [765]:
gram_matrix_inv.shape, moment_vector.shape

((2, 2), (2, 1))

In [766]:
beta = gram_matrix_inv @ moment_vector
beta
beta.shape

array([[82.1],
       [ 3.7]])

(2, 1)

Once you have calculated the $\beta$ vector, you have "trained" your model.

$\beta_0$ (Intercept): $\mathbf{82.1}$

$\beta_1$ (Slope): $\mathbf{3.7}$

The Prediction Equation:$$\hat{y} = X_{test} \cdot \beta$$

In [767]:
X.shape, beta.shape

((5, 2), (2, 1))

In [768]:
height_pred = X @ beta
height_pred
height_pred.shape

array([[100.6],
       [104.3],
       [108. ],
       [111.7],
       [115.4]])

(5, 1)

Evaluation metrics

In [769]:
height_pred = height_pred.flatten()
height_pred
height_pred.shape

array([100.6, 104.3, 108. , 111.7, 115.4])

(5,)

In [770]:
mse = np.mean((height - height_pred)**2)
mse

np.float64(0.21999999999999945)

In [771]:
rmse = np.sqrt(np.mean((height - height_pred)**2))
rmse

np.float64(0.46904157598234236)

In [772]:
ss_res = np.sum((height - height_pred)**2)
ss_tot = np.sum((height - height.mean())**2)

r2 = 1 - ss_res / ss_tot
r2

np.float64(0.9920289855072464)

# 3. Multiple linear regression (with 2 features)

We are predicting Height ($y$) based on Age ($x_1$) and Weight ($x_2$).

In [773]:
data = np.array([
    [5,  20, 100],  # age(x1), weight(x2), height(y)
    [6,  30, 110],
    [8,  25, 115],
    [7,  40, 120],
    [4,  50, 105],
    [5,  70, 140],
])

Step 1 ($X$): Create the Design Matrix. Remember the Bias Trick (Column of 1s first).

In [774]:
bias_col = np.ones(len(data))
bias_col

array([1., 1., 1., 1., 1., 1.])

In [775]:
X_ = data[:, :-1]  # data without last (target) col
X_
X_.shape

array([[ 5, 20],
       [ 6, 30],
       [ 8, 25],
       [ 7, 40],
       [ 4, 50],
       [ 5, 70]])

(6, 2)

In [776]:
X = np.c_[bias_col, X_]
X
X.shape

array([[ 1.,  5., 20.],
       [ 1.,  6., 30.],
       [ 1.,  8., 25.],
       [ 1.,  7., 40.],
       [ 1.,  4., 50.],
       [ 1.,  5., 70.]])

(6, 3)

Step 2 ($y$): Create the Target Vector.

In [777]:
y = data[:, [-1]]  # data with just last col
y
y.shape

array([[100],
       [110],
       [115],
       [120],
       [105],
       [140]])

(6, 1)

The Gram Matrix ($X^T X$)

Captures the "spread" (variance) of your features and how much they overlap with each other (covariance).

Gram matrix will look like this:

$$\begin{bmatrix} \text{Count}(n) & \sum x_1 & \sum x_2 \\ - & \sum x_1^2 & \sum x_1 x_2 \\ - & - & \sum x_2^2 \end{bmatrix}$$

In [778]:
X.T.shape, X.shape

((3, 6), (6, 3))

In [779]:
gram_matrix = X.T @ X
gram_matrix
gram_matrix.shape

array([[    6.,    35.,   235.],
       [   35.,   215.,  1310.],
       [  235.,  1310., 10925.]])

(3, 3)

The Moment Vector ($X^T y$)

Captures the "alignment" (correlation) between your features and the target variable.

Moment vector will look like:

$$\begin{bmatrix} \sum y \\ \sum (x_1 \cdot y) \\ \sum (x_2 \cdot y) \end{bmatrix}$$

In [780]:
X.T.shape, y.shape

((3, 6), (6, 1))

In [781]:
moment_vector = X.T @ y
moment_vector
moment_vector.shape

array([[  690.],
       [ 4040.],
       [28025.]])

(3, 1)

The Inverse ($(X^T X)^{-1}$)

In [782]:
gram_matrix_inv = np.linalg.inv(gram_matrix)
gram_matrix_inv
gram_matrix_inv.shape

array([[ 7.0583, -0.8313, -0.0521],
       [-0.8313,  0.1152,  0.0041],
       [-0.0521,  0.0041,  0.0007]])

(3, 3)

Finally: Solve for $\beta$

Multiply `The Inverse` by `The Moment Vector`.

In [783]:
beta = gram_matrix_inv @ moment_vector
beta
beta.shape

array([[50.3834],
       [ 5.7989],
       [ 0.7861]])

(3, 1)

Once you have calculated the $\beta$ vector, you have "trained" your model.

- Intercept ($\beta_0$): $\mathbf{50.38}$

- Age Slope ($\beta_1$): $\mathbf{5.79}$

- Weight Slope ($\beta_2$): $\mathbf{0.78}$

The Final Equation:

$$Height = 50.38 + 5.79(\text{Age}) + 0.78(\text{Weight})$$

Interpretation:

- Base Height: A child with 0 age and 0 weight would theoretically be 50.38 cm.

- Age Factor: For every year older, they grow about 5.79 cm.

- Weight Factor: For every kg heavier, they grow about 0.78 cm.

Let's make the predictions

$$\hat{y} = X_{test} \cdot \beta$$

In [784]:
X.shape, beta.shape

((6, 3), (3, 1))

In [785]:
height_pred = X @ beta
height_pred
height_pred.shape

array([[ 95.1004],
       [108.7605],
       [116.4278],
       [122.4205],
       [112.8848],
       [134.406 ]])

(6, 1)

In [786]:
height_pred = height_pred.flatten()
height_pred
height_pred.shape

array([ 95.1004, 108.7605, 116.4278, 122.4205, 112.8848, 134.406 ])

(6,)