### What is linear regression ?

$ y = ax + b $

* <var>x</var>: input features
* <var>w</var>: weight (slope)
* <var>b</var>: bias (intercept)
* <var>y</var>: predicted

for example:

$ y = x² + 1$

dataset example:

| x | y |
|---|---|
| 1 | 2 |
| 2 | 5 |
| 3 | 10|

$
Y = \begin{pmatrix}
y_1 \\
y_2 \\
y_3 \\
\vdots \\
y_n
\end{pmatrix}
$

$
\begin{pmatrix}
x_1 \\
x_2 \\
x_3 \\
\vdots \\
x_n
\end{pmatrix}
\cdot
\begin{pmatrix}
w_1
\end{pmatrix} + b
= \begin{pmatrix}
x_1w_1 + b \\
x_2w_1 + b \\
x_3w_1 + b \\
\vdots \\
x_nw_1 + b \\
\end{pmatrix}
= Ŷ
$
that is
$
Ŷ = \begin{pmatrix}
ŷ_1 \\
ŷ_2 \\
ŷ_3 \\
\vdots \\
ŷ_n
\end{pmatrix}
$

$
errors = Ŷ - Y
$

##### Descent gradients
* $ Y = WX^T + b $
* $ MSE = \frac{errors²}{n} $
* $ errors = Ŷ - (WX^T + b)$

$$ \frac{\delta MSE}{\delta W} = \frac{\delta MSE}{\delta errors} \cdot \frac{\delta errors}{\delta W} $$

$ \frac{\delta MSE}{\delta W} = \frac{-2X^Terrors}{n}$
$\implies$
$ \frac{\delta MSE}{\delta W} = \frac{X^Terrors}{n}$

$$ \frac{\delta MSE}{\delta b} = \frac{\delta MSE}{\delta errors} \cdot \frac{\delta errors}{\delta b} $$

$ \frac{\delta MSE}{\delta b} = \frac{-2errors}{n}$
$\implies$
$ \frac{\delta MSE}{\delta b} = \frac{errors}{n}$

##### update weights and bias

$ new_W = old_W - \alpha\cdot \frac{\delta MSE}{\delta W} $

$ new_b = old_b - \alpha\cdot \frac{\delta MSE}{\delta b} $

In [89]:
import numpy as np
import math

In [58]:
class LinearRegression:
  def __init__(self, learning_rate=0.01, n_iterations=1000):
    self.learning_rate = learning_rate
    self.n_iterations = n_iterations
    self.weights = None
    self.bias = None
  pass

  def fit(self, X, y):
    n_samples, n_features = X.shape
    print(f"n_samples: {n_samples}, n_features: {n_features}")
    self.weights = np.random.normal(0, 0.1, size=(n_features, 1)) # [0]
    self.bias = 0

    for _ in range(self.n_iterations):
      # make predictions
      y_predicteds = np.dot(X, self.weights) + self.bias # Linear model prediction y = ax + b [[1], [2]] x [0]
      # calculate the errors
      errors = y_predicteds - y

      # update weights and bias (gradients)
      self.weights -= self.learning_rate * (1/n_samples) * np.dot(X.T, errors)
      self.bias -= self.learning_rate * (1/n_samples) * np.sum(errors)
    pass
  pass

  def predict(self, X):
    y_predicteds = np.dot(X, self.weights) + self.bias
    return y_predicteds
  pass

In [67]:
X = np.array([[1], [2], [3], [4], [5]])
Y = np.array([[2], [5], [10], [17], [26]])

In [118]:
learning_rate = 0.01
n_iterations = 10000

linear_model = LinearRegression(learning_rate, n_iterations)
linear_model.fit(X, Y)
predictions = linear_model.predict(X) # data leaking, but for simplicity we use the same data for training and prediction

print(predictions)

n_samples: 1000, n_features: 1
[[ 1.19585103e+00]
 [ 4.11275970e+00]
 [ 4.14408944e+00]
 [ 8.50075380e-01]
 [ 3.14584751e+00]
 [ 2.10511498e+00]
 [ 5.59461596e+00]
 [ 2.36079197e+00]
 [ 4.48532847e+00]
 [ 3.74800917e+00]
 [ 1.93655119e+00]
 [ 1.06387436e+00]
 [ 1.61908768e+00]
 [ 3.49875310e+00]
 [ 2.93903546e-01]
 [-1.37317817e+00]
 [ 3.76106148e+00]
 [ 6.03830841e+00]
 [ 2.51747979e+00]
 [ 3.45964812e+00]
 [ 6.16800147e+00]
 [ 3.94628334e+00]
 [ 3.67503112e+00]
 [ 3.81296810e+00]
 [ 3.55738402e+00]
 [ 4.51812118e-01]
 [ 3.27680311e+00]
 [ 6.61124417e+00]
 [ 3.99583823e+00]
 [ 2.31312550e+00]
 [ 4.69349836e+00]
 [ 6.71908138e+00]
 [ 1.97315557e+00]
 [ 6.94811095e+00]
 [ 2.20226571e+00]
 [ 5.94928494e-01]
 [ 1.52033333e+00]
 [ 5.79908975e-01]
 [ 3.19974300e-02]
 [ 3.77895834e+00]
 [ 1.45699593e+00]
 [ 2.44062292e+00]
 [ 3.12972259e+00]
 [ 3.93312815e+00]
 [ 5.63039031e+00]
 [ 1.76861625e+00]
 [ 6.11565344e+00]
 [ 3.27833817e+00]
 [ 1.36340630e+00]
 [ 4.28452285e+00]
 [ 5.87404509e+00]


In [164]:
# Create a dummy dataset
X = np.random.normal(loc=0.0, scale=1, size=(1000, 1)) # features
Y = X*2 + 3 + np.random.normal(0.0, 0.1, size=(1000, 1)) # labels

n = int(len(X) * 0.8)

x_train, x_test = X[:n], X[n:]
y_train, y_test = Y[:n], Y[n:]

print(x_train.shape, y_train.shape)
learning_rate = 0.01
n_iterations = 100000

linear_model = LinearRegression(learning_rate, n_iterations)
linear_model.fit(x_train, y_train)
predictions_test = linear_model.predict(x_test)
# print(f"predictions: {predictions_test[:10]},\n\n expecteds: {y_test[:10]}")

score_card = []
for i in range(predictions_test.shape[0]):
  print(predictions_test[i][0], y_test[i][0])
  if math.isclose(predictions_test[i][0], y_test[i][0], rel_tol=0.1):
    score_card.append(1)
  else:
    score_card.append(0)
  pass
pass

print(score_card)
acc = np.sum(score_card) / len(score_card) * 100 # accuracy
print(f"Accuracy: {acc:.2f}%")

(800, 1) (800, 1)
n_samples: 800, n_features: 1
1.9692073793247473 2.0730797008842266
-0.10949803937866065 -0.16276106941826674
3.0506130383855563 3.1378583997527123
6.009464385922634 5.979269906299979
6.553581528306498 6.737328215606199
3.8835452108516955 3.8941228282655422
7.178236030613846 7.128243653641487
2.0658774123870574 2.1455514799353703
2.6128610957570295 2.6346696550620634
2.3987044733995133 2.2865061339687505
3.4575478179846413 3.405804842129404
2.1014903683628807 2.1140888792853674
0.9132645915423474 0.9201268730307229
4.241698594687721 4.108810423132955
2.925704739160028 2.7681231720506014
3.9415011615566167 4.036454975764927
5.959667697478886 6.088075776194275
3.153115326424748 2.976650275196423
2.5254106982696243 2.524140148772272
0.6633924130327622 0.6005908224635355
4.9167484603886 4.914240621780497
1.809202612664962 1.701574286887628
3.3199754433236324 3.4139113010273645
1.540876216449485 1.365186364602161
2.621690076951178 2.564543764175769
1.5708341166338982 1.605