# Linear Regression

Let's say some value $y$ is determined by $y = \bold{\theta^Tx} + \varepsilon$. The equation is strictly
determined by some natural process, with $\varepsilon\sim N(0,r)$ being the total effect of some Gauss noises.

In [None]:
import numpy as np

x = np.reshape(np.arange(1, 11), shape=(-1, 1))
X = np.hstack([x, np.ones_like(x)])
y = 2 * x + 3 + np.random.normal(size=x.shape) / 2  # y = 2x + 3 + normal-noise

A hypothesis of $y$, given by $h_\theta(\bold{x}) = \bold{\theta^Tx}$, is defined on our current belief
on the parameter $\bold{\theta}$.

Thus a loss function, MSE, can be built (discussed in detail later):
$J(\bold{\theta}) = \frac{1}{2}\Vert y - h_\theta(\bold{x})\Vert_2^2$

Now with a dataset given in the form of $<\bold{X}, \bold{y}>$, the hypothesis and loss can be rewritten as

- $\bold{\~y}=h_\theta(\bold{X}) = \bold{X}^T\bold{\theta}$

- $J(\bold{\theta}) = \sum_i{\frac{1}{2} \Vert y_i - h_\theta(\bold{x}_i)\Vert_2^2}
                    = \frac{1}{2} \Vert \bold{y} - \bold{\~y} \Vert_2^2
                    = \frac{1}{2} \Vert (\bold{y} - \bold{X}^T\theta)^T(\bold{y} - \bold{X}^T\theta) \Vert_2^2 $

In [None]:
params = np.zeros((2,1))  # our theta, holding our belief for now
hypothesis = lambda: X @ params
loss = lambda: (np.sum(np.square(hypothesis() - y))) / 2

In [None]:
grad = lambda: X.T @ (hypothesis() - y)

In [None]:
learn_ratio = 0.003
for i in range(10000):
    params -= learn_ratio * grad()

print(f"params=\n{params}, loss={loss():.3}")

params=
[[2.00599422]
 [3.05789986]], loss=0.262
