<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Programu Operacyjnego Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej"
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>



# Simple linear regression

In this exercise you will train a linear regression model via gradient descent in the simplest scenario, i.e. recreating an affine function.

The setup is as follows:
* we are given a set of pairs $(x, y)$, where $x$ represents the feature, and $y$ is the target,
* our hypothesis is $h(x) = ax + b$,
* we will use the dataset consisting of set of pairs to figure out the right values for $a$ and $b$,
* to do so we will optimize the loss function: $J(a,b) = \frac{1}{n}\sum_{i=1}^n (y_i - h(x_i))^2$,
* with the loss function in hand we can improve our guesses iteratively:
    * $a^{t+1} = a^t - \text{step_size} \cdot \frac{\partial J(a,b)}{\partial a}$,
    * $b^{t+1} = b^t - \text{step_size} \cdot \frac{\partial J(a,b)}{\partial b}$,
* we can end the process after some predefined number of epochs (or when the changes are no longer meaningful).

Let's start with creating the dataset.

In [1]:
import random

_a = 0.3
_b = 0.5

f = lambda x: _a * x + _b # ground truth
g = lambda x: f(x) + random.gauss(0, 0.02) # a noisy version of f

In [2]:
n = 50 # number of examples

xs = [random.random() for _ in range(n)] # features
ys = list(map(g, xs)) # targets

ts = list(map(f, xs)) # we don't get to see this

Our goal is to recreate $f$. However, as reality can be harsh (and usually is) we only get to observe $g$. We observe it as a list of pairs $(x,y) \in \text{zip}(xs, ys)$.

Let's plot the data. We will use the `plotly` library to make the plots interactive, which allows for easier inspection of data.

In [3]:
!pip install -q plotly==4.2.1

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/7.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.6/7.2 MB[0m [31m18.8 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━[0m [32m4.6/7.2 MB[0m [31m67.6 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m7.2/7.2 MB[0m [31m84.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m64.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
import plotly.graph_objects as go
import plotly.express as px

fig = px.scatter(x=xs, y=ys)
fig.show()

In [11]:
import numpy as np

def mse_loss(ys, ps):
    assert len(ys) == len(ps)

    ### YOUR CODE BEGINS HERE ###
    return sum((np.array(ys) - np.array(ps))**2) / len(ys)
    ### YOUR CODE ENDS HERE ###

Please take a while to (roughly) guess the output before executing the cell below.

In [12]:
mse_loss(ys, ts)

0.0004021922476704572

Let's now implement the algorithm

Hint: To make sure that you correctly compute the gradients, you can compute them numerically and compare the results.

In [13]:
a = 0. # our initial guess for _a
b = 0. # our initial guess for _b
lr = 0.5 # step size

n_epochs = 40 # number of passes over the training data

def predict(a, b, xs=xs):
    return [a * x + b for x in xs]

def evaluate(a, b, xs=xs, ys=ys):
    return mse_loss(ys, predict(a, b, xs))

def get_gradient(a, b, xs=xs, ys=ys):
    num_of_obs = len(xs)
    aux = a * np.array(xs) + np.full((num_of_obs, ), b) - np.array(ys)
    g_a = 2 / num_of_obs * sum(aux * np.array(xs))
    g_b = 2 / num_of_obs * sum(aux)

    return [g_a, g_b]

losses = [evaluate(a, b)]

for i in range(n_epochs):
    #############################
    # TODO: Fill in the details #
    #############################
    ### YOUR CODE BEGINS HERE ###
    [g_a, g_b] = get_gradient(a, b)
    a = a - lr * g_a
    b = b - lr * g_b
    ### YOUR CODE ENDS HERE ###

    loss = evaluate(a, b)
    losses.append(loss)

    print(f'Iter: {i:>3} Loss: {loss:8.8f} a: {a:8.5f}, b: {b:8.5f}')

Iter:   0 Loss: 0.02864650 a:  0.34070, b:  0.65169
Iter:   1 Loss: 0.00223655 a:  0.25439, b:  0.48419
Iter:   2 Loss: 0.00050128 a:  0.27744, b:  0.52662
Iter:   3 Loss: 0.00038626 a:  0.27241, b:  0.51529
Iter:   4 Loss: 0.00037773 a:  0.27453, b:  0.51776
Iter:   5 Loss: 0.00037630 a:  0.27477, b:  0.51672
Iter:   6 Loss: 0.00037544 a:  0.27544, b:  0.51660
Iter:   7 Loss: 0.00037470 a:  0.27597, b:  0.51627
Iter:   8 Loss: 0.00037404 a:  0.27649, b:  0.51601
Iter:   9 Loss: 0.00037345 a:  0.27697, b:  0.51576
Iter:  10 Loss: 0.00037293 a:  0.27743, b:  0.51552
Iter:  11 Loss: 0.00037247 a:  0.27787, b:  0.51529
Iter:  12 Loss: 0.00037205 a:  0.27828, b:  0.51508
Iter:  13 Loss: 0.00037169 a:  0.27866, b:  0.51488
Iter:  14 Loss: 0.00037136 a:  0.27902, b:  0.51469
Iter:  15 Loss: 0.00037107 a:  0.27937, b:  0.51451
Iter:  16 Loss: 0.00037081 a:  0.27969, b:  0.51434
Iter:  17 Loss: 0.00037058 a:  0.28000, b:  0.51418
Iter:  18 Loss: 0.00037037 a:  0.28029, b:  0.51403
Iter:  19 Lo

In [14]:
fig = px.line(y=losses, labels={'y':'loss'})
fig.show()

Let's now visually asses how we do on training data

In [15]:
fig = go.Figure()

fig = px.scatter(x=xs, y=ys)
dense_x = np.linspace(np.min(xs), np.max(xs), 100)
fig.add_trace(go.Scatter(x=dense_x, y=predict(a, b, dense_x), name='linear fit', mode='lines'))
fig.add_trace(go.Scatter(x=xs, y=ts, name='y without noise', mode='markers'))

fig.show()

Let's check our implementation vs. the one in sklearn and numpy.


In [16]:
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array(xs).reshape((len(xs), 1))
regr = LinearRegression()
regr.fit(X, ys) # training

sk_a = float(regr.coef_)
sk_b = regr.intercept_
sk_loss = mse_loss(ys, regr.predict(X))

print(f'Loss: {sk_loss:8.8f} a: {sk_a:8.5f}, b: {sk_b:8.5f}')

Loss: 0.00036872 a:  0.28508, b:  0.51153


In [17]:
z = np.polyfit(x=xs, y=ys, deg=1)
print(z)
f = np.poly1d(z)
print(f)

[0.28507793 0.51153357]
 
0.2851 x + 0.5115


<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>