Problem: Try to predict the following function:
| x | f(x) |
|---|------|
| 1 | 1.5  |
| 2 | 3    |
| 3 | 4.5  |
| 10| 15   |
|...| ...  |   

It's assumed that the function is linear.

In [125]:
# Given data.
values = {1: 1.5,
          2: 3,
          3: 4.5,
          10: 15}

In [126]:
# Only using numpy for random number generation.
import numpy as np

Let's construct a simple model: `y = a + bx`.  
Our goal is to adjust the parameters `a` and `b` in a such way
so that the model will yield data found in the problem description.  
Since at the start the values of the parameters are not known,
we'll just initialize them with random values between `0` and `1`. 

In [127]:
a = np.random.random()
b = np.random.random()

Next, simply try to feed the `x`'s into the model and see what happens.

In [128]:
def model(x, a, b):
    return a + b * x

In [129]:
for x, y in values.items():
    predicted = model(x, a, b)
    print(f"x = {x} | predicted = {predicted} | actual = {y}")

x = 1 | predicted = 0.4036528146603541 | actual = 1.5
x = 2 | predicted = 0.6304484165589826 | actual = 3
x = 3 | predicted = 0.8572440184576112 | actual = 4.5
x = 10 | predicted = 2.4448132317480114 | actual = 15


We need to compare the "quality" of the results. We do that by introducing the "error" parameter into the model.  
Consider the new model `y = a + bx + e`, where `e` is the error parameter.  
We can find the error of the prediction by rearranging the equation: `e = y - a - bx`.  

In [130]:
def error(x, y, a, b):
    return y - a - b * x

Let's find the error for each `x`

In [131]:
for x, y in values.items():
    predicted = model(x, a, b)
    err = error(x, y, a, b)
    print(f"x = {x} | predicted = {predicted} | actual = {y} | error = {err}")

x = 1 | predicted = 0.4036528146603541 | actual = 1.5 | error = 1.0963471853396458
x = 2 | predicted = 0.6304484165589826 | actual = 3 | error = 2.3695515834410172
x = 3 | predicted = 0.8572440184576112 | actual = 4.5 | error = 3.642755981542389
x = 10 | predicted = 2.4448132317480114 | actual = 15 | error = 12.555186768251989


The error value varies for each given `x`.  
What we really want is to measure the "quality" of the model's parameters.  
The thing is, our model actually looks like this: `y_i = a + bx_i + e`, where `i` is the i-th "row" in the data set.  
The error needs to represent the whole model, not individual results.  
We can do that by measuring the error using the MSE (Mean Squared Error) method.  
This way, `Model's Error = MSE = 1/n * sum(i = 1, n, e^2) = 1/n * sum(i = 1, n, (y_i - a - bx_i)^2)`.

In [132]:
mse = 0
for x, y in values.items():
    predicted = model(x, a, b)
    mse += (y - predicted)**2
    print(f"x = {x} | predicted = {predicted} | actual = {y}")

n = len(values)
mse /= n
print(f"Error: {mse}")

x = 1 | predicted = 0.4036528146603541 | actual = 1.5
x = 2 | predicted = 0.6304484165589826 | actual = 3
x = 3 | predicted = 0.8572440184576112 | actual = 4.5
x = 10 | predicted = 2.4448132317480114 | actual = 15
Error: 44.42978444603567


We now need to find a way to make the error go down.  
Derivative denotes the rate at which the function changes at the given point.  
By taking derivatives, we can figure out in which direction on the number line 
we need to go to decrease the error.  
Let's try using the limit definition of the derivative to get the rate of change for both parameters.  
First of all, we need to define the MSE function. 

In [133]:
def mse(a, b):
    sum = 0
    for x, y in values.items():
        sum += (y - model(x, a, b))**2
    return sum / n

Let's then compute partial derivatives for `a` and `b`.

In [134]:
err = mse(a, b)

change = 1e-10
deriv_a = (mse(a + change, b) - err) / change
deriv_b = (mse(a, b + change) - err) / change
print(f"Error = {err} | Derivative for a = {deriv_a} | Derivative for b = {deriv_b}")

Error = 44.42978444603567 | Derivative for a = -9.831921943259658 | Derivative for b = -71.15772859833669


In [135]:
a_adj, b_adj = a, b
learning_rate = 1e-3

for i in range(int(1e6)):
    err = mse(a_adj, b_adj)
    deriv_a = (mse(a_adj + change, b_adj) - err) / change
    deriv_b = (mse(a_adj, b_adj + change) - err) / change
    a_adj -= deriv_a * learning_rate
    b_adj -= deriv_b * learning_rate

print(f"Adjusted error = {mse(a_adj, b_adj)}")

Adjusted error = 1.2256092203110068e-19


In [136]:
print(a_adj, b_adj)

3.420181998180499e-10 1.4999999999019955


We predict that `a = 3.42e-10` and `b = 1.499`, which is almost correct.  
Let's move the train process into a function.

In [137]:
def train():
    a_adj, b_adj = a, b
    learning_rate = 1e-4
    train_count = int(1e6)

    for i in range(train_count):
        err = mse(a_adj, b_adj)
        deriv_a = (mse(a_adj + change, b_adj) - err) / change
        deriv_b = (mse(a_adj, b_adj + change) - err) / change
        a_adj -= deriv_a * learning_rate
        b_adj -= deriv_b * learning_rate

        if i % (train_count // 10 - 1) == 0:
            print(f"Error: {err}")

    return a_adj, b_adj

In [138]:
print(train())

Error: 44.42978444603567
Error: 1.8329820320695041e-09
Error: 6.586626172217763e-17
Error: 1.2328759496676343e-19
Error: 1.2265886813310832e-19
Error: 1.2265886813310832e-19
Error: 1.2265886813310832e-19
Error: 1.2265886813310832e-19
Error: 1.2265886813310832e-19
Error: 1.2265886813310832e-19
Error: 1.2265886813310832e-19
(3.421780719246051e-10, 1.4999999999019555)


Let's now try to change our original linear function.  
1\. `f(x) = -1.23x`

In [139]:
values = {1: -1.23,
          5: -6.15,
          -2: 2.46}
print(train())

Error: 16.455649167543953
Error: 2.1697338008843056e-14
Error: 1.8921025478213232e-20
Error: 1.8944914390007656e-20
Error: 1.8944914390007656e-20
Error: 1.8945022265355938e-20
Error: 1.8944914390007656e-20
Error: 1.8944914390007656e-20
Error: 1.8944914390007656e-20
Error: 1.8944914390007656e-20
Error: 1.8944914390007656e-20
(2.0149659727128603e-11, -1.2300000000526128)


2\. `f(x) = -2x + 5`

In [140]:
values = {1: 3,
          5: -5,
          -2: 9}
print(train())

Error: 33.156363362426944
Error: 4.850034158245847e-10
Error: 2.869766391302962e-20
Error: 1.8805762816629418e-20
Error: 1.8805762816629418e-20
Error: 1.8805762816629418e-20
Error: 1.8805762816629418e-20
Error: 1.8805762816629418e-20
Error: 1.8805762816629418e-20
Error: 1.8805762816629418e-20
Error: 1.8805762816629418e-20
(5.00000000001643, -2.000000000052043)


3\. `f(x) = 42`

In [141]:
values = {1: 42,
          5: 42,
          -2: 42}
print(train())

Error: 1293.296616605647
Error: 3.2073749073454766e-08
Error: 9.67196261041692e-19
Error: 1.879460350644769e-20
Error: 1.879460350644769e-20
Error: 1.879460350644769e-20
Error: 1.879460350644769e-20
Error: 1.879460350644769e-20
Error: 1.879460350644769e-20
Error: 1.879460350644769e-20
Error: 1.879460350644769e-20
(41.99999999999147, -4.886331328963345e-11)


4\. `f(x) = 0`

In [142]:
values = {1: 0,
          5: 0,
          -2: 0}
print(train())

Error: 0.48945156912041243
Error: 3.7960494549195224e-13
Error: 1.9162306954700235e-20
Error: 1.9003379196739e-20
Error: 1.9003378378383388e-20
Error: 1.900337837838041e-20
Error: 1.900337837838041e-20
Error: 1.900337837838041e-20
Error: 1.900337837838041e-20
Error: 1.900337837838041e-20
Error: 1.900337837838041e-20
(2.027027027028683e-11, -5.270270270270707e-11)


5\. `f(x) = 3.1415x - 2.7183`

In [143]:
values = {1: 0.4232,
          5: 12.9892,
          -2: -9.0013}
print(train())

Error: 53.12565950482776
Error: 2.016098671253905e-10
Error: 2.871108297902418e-20
Error: 1.9168263739925843e-20
Error: 1.9168263739925843e-20
Error: 1.9168263739925843e-20
Error: 1.9168263739925843e-20
Error: 1.9168263739925843e-20
Error: 1.9168263739925843e-20
Error: 1.9168263739925843e-20
Error: 1.9168263739925843e-20
(-2.718299999977689, 3.1414999999468773)
