In [10]:
import sys
!{sys.executable} -m pip install matplotlib
!{sys.executable} -m pip install pandas



You should consider upgrading via the 'd:\jojo\stuff\notebooks\scripts\python.exe -m pip install --upgrade pip' command.





You should consider upgrading via the 'd:\jojo\stuff\notebooks\scripts\python.exe -m pip install --upgrade pip' command.


In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Earlier we have done training samples that were just constants without any input.  Now we are going to try a sample with an input and output.

|x |y |
|--|--|
|2 |3 |

One major difference between this and the previous examples is that we can no longer directly guess the hypothesis, since the hypothesis depends on our input, $x$.

In linear regression, we are creating a linear function to calculate value $f(x)$ for a given $x$.  To find $f(x)$, we are given sample values of $x$ and $y$.  We want our $f(x)$ to be very close to $y$ for every value of $x$.  The hope is that for any other values of $x$ not in the sample, $f(x)$ would predict the correct value of $y$.

The equation is just like any linear equation, in the form of

$$
f(x) = mx + b
$$

But in linear-regression-speak, we write

$$
h_\theta(x) = \theta_0 + \theta_1x
$$

$h_\theta(x)$ is our hypothesis, which is now based on our guess of input $x$, parameter $\theta_0$, and parameter $\theta_1$.

(If we had more than one input, we would need more $\theta$s for each additional input)

Our job is to guess the correct $\theta_0$ and $\theta_1$.  There are two values to guess even though there is only one input.  That makes sense because even when we had no input, we still had to guess one value.  The number of values to guess is always one more than the number of inputs.

**But it's confusing to have to guess two values given only one input, so we are going to make our lives easier by inserting a dummy input of 1, so that our sample looks like this:**

|x0 |x1 |y  |
|---|---|---|
|1  |2  |3  |

That makes our function look more uniform:

$$
h_\theta(x) = \theta_0x_0 + \theta_1x_1
$$

The steps of linear regression are

1. Pick some $\theta_{0}$ and $\theta_{1}$
2. Calculate $h_{\theta}(x)$ for every given sample of $x_0$ and $x_1$ (remember $x_0$ is always 1)
3. Compare the calculated value with the actual value.  Calculate the overall error level (known as cost function $J$)
   by using
   $$
     J = \frac 1 2 (h_\theta(x) - y)^2
   $$
   where
   - $x$ is the collective term for $x_0$ and $x_1$ (and possibly more if we had more inputs)
   - $y$ is the actual value of training sample for a given $x$
4. Minimize the cost function $J$

In [3]:
training_x = np.array([1, 2])
training_y = 3
theta = np.array([0, 0])

We have to define a hypothesis function that would calculate our hypothesis based on $x$

In [4]:
def hypothesis_theta(training_x, theta):
    # @ is the dot product, meaning multiplying each corresponding element in the array, and sum up the products
    return training_x @ theta

# this is 1 * 0 + 2 * 0
current_hypothesis = hypothesis_theta(training_x, theta)
current_hypothesis

0

In [5]:
def cost(hypothesis, training_y):
    return (hypothesis - training_y) ** 2 / 2
current_cost = cost(current_hypothesis, training_y)
current_cost

4.5

Slope is tricky.  There are actually two slopes (in this example), one for $\theta_0$ and another for $\theta_1$.  The derivation in this one is a bit more complicated, so we won't talk about it in detail.

For $\theta_0$, the slope is $(h_\theta(x) - y)x_0$

For $\theta_1$, the slope is $(h_\theta(x) - y)x_1$

Kind of make sense if we look at the slope for $\theta_0$: since $x_0$ is always 1, the slope is $(h_\theta(x) - y)$, which is what we had in our previous example.

In [6]:
def slope(hypothesis, training_y, training_x):
    return (hypothesis - training_y) * training_x

# theta0 = (0 - 3) * 1
# theta1 = (0 - 3) * 2
slope(current_hypothesis, training_y, training_x)

array([-3, -6])

In [7]:
def gradient_descent(theta, hypothesis, training_y, learning_rate_alpha):
    return theta - learning_rate_alpha * slope(hypothesis, training_y, training_x)
gradient_descent(theta, current_hypothesis, training_y, 0.1)

array([0.3, 0.6])

And we see our $\theta$ slowly increasing.  Our linear regression is now complicated enough that we can't immediately tell if our result is correct.  But let's go ahead and complete the regression:

In [8]:
def find_minimum():
    theta = np.array([0, 0])
    iteration = 0
    acceptable_slope = 0.00005
    learning_rate_alpha = 0.1
    current_hypothesis = hypothesis_theta(training_x, theta)
    current_slope = slope(current_hypothesis, training_y, training_x)
    current_cost = 0
    while abs(np.average(current_slope)) > acceptable_slope and iteration < 100:
        current_hypothesis = hypothesis_theta(training_x, theta)
        current_cost = cost(current_hypothesis, training_y)
        current_slope = slope(current_hypothesis, training_y, training_x)
        theta = gradient_descent(theta, current_hypothesis, training_y, learning_rate_alpha)
        print((theta, current_hypothesis, current_slope, current_cost))
        iteration += 1
    return (theta, current_hypothesis, current_slope, current_cost)

In [9]:
find_minimum()

(array([0.3, 0.6]), 0, array([-3, -6]), 4.5)
(array([0.45, 0.9 ]), 1.5000000000000002, array([-1.5, -3. ]), 1.1249999999999998)
(array([0.525, 1.05 ]), 2.2500000000000004, array([-0.75, -1.5 ]), 0.28124999999999967)
(array([0.5625, 1.125 ]), 2.625, array([-0.375, -0.75 ]), 0.0703125)
(array([0.58125, 1.1625 ]), 2.8125, array([-0.1875, -0.375 ]), 0.017578125)
(array([0.590625, 1.18125 ]), 2.90625, array([-0.09375, -0.1875 ]), 0.00439453125)
(array([0.5953125, 1.190625 ]), 2.9531250000000004, array([-0.046875, -0.09375 ]), 0.0010986328124999792)
(array([0.59765625, 1.1953125 ]), 2.9765625, array([-0.0234375, -0.046875 ]), 0.000274658203125)
(array([0.59882813, 1.19765625]), 2.98828125, array([-0.01171875, -0.0234375 ]), 6.866455078125e-05)
(array([0.59941406, 1.19882813]), 2.994140625, array([-0.00585938, -0.01171875]), 1.71661376953125e-05)
(array([0.59970703, 1.19941406]), 2.9970703125000004, array([-0.00292969, -0.00585937]), 4.291534423826824e-06)
(array([0.59985352, 1.19970703]), 2.

(array([0.59999771, 1.19999542]),
 2.9999771118164062,
 array([-2.28881836e-05, -4.57763672e-05]),
 2.6193447411060333e-10)

We can go verify to see if the result is correct.  It may not be what we were expecting, but since we only have one data point, there are multiple possible results.