# Stochastic Gradient Descent and Back Propagation

## Libraries

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

## Example 1

Let's focus on very simple function. For example 
$$ f(x) = x^{2} + 1 $$

The derivative of this function is:
$$ f'(x) = 2x $$

The extreme (minimum) of the function f(x) is in $x_{0}=1$, because the solution of equation $f'(x)=0$ is in point $x=0$. Then we can find the extreme of the function in $x=0$.
$$ f(0) = 1 $$

But what if we have more sophisticated function and we can calculate derivative but finding local/global minimum is very difficult. Then we can use Stochastic Gradient Descent. Let's try to write the function which will calculate it.

In [3]:
def f(x): 
    return x**2 + 1

def grad_f(x):
    return 2*x

In [27]:
x = np.arange(-2, 2.05, 0.05)
y = f(x)
grad = grad_f(x)

sample_data = pd.DataFrame({'x':x, 'y':y, 'grad_f':grad})

In [28]:
sample_data.head()

Unnamed: 0,x,y,grad_f
0,-2.0,5.0,-4.0
1,-1.95,4.8025,-3.9
2,-1.9,4.61,-3.8
3,-1.85,4.4225,-3.7
4,-1.8,4.24,-3.6


In [39]:
from plotnine import ggplot, geom_line, aes, theme_bw, labs, geom_point

(
ggplot(sample_data, aes(x='x', y='y'))
    + geom_line(color='red')
    + geom_line(aes(y='grad_f'), color='blue')
    + theme_bw()
    + labs(x='x', y = 'y')
)


TypeError: frame_apply() got an unexpected keyword argument 'broadcast'

## Gradient Descent

In [29]:
x0 = 1.345
lr = 0.1
epochs = 30

def GD_ordinary_fun(f, grad_f, x0, lr, epochs):
    x=x0
    results=pd.DataFrame({'x':[x], 'y':[f(x)], 'grad_f':[grad_f(x)]})
    for i in range(epochs):
        x=x - lr*grad_f(x)
        results=results.append({'x':x, 'y':f(x), 'grad_f':grad_f(x)}, ignore_index=True)
        print("Updated x value: {}. Updated f(x) value: {}".format(np.round(x, 8), np.round(f(x), 8)))
    return results

def plot(data):
    base_plot = (
        ggplot(data, aes(x='x', y='y'))
        + geom_line(color='red')
        + geom_line(aes(y='grad_f'), color='blue')
        + theme_bw()
        + labs(x='x', y = 'y')
    )
    return base_plot

In [25]:
task1=GD_ordinary_fun(f, grad_f, x0, lr, epochs)

Updated x value: 1.076. Updated f(x) value: 2.157776
Updated x value: 0.8608. Updated f(x) value: 1.74097664
Updated x value: 0.68864. Updated f(x) value: 1.47422505
Updated x value: 0.550912. Updated f(x) value: 1.30350403
Updated x value: 0.4407296. Updated f(x) value: 1.19424258
Updated x value: 0.35258368. Updated f(x) value: 1.12431525
Updated x value: 0.28206694. Updated f(x) value: 1.07956176
Updated x value: 0.22565356. Updated f(x) value: 1.05091953
Updated x value: 0.18052284. Updated f(x) value: 1.0325885
Updated x value: 0.14441828. Updated f(x) value: 1.02085664
Updated x value: 0.11553462. Updated f(x) value: 1.01334825
Updated x value: 0.0924277. Updated f(x) value: 1.00854288
Updated x value: 0.07394216. Updated f(x) value: 1.00546744
Updated x value: 0.05915373. Updated f(x) value: 1.00349916
Updated x value: 0.04732298. Updated f(x) value: 1.00223946
Updated x value: 0.03785838. Updated f(x) value: 1.00143326
Updated x value: 0.03028671. Updated f(x) value: 1.00091728

In [40]:
(plot(sample_data)
 + geom_point(task1, col='black')
)

PlotnineError: "Parameters {'col'}, are not understood by either the geom, stat or layer."

Above we sole the realy easy problem for function with one parameter. What if we try to do this for a function with multiple variables ? We can try to use gradient descent in one of the more commonly problem - linear regression.

In [41]:
np.random.seed(666)

x = np.random.uniform(-3, 3, 50)
y = x + np.random.normal(3, 1.2, 50)
sample_data = pd.DataFrame({'x':x, 'y':y})

base_plot = (
        ggplot(sample_data, aes(x='x', y='y'))
        + geom_point(color='red')
        + theme_bw()
        + labs(x='x', y = 'y')
)
base_plot

TypeError: frame_apply() got an unexpected keyword argument 'broadcast'

Our aim is to find optimal parameters $\beta_{0}$ and $\beta_{1}$ of the linear regression mofel $y = \beta_{0} + \beta_{1}x$. For check our solution we will calculate MSE (Mean Squared Error) in each iteration. The best model is the model where the MSE in minimized so our target is to minimized MSE error. Before calculate optimal parameters own let's find this parameters using sklearn module.

In [42]:
np.array(sample_data['y'])

array([ 3.97651347,  5.01533271,  2.87733572,  2.74441023,  5.94193636,
        0.3969064 ,  1.96975775,  2.03015883,  0.36400405,  4.870172  ,
        1.2941515 ,  4.38813542,  2.29246011,  5.64997772,  1.21787975,
        2.74194091, -2.21197354,  0.79759877, -3.04812268,  5.55822951,
       -0.01258474,  3.23937061,  2.71884   ,  1.56409593,  4.50021086,
        6.47617277,  3.58660422,  2.70776945,  3.5391981 ,  2.64540722,
        5.24949144,  4.53302091,  4.18653391,  5.37117384,  3.35407702,
        4.2535031 ,  3.17773431,  1.26430537,  3.51064356,  1.24796351,
        0.90746568,  3.54821792, -0.17470737,  4.26885164,  0.44750296,
        1.61383577,  1.0346466 ,  5.15003961,  3.2261089 ,  2.28497144])

In [70]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error as mse

X = np.array(sample_data.x).reshape(-1, 1)
y = np.array(sample_data.y).reshape(-1, 1)
lm_model = LinearRegression(fit_intercept=True)
lm_model.fit(X, y)
y_pred = lm_model.predict(X)

In [45]:
print("y = {} + {}x".format(
    np.round(lm_model.intercept_[0],4), 
    np.round(lm_model.coef_[0][0], 4))
     )

y = 2.9291 + 0.8878x


In [46]:
print("Mean Squared Error = {}".format(np.round(mse(y, y_pred), 3)))

Mean Squared Error = 1.553


Right now we will try to find similar solution using ourself gradient descent. 

In [97]:
X = pd.DataFrame({'x0':1, 'x1':sample_data.x})
X.head()

Unnamed: 0,x0,x1
0,1,1.202623
1,1,2.06512
2,1,1.059086
3,1,1.367148
4,1,2.708748


In [99]:
X = np.array(X).reshape(-1, 2)
y = np.array(sample_data.y).reshape(-1, 1)
b=np.array([2.9291, 0.8878]).reshape(2, -1)

In [115]:
def MSE(beta, X, y):
    y_pred = np.dot(X, beta)
    diff = np.subtract(y, y_pred)
    result = np.round(np.mean(diff**2), 3)
    return result

In [118]:
print("MSE is equal {}".format(MSE(b, X, y)))

MSE is equal 1.553


In [155]:
def MSE_grad(beta, X, y):
    Xb = np.dot(X, beta)
    Xb_y = np.subtract(Xb, y)
    Xb_yX = np.dot(np.transpose(X), Xb_y)
    result = 2*Xb_yX/len(y)
    return result

In [None]:
beta00 = np.array([5, -0.2]) # Start Point
lr = 0.1 # Learning rate
epochs = 30 # Nr of epochs

def GD_linear_regession(beta00, X, y, lr, epochs):
    