# Stochastic and batch gradient descent



For this exercise we will use the code we wrote for the gradient descent from scratch for the simple linear regression : 

$f(x) = \beta_1 \times x + \beta_0$

* Import the following libraries: 
  * Numpy 
  * random

In [46]:
import numpy as np 
import random


In [47]:
class Model():
  def __init__(self):
    self.beta_1 = np.random.randn(1)
    self.beta_0 = np.random.randn(1)
  
  def __call__(self, x):
    return self.beta_1 * x + self.beta_0

In [48]:
from sklearn import datasets, linear_model

# Load the diabetes dataset
diabetes = datasets.load_diabetes()
print(diabetes.DESCR)
diabetes_data = diabetes.data
y = diabetes.target

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - Age
      - Sex
      - Body mass index
      - Average blood pressure
      - S1
      - S2
      - S3
      - S4
      - S5
      - S6

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).

Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

For more information see:
Bra

* We have too much data in this dataset `diabetes_data`, take only the third column of the dataset and store it in a `diabetes_X` variable.

In [49]:
# Use only one feature
diabetes_X = diabetes_data[:,2]
diabetes_X[:5]

array([ 0.06169621, -0.05147406,  0.04445121, -0.01159501, -0.03638469])

In [50]:
def mse(y_true,y_pred):
  return np.mean((y_true - y_pred)**2)

In [51]:
# Calculate model.beta_1's derivate
def derivative_mse_beta_1(y_pred, y_true, x):
  return 2/len(y_pred)*np.sum((x @ (y_pred - y_true)))
  # return 2/len(y_pred) * np.sum(np.dot(x,(y_pred-y_true)))

In [52]:
# Calculate model.b's derivate
def derivative_mse_beta_0(y_pred, y_true):
  return 2/len(y_pred)*(np.sum(y_pred - y_true))

In [53]:
# Define learning rate and a number of iterations 
lr = 0.1
epochs = 1000

We have previously coded the gradient descent algorithm as follows, we are just adding two lines of code to keep in memory the variations of the loss function at each epoch (since we are using gradient descent one epoch equals one adjustment of the coefficients) :

In [65]:
%%time
loss_history = []
model = Model()
for epoch in range(epochs):
  # Calculate the loss function
  current_loss = mse(model(diabetes_X), y)
  loss_history.append(current_loss)

  # Update variables
  model.beta_1 -= lr * derivative_mse_beta_1(model(diabetes_X), y, diabetes_X)
  model.beta_0 -= lr * derivative_mse_beta_0(model(diabetes_X), y)

  # Show updated variables
  if epoch % 100 == 0 or epoch == epochs - 1:
    print("-------------------- Epoch {} --------------------".format(epoch))
    print("Current Loss: {}".format(current_loss))
    print("beta_1 = {}".format(model.beta_1))
    print("beta_0 = {}".format(model.beta_0))

-------------------- Epoch 0 --------------------
Current Loss: 28814.374698837895
beta_1 = [0.21867372]
beta_0 = [31.11491872]
-------------------- Epoch 100 --------------------
Current Loss: 5754.216301656195
beta_1 = [42.22169163]
beta_0 = [152.13348414]
-------------------- Epoch 200 --------------------
Current Loss: 5592.922234389624
beta_1 = [82.36606785]
beta_0 = [152.13348416]
-------------------- Epoch 300 --------------------
Current Loss: 5445.586927959727
beta_1 = [120.73404765]
beta_0 = [152.13348416]
-------------------- Epoch 400 --------------------
Current Loss: 5311.002358987856
beta_1 = [157.40423691]
beta_0 = [152.13348416]
-------------------- Epoch 500 --------------------
Current Loss: 5188.065049226583
beta_1 = [192.45176319]
beta_0 = [152.13348416]
-------------------- Epoch 600 --------------------
Current Loss: 5075.767017982761
beta_1 = [225.94842963]
beta_0 = [152.13348416]
-------------------- Epoch 700 --------------------
Current Loss: 4973.18751753878

The model took 54ms to train in total!

In [66]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(x=diabetes_X, y=y,
                    mode='markers',
                    name='target'))
fig.add_trace(go.Scatter(x=diabetes_X, y=model(diabetes_X),
                    mode='lines',
                    name='predictions'))
fig.update_layout(
    title="Target vs Predictions",
    xaxis_title="BMI",
    yaxis_title="Diabetes metric"
    )
fig.show()

## Stochastic gradient descent

Let's now implement stochastic gradient descent!
Reproduce the training loop for training the model but you will define :
* `sample_size` : the number of observations randomly selected at each step
* `steps_per_epochs` : the number of steps before the model has trained on as many observations as the total number of observations in the dataset.
* `stochastic_loss_history` : a list that will contain the loss after each epoch is finished
* `stochastic_loss_by_step_history` : a list that will contain the loss after each step

⚠️ Don't forget to add `%%time` at the beginning of the cell to measure how long the stochastic gradient descent took to run over 1000 epochs ⚠️ 

In [79]:
%%time
sample_size = 100
steps_per_epochs = int(len(diabetes_X) / sample_size)
stochastic_loss_history = []
stochastic_loss_by_step_history = []
model = Model()
for epoch in range(epochs):
  # Calculate epoch loss 
  current_loss = mse(model(diabetes_X), y)
  stochastic_loss_history.append(current_loss)
  for step in range(steps_per_epochs):
    # define  random sample :
    index = random.sample(range(len(diabetes_X)), sample_size)
    data_sample = diabetes_X[index]
    target_sample = y[index]

    # calculate step loss
    step_loss = mse(model(data_sample), target_sample)
    stochastic_loss_by_step_history.append(step_loss)

    # Update variables
    model.beta_1 -= lr * derivative_mse_beta_1(model(data_sample), target_sample, data_sample)
    model.beta_0 -= lr * derivative_mse_beta_0(model(data_sample), target_sample)

  # Show updated variables
  if epoch % 100 == 0 or epoch == epochs - 1:
    print("-------------------- Epoch {} --------------------".format(epoch))
    print("Current Loss: {}".format(current_loss))
    print("beta_1 = {}".format(model.beta_1))
    print("beta_0 = {}".format(model.beta_0))

-------------------- Epoch 0 --------------------
Current Loss: 29049.607337315636
beta_1 = [2.15887651]
beta_0 = [85.23351111]
-------------------- Epoch 100 --------------------
Current Loss: 5306.8593330727335
beta_1 = [159.74961033]
beta_0 = [152.4155563]
-------------------- Epoch 200 --------------------
Current Loss: 4880.252950904606
beta_1 = [290.33811812]
beta_0 = [151.55574243]
-------------------- Epoch 300 --------------------
Current Loss: 4579.643374018085
beta_1 = [398.34403349]
beta_0 = [150.97398673]
-------------------- Epoch 400 --------------------
Current Loss: 4374.225056741411
beta_1 = [487.85341872]
beta_0 = [153.13098766]
-------------------- Epoch 500 --------------------
Current Loss: 4234.948527401572
beta_1 = [564.8906317]
beta_0 = [152.20837821]
-------------------- Epoch 600 --------------------
Current Loss: 4120.871588975998
beta_1 = [630.85788871]
beta_0 = [150.599879]
-------------------- Epoch 700 --------------------
Current Loss: 4054.438616700631

Let's now compare the loss of classical gradient descent with the loss of stochastic gradient descent in a visualization.

In [86]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=[i for i in range(epochs)][10:], y=loss_history[10:],
              mode="markers+lines",
              name="gradient descent loss"))
fig.add_trace(go.Scatter(x=[i for i in range(epochs)][10:], y=stochastic_loss_history[10:],
              mode="markers+lines",
              name="stochastic gradient descent loss"))
fig.update_layout(
    title="Gradient descent vs. Stochastic gradient descent",
    xaxis_title="epochs",
    yaxis_title="loss"
    )
fig.show()

## Batch gradient descent

Now let's implement batch gradient descent, for this you will need :
* `batch_size` : the number of observations in each batch
* `steps_per_epochs` : the number of steps before the model has trained on as many observations as the total number of observations in the dataset (meaning number of batches).
* `batch_loss_history` : a list that will contain the loss after each epoch is finished
* `batch_loss_by_step_history` : a list that will contain the loss after each step

⚠️ Don't forget to add `%%time` at the beginning of the cell to measure how long the stochastic gradient descent took to run over 1000 epochs ⚠️ 

In [83]:
%%time
batch_size = 100
steps_per_epochs = int(len(diabetes_X) / batch_size)
batch_loss_history = []
batch_loss_by_step_history = []
model = Model()
for epoch in range(epochs):
  # Calculate epoch loss 
  current_loss = mse(model(diabetes_X), y)
  batch_loss_history.append(current_loss)
  index = random.sample(range(len(diabetes_X)), len(diabetes_X))
  for step in range(steps_per_epochs):
    # define the batch index
    index_step = index[step*batch_size:(step+1)*batch_size]
    # define  random sample :
    data_sample = diabetes_X[index_step]
    target_sample = y[index_step]

    # calculate step loss
    step_loss = mse(model(data_sample), target_sample)
    batch_loss_by_step_history.append(step_loss)

    # Update variables
    model.beta_1 -= lr * derivative_mse_beta_1(model(data_sample), target_sample, data_sample)
    model.beta_0 -= lr * derivative_mse_beta_0(model(data_sample), target_sample)

  # Show updated variables
  if epoch % 100 == 0 or epoch == epochs - 1:
    print("-------------------- Epoch {} --------------------".format(epoch))
    print("Current Loss: {}".format(current_loss))
    print("beta_1 = {}".format(model.beta_1))
    print("beta_0 = {}".format(model.beta_0))

-------------------- Epoch 0 --------------------
Current Loss: 28955.37420227999
beta_1 = [1.70369288]
beta_0 = [91.40113838]
-------------------- Epoch 100 --------------------
Current Loss: 5313.364371439321
beta_1 = [158.38783618]
beta_0 = [153.46522234]
-------------------- Epoch 200 --------------------
Current Loss: 4879.84119366171
beta_1 = [289.23768989]
beta_0 = [150.41919956]
-------------------- Epoch 300 --------------------
Current Loss: 4580.373238829905
beta_1 = [398.26787112]
beta_0 = [152.5457078]
-------------------- Epoch 400 --------------------
Current Loss: 4370.881307079865
beta_1 = [489.42778516]
beta_0 = [153.71591661]
-------------------- Epoch 500 --------------------
Current Loss: 4225.518536880122
beta_1 = [565.46695687]
beta_0 = [152.29058332]
-------------------- Epoch 600 --------------------
Current Loss: 4123.734939377367
beta_1 = [629.18935651]
beta_0 = [152.03893473]
-------------------- Epoch 700 --------------------
Current Loss: 4052.62830836078


Let's compare all three methods in a visualization :

In [85]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=[i for i in range(epochs)][10:], y=loss_history[10:],
              mode="markers+lines",
              name="gradient descent loss"))
fig.add_trace(go.Scatter(x=[i for i in range(epochs)][10:], y=stochastic_loss_history[10:],
              mode="markers+lines",
              name="stochastic gradient descent loss"))
fig.add_trace(go.Scatter(x=[i for i in range(epochs)][10:], y=batch_loss_history[10:],
              mode="markers+lines",
              name="batch gradient descent loss"))
fig.update_layout(
    title="Gradient descent vs. Stochastic gradient descent",
    xaxis_title="epochs",
    yaxis_title="loss"
    )
fig.show()

**We can conclude from the graphs that stochastic and batch gradient descent methods converge much faster than classical gradient descent for the same number of epochs** 