<a href="https://colab.research.google.com/github/songqsh/foo1/blob/master/src/sgd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stochastic gradient descent

- Objective: For a given set of functions $\{f_i(\theta): i = 1, 2, \ldots, n\}$, find 
$$\theta^* = \arg\min_\theta \frac 1 n \sum_{i=1}^n f_i(\theta).$$

- Algorithm GD: Update the current $\theta$ by
$$\theta' = \theta - \gamma \cdot \frac 1 n \sum_{i=1}^n \nabla f_i(\theta).$$

- Algorithm SGD: Update the current $\theta$ by
$$\theta' = \theta - \gamma \nabla f_i(\theta),$$
where $i$ is the iid uniform random varialbe on the set $\{1,2,\ldots, n\}$.

In [20]:
# gradient descent
import numpy as np

def gd(val_g, grad_g, th0=0., eps = 0.01):
  th0 = np.array(th0)
  for i in range(1000):
    th1 = th0 - eps*np.array(grad_g(th0))
    th0 = th1
  return th0

# A test on 1-d functions with 2 samples

In [18]:
# Given value and gradient of the functions
def f1(th):
  return float(th)**2

def grad_f1(th):
  return 2.* float(th)

def f2(th):
  return (float(th)- 2.)**2

def grad_f2(th):
  return 2.* (th -2.)

list_val_f = [f1, f2]
list_grad_f = [grad_f1, grad_f2]

In [24]:
# gradient descent

def val_g(th):
  val = 0.
  for f in list_val_f:
    val += f(th)
  return val/len(list_val_f)

def grad_g(th):
  grad = 0.
  for f in list_grad_f:
    grad += f(th)
  return grad/len(list_grad_f)

gd(val_g, grad_g, th0=0., eps = 0.01)

0.9999999983170326