# Simple Gradient Descent 

Example by hand :
Question : Find the local minima of the function y=(x+5)² starting from the point x=3
![graph](https://miro.medium.com/max/800/1*5-56UEwcZHgzqIAtlnsLog.png)
Solution : We know the answer just by looking at the graph. y = (x+5)² reaches it’s minimum value when x = -5 (i.e when x=-5, y=0). Hence x=-5 is the local and global minima of the function.
Now, let’s see how to obtain the same numerically using gradient descent.<br>
Step 1 : Initialize x =3. Then, find the gradient of the function, dy/dx = 2*(x+5).



Step 2 : Move in the direction of the negative of the gradient (Why?). But wait, how much to move? For that, we require a learning rate. Let us assume the learning rate → 0.01 <br>
Step 3 : Let’s perform 2 iterations of gradient descent<br>
![](https://miro.medium.com/max/746/1*YkU1u_Px_FprYjKL1xtUwg.png)
Step 4 : We can observe that the X value is slowly decreasing and should converge to -5 (the local minima). However, how many iterations should we perform?<br>

In [2]:
def df(x):
    return 2*(x+5)

current_theta = 3 #start at x = 3 
learning_rate = 0.1
maximum_iteration = 500 #maximum no of iteration
i = 0 #iteration counter
value_difference = 1 #difference in value between two iterations
precision = 0.000001

In [3]:
while value_difference > precision and i < maximum_iteration:
    old_theta = current_theta #store current x value in old theta
    current_theta = current_theta - learning_rate * df(old_theta) #gradient descent
    value_difference = abs(current_theta-old_theta) #change of x
    i+=1 #iteration_count
    print("Iteration",i,"\n value is",current_theta) #print iterations

print("The local minimum occurs at", current_theta)

Iteration 1 
 value is 1.4
Iteration 2 
 value is 0.11999999999999966
Iteration 3 
 value is -0.9040000000000001
Iteration 4 
 value is -1.7232000000000003
Iteration 5 
 value is -2.3785600000000002
Iteration 6 
 value is -2.902848
Iteration 7 
 value is -3.3222784
Iteration 8 
 value is -3.65782272
Iteration 9 
 value is -3.926258176
Iteration 10 
 value is -4.1410065408
Iteration 11 
 value is -4.312805232640001
Iteration 12 
 value is -4.450244186112
Iteration 13 
 value is -4.5601953488896
Iteration 14 
 value is -4.64815627911168
Iteration 15 
 value is -4.718525023289343
Iteration 16 
 value is -4.774820018631475
Iteration 17 
 value is -4.81985601490518
Iteration 18 
 value is -4.855884811924144
Iteration 19 
 value is -4.884707849539315
Iteration 20 
 value is -4.907766279631452
Iteration 21 
 value is -4.926213023705161
Iteration 22 
 value is -4.940970418964129
Iteration 23 
 value is -4.952776335171303
Iteration 24 
 value is -4.962221068137042
Iteration 25 
 value is -4.969

Solving Linear Equation using Gradient Descent

In [25]:
import numpy as np
import matplotlib.pyplot as plt 
%matplotlib inline

In [26]:
X = 2 * np.random.rand(10,1)
Y = 4 + 3 * X + np.random.randn(10,1)


In [43]:
def cal_cost(theta,X,Y):
    m=len(Y)
    prediction = X.dot(theta)
    cost = (1/(2*m)) * np.sum(np.square(prediction-Y))
    return cost

In [48]:
def gradient_descent(X,Y,theta,learning_rate=0.01,iterations=5000):
    m=len(Y)
    cost_history = np.zeros(iterations)
    theta_history = np.zeros((iterations,2))
    for it in range(iterations):
        prediction = np.dot(X,theta)
        theta = theta-(1/m)*learning_rate*(X.T.dot((prediction - Y)))
        theta_history[it,:] = theta.T
        cost_history[it] = cal_cost(theta,X,Y)
    return theta_history[np.argmin(cost_history)]    

In [54]:
lr = 0.01
n_iter =2000
theta = np.random.randn(2,1)
X_b = np.c_[np.ones((len(X),1)),X]
theta_optimum = gradient_descent(X_b,Y,theta,lr,n_iter)
print(theta_optimum)

[5.11240534 2.03253828]


# Solving Linear Equation using Stochastic Gradient Descent

In [59]:
def stocashtic_gradient_descent(X,Y,theta,learning_rate=0.01,iterations = 10):
    m=len(Y)
    for it in range(iterations):
        for i in range(m):
            rand_ind = np.random.randint(0,m)
            X_i=X[rand_ind,:].reshape(1,X.shape[1])
            Y_i=Y[rand_ind].reshape(1,1)
            prediction = np.dot(X_i,theta)
            theta = theta - learning_rate * (X_i.T.dot((prediction-Y_i)))
    return theta        

In [60]:
lr = 0.01
n_iter = 2000
theta = np.random.randn(2,1)
X_b = np.c_[np.ones((len(X),1)),X]
theta_optimum = stocashtic_gradient_descent(X_b,Y,theta,lr,n_iter)
print(theta_optimum)

[[5.13613295]
 [1.9933342 ]]


# Solving Linear Equation using Mini-batch Gradient Descent

In [61]:
def minibatch_gradient_descent(X,Y,theta,learning_rate=0.01,iterations = 10, batch_size=20):
    m=len(Y)
    n_batches = int(m/batch_size)
    
    for it in range(iterations):
        indices = np.random.permutation(m)
        X = X[indices]
        Y = Y[indices]
        for i in range(0,m,batch_size):
            X_i = X[i:i+batch_size]
            Y_i = Y[i:i+batch_size]
            
            prediction = np.dot(X_i,theta)
            theta = theta - (1/m) * learning_rate * (X_i.T.dot((prediction - Y_i)))
    return theta        

In [62]:
lr = 0.01
n_iter = 200
theta = np.random.randn(2,1)
X_b = np.c_[np.ones((len(X),1)),X]
theta_optimum = minibatch_gradient_descent(X_b,Y,theta,lr,n_iter)
print(theta_optimum)

[[4.31600142]
 [2.67943491]]
