## Simple Gradient Descent Example

**Question**: Find the local minima of the function y = (x+5)^2 starting from the point x = 3.

<img src="Figures/GradientSimple_Fig1.png" style="height:250px">

**Analytic Solution:** We know that the minimum is at the point **x = - 5** (when x = - 5, y = 0). Hence, *x = - 5* is the local and global minima of the function.

We saw the **analytic** solution above, now let us try to do the same **numerically** using gradient descent.

**STEP 1:** Initialize x = 3. Then, find the gradient of the function, dy/dx = 2*(x+5).

**STEP 2:** Move in the direction of the negative of the gradient. 


**Note:** But, how much to move? For that, we require a **learning rate**. Say, learning rate = 0.01

**STEP 3:** Perform gradient descent in an iterative manner. Two iterations are shown below:

<img src="Figures/GD_algorithm.png" style="height:650px">

**STEP 4:** The X value starts to decrease, and should converge to - 5 (local minima). But, how many iterations to perform?

**Solution:** Let us set a precision variable in our algorithm which calculates the difference between two consecutive “x” values. If the difference between x values from 2 consecutive iterations is lesser than the precision we set, stop the algorithm !

Now, we will perform the above steps in Python.

### Simple Gradient Descent Example

In [None]:
def df(x):
    return 2*(x+5)

current_theta = 3  # Start at x=3
learning_rate = 0.01
maximum_iterations = 500 # max number of iterations
i = 0 #iteration counter

value_difference = 1 # difference in value between two iterations

precision = 0.000001

In [None]:
while value_difference > precision and i < maximum_iterations:
    old_theta = current_theta #Store current x value in old_theta
    current_theta = # TODO # Fill in the Gradient descent step
    value_difference = abs(current_theta - old_theta) #Change in x
    i = i+1 #iteration count
    #print("Iteration",i,"\nX value is",current_theta) #Print iterations
    
print("The local minimum occurs at", current_theta)

## Comparing Analytical and Numerical way of doing Linear Regression

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use(['ggplot'])

### Generate some Data to do the comparison on

<h5> Generate some data with:
\begin{equation} \theta_0= 4 \end{equation} 
\begin{equation} \theta_1= 3 \end{equation} 

Add some Gaussian noise to the data

In [None]:
X = 2 * np.random.rand(1000,1)
y = 4 + 3 * X + np.random.randn(1000,1)
z = 4 + 3 * X
#print(X)
#print(y)

Let's plot our data to check the relation between X and Y

In [None]:
plt.plot(X,y,'b.')
plt.plot(X,z,'r.')
plt.xlabel("$x$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
_ =plt.axis([0,2,0,15])

##  Analytical way of Linear Regression

In [None]:
X_b = np.c_[np.ones((1000,1)),X]
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
print(theta_best)

<h5>This is close to our real thetas 4 and 3. It cannot be accurate due to the noise we have introduced in data

In [None]:
X_new = np.array([[0],[2]])
X_new_b = np.c_[np.ones((2,1)),X_new]
print(X_new_b)
y_predict = X_new_b.dot(theta_best)
y_predict

<h5>Let's plot prediction line with calculated:theta

In [None]:
plt.plot(X_new,y_predict,'r-')
plt.plot(X,y,'b.')
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.axis([0,2,0,15])

## Solving Linear Equation using Gradient Descent

In [None]:
X = 2 * np.random.rand(10,1)
y = 4 + 3 * X + np.random.randn(10,1)

In [None]:
def  cal_cost(theta,X,y):
    
    m = len(y)
    predictions = X.dot(theta)
    cost = (1/(2*m)) * np.sum(np.square(predictions-y))
    return cost

In [None]:
def gradient_descent(X,y,theta,learning_rate=0.01,iterations=5000):

    m = len(y)
    cost_history = np.zeros(iterations)
    theta_history = np.zeros((iterations,2))
    for it in range(iterations):    
        prediction = np.dot(X,theta)
        theta = theta -(1/m)*learning_rate*( X.T.dot((prediction - y)))
        theta_history[it,:] =theta.T
        cost_history[it]  = cal_cost(theta,X,y) 
    return theta, cost_history, theta_history[np.argmin(cost_history)]

In [None]:
lr =0.01
n_iter = 2000
theta = np.random.randn(2,1)
X_b = np.c_[np.ones((len(X),1)),X]
theta, cost_history, theta_optimum = gradient_descent(X_b,y,theta,lr,n_iter)

print('Theta0:          {:0.3f} \nTheta1:          {:0.3f}'.format(theta_optimum[0] , theta_optimum[1]));
print('Final cost/MSE:  {:0.3f}'.format(cost_history[-1]))

In [None]:
####################
# Plot the values ##
####################

theta, cost_history,_ = gradient_descent(X_b,y,theta,lr,n_iter) 
fig,ax = plt.subplots(figsize=(10,8))

ax.set_ylabel('{J(Theta)}',rotation=0)
ax.set_xlabel('{Iterations}')
theta = np.random.randn(2,1)

_=ax.plot(range(n_iter),cost_history,'b.')

### Solving Linear Equation using Stochastic Gradient Descent

In [None]:
def stocashtic_gradient_descent(X,y,theta,learning_rate=0.01,iterations=10):
    
    m = len(y)
    cost_history = np.zeros(iterations)
    for it in range(iterations):
        cost = 0.0
        for i in range(m):
            rand_ind = np.random.randint(0,m)
            X_i = X[rand_ind,:].reshape(1,X.shape[1])
            y_i = y[rand_ind].reshape(1,1)
            prediction = # TODO
            theta = # TODO)
            cost += cal_cost(theta,X_i,y_i)
        cost_history[it]  = cost
    return theta, cost_history

In [None]:
lr =0.01
n_iter = 2000

theta = np.random.randn(2,1)

X_b = np.c_[np.ones((len(X),1)),X]

theta_optimum,_ = stocashtic_gradient_descent(X_b,y,theta,lr,n_iter)

print('Theta0:          {:0.3f} \nTheta1:          {:0.3f}'.format(theta_optimum[0][0] , theta_optimum[1][0]));

In [None]:
#####################
## Plot the values ##
#####################

#TODO

### Solving Linear Equation using Mini-batch Gradient Descent

In [None]:
def minibatch_gradient_descent(X,y,theta,learning_rate=0.01,iterations=10,batch_size =20):
    pass