<h1 style=“font-size:30px;“>Gradient Descent Assignment</h1>

The previous section on Gradient Descent focused on how to implement the gradient calculation and weight update for a single variable ( `m` ), using simple math operations. In this assignment, you do the same, but for 2 variables.  

We will use the full form of a line i.e. `y = mx + c`.  
You need to estimate the values of two variables `m` and `c`, using Stochastic Gradient Descent.


Tasks to implement for the 2 variables:   
1. Implement the gradient calculation step for the 2 variables.
2. Implement the weight update step for the 2 variables.

### Maximum Points: 30

<div>
    <table>
        <tr><td><h3>Section</h3></td> <td><h3>Problem</h3></td> <td><h3>Points</h3></td> </tr>
        <tr><td><h3>2.1</h3></td> <td><h3>Implement Gradients </h3></td> <td><h3>20</h3></td> </tr>
        <tr><td><h3>2.2</h3></td> <td><h3>Implement SGD</h3></td> <td><h3>10</h3></td> </tr>
    </table>
</div>


In [3]:
import os
import tensorflow as tf
import matplotlib.pyplot as plt

# For reproducibility
tf.random.set_seed(41)
os.environ["TF_DETERMINISTIC_OPS"] = "1"


%matplotlib inline
plt.style.use('ggplot')
plt.rcParams["figure.figsize"] = (15, 8)

## 1 Generate Sample Data

Here we define a function that will generate some sample data based on a **linear model** in the presence of random noise. We will generate 1,000 data points for this experiment. The independent variable, `x`, has values randomly distributed between -5 to 5. Values for `m` and `c` have been specified to create the data points for the dependent variable (`y`).  

In [2]:
# Generating y = mx + c + random noise.
num_data = 1000

# True values of m and c
m_line = 3.3
c_line = 5.3


# Input (Generate random data between [-5,5]).
x = tf.random.uniform([num_data], minval=-5, maxval=5)

# Output (Generate data assuming y = mx + c + noise).
y_label = m_line * x + c_line + tf.random.normal(x.shape).numpy()
y = m_line * x + c_line

# Plot the generated data points. 
plt.plot(x, y_label, '.', color='g', label="Data points")
plt.plot(x, y, color='b', label='y = mx + c', linewidth=3)
plt.ylabel('y')
plt.xlabel('x')
plt.legend()
plt.show()

The goal is to find the "unknown" parameters ($m$ and $c$) of the linear model below so that we can predict $y$, given some value of $x$.  

$$ y = mx + c $$  

We have a set of data points $(x_i, y_i)$, and they should all satisfy the equation above. i.e.,

$$ y_i = m x_i + c $$  

Since the data is not perfectly linear due to the added noise, we can represent the **error** or a **residual** as follows: 

$$ e_i = (y_i - m x_i -c) $$   


Next, we need to find a value of $m$ and $c$ that minimizes the error above. Positive or negative values of error are equally bad. So, we need to minimize the square of the above error, across all the data points.

$$ l_{sse} = \sum^N_{i=1}(y_i - m x_i -c)^2 \\ $$


This form of the **loss function** is the sum of squared errors.


## 2 Gradient Descent [30 Points]


We have already seen how the Math works for `m`. The same approach is used in the case of `m` and `c`.    
We calculate the loss function and then take partial derivatives w.r.t `m` and `c` respectively. 

$$
\begin{align}
l &= \sum^n_{i=1}(y_i - m x_i - c)^2 \\
\frac{\partial l}{\partial m}  &= -2 \sum^n_{i=1} x_i(y_i - m x_i - c) \\
\frac{\partial l}{\partial c}  &= -2 \sum^n_{i=1} (y_i - m x_i - c) \\
\end{align}
$$

To follow the slope of the curve, we move `m` in the direction of negative gradient. However, we must control the rate at which we go down the slope, so that we do not overshoot the minimum. Thus, we use a parameter $\lambda$ called the `learning rate`.
$$
\begin{align}
m_k &= m_{k-1} - \lambda \frac{\partial l}{\partial m} \\
c_k &= c_{k-1} - \lambda \frac{\partial l}{\partial c} \\ 
\end{align}
$$

That is it! 

Let's implement this in code to see that it really works. 



<!-- <div class="alert alert-block alert-info">
    <b>1. Implement Gradients: 20 Points</b>
</div> -->

### 2.1 Implement Gradients [20 Points]

**Useful tensorflow methods for this functions:**

1. [reduce_sum](https://www.tensorflow.org/api_docs/python/tf/math/reduce_sum)
2. [gather](https://www.tensorflow.org/api_docs/python/tf/gather)

In [93]:
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams["figure.figsize"] = (15, 6)
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 14
block_plot = False
def plotting_1d(x, label):
# Now plotting 
    plt.figure
    plt.xlabel(label)
    plt.plot(x.numpy(),'c--')
    plt.title('Generic 1D plot')
    plt.show(block=block_plot)

def plotting_2d(x, x_axis_label, y, y_axis_label):
    plt.figure
    plt.xlabel(x_axis_label)
    plt.ylabel(y_axis_label)
    plt.plot(x.numpy(),y.numpy(),'c--')
    plt.show(block=block_plot)

In [6]:
def gradient_wrt_m_and_c(inputs, labels, m, c, k):
    
    '''
    All arguments are defined in the training section of this notebook. 
    This function will be called from the training section.  
    So before completing this function go through the whole notebook.
    
    inputs (torch.tensor): input (X)
    labels (torch.tensor): label (Y)
    m (float): slope of the line
    c (float): vertical intercept of line
    k (torch.tensor, dtype=int): random index of data points
    '''
    num_iter0 = 970
    lr0 = 0.005

    # Data structures needed for computations
    loss = tf.Variable(tf.zeros(shape=[num_iter0]))

    # gradient w.r.t to m is g_m 
    # gradient w.r.t to c is g_c
    
    ###
    ### YOUR CODE HERE
    ###
    # Start computing loss
    '''
    The below code is not correct, there is a much simpler way to do this using the tf framework.
    for i in range(0, num_iter0):
        pde_loss_m = -2 * tf.reduce_sum(inputs * (labels - m * inputs - c))/len(inputs)
        pde_loss_c = -2 * tf.reduce_sum(labels - m * inputs - c)/len(inputs)
        
        # New slope and bias
        m = m - lr0 * pde_loss_m
        c = c - lr0 * pde_loss_c
        
        # Compute error
        e = labels - m * inputs - c
 
        
        # Loss function based on the error
        loss[i].assign(tf.reduce_sum(tf.multiply(e,e))/len(inputs))
        print(i,loss[i].numpy())
    
    g_m = m.numpy()
    c_m = c.numpy()
    '''
    g_c = -2.0 * tf.math.reduce_sum(tf.gather(labels,k) - m * tf.gather(inputs,k) - c)
    g_m = -2.0 * tf.math.reduce_sum(tf.gather(inputs,k) * (tf.gather(labels,k) - m * tf.gather(inputs,k) - c))
    return g_m, g_c

**Test your code before submitting it using the below code cell.**

For the given input:
```
X = tf.convert_to_tensor([-0.0374,  2.6822, -4.1152])
Y = tf.convert_to_tensor.tensor([ 5.1765, 14.1513, -8.2802])
m = 2
c = 3
k = tf.convert_to_tensor.tensor([0, 2])
```
Output:
```
Gradient of m : -24.93
Gradient of c : 1.60
```
```

In [7]:
X = tf.convert_to_tensor([-0.0374,  2.6822, -4.1152])
Y = tf.convert_to_tensor([ 5.1765, 14.1513, -8.2802])
m = 2
c = 3
k = tf.convert_to_tensor([0, 2])

gm, gc = gradient_wrt_m_and_c(X, Y, m, c, k)

print(f'Gradient of m : {gm:.2f}')
print(f'Gradient of c : {gc:.2f}')    

Gradient of m : -24.93
Gradient of c : 1.60


In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [6]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### 2.2 Stochastic Gradient Descent (SGD) [10 point]

In [None]:
def update_m_and_c(m, c, g_m, g_c, lr):
    '''
    All arguments are defined in the training section of this notebook. 
    This function will be called from the training section.  
    So before completing this function go through the whole notebook.
    
    g_m = gradient w.r.t to m
    c_m = gradient w.r.t to c
    '''
    # Update m and c parameters.
    # store updated value of m is updated_m variable
    # store updated value of c is updated_c variable
    ###
    ### YOUR CODE HERE
    ###
    updated_m = m - lr * g_m
    updated_c = c - lr * g_c
    return updated_m, updated_c

**Test your code before submitting it using the below code cell.**

For the given input:
```
m = 2
c = 3
g_m = -24.93
g_c = 1.60
lr = 0.001
```
Output:
```
Updated m: 2.02
Updated c: 3.00
```

In [4]:
m = 2
c = 3
g_m = -24.93
g_c = 1.60
lr = 0.001
m, c = update_m_and_c(m, c, g_m, g_c, lr)

print('Updated m: {0:.2f}'.format(m))
print('Updated c: {0:.2f}'.format(c))

Updated m: 2.02
Updated c: 3.00


In [9]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [10]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


## 3  Training

In [11]:
# Stochastic Gradient Descent with Minibatch.

# Input 
X = x

# output label.
Y = y_label

num_iter = 1000
batch_size = 10

# display updated values after every 50 iterations.
display_count = 50
# 

lr = 0.001
m = 2
c = 1
print()
loss = []

for i in range(0, num_iter):

    # Randomly select a training data point.
    k = tf.random.uniform(shape=[batch_size], minval=0, maxval=len(Y)-1, dtype=tf.int32)

    # Calculate gradient of m and c using a mini-batch.
    g_m, g_c = gradient_wrt_m_and_c(X, Y, m, c, k)
    
    # update m and c parameters.
    m, c = update_m_and_c(m, c, g_m, g_c, lr)
    
    # Calculate Error.
    e = Y - m * X - c
    
    # Compute Loss Function.
    current_loss = tf.math.reduce_sum(tf.math.multiply(e,e))
    loss.append(current_loss)

    if i % display_count==0:
        print('Iteration: {}, Loss: {}, updated m: {:.3f}, updated c: {:.3f}'.format(i, loss[i], m, c))
        y_pred = m * X + c
        # Plot the line corresponding to the learned m and c.
        plt.plot(x, y_label, '.', color='g')
        plt.plot(x, y, color='b', label='Line corresponding to m={0:.2f}, c={1:.2f}'.
                 format(m_line, c_line), linewidth=3)
        plt.plot(X, y_pred, color='r', label='Line corresponding to m_learned={0:.2f}, c_learned={1:.2f}'.
                 format(m, c), linewidth=3)
        plt.title("Iteration : {}".format(i))
        plt.legend()

        plt.ylabel('y')
        plt.xlabel('x')
        plt.show()
        

In [12]:
print('Loss of after last batch: {}'.format(loss[-1]))
print('Leaned "m" value: {}'.format( m))
print('Leaned "c" value: {}'.format( c))

# Plot loss vs m.
plt.figure
plt.plot(range(len(loss)),loss)
plt.ylabel('loss')
plt.xlabel('iterations')
plt.show()

In [13]:
# Calculate the predicted y values using the learned m and c.
y_pred = m * X + c

In [14]:
# Plot the line corresponding to the learned m and c
plt.plot(x, y_label, '.', color='g', label='X and Y')
plt.plot(x, y, color='b', label='Line corresponding to m={0:.2f}, c={1:.2f}'.format(m_line, c_line), linewidth=3)
plt.plot(X, y_pred, color='r', label='Line corresponding to m_learned={0:.2f}, c_learned={1:.2f}'.format(m, c), linewidth=3)
plt.legend()

plt.ylabel('y')
plt.xlabel('x')
plt.show()