# Gradient Descent

In this exercise, you will create the necessary functions to go through the steps of a single Gradient Descent Epoch. You will then combine the functions and create a loop through the entire Gradient Descent procedure.

## 1. Data Exploration

We'll import for you the following dataset of ingredients with their mineral content

In [1]:
import pandas as pd

data = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_ingredients_zinc_phosphorous.csv")
data.head()

üëá We can visualize a somewhat Linear relationship between the `Phosphorus` and `Zinc`.   

Let's use Gradient Descent to find the line of best fit between them! 

In [2]:
import seaborn as sns

sns.scatterplot(data=data, x='zinc', y='phosphorus');

üëá Create the two `np.Array`
- `data_X` for zinc
- `data_Y` for phosphorus

In [4]:
# YOUR CODE HERE

In [5]:
assert (data_X.shape == (53,))
assert (data_Y.shape == (53,))

## 2. Code one Epoch

In this section of the exercise, you will define the key functions used to update the parameters during one epoch $\color {red}{(k)}$ of gradient descent. Recall the formula below

$$
\beta_0^{\color {red}{(k+1)}} = \beta_0^{\color {red}{(k)}} - \eta \frac{\partial L}{\partial \beta_0}(\beta^{\color{red}{(k)}})
$$


$$
\beta_1^{\color {red}{(k+1)}} = \beta_1^{\color {red}{(k)}} - \eta \frac{\partial L}{\partial \beta_1}(\beta^{\color {red}{(k)}})
$$


### 2.1 Hypothesis Function

$$
\hat{y} =  a x + b
$$

üëá Define the hypothesis function of a Linear Regression. Let `a` be the slope and `b` the intercept.


In [6]:
def h(X,a,b):
    pass  # YOUR CODE HERE

### 2.2 Loss Function

$$
Sum\ Squares\ Loss = \sum_{i=0}^n (y^{(i)} - \hat{y}^{(i)} )^2
$$

üëá Define the SSR Loss Function for the above created Hypothesis Function. Reuse `h` coded above


In [7]:
import numpy as np

def loss(X,Y,a,b):
    pass  # YOUR CODE HERE

‚ùì What would be the total Loss computed on all our ingredients dataset if:
- a = 1 
- b = 1

In [8]:
# YOUR CODE HERE

‚ö†Ô∏è You should be getting 63.86. If not, something is wrong with your function. Fix it before moving on!

### 2.3 Gradient

$$
\frac{d\ SSR}{d\ slope}= \sum_{i=0}^n -2  x_i (y^{(i)} - \hat{y}^{(i)} )
$$

$$
\frac{d\ SSR}{d\ intercept}= \sum_{i=0}^n -2(y^{(i)} - \hat{y}^{(i)} ) 
$$

üëá Define a function to compute the partial derivatives of the Loss Function relative to parameter `a` and `b` at a given points.


<details>
<summary>üí° Hint</summary>
Again, you must use the Hypothesis Function within to compute the predictions at given points.
</details>

In [9]:
def gradient(X,Y,a,b):
    pass  # YOUR CODE HERE
    return d_a, d_b

‚ùì Using your function, what would be the partial derivatives of each parameter if:
- a = 1
- b = 1

In [10]:
# YOUR CODE HERE

‚ö†Ô∏è You should be getting 48.45 and  115.17. If not, fix your function!

### 2.4 Step Sizes

$$
step\ size = gradient \cdot learning\ rate
$$

üëá Define a function that calculates the step sizes alongside each parameter (`a`,`b`), according to their derivatives (`d_a`, `d_b`) and a learning_rate equals to 0.01 by default

In [11]:
def steps(d_a,d_b, learning_rate = 0.01):
    pass  # YOUR CODE HERE
    return (step_a, step_b)

‚ùì What would be the steps (`step_a`, `step_b`) to take for the derivatives computed above for (`a`,`b`) = (1,1)?

In [12]:
# YOUR CODE HERE

‚ö†Ô∏è The steps should be 0.48 for a and 1.15 for b

### 2.5 Update parameters (a, b)

$$
updated\ parameter = old\ parameter\ value - step\ size
$$

üëá Define a function that computes the updated parameter values from the old parameter values and the step sizes.

In [13]:
def update_params(a, b, step_a, step_b):
    pass  # YOUR CODE HERE
    return a_new , b_new

### 2.6 One full epoch

üëá Using the functions you just created, compute the updated parameters at the end of the first Epoch, had you started with parameters:
- a = 1
- b = 1

In [15]:
# YOUR CODE HERE

‚ö†Ô∏è You should be getting the following values:
   - updated_a = 0.51
   - updated_b = -0.15

## 3. Gradient Descent

üëá Now that you have the necessary functions for a Gradient Descent, loop through epochs until convergence.

- Initialize parameters `a = 1` and  `b = 1`
- Consider convergence to be **100 epochs**
- Don't forget to start each new epoch with the updated parameters
- Append the value of the loss, a, and b at each epoch to a list called `loss_history`, `a_history` and `b_history`

In [16]:
# YOUR CODE HERE

‚ùì What are the parameter values `a_100` and `b_100` at the end of the 100 epochs?

In [17]:
# YOUR CODE HERE

In [18]:
# üß™ Test your code
from nbresult import ChallengeResult
result = ChallengeResult('descent',
                         a_100=a_100,
                         b_100=b_100)
result.write()
print(result.check())

## 4. Visual check

üëá Wrap this iterative approach into a method `gradient_descent()` which returns your new a/b and `history`, a dictionary containing the 
- `loss_history`
- `a_history`
- `b_history`

In [43]:
def gradient_descent(X, Y, a_init=1, b_init=1, learning_rate=0.001, n_epochs=100):
    pass  # YOUR CODE HERE
    return a_new, b_new, history

üëá Plot the line of best fit through Zinc and Phosphorus using the parameters of your Gradient Descent.

In [45]:
# YOUR CODE HERE

## 5. Visualize your descent

Our goal is to plot our loss function and the descent steps on a 2D surface using matplotlib [contourf]

üëáStart by creating the data we need for the plot
- `range_a` a range of 100 values for `a` equally spaced between -1 and 1
- `range_b` a range of 100 values for `b` equally spaced between -1 and 1 
- `Z` a 2D-array where each elements `Z[j,i]` is equal to the value of the loss function at `a` = `range_a[i]` and `b` = `range_b[j]`

In [73]:
# YOUR CODE HERE

In [74]:
# YOUR CODE HERE

üëá Now, plot in one single subplot:
- your gradient as a 2D-surface using matplotlib [contourf](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.contourf.html)
- all historical (a,b) points as red dots to visualize your gradient descent!

Change your learning rate and observe it's impact on the graph!

In [55]:
# YOUR CODE HERE

üëá [optional] What about 3D? Try out this [plot.ly - 3D contour plot](https://plotly.com/python/3d-surface-plots/) below

In [56]:
import plotly.graph_objects as go

surface = go.Surface(x=range_a, y=range_b, z=Z)
scatter = go.Scatter3d(x=history['a'], y=history['b'], z=history['loss'], mode='markers')
fig = go.Figure(data=[surface, scatter])

#fig.update_layout(title='Loss Function', autosize=False, width=500, height=500)
fig.show()

üëá Plot the history of the `loss` values as a function of number of `epochs`. Vary the `learning_rate` from 0.001 to 0.01 and make sure to understand the difference

In [61]:
# YOUR CODE HERE

## 6. With Sklearn...

üëá Using Sklearn, train a Linear Regression model on the same data. Compare its parameters to the ones computed by your Gradient Descent.

In [71]:
# YOUR CODE HERE

They should be almost identical!

###¬†üèÅ Congratulation! Please, push your exercise when you are done