# <center>Calculus and Gradient Descent

### Outcomes
<br>
    - Calculate derivatives for any given function <br>
    - Understand partial derivatives <br>
    - Understand and implement gradient descent to perform linear regression <br>
    - Gain more experience manipulating Numpy arrays

### Derivatives

In [None]:
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-5,5)
y = [i**2 for i in x]
plt.plot(x,y)

### Power rule

### Constant factor rule

### Addition rule

### Chain rule

### Partial Derivatives

# Knowledge Check

\begin{align}
& f(x) =2/3x^9 \\ \\
& f(x) =4x^4 - 5/3x^3 + 2x^2 - 5x + 7 \\ \\
& f(x) = (3x^2 + 4x + 2)^3 \\ \\
& f(x) = 5x^3 + 4y^2 + 3x + 2y + 8 \\ \\
\end{align}

### Using derivatives to find minima and maxima

### <center>Cost function<br>
#### <center> A measure of error that needs to be minimized

<center> <img src='mse.jpg'>

### Gradient descent

<center><img src='grad_desc.png'>
<center><img src='gradient_desc.gif' height="500" width="500">

### Step size and Learning Rate <br>
Step size will be equal to the product of the current value of the derivative of the cost function and a learning rate.

<img src='learning_rate.png'>

### Gradient Descent in Action

In [None]:
X = np.array([1,2,3])
y = np.array([2,3,7])
plt.scatter(X,y)

In [None]:
from sklearn.metrics import mean_squared_error 
m = 0
b = 0
lr = 0.1
N = len(x)

y_pred = X*m+b  ## [0, 0, 0]
y_diff = y-y_pred ## [2, 3, 7]
D_m = (-2/N)*sum(X*y_diff) ## [(-2/3)*(1*2 + 2*3 + 3*7)]
D_b = (-2/N)*sum(y_diff) ## [(-2/3)*(2 + 3 + 7)]
m = m - lr*D_m
b = b - lr*D_b
print('Slope: ',m)
print('Intercept: ', b)
print('MSE: ',mean_squared_error(y, y_pred))

In [None]:
m = 0
b = 0
lr = 0.1
N = len(x)
epochs = 100

for i in range(epochs):
    y_pred = X*m+b
    y_diff = y-y_pred
    D_m = (-2/N)*sum(X*y_diff)
    D_b = (-2/N)*sum(y_diff)
    m = m - lr*D_m
    b = b - lr*D_b
print('Slope: ',m)
print('Intercept: ', b)
print('MSE: ',mean_squared_error(y, y_pred))

In [None]:
plt.scatter(X,y)
plt.plot(X,y_pred)

In [None]:
def gradient_descent(X,y,lr,epochs):
    N = len(X)
    m = 0
    b = 0
    for i in range(epochs):
        y_pred = X*m+b
        y_diff = y-y_pred
        D_m = (-2/N)*sum(X*y_diff)
        D_b = (-2/N)*sum(y_diff)
        m = m - lr*D_m
        b = b - lr*D_b
    return m,b,y_pred

### Another example

In [None]:
import pandas as pd
df = pd.read_csv('graduate_admission_data.csv')
df.head()

In [None]:
X = np.array(df['GRE Score'])
y = np.array(df['Chance of Admit '])

In [None]:
m,b,y_pred = gradient_descent(X,y,0.1,100)
print('Slope: ',m)
print('Intercept: ', b)
print('MSE: ',mean_squared_error(y_pred,y))

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[df.columns] = scaler.fit_transform(df[df.columns])
X = np.array(df['GRE Score'])
y = np.array(df['Chance of Admit '])

In [None]:
m,b,y_pred = gradient_descent(X,y,0.1,2000)
print('Slope: ',m)
print('Intercept: ', b)
print('MSE: ',mean_squared_error(y_pred,y))

In [None]:
import statsmodels.api as sm
result = sm.OLS(y, sm.add_constant(X)).fit()
print('MSE: ',result.mse_resid)
result.summary()

## Activity
Using the function we created for linear regression using gradient descent, implement a function that can perform multiple linear regression. <br><br>
Instead of taking in one feature (x), the function should be able to handle any number of features (x1, x2, x3...xi).

To do this, you will need to consider the following:
- <b>m</b> will now be an array instead of a scalar value corresponding to the slope for each of the features
- <b>y_pred</b> can be found by using matrix multiplication of the <b>X</b> matrix and the <b>m</b> array.
- Similar to the calculation of <b>y_pred</b>, to calculate the derivatives with respect to each <b>m</b>, you can use matrix multiplication. <br><b>Note:</b> Remember how the shapes of matrices have to align in order to multiply them and how you can utilize a matrix transformation to get a matrix into the right shape.

After completing your multiple gradient descent function, run it on all the features of the Graduate Admissions dataset. <br> <br>
Feel free to experiment with different values for the learning rate and number of epochs. <br> <br>
Then, compare your results to an OLS regression. <br> <br>
What seems to be the best combination of learning rate and number of epochs for this data?

In [None]:
def multiple_gradient_descent(X,y,lr,epochs):
    