<h2 style="text-align: center;"><strong>Segment 3: Automatic Differentiation</strong></h2>

* AutoDiff with PyTorch and TensorFlow 2
* Machine Learning via Differentiation 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch 
import tensorflow as tf

## **AutoDiff with PyTorch and TensorFlow 2**

**PyTorch** and **TensorFlow** are the two most popular automatic differentiation libraries.

Let's use them to calculate $dy/dx$ at $x = 5$ where: 

$$y = x^2$$

$$ \frac{dy}{dx} = 2x = 2(5) = 10 $$

##### **AutoDiff with PyTorch**

In [None]:
x = torch.tensor(5.0)
x

*Contagiously track gradients through forward pass*

In [None]:
x.requires_grad_() 

In [None]:
y = x**2

*Using autodiff*

In [None]:
y.backward()

In [None]:
x.grad

##### **Autodiff with TensorFlow**

In [None]:
x = tf.Variable(5.0)

*Track forward pass*

In [None]:
with tf.GradientTape() as t:
    t.watch(x) 
    y = x**2

*Using autodiff*

In [None]:
t.gradient(y, x) 

---

## **Machine Learning via Differentiation**

*We use PyTorch’s automatic differentiation library to fit a straight line to a very small dataset containing only a few data points. This time, the regression problem is solved using auto-diff, rather than the* **Moore–Penrose pseudoinverse** *approach covered earlier in* **Phase 3, Course 3: Linear Algebra for Machine Learning.**

$x$ *represents the **dosage levels of a drug** administered to patients in a study on Alzheimer's disease.*

In [None]:
x = torch.tensor([0, 1, 2, 3, 4, 5, 6, 7.])
x

$y$ *represents the patients forgetfulness scores corresponding to each drug dosage.*

In [None]:
y = torch.tensor([1.86, 1.31, .62, .33, .09, -.67, -1.23, -1.37]) 
y

**Plot data points**

In [None]:
fig, ax = plt.subplots()
plt.title("Clinical Trial",fontweight="bold")
plt.xlabel("Drug dosage (mL)")
plt.ylabel("Forgetfulness")
ax.scatter(x, y)
plt.show()

*The target $y$ values are generated from the linear equation $y = mx + b$, which lets us know the true parameters the model should learn. In our case, the underlying line uses $m = 0.9$ and $b = 0.1$. To introduce a bit of realism, we add random, normally distributed noise to the data to simulate sampling variability. We will use PyTorch’s automatic differentiation (autodiff) to learn these parameters from the noisy observations.*

In [None]:
import torch.nn as nn
m = nn.Parameter(torch.tensor(0.9))
m

In [None]:
b = nn.Parameter(torch.tensor(0.1))
b

*We define a simple regression function that computes the predicted value $\hat{y}$ for a given input $x$ using the linear model $y = mx + b$*

In [None]:
def regression(m,x,b):
    return m*x + b

*Function to Plot the Noisy Data and the Learned Linear Model*

In [None]:
def regression_plot(x, y, m, b):
    fig, ax = plt.subplots()

    ax.scatter(x, y)
    x_min, x_max = ax.get_xlim()
    y_min = regression(m, x_min, b).item()
    y_max = regression(m, x_max, b).item()
    ax.plot([x_min, x_max], [y_min, y_max])

    return ax

In [None]:
regression_plot(x,y,m,b)

*Machine Learning*

**Step 1**: Forward pass

In [None]:
yhat = regression(m,x,b)
yhat

**Step 2**: Compare $\hat{y}$ with true $y$ to calculate cost $C$

> There is a PyTorch **MSELoss** method, but let's define it outselves to see how it works. MSE cost is defined by:* $$C = \frac{1}{n} \sum_{i=1}^n (\hat{y_i}-y_i)^2 $$

In [None]:
def mse(yhat, y): 
    sigma = torch.sum((yhat - y)**2)
    return sigma/len(y)

In [None]:
C = mse(yhat, y)
C

**Step 3**: Use Autodiff to calculate gradient of $C$ w.r.t. parameters

In [None]:
C.backward()

In [None]:
m.grad

In [None]:
b.grad

**Step 4**: Gradient descent

In [None]:
optimizer = torch.optim.SGD([m, b], lr=0.01)

In [None]:
optimizer.step()

*Let's confirm parameters have been adjusted sensibly*

In [None]:
m

In [None]:
b

In [None]:
regression_plot(x, y, m, b)

*We can repeat steps 1 and 2 to confirm cost has decreased*

In [None]:
C = mse(regression(m,x,b), y)
C

*Put the 4 steps in a loop to iteratively minimize cost toward zero:*

In [None]:
epochs = 1000
for epoch in range(epochs):
    
    optimizer.zero_grad()
    
    yhat = regression(m,x,b) 
    C = mse(yhat, y) 
    
    C.backward() 
    optimizer.step() 
    
    print('Epoch {}, cost {}, m grad {}, b grad {}'.format(epoch, '%.3g' % C.item(), '%.3g' % m.grad.item(), '%.3g' % b.grad.item()))

In [None]:
regression_plot(x, y, m, b)

---