# Calculus 1 - Exercises

In [3]:
# import libraries
import torch
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

## 1.
Use PyTorch (or TensorFlow, if you like) to find the slope of y = $x^{2} + 2x + 2$ where x = 2.

### Autodiff with PyTorch

In [7]:
# let's declare x variable
x = torch.tensor(2.0)
x

tensor(2.)

In [9]:
x.requires_grad_() # track gradients through forward pass

tensor(2., requires_grad=True)

In [11]:
# let us declare the equation
y = x**2 + 2*x + 2

In [13]:
y.backward() # use autodiff (backward pass)

In [15]:
x.grad

tensor(6.)

### Autodiff with Tensorflow

In [20]:
import tensorflow as tf

In [21]:
x_tf = tf.Variable(2.0)
x_tf

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>

In [24]:
with tf.GradientTape() as t:
    t.watch(x_tf) # track forward pass
    y_tf = x_tf**2 + 2*x_tf + 2

In [26]:
t.gradient(y_tf, x_tf) # use autodiff

<tf.Tensor: shape=(), dtype=float32, numpy=6.0>

## 2.

Use the _Regression in PyTorch_ notebook to simulate a new linear relationship between y and x, and then fit parameters _m_ and _b_.

In [31]:
m = torch.tensor([0.9]).requires_grad_()
m

tensor([0.9000], requires_grad=True)

In [33]:
b = torch.tensor([0.1]).requires_grad_()
b

tensor([0.1000], requires_grad=True)

y = $x^{2} + 2x + 2$

In [38]:
def regression(my_x, my_m, my_b):
    return my_x**2 + my_m*my_x + my_b

### Machine Learning

In four easy steps :)

**Step 1**: Forward pass

In [41]:
yhat = regression(x, m, b)
yhat

tensor([5.9000], grad_fn=<AddBackward0>)

**Step 2**: Compare $\hat{y}$ with true $y$ to calculate cost $C$

There is a PyTorch `MSELoss` method, but let's define it outselves to see how it works. MSE cost is defined by: $$C = \frac{1}{n} \sum_{i=1}^n (\hat{y_i}-y_i)^2 $$

In [45]:
y = x**2 + 2*x + 2 + torch.normal(mean=torch.zeros(8), std=0.2)

def mse(my_yhat, my_y):
    sigma = torch.sum((my_yhat - my_y)**2)
    return sigma/len(my_y)

In [47]:
C = mse(yhat, y)
C

tensor(16.5845, grad_fn=<DivBackward0>)

**Step 3**: Use autodiff to calculate gradient $C$ w.r.t. parameters

In [52]:
C.backward()

In [54]:
m.grad

tensor([-16.2755])

In [56]:
b.grad

tensor([-8.1378])

**Step 4**: Gradient descent

In [59]:
optimizer = torch.optim.SGD([m, b], lr=0.01)

In [61]:
optimizer.step()

Confirm parameters have been adjusted sensibly:

In [66]:
m

tensor([1.0628], requires_grad=True)

In [68]:
b

tensor([0.1814], requires_grad=True)

We can repeat steps 1 and 2 to confirm cost has decreased:

In [71]:
C = mse(regression(x, m, b), y)
C

tensor(13.4389, grad_fn=<DivBackward0>)

Put the 4 steps in a loop to iteratively minimize cost toward zero:

In [78]:
y = x**2 + 2*x + 2 + torch.normal(mean=torch.zeros(8), std=0.2)

epochs = 1000

for epoch in range(epochs):

    optimizer.zero_grad() # reset gradients to zero; else they accumulate

    yhat = regression(x, m, b) # Step 1
    C = mse(yhat, y) # Step 2

    C.backward(retain_graph=True) # Step 3
    optimizer.step() # Step 4

    print('Epoch {}, cost {}, m grad {}, b grad {}'.format(epoch, '%.3g' % C.item(), '%.3g' % m.grad.item(), '%.3g' % b.grad.item()))

Epoch 0, cost 11.3, m grad -13.4, b grad -6.71
Epoch 1, cost 9.13, m grad -12.1, b grad -6.04
Epoch 2, cost 7.39, m grad -10.9, b grad -5.43
Epoch 3, cost 5.99, m grad -9.78, b grad -4.89
Epoch 4, cost 4.86, m grad -8.8, b grad -4.4
Epoch 5, cost 3.94, m grad -7.92, b grad -3.96
Epoch 6, cost 3.19, m grad -7.13, b grad -3.56
Epoch 7, cost 2.59, m grad -6.42, b grad -3.21
Epoch 8, cost 2.1, m grad -5.77, b grad -2.89
Epoch 9, cost 1.7, m grad -5.2, b grad -2.6
Epoch 10, cost 1.38, m grad -4.68, b grad -2.34
Epoch 11, cost 1.12, m grad -4.21, b grad -2.1
Epoch 12, cost 0.912, m grad -3.79, b grad -1.89
Epoch 13, cost 0.742, m grad -3.41, b grad -1.71
Epoch 14, cost 0.604, m grad -3.07, b grad -1.53
Epoch 15, cost 0.492, m grad -2.76, b grad -1.38
Epoch 16, cost 0.401, m grad -2.49, b grad -1.24
Epoch 17, cost 0.328, m grad -2.24, b grad -1.12
Epoch 18, cost 0.268, m grad -2.01, b grad -1.01
Epoch 19, cost 0.22, m grad -1.81, b grad -0.906
Epoch 20, cost 0.181, m grad -1.63, b grad -0.815

In [82]:
m.item()

2.5520095825195312

In [84]:
b.item()

0.926004946231842

## 3.
Read about how _differential programming_, wherein computer programs can be differentiated, could be common soon (perhaps thanks to _quantum ML; see pennylane.ai)