__content:__
1. [Autograd in PyTorch](#AutogradPyTorch)
2. [Using Autograd for Polynomial Regression](#AutogradPolynomialRegression)
3. [Using Autograd to Solve a Math Puzzle](#AutogradSolvePuzzle)

# <h2 style="color: blue;"> 1.Autograd in PyTorch <a id='AutogradPyTorch'></a></h2>

<font color='#808080'>
    
In PyTorch, you can create tensors as variables or constants and build an expression with
them. The expression is essentially a function of the variable tensors. Therefore, you may
derive its derivative function, i.e., the differentiation or the gradient. In PyTorch, you can create a __constant matrix__ as follows:

In [1]:
import torch
x = torch.tensor([1, 2, 3])
print(x)
print(x.shape)
print(x.dtype)

tensor([1, 2, 3])
torch.Size([3])
torch.int64


<font color='#808080'>

this tensor is not assumed to be a variable for a function in the sense that differentiation
with it is not supported. You can create tensors that work like a __variable__ with an extra option:

In [2]:
x = torch.tensor([1., 2., 3.], requires_grad = True)
print(x)
print(x.shape)
print(x.dtype)

tensor([1., 2., 3.], requires_grad=True)
torch.Size([3])
torch.float32


In [3]:
x = torch.tensor(3.6, requires_grad = True)
y = x * x
y.backward()
print("x =", x)
print("y =", y)
print("x.grad =", x.grad)

x = tensor(3.6000, requires_grad=True)
y = tensor(12.9600, grad_fn=<MulBackward0>)
x.grad = tensor(7.2000)


<font color='#808080'>

What it does is the following: This defined a variable $x$ (with value $3.6$) and then computed
$y = x \times x$ or $y = x ^ {2}$. Then you ask for the differentiation of $y$. Since $y$ obtained its value from $x$, you can find the derivative $\frac {dy}{dx}$ at __x.grad__, in the form of a tensor, immediately after you run __y.backward()__. You know $y = x^{2}$ means $y^{'} = 2x$. Hence the output would give you a value of $3.6 \times 2 = 7.2$.

# <h2 style="color: blue;"> 2.Using Autograd for Polynomial Regression <a id='AutogradPolynomialRegression'></a></h2>

<font color='#808080'>

Let’s consider an example. You can build a polynomial $f(x) = x^{2} + 2x + 3$ in __NumPy__ as
follows:

In [4]:
import numpy as np
polynomial = np.poly1d([1, 2, 3])
print(polynomial)

   2
1 x + 2 x + 3


<font color='#808080'>

You may use the polynomial as a function, such as:
```python
print(polynomial(1.5))
```
And this prints $8.25$, for $(1.5)^{2} + 2 \times (1.5) + 3 = 8.25$.
Now you can generate a number of samples from this function using __NumPy__:(remember that both $X$ and $Y$ are NumPy arrays of the shape $(20,1)$, and they are related as $y = f(x)$ for the polynomial $f(x)$.)


In [6]:
print(polynomial(1.5))
N = 20 # number of samples
# Generate random samples roughly between -10 to +10
X = np.random.randn(N,1) * 5
Y = polynomial(X)
print(Y)

8.25
[[  2.19214872]
 [ 13.96736855]
 [ 25.16547915]
 [  3.83473191]
 [ 35.09627477]
 [  2.03643278]
 [  3.56013023]
 [ 85.10614604]
 [  3.85798294]
 [118.68877768]
 [151.35571859]
 [  3.19670162]
 [  5.01118283]
 [  7.27846113]
 [  3.32798729]
 [  8.4853999 ]
 [ 13.79119342]
 [  8.55030693]
 [ 16.8655217 ]
 [  4.38718173]]


<font color='#808080'>

Now, assume you do not know what the polynomial is except it is quadratic. And you
want to recover the coefficients. Since a quadratic polynomial is in the form of $Ax^{2} + Bx + C$, you have three unknowns to find. You can find them using the gradient descent algorithm
you implement or an existing gradient descent optimizer. The following demonstrates how it
works:

In [7]:
# Assume samples X and Y are prepared elsewhere
XX = np.hstack([X*X, X, np.ones_like(X)])
w = torch.randn(3, 1, requires_grad = True) # the 3 coefficients
x = torch.tensor(XX, dtype = torch.float32) # input sample
y = torch.tensor(Y, dtype = torch.float32) # output sample
optimizer = torch.optim.NAdam([w], lr = 0.01)
print(w)
for _ in range(1000):
    y_pred = x @ w
    mse = torch.mean(torch.square(y - y_pred))
    optimizer.zero_grad()
    mse.backward()
    optimizer.step()
print(w)

tensor([[-1.1114],
        [-1.0434],
        [-0.5185]], requires_grad=True)
tensor([[1.0111],
        [1.9969],
        [2.0004]], requires_grad=True)


<font color='#808080'>

What the above code does is the following:
1. First, it creates a variable vector $w$ of $3$ values, (the coefficients $A,B,C$).
2. you create an array of shape $(N, 3)$,($N$ is the number of samples in the array $X$). This array has 3 columns: the values of $x^{2}$, $x$, and $1$,respectively(using the __np.hstack()__).
3. you build the tensor $y$ from the NumPy array $Y$.
4. you use a for loop to run the gradient descent in 1,000 iterations. In each
iteration, you compute $x \times w$ in matrix form to find $Ax^2 + Bx + C$ and assign it to the
variable $y_{pred}$. Then, compare $y$ and $y_{pred}$ and find the mean square error.
5. derive the gradient using the __backward()__ function. And based on this gradient, you use gradient descent to update w via the optimizer.

In [8]:
polynomial = np.poly1d([1, 2, 3])
N = 20 # number of samples
# Generate random samples roughly between -10 to +10
X = np.random.randn(N,1) * 5
Y = polynomial(X)
# Prepare input as an array of shape (N,3)
XX = np.hstack([X*X, X, np.ones_like(X)])
# Prepare tensors
w = torch.randn(3, 1, requires_grad = True) # the 3 coefficients
x = torch.tensor(XX, dtype = torch.float32) # input sample
y = torch.tensor(Y, dtype = torch.float32) # output sample
optimizer = torch.optim.NAdam([w], lr = 0.01)
print(w)
# Run optimizer
for _ in range(1000):
    optimizer.zero_grad()
    y_pred = x @ w
    mse = torch.mean(torch.square(y - y_pred))
    mse.backward()
    optimizer.step()
print(w)

tensor([[-0.6290],
        [-0.1884],
        [-0.3679]], requires_grad=True)
tensor([[1.0173],
        [1.9345],
        [1.7159]], requires_grad=True)


# <h2 style="color: blue;"> 3.Using Autograd to Solve a Math Puzzle <a id='AutogradSolvePuzzle'></a></h2>

<font color='#808080'>

You may use gradient descent to solve some math puzzles as well. For example, the following
problem:

$$
\begin{array}{ccc}
   A & + & B & = & 9\\
   \underset{+}{} & & \underset{-}{} \\
   C & - & D & = & 1 \\
   \underset{=}{} & & \underset{=}{} \\
   8 &   & 2
\end{array}
$$

In other words, to find the values of $A,B,C,D$ such that:
$$
\begin{array}{ccc}
  A + B & = 9 \\
  C - D & = 1 \\
  A + C & = 8 \\
  B - D & = 2
\end{array}
$$

This can also be solved using autograd, as follows:


In [10]:
import random
A = torch.tensor(random.random(), requires_grad = True)
B = torch.tensor(random.random(), requires_grad = True)
C = torch.tensor(random.random(), requires_grad = True)
D = torch.tensor(random.random(), requires_grad = True)
# Gradient descent loop
EPOCHS = 2000
optimizer = torch.optim.NAdam([A, B, C, D], lr = 0.01)
for _ in range(EPOCHS):
    y1 = A + B - 9
    y2 = C - D - 1
    y3 = A + C - 8
    y4 = B - D - 2
    sqerr = y1*y1 + y2*y2 + y3*y3 + y4*y4
    optimizer.zero_grad()
    sqerr.backward()
    optimizer.step()
print(A)
print(B)
print(C)
print(D)

tensor(4.5062, requires_grad=True)
tensor(4.4938, requires_grad=True)
tensor(3.4938, requires_grad=True)
tensor(2.4938, requires_grad=True)
