# Automatic Differentiation in PyTorch

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

## Linear Regression

Consider a regression task that has two-dimensional feature vectors as input.
The number of training instances is $N = 30$.
The true formula that describes the relationship between $x$ and $y$ is $y = 2x_1 - 1.5x_2 + 5$.
First, we make a toy dataset on this condition.

In PyTorch, every data is represented as `torch.Tensor` object.

In [None]:
N = 30

In [None]:
xs = torch.randn(N, 2)

In [None]:
xs

In [None]:
ys_true = xs @ torch.Tensor([2.0, -1.5]) + 5.0

Notice that we used **matrix multiplication operator** `@`, which is a new Python feature from Python 3.5.

In [None]:
ys_true

In [None]:
ys_true = ys_true.view(-1, 1)

For compatibility with the specification of loss functions we will see later, the `ys_true` should be a $30 \times 1$ matrix, rather than a simple $30$ dimensional vector.
`view()` method reshapes the tensor.
If you provide `-1` to `view()` method, it uses the original size for that dimension (in this case, `30`).

In [None]:
ys_true

### Apply the linear transformation model

The important thing you should notice is that the instantiation of `nn.Linear` involves **the instantiation of parameters w and b as nodes in a computational graph**. They are initialized at random in a default setting.
In this task, we need a linear function that maps two-dimensional vector to one-dimensional vector (scalar), so we create `Linear(2, 1)`.
https://pytorch.org/docs/master/nn.html#torch.nn.Linear

In [None]:
linear = nn.Linear(2, 1)

In [None]:
for param in linear.parameters():
    print(param.data)

You can compute the prediction of $y$ using the model that has the initial parameters in this way.

Note that **this code implicitly creates a node on the computational graph**.
`ys_pred` is not just a variable that stores an actual value of y.
It is a variable that represents a node in the computational graph, which remembers that there are edges from x, w and b to this node.

In [None]:
ys_pred = linear(xs)

In [None]:
ys_pred

### Apply a loss function

PyTorch also provides a lot of loss functions.
One of them is the Mean Square Error (`MSELoss`), which is usually used in regression tasks.
https://pytorch.org/docs/master/nn.html#torch.nn.MSELoss

In [None]:
mse = nn.MSELoss()
loss = mse(ys_pred, ys_true)

In [None]:
loss

### Automatic differentiation

You can calculate **the gradient of the loss function regarding the parameters on the computational graph** just by calling `backward()` function, which is equipped on `Tensor` object.

In [None]:
loss.backward()

That's it!
You can check the value of the gradient in the vicinity of the current parameters like this.

In [None]:
for param in linear.parameters():
    print(param.grad)

## Optimization of Linear Regression

We can apply the gradient descent algorithm to optimize the model parameters.
PyTorch also provides a lot of implementations of numerical optimizer including `SGD` (stochastic gradient descent, which is a generalization of gradient descent).
https://pytorch.org/docs/master/optim.html#torch.optim.SGD

In this code, we iterate `300` times to update the parameters. In each iteration, we do:
1. compute the prediction of $y$ as a node in the computational graph for all data by `linear(xs)`
2. compute the loss function as a node in the computational graph by `mse(ys_pred, ys_true)` (The step 1 and 2 complete the forward computation that is required to calculate the gradient in the back propagation)
3. reset gradient values to be zeros by calling `optimizer.zero_grad()`
4. call `backward()`
5. call the parameter updating method `optimizer.step()`

In [None]:
linear = nn.Linear(2, 1)
mse = nn.MSELoss()
optimizer = optim.SGD(linear.parameters(), lr = 0.1)
for epoch in range(300):
    ys_pred = linear(xs)
    loss = mse(ys_pred, ys_true)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Finally, we got the perfect estimation of the parameters.

In [None]:
for param in linear.parameters():
    print(param.data)

## Logistic Regression

Consider a classification task that has three-dimensional feature vectors $(x_1, x_2, x_3)$ as input.
The number of training instances is $N = 30$.
A data instance belonging to positive class tends to have a larger value for $x_1$ but a lower value for $x_2$.
A data instance belonging to negative class tends to have a lower value for $x_1$ but a higher value for $x_2$.
First, we make a toy dataset on this condition.

In [None]:
Np = 15
Nn = 15
N = Np + Nn
xps = torch.randn(Np, 3) + torch.Tensor([2.0, -2.0, 0.0])
xns = torch.randn(Nn, 3) + torch.Tensor([-2.0, 2.0, 0.0])
xs = torch.cat((xps, xns))

In [None]:
xs

In [None]:
ys_true = torch.cat((torch.ones(Np), torch.zeros(Nn)))

In [None]:
ys_true

In [None]:
ys_true = ys_true.view(-1, 1)

In [None]:
ys_true

### Model optimization

In this task, we need a linear function that maps three-dimensional vector to one-dimensional vector (scalar), so we create `Linear(3, 1)` that represents a linear model $\hat{y} = w_1x_1 + w_2x_2 + w_3x_3 + b$ which has four parameters $w_1, w_2, w_3$ and $b$.
**These parameters are defined and stored as objects that represents nodes in a computational graph inside the linear object**.

In [None]:
linear = nn.Linear(3, 1)

In [None]:
for param in linear.parameters():
    print(param.data)

There is a convenient loss function `BCEWithLogitsLoss` that first applies the sigmoid function and then compute the "negative log likelihood of Bernoulli distribution (NLLB)". Check https://pytorch.org/docs/master/nn.html#torch.nn.BCEWithLogitsLoss

In [None]:
bce_with_sigmoid = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(linear.parameters(), lr = 0.1)
for epoch in range(300):
    zs = linear(xs)
    loss = bce_with_sigmoid(zs, ys_true)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
for param in linear.parameters():
    print(param.data)

We can get the prediction of the classification for any data by using the trained logistic regression model as follows.

In [None]:
ys_pred = F.sigmoid(linear(xs))

In [None]:
ys_pred