# Neural Networks II

### PyTorch Tensor objects for storing and updating model parameters


A special tensor object for which gradients need to be computed allows us to store and update the parameters of our models during training. Such a tensor can be created by just assigning "requires_grad" to True on user-specified initial values. Note that as of now, only tensors of floating point and complex dtype can require gradients. In the following code.

In [1]:
import torch

a = torch.tensor(3.14, requires_grad=True)
b = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) 
print(a)
print(b)

tensor(3.1400, requires_grad=True)
tensor([1., 2., 3.], requires_grad=True)


Notice that "requires_grad" is set to False by default. This value can be efficiently set to True by running "requires_grad_()".

In [2]:
w = torch.tensor([1.0, 2.0, 3.0])
print(w.requires_grad)
w.requires_grad_()
print(w.requires_grad)

False
True


## Computing gradients via automatic differentiation and GradientTape


### Computing the gradients of the loss with respect to variables
PyTorch supports automatic differentiation, which can be thought of as an implementation of the chain rule for computing gradients of nested functions. To compute these gradients, we can call the "backward" method from the "torch.autograd" module. It computes the sum of gradients of the given tensor with regard to leaf nodes (terminal nodes) in the graph.

In [3]:
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.5, requires_grad=True) 

x = torch.tensor([1.4])
y = torch.tensor([2.1])


z = torch.add(torch.mul(w, x), b) # forward pass
 
loss = (y-z).pow(2).sum() # the sum is for multiple output and redundant here
loss.backward()

print('dL/dw : ', w.grad)
print('dL/db : ', b.grad)

# verifying the computed gradient dL/dw
print(2 * x * ((w * x + b) - y))

dL/dw :  tensor(-0.5600)
dL/db :  tensor(-0.4000)
tensor([-0.5600], grad_fn=<MulBackward0>)


### Understanding automatic differentiation
Automatic differentiation represents a set of computational techniques for computing gradients of arbitrary arithmetic operations. During this process, gradients of a computation (expressed as a series of operations) are obtained by accumulating the gradients through repeated applications of the chain rule. To better understand the concept behind automatic differentiation, let’s consider a series of nested computations, $y = f(g(h(x)))$, with input $x$ and output $y$. This can be broken into a series of steps:
\begin{align*}
u_0 &= x \\
u_1 &= h(x) \\
u_2 &= g(u_1) \\
u_3 &= f(u_2) = y
\end{align*}
Hence,
\begin{align*}
\frac{dy}{dx} = \frac{dy}{du_2} \times \frac{du_2}{du_1} \times \frac{du_1}{dx}
\end{align*}

### Adversarial examples
Computing gradients of the loss with respect to the input example is used for generating adversarial examples (or adversarial attacks). In computer vision, adversarial examples are examples that are generated by adding some small, imperceptible noise (or perturbations) to the input example, which results in a deep NN misclassifying them. Covering adversarial examples is beyond the scope of this book, but if you are interested, you can find the original paper by Christian Szegedy et al., Intriguing properties of neural networks at https://arxiv.org/pdf/1312.6199.pdf.

## Simplifying implementations of common architectures via the "torch.nn" module
We have already seen some examples of building a feedforward NN model (for instance, a multilayer perceptron) and defining a sequence of layers using the "nn.Module" class. Before we take a deeper dive into "nn.Module", let’s briefly look at another approach for conjuring those layers via "nn.Sequential".

### Implementing models based on "nn.Sequential"
With nn.Sequential (https://pytorch.org/docs/master/generated/torch.nn.Sequential.html#sequential), the layers stored inside the model are connected in a cascaded way. In the following example, we will build a model with two densely (fully) connected layers:

In [5]:
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(4, 16),
    nn.ReLU(),
    nn.Linear(16, 32),
    nn.ReLU()
)

model

Sequential(
  (0): Linear(in_features=4, out_features=16, bias=True)
  (1): ReLU()
  (2): Linear(in_features=16, out_features=32, bias=True)
  (3): ReLU()
)

### Choosing a loss function