<a href="https://colab.research.google.com/github/iliasprc/pytorch-tutorials/blob/master/chapter1/2_autograd_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
%matplotlib inline

# Autograd: Automatic Differentiation
===================================

The main advantage of PyTorch framework is the ``autograd`` package.
Let’s  briefly describe this and then we will learn how to train our
first neural network in the following articles.
The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backpropagation is
defined by how your code is developed and ran, as well as that every single iteration can be
different.

Let's see this in more details with some examples.

Tensor
--------

``torch.Tensor`` is the main class of the package. If you set its attribute
``.requires_grad`` as ``True``, it starts to track all tensor's operations. When
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

To stop a tensor from tracking its history, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked. To prevent tracking history (and using memory), you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with `requires_grad=True`,
but for which we don't need the gradients. There’s one more class which is very important for autograd
implementation - a ``Function``.

``Tensor`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Tensor`` (except for Tensors created by the user - their
``grad_fn is None``).

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``.
However, if the tensor has more elements, you need to specify a ``gradient``
argument that is a tensor of the same shape to your network's output.



In [17]:
import torch
torch.manual_seed(0)

<torch._C.Generator at 0x7efdc93edd80>

Then, create a tensor and set `requires_grad=True` to track computation history.



In [18]:
x = torch.ones(2, 2, requires_grad=True)
x

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Now, let's do an operation with the created tensor:

In [19]:

y = 2 * x
y

tensor([[2., 2.],
        [2., 2.]], grad_fn=<MulBackward0>)

``y`` was created as a result of a multiplication, so it has a ``grad_fn`` attribute that references to a function .
The ``grad_fn`` will be an MulBackward0 object that confirms that operation


In [20]:
y.grad_fn

<MulBackward0 at 0x7efd31680ac8>

You can do more operations on ``y`` and the tensors will still track the history
of those operations:


In [21]:
z = 3 * y * y
out  = z.mean()
out

tensor(12., grad_fn=<MeanBackward0>)

``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.


In [22]:
x.requires_grad_(False)
x.requires_grad

False

Gradients
---------
Let's do a backpropagation now.
Because ``out`` contains a single scalar, ``out.backward()`` is
equivalent to ``out.backward(torch.tensor([value]))``.



In [23]:
out.backward()

Now let's print the gradients $\frac{d(out)}{dx}$


In [24]:
x.grad

You should have got a matrix filled with  ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.
We have that $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(2x_i)^2$ and $z_i\bigr\rvert_{x_i=1} = 12$.
Therefore,
$\frac{\partial o}{\partial x_i} = (6x_i)$, hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = 6 $.


In [25]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

y

tensor([  788.9900,  -150.2356, -1115.5402], grad_fn=<MulBackward0>)

You can even calculate your own gradients and call `backward()` as:

In [26]:

gradients = torch.randn([3], dtype=torch.float)
y.backward(gradients)
x.grad

tensor([ 291.0368, -555.2755, -716.0809])

If you no longer want autograd to track the history of tensor's  operations
with ``.requires_grad=True`` , wrap the code block with the following command
``with torch.no_grad()`` or use ``.detach()`` to remove the tensor from the computation graph:



In [27]:
x = torch.randn(3, requires_grad=True)
x.requires_grad
(x ** 2).requires_grad

with torch.no_grad():
	(x ** 2).requires_grad

x = x.detach()
(x ** 2).requires_grad

False

Now we are going  to calculate the gradients of the following equations on the image.
All partial derivatives have been calculated using the chain rule and are shown in the image.

![derivatives](https://github.com/iliasprc/pytorch-tutorials/chapter1/chapter1_autograd.png)
Now let's test with ``autograd`` if we calculated correctly all the  derivatives.

In [28]:

a = torch.tensor([1.], requires_grad=True)
b = torch.tensor([2.], requires_grad=True)
c = torch.tensor([3.], requires_grad=True)

y = b*c
u = y+a
J = (u*u).sum()

J.backward()

for i in [y,u,a,b,c]:
    print(i.grad)


None
None
tensor([14.])
tensor([42.])
tensor([28.])


  app.launch_new_instance()


Let's see more details about the computational graph  in the following code

In [29]:
x = torch.tensor(1.0, requires_grad = True)
y = torch.tensor(2.0, requires_grad = True)
z = x * y
# Displaying
z.backward()
for i, name in zip([x, y, z], "xyz"):
    print(f"{name}\ndata: {i.data}\nrequires_grad: {i.requires_grad}\n\
grad: {i.grad}\ngrad_fn: {i.grad_fn}\nis_leaf: {i.is_leaf}\n")

x
data: 1.0
requires_grad: True
grad: 2.0
grad_fn: None
is_leaf: True

y
data: 2.0
requires_grad: True
grad: 1.0
grad_fn: None
is_leaf: True

z
data: 2.0
requires_grad: True
grad: None
grad_fn: <MulBackward0 object at 0x7efd3161d7f0>
is_leaf: False



  


## Graph Visualization with Tensorboard


Let's visualize the graph of a simple Linear Model.
We used Tensorboard to visualize the following graph of a simple linear model.
The computation graph is shown in the following Figure.

![](https://github.com/iliasprc/pytorch-tutorials/chapter1/images/autograd_chapter2_linear_graph.png)


In [30]:
class Y(torch.nn.Module):
    def __init__(self):
        super(Y, self).__init__()
        self.y = torch.nn.Linear(3,3)

    def forward(self, x):

        out = self.y(x)
        return out
x = torch.randn(3, requires_grad=True)

m = Y()
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
writer.add_graph(m,x)


In this article, we described the main advantages of PyTorch's ``autograd``.

**Read Later:**
For more information read the documentation of ``autograd`` and ``Function`` is at
- https://pytorch.org/docs/autograd
- https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/

