<a href="https://colab.research.google.com/github/iliasprc/pytorch-tutorials/blob/master/chapter1/2_autograd_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [175]:
#%matplotlib inline

# Autograd: Automatic Differentiation
===================================

The main advantage of PyTorch is the ``autograd`` package.
Let’s first briefly describe this and then we will learn how to train our
first neural network.


The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

Let us see this in more simple terms with some examples.

Tensor
--------

``torch.Tensor`` is the main class of the package. If you set its attribute
``.requires_grad`` as ``True``, it starts to track all tensor's operations. When
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

To stop a tensor from tracking its history, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked.

To prevent tracking history (and using memory), you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with `requires_grad=True`,
but for which we don't need the gradients.

There’s one more class which is very important for autograd
implementation - a ``Function``.

``Tensor`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Tensor`` (except for Tensors created by the user - their
``grad_fn is None``).

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``,
however if it has more elements, you need to specify a ``gradient``
argument that is a tensor of matching shape.



In [176]:
import torch
torch.manual_seed(0)

<torch._C.Generator at 0x7f5018d7ad80>

Create a tensor and set `requires_grad=True` to track computation with it



In [177]:
x = torch.ones(2, 2, requires_grad=True)
x

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Then, do an operation with the created tensor:

In [178]:

y = x + x
y

tensor([[2., 2.],
        [2., 2.]], grad_fn=<AddBackward0>)

``y`` was created as a result of an operation, so it has a ``grad_fn`` attribute that references a function .



In [179]:
y.grad_fn

<AddBackward0 at 0x7f4f8a851978>

You can do more operations on y and the tensors will still track the history
of those operations:


In [180]:
z = y * y * 3
z = z.sum()
out = torch.log(z)
z
out

tensor(3.8712, grad_fn=<LogBackward>)

``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.




Gradients
---------
Let's backpropagate now.
Because ``out`` contains a single scalar, ``out.backward()`` is
equivalent to ``out.backward(torch.tensor(1))``.



In [181]:
out.backward()

Now let's print the gradients $\frac{d(out)}{dx}$


In [182]:
x.grad

tensor([[0.5000, 0.5000],
        [0.5000, 0.5000]])

You should have got a matrix of ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.
We have that $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.
Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.


In [183]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
a.requires_grad
a.requires_grad_(True)
a.requires_grad
b = (a * a).sum()
b.grad_fn



<SumBackward0 at 0x7f4f805774a8>

In [184]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

y

tensor([-1110.5509, -1432.1617,   413.0272], grad_fn=<MulBackward0>)

You can even calculate your own gradients and call `backward()` as:

In [185]:
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)
x.grad

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])

If you no longer want autograd to track the history of tensor's  operations
with ``.requires_grad=True`` , wrap the code block with the following command
``with torch.no_grad()``:



In [186]:
x.requires_grad
(x ** 2).requires_grad

with torch.no_grad():
	(x ** 2).requires_grad

We will calculate the gradients of the following equations

![derivatives](./images/chapter1_autograd.png)

In [187]:

a = torch.tensor([1.], requires_grad=True)
b = torch.tensor([2.], requires_grad=True)
c = torch.tensor([3.], requires_grad=True)

a.requires_grad=True
b.requires_grad=True
c.requires_grad=True

y = b*c
u = y+a
J = (u*u).sum()

J.backward()

for i in [y,u,a,b,c]:
    print(i.grad)


None
None
tensor([14.])
tensor([42.])
tensor([28.])




Let's see more details about the computational graph  in the following code

In [188]:
x = torch.tensor(1.0, requires_grad = True)
y = torch.tensor(2.0, requires_grad = True)
z = x * y
# Displaying
z.backward()
for i, name in zip([x, y, z], "xyz"):
    print(f"{name}\ndata: {i.data}\nrequires_grad: {i.requires_grad}\n\
grad: {i.grad}\ngrad_fn: {i.grad_fn}\nis_leaf: {i.is_leaf}\n")

x
data: 1.0
requires_grad: True
grad: 2.0
grad_fn: None
is_leaf: True

y
data: 2.0
requires_grad: True
grad: 1.0
grad_fn: None
is_leaf: True

z
data: 2.0
requires_grad: True
grad: None
grad_fn: <MulBackward0 object at 0x7f4f63a6a470>
is_leaf: False



  


You can do many crazy things with autograd!

Let's visualize the graph of a simple Linear Model. We used Tensorboard to visualize the following graph of a simple linear model

![](./images/autograd_chapter2_linear_graph.png)

In [189]:



class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.w = torch.nn.Parameter(torch.randn(1))

    def forward(self, x):

        out = (self.w * x)**2
        return out

class Y(torch.nn.Module):
    def __init__(self):
        super(Y, self).__init__()
        self.y = torch.nn.Linear(3,3)

    def forward(self, x):

        out = self.y(x)
        return out
x = torch.randn(3, requires_grad=True)
#model = Y()
m = Model()

# writer = SummaryWriter()
# writer.add_graph(m,x)


**Read Later:**
For more information read the documentation of ``autograd`` and ``Function`` is at
- https://pytorch.org/docs/autograd
- https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/

