<a href="https://colab.research.google.com/github/fpaludi/DeepLearningDatitos/blob/master/Unidad2_Notas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch

In [3]:
x = torch.arange(4.0, requires_grad=True)
x

tensor([0., 1., 2., 3.], requires_grad=True)

In [4]:
x.grad

$y = 2 \times x^2$


In [5]:
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

**we can automatically calculate the gradient of y with respect to each component of x] by calling the function for backpropagation and printing the gradient.**

In [6]:
y.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

$y' = 4 \times x$

# 

In [7]:
x.grad == 4 * x

tensor([True, True, True, True])

In [8]:
all(x.grad == 4 * x)

True

**Now let us calculate another function of `x`.**

In [9]:
x.grad.zero_()
y = x.sum()
y.backward()
x.grad

tensor([1., 1., 1., 1.])

## Backward for Non-Scalar Variables
Technically, when y is not a scalar, the most natural interpretation of the differentiation of a vector y with respect to a vector x is a matrix. For higher-order and higher-dimensional y and x, the differentiation result could be a high-order tensor.

However, while these more exotic objects do show up in advanced machine learning (including [in deep learning]), more often (when we are calling backward on a vector,) we are trying to calculate the derivatives of the loss functions for each constituent of a batch of training examples. Here, (our intent is) not to calculate the differentiation matrix but rather (the sum of the partial derivatives computed individually for each example) in the batch.

In [10]:
# Invoking `backward` on a non-scalar requires passing in a `gradient` argument
# which specifies the gradient of the differentiated function w.r.t `self`.
# In our case, we simply want to sum the partial derivatives, so passing
# in a gradient of ones is appropriate
x.grad.zero_()
y = x * x
# y.backward(torch.ones(len(x))) equivalent to the below
y.sum().backward()
x.grad

tensor([0., 2., 4., 6.])

## Computing the Gradient of Python Control Flow
One benefit of using automatic differentiation is that [even if] building the computational graph of (a function required passing through a maze of Python control flow) (e.g., conditionals, loops, and arbitrary function calls), (we can still calculate the gradient of the resulting variable.) In the following snippet, note that the number of iterations of the while loop and the evaluation of the if statement both depend on the value of the input a.

In [13]:
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c

In [14]:
a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()

In [15]:
a.grad == d / a

tensor(True)

In [18]:
a, a.grad, d

(tensor(0.4676, requires_grad=True),
 tensor(4096.),
 tensor(1915.1417, grad_fn=<MulBackward0>))