In [4]:
%matplotlib inline


Autograd: Automatic Differentiation
===================================

Central to all neural networks in PyTorch is the ``autograd`` package.
Let’s first briefly visit this, and we will then go to training our
first neural network.

파이토치의 모든 뉴럴 넷의 핵심은 Autograd 패키지입니다. 간단히 살펴 보고, 첫번째 뉴럴넷을 학습시키러 가겠습니다.


The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

Autograd 패키지는 텐서 연산에 자동 미분 기능을 제공합니다. 이는 define-by-run 프레임웍입니다.(역전파가 당신의 코드가 어떻게 작동하는지에 따라 정의되고, 매번 이터레이션이 다를 수 있음을 의미) 더 쉬운 용어와 예시로 살펴보죠.

Let us see this in more simple terms with some examples.

Tensor
--------

``torch.Tensor`` is the central class of the package. If you set its attribute
``.requires_grad`` as ``True``, it starts to track all operations on it. When
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

Torch.tensor는 패키지의 핵심 클래스입니다. 
이 클래스의 어트리뷰트 .requires_grad를 True로 둔다면, 모든 연산을 추적합니다. 
연산을 모두 끝내면 .backward()를 호출해서 자동으로 모든 기울기(그래디언트)를 구할 수 있습니다. 
이 텐서의 기울기는 .grad 어트리뷰트에 쌓입니다.


To stop a tensor from tracking history, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked.

텐서가 기록을 추적하는 것을 멈추려면, .detach()를 사용하면 됩니다. 
이는 미래 연산을 추적하는 것도 막을 수 있습니다.

To prevent tracking history (and using memory), you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with
``requires_grad=True``, but for which we don't need the gradients.

기록을 추적하는 것을 막으려면(그리고 메모리를 사용하는 것을 막으려면) 코드 블록을 with torch.no_grad()로 감싸주어도 됩니다. 
이는 특히 모델을 평가할 때 유용한데, 모델이 학습 가능한 파라미터(기울기 값이 필요 없는)들을 requires_grad=True로 가지고 있을 수도 있기 때문입니다. 


There’s one more class which is very important for autograd
implementation - a ``Function``.
Autograd 활용에 아주 중요한 클래스가 하나 더 있습니다. "function"입니다.

``Tensor`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Tensor`` (except for Tensors created by the user - their
``grad_fn is None``).

텐서와 function은 상호 연결되어있고 연산의 완전한 기록을 인코드하는 비순환적인 구조를 이룹니다. 
각 텐서는 .grad_fn 어트리뷰트를 가지고 있는데, 이는 해당 텐서를 생성하는 데 사용한 function을 참조합니다. 
(유저에 의해 생성된 텐서는 제외하구요. 그 텐서들의 grad_fn은 None입니다.)

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``,
however if it has more elements, you need to specify a ``gradient``
argument that is a tensor of matching shape.

만약 도함수를 계산하고 싶으면, backward()를 텐서에서 호출하면 됩니다.
만약 텐서가 스칼라값이면(즉 한 원소 데이터만을 가지고 있다면) backward()에 어느 인자도 특정할 필요 없지만, 
더 많은 원소를 가지고 있으면 그에 맞는 텐서 형태의 기울기 인자를 특정해 주어야합니다.



In [5]:
import torch

Create a tensor and set ``requires_grad=True`` to track computation with it



In [19]:
#텐서 생성, requires_grad=True 설정해서 연산을 추적

x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


Do a tensor operation:



tensor([[3., 3.],
        [3., 3.]])


In [20]:
# 텐서 연산
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


``y`` was created as a result of an operation, so it has a ``grad_fn``.



In [21]:
#텐서를 만들 때 사용한 함수를 출력
print(y.grad_fn)

<AddBackward0 object at 0x0000015F3D8D45F8>


Do more operations on ``y``



In [22]:
z = y * y * 3
out = z.mean()

print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.



In [None]:
#비교 
aa = torch.ones(2, 2, requires_grad=False)
a=aa+2
print(a)

In [23]:
#.requires_grad_(True/False)로 함수 추적 

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)

a.requires_grad_(True)
print(a.requires_grad)

b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x0000015F3D8C6DA0>


Gradients
---------
Let's backprop now.
Because ``out`` contains a single scalar, ``out.backward()`` is
equivalent to ``out.backward(torch.tensor(1.))``.



In [24]:
#out이 스칼라 하나만을 가지고 있기 때문에, out.backward()는 out.backward(torch.tensor(1.))와 같은 결과

out.backward()

In [25]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


You should have got a matrix of ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.
We have that $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.
Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



Mathematically, if you have a vector valued function $\vec{y}=f(\vec{x})$,
then the gradient of $\vec{y}$ with respect to $\vec{x}$
is a Jacobian matrix:

\begin{align}J=\left(\begin{array}{ccc}
   \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
   \vdots & \ddots & \vdots\\
   \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
   \end{array}\right)\end{align}

Generally speaking, ``torch.autograd`` is an engine for computing
vector-Jacobian product. That is, given any vector
$v=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T}$,
compute the product $v^{T}\cdot J$. If $v$ happens to be
the gradient of a scalar function $l=g\left(\vec{y}\right)$,
that is,
$v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}$,
then by the chain rule, the vector-Jacobian product would be the
gradient of $l$ with respect to $\vec{x}$:

\begin{align}J^{T}\cdot v=\left(\begin{array}{ccc}
   \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
   \vdots & \ddots & \vdots\\
   \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
   \end{array}\right)\left(\begin{array}{c}
   \frac{\partial l}{\partial y_{1}}\\
   \vdots\\
   \frac{\partial l}{\partial y_{m}}
   \end{array}\right)=\left(\begin{array}{c}
   \frac{\partial l}{\partial x_{1}}\\
   \vdots\\
   \frac{\partial l}{\partial x_{n}}
   \end{array}\right)\end{align}

(Note that $v^{T}\cdot J$ gives a row vector which can be
treated as a column vector by taking $J^{T}\cdot v$.)

This characteristic of vector-Jacobian product makes it very
convenient to feed external gradients into a model that has
non-scalar output.



Now let's take a look at an example of vector-Jacobian product:



In [34]:
#y가 스칼라일 때 벡터 야코비안 프로덕트의 예시

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

tensor([-1154.3105,   195.3638,  -919.3580], grad_fn=<MulBackward0>)


Now in this case ``y`` is no longer a scalar. ``torch.autograd``
could not compute the full Jacobian directly, but if we just
want the vector-Jacobian product, simply pass the vector to
``backward`` as argument:



In [35]:
#전체 야코비안 식을 모두 계산할 수 없지만, 단순히 벡터 야코비안 프로덕트가 필요하므로 backward에 벡터를 넣어주기만 하면 됨

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])


You can also stop autograd from tracking history on Tensors
with ``.requires_grad=True`` by wrapping the code block in
``with torch.no_grad():``



In [12]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
	print((x ** 2).requires_grad)

True
True
False


**Read Later:**

Document about ``autograd.Function`` is at
https://pytorch.org/docs/stable/autograd.html#function

