<a href="https://colab.research.google.com/github/pphos/pytorch_tutorial/blob/master/two_layer_net_autograd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pytorch: Tensors and autograd
手動で逆伝搬を実装することは, 小規模な2層ネットワークでは大したことではありませんが,
大規模で複雑なネットワークでは非常に困難になります.

ありがたいことに, ニューラルネットワークの逆伝搬の計算を自動化するために, Pytorchでは自動微分パッケージ(autograd)が利用できます.
autogradを使うと, ネットワークの順伝搬は計算グラフを定義します.
グラフのノードはTensorであり, エッジは入力Tensorから出力Tensorを生成する関数です.
このグラフを出力から入力に向かって辿っていくことで, 勾配を簡単に計算することができます.

このことは複雑に見えますが, 実際に利用するのは非常に簡単です.
各Tensorは, 計算グラフのノードを表します.
`x`が`x.requires_grad = True`をもつTensorである場合,
`x.grad`は, あるスカラー値に対する`x`の勾配を保持する別のTensorです.

ここでは, Pytorch Tensorsとautogradを使って2層ネットワークを実装します.
これで, ネットワークを介して逆伝搬を手動で実装する必要がなくなります.




In [3]:
import torch

dtype = torch.float
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out, is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y using operations on Tensors;
  # these are exactly the same operations we used to compute the forward pass
  # using Tensors, but we do not need to keep references to intermidiate values
  # since we are not implementing the backward pass by hand.
  y_pred = x.mm(w1).clamp(min=0).mm(w2)

  # Compute and print loss using operations on Tensors.
  # Now loss is a Tensor of shape (1,)
  # loss.item() gets the scalar value held in the loss.
  loss = (y_pred - y).pow(2).sum()
  if t % 100 == 99:
    print(t, loss.item())

  # Use autograd to compute the backward pass. This call will compute the
  # gradient of loss with respect to all Tensors with requires_grad=True.
  # After this call w1.grad and w2.grad will be Tensors holding the gradient
  # of the loss with respect to w1 and w2 respectively
  loss.backward()

  # Manually update weights using gradinet descent. Wrap in torch.no_grad()
  # because weights have requires_grad=True, but we don't need to track this
  # in autograd.
  # An alternative way is to operate on weight.data and weight.grad.data.
  # Recall that tensor.data gives a tensor that shares the storage with tensor,
  # but dosen't track history.
  # You can alse use torch.optim.SGD to achieve this.
  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # Manually zero the gradients after updating weights
    w1.grad.zero_()
    w2.grad.zero_()

99 1054.7764892578125
199 13.261177062988281
299 0.24380940198898315
399 0.005378917790949345
499 0.0003115542058367282
