In [20]:
import numpy as np
import torch
%matplotlib inline

This tutorial introduces the fundamental concepts of PyTorch through self-contained examples.

At its core, PyTorch provides two main features:
- An n-dimensional Tensor, similar to numpy but can run on GPUs
- Automatic differentiation for building and training neural networks

We will use a fully-connected ReLU network as our running example. 
The network will have a single hidden layer, and will be trained with gradient descent to fit random data by minimizing the Euclidean distance between the network output and the true output.


## 1. Warm-up: numpy

- Numpy provides an n-dimensional array object, and many functions for manipulating these arrays. 
- A generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients. 
- However we can easily use numpy to fit a two-layer network to random data by manually implementing the forward and backward passes through the network using numpy operations:

In [21]:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 36257585.86723656
1 33379428.55762464
2 34428803.06119087
3 32463640.277820274
4 25125743.8394095
5 15414618.370577587
6 8081114.030796533
7 4112245.7813627506
8 2298180.4564772155
9 1476755.2335939498
10 1070447.6294885646
11 838138.5978301254
12 685028.4066661634
13 573136.7953800916
14 486385.6835498083
15 416718.202435818
16 359489.8452623093
17 311799.93454804947
18 271696.0055473017
19 237833.09931187387
20 209008.42001760064
21 184369.86175201274
22 163142.96811882246
23 144758.55074908215
24 128795.70811793461
25 114922.84393663735
26 102866.14400655165
27 92282.00023479118
28 82953.97456836095
29 74719.41412281973
30 67442.84320266568
31 60977.69963288645
32 55220.734765240144
33 50082.83617556408
34 45489.99411772998
35 41372.65468317745
36 37677.55785619431
37 34354.811566548145
38 31361.981923591877
39 28662.24607461755
40 26222.141815304647
41 24014.307253856008
42 22012.861114401454
43 20197.63667632411
44 18548.047550726107
45 17047.68153075432
46 15681.152785891481
47

481 3.093346440305878e-07
482 2.940814585930383e-07
483 2.7958469187931606e-07
484 2.658057107904795e-07
485 2.5270949373624803e-07
486 2.402626665217084e-07
487 2.284312207276248e-07
488 2.1718515514773917e-07
489 2.0649592305381715e-07
490 1.9633530907856786e-07
491 1.8667763828832372e-07
492 1.7749686201122847e-07
493 1.6876988037209364e-07
494 1.604739822755514e-07
495 1.5258808451356005e-07
496 1.4509236017171335e-07
497 1.3796589760455108e-07
498 1.3119117779112094e-07
499 1.2475091904021305e-07



## 2. PyTorch: Tensors

- Numpy is a great framework, but it cannot utilize GPUs to accelerate its numerical computations. 
- For modern deep neural networks, GPUs often provide speedups of 50x or greater, so unfortunately numpy won’t be enough for modern deep learning.
- Here we introduce the most fundamental PyTorch concept: the __Tensor__. 
    - conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. 
    - can keep track of a computational graph and gradients, but they’re also useful as a generic tool for scientific computing.
    - can utilize GPUs to accelerate their numeric computations. To run a PyTorch Tensor on GPU, you simply need to cast it to a new datatype.

- Here we use PyTorch Tensors to fit a two-layer network to random data. Like the numpy example above we need to manually implement the forward and backward passes through the network:

In [23]:
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mm(w1) # mm : matrix multiplication
    h_relu = h.clamp(min=0) # clamp : 텐서의 각 요소에 min/max 범위로 하한/상한
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 26291316.0
1 20046144.0
2 18446274.0
3 18364670.0
4 18040652.0
5 16543573.0
6 13686749.0
7 10233446.0
8 7004154.5
9 4558929.0
10 2909092.5
11 1883319.5
12 1262191.875
13 888165.0
14 657751.5
15 510526.125
16 411842.1875
17 342226.3125
18 290639.25
19 250713.625
20 218681.765625
21 192304.5625
22 170165.203125
23 151320.0625
24 135111.875
25 121026.046875
26 108702.3671875
27 97886.0703125
28 88338.7890625
29 79873.75
30 72349.7890625
31 65645.4140625
32 59653.9453125
33 54286.7421875
34 49469.9765625
35 45144.02734375
36 41246.74609375
37 37729.7734375
38 34552.765625
39 31676.376953125
40 29068.48828125
41 26700.82421875
42 24548.603515625
43 22588.978515625
44 20802.798828125
45 19175.6796875
46 17692.48828125
47 16336.908203125
48 15095.9833984375
49 13959.0947265625
50 12915.7509765625
51 11957.70703125
52 11077.53125
53 10267.7998046875
54 9522.421875
55 8835.83203125
56 8202.921875
57 7619.31787109375
58 7080.8154296875
59 6583.52734375
60 6123.86474609375
61 5698.791015625
62 