# Getting Started

## Deep Learning with Pytorch: A 60 Minute Blitz
blitz: 闪电战

Our Goal in this part:

+ Understand Pytorch's Tensor library and neural networks at a high level
+ Train a small neural network to classify images

### What's is Pytorch?
It is a Python-based scientific computing package targeted at two sets of audiences:

+ A replcement for Numpy to use the power of GPU
+ A deep learning search platform that provides maximum flexibility and speed

#### Getting started
1. Tensors

+ similar as Numpy's ndarrats
+ But Tensors can also be used on a GPU to accelerate computing.

*Let's take some example now*

In [1]:
# from __future__ import print_fuction
# 适用于从Python2中引入print函数，即便是Python2，Print也可以加()
import torch
from IPython.display import Latex
# Construct a 5x3 matrix, uninitialized:
x_empty = torch.empty(5,3)
print('x_empty=',x_empty)

# Construct a randomly initialized matrix:
x_rand = torch.rand(5,3)
print('x_rand=',x_rand)

# Construct a matrix filled zeros and of dtype long:
x_LongZeros = torch.zeros(5, 3, dtype = torch.double)
print('x_LongZeros=', x_LongZeros)

# Construct a tensor directly from special data:
x_data = torch.tensor([2018,9,22])
print('x_data=', x_data)

# Construct a tensor based on an existing tensor
# change Value, Size and dType
x_newVST = x_data.new_ones(9, 9, dtype= torch.double)
print(x_newVST)
# just change Value and dType
x_newVS = torch.randn_like(x_data, dtype= torch.double)
print(x_newVS)

# print Size( Shape)
print(x_newVST.size())
print(x_newVST.shape)
print('Successful')

x_empty= tensor([[0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000]])
x_rand= tensor([[0.2634, 0.2545, 0.6092],
        [0.7748, 0.0908, 0.8941],
        [0.3047, 0.6577, 0.6483],
        [0.7937, 0.7073, 0.2769],
        [0.3876, 0.6029, 0.7799]])
x_LongZeros= tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64)
x_data= tensor([2018,    9,   22])
tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=torch.float64)
tensor([-0.6584,  1.3335,

2. Operations
There are multiple syntaxes for operations. 
In the following exampl, we will take a look at the add operation.

$\color{red}{ADD\; Operation}$

In [2]:
# addition
a = torch.rand(5,3)
b = torch.rand_like(a)
print('a=', a)
print('b=', b)

# syntax 1 
c = a + b;
print('c=', c)

# syntax 2
c1 = torch.add(a,b)
print('c=',c1)

# syntax 3: providing an output tensor as argument
c2 = torch.rand_like(a)
torch.add(a, b, out = c2)
print('c=',c2)

# syntax 4: add a to b
b.add_(a)
print('add=',b)

"""
NOTE: Any operation that mutates(改变) a tensor in-place(就地操作) is post-fixed with an _. 
For example: x.copy_(y), x.t_(), will change x.
"""

"""
You can use standard NumPy-like indexing with all bells and whistles! 
(你可以像使用Numpy库中索引的习惯一样在pytorch中索引)
"""
print('c[:,0]=',c[:,0])
print('c[:,1]=',c[:,1])
print('c[:,2]=',c[:,2])

a= tensor([[0.8121, 0.1098, 0.2129],
        [0.4936, 0.7722, 0.4620],
        [0.1978, 0.7144, 0.7590],
        [0.2558, 0.8495, 0.5203],
        [0.5908, 0.9509, 0.6481]])
b= tensor([[0.4574, 0.5889, 0.1933],
        [0.8983, 0.3399, 0.9876],
        [0.8989, 0.1065, 0.3730],
        [0.7859, 0.7841, 0.0518],
        [0.8448, 0.1018, 0.4269]])
c= tensor([[1.2695, 0.6987, 0.4062],
        [1.3919, 1.1121, 1.4497],
        [1.0967, 0.8209, 1.1320],
        [1.0417, 1.6336, 0.5721],
        [1.4356, 1.0527, 1.0750]])
c= tensor([[1.2695, 0.6987, 0.4062],
        [1.3919, 1.1121, 1.4497],
        [1.0967, 0.8209, 1.1320],
        [1.0417, 1.6336, 0.5721],
        [1.4356, 1.0527, 1.0750]])
c= tensor([[1.2695, 0.6987, 0.4062],
        [1.3919, 1.1121, 1.4497],
        [1.0967, 0.8209, 1.1320],
        [1.0417, 1.6336, 0.5721],
        [1.4356, 1.0527, 1.0750]])
add= tensor([[1.2695, 0.6987, 0.4062],
        [1.3919, 1.1121, 1.4497],
        [1.0967, 0.8209, 1.1320],
        [1.0417, 1.6336

$\color{red}{Resizing}$

If you want to resize/reshape tensor, you can use torch.view:

In [3]:
x = torch.randn(4, 4)
y = x.view(-1, 8)
print('x=',x)
print('x size=',x.size())
print('y size=',y.size())

"""
(Only) one element tensors can be converted to Python scalars by .item()
"""
x = torch.randn(1)
print(x, x.type)
print(x.item(), type(x.item()))


x= tensor([[ 0.5465, -1.6677,  1.0034, -0.0125],
        [ 0.7528,  0.5030,  1.2933,  0.5500],
        [ 1.9160, -2.4091,  0.0134,  0.8555],
        [ 1.7463,  0.8464,  1.2658, -0.9102]])
x size= torch.Size([4, 4])
y size= torch.Size([2, 8])
tensor([1.2587]) <built-in method type of Tensor object at 0x000002511A5EFA68>
1.2586830854415894 <class 'float'>


More Operation can be found in http://pytorch.org/docs/torch

3. Numpy Bridge

$\textbf{Converting Torch Tensor to Numpy Array}$

Converting a Torch Tensor to a Numpy array and vice versa is a breeze.
(注: vice versa 反之亦然; breeze: 轻而易举的事情)

The torch Tensor and Numpy array will share their underlying memory locations, and ${\color{red}{changing\;one\;will\;change\;the\;other}}$


In [4]:
a = torch.ones(5)
print('a=',a)

b = a.numpy()
print('b=',b)

a.add_(1)
print('a=',a)
print('b=',b)

a= tensor([1., 1., 1., 1., 1.])
b= [1. 1. 1. 1. 1.]
a= tensor([2., 2., 2., 2., 2.])
b= [2. 2. 2. 2. 2.]


$\textbf{Converting Numpy Array to Torch Tensor}$

See how changing the np array changed the Torch Tensor automatically

In [5]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
"""
All the Tensors on the CPU except a CharTensor support converting to NumPy and back.
"""

np.add(a,1)
print('a=',a)
print('b=',b)


a= [1. 1. 1. 1. 1.]
b= tensor([1., 1., 1., 1., 1.], dtype=torch.float64)


4. CUDA Tensors

此部分无法完成，因为此电脑没有可使用的Cuda Device

In [6]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    print('yes')
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!
else:
    print('No!')
    
# ![jupyter](https://pytorch.org/tutorials/_images/mnist.png)

No!


### Neural Networks
Neural networks can be constructed using the torch.nn package.

Now that you had a glimpse of 'autograd', 'nn' depends on 'autograd' to define models and differentiate them. An nn.Module contains layers, and a method forward(input) that returns the 'output'.

For example, look at this network that classifiers digit images in https://pytorch.org/tutorials/_images/mnist.png, including Input, Convolution, Subsampling, Convolution, Subsampling, Full connection.

It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:

    + Define the neural network that has some learnable parameters(or weights)
    + Iterate over a dataset of inputs
    + Process input through the network
    + Compute the loss
    + Propagate gradients back into the network's parameters
    + Update the weights of the network, typically using a simple update rule:

$$weight = weight - learning\_rate * gradient$$

#### Define The Network
Let's define the network

In [7]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel; 6 output channel; 5*5 square convolution kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        # 16*5*5: input size; 120: output size
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        # Max pooling over a (2,2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)),2)
        # 将x变为全连接。即bathsize * feature_numbers 的二维矩阵
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        # 最后一层不需要激活
        x = self.fc3(x)
        return x
        
    def num_flat_features(self, x):
        # all dimension except the batch dimension
        # 除了第一维之外的维度，例如x是4*5*6的torch张量，则size为torch.Size([5,6])
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        # 返回是30；
        # 若输入x:4*5*6*7
        # 输出应为30*7=210
        # 即所有的特征数量
        return num_features
    
net = Net()
print(net)
            
        

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


You just have define the *forward* function, and the *backward* function (where gradients are computed) is automatically defined for you using *autograd*. You can use any of the Tensor operation in the *forward* function.

The learnable parameters of a model are returned by *net.parameters()*


In [8]:
params = list(net.parameters())
print(len(params))
print(params[0].size())   # conv1's weight
print(params[1].size())
print(params[2].size())
print(params[3].size())
print(params[4].size())
print(params[5].size())
print(params[6].size())
print(params[7].size())
print(params[8].size())
print(params[9].size())

10
torch.Size([6, 1, 5, 5])
torch.Size([6])
torch.Size([16, 6, 5, 5])
torch.Size([16])
torch.Size([120, 400])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])


Let's try a random $32\times 32$ imput.

Note: Expected input size to this net(LeNet) is $32\times 32$. To use this net on MNIST dataset, please resize the images from the dataset to $32\times 32$

In [9]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[ 0.1585, -0.0513, -0.0055,  0.0208, -0.0938, -0.0586, -0.0479, -0.1212,
         -0.1156,  0.0494]], grad_fn=<ThAddmmBackward>)


Zeros the gradient buffers of all parameters and backprops with random gradients:

In [10]:
net.zero_grad()
out.backward(torch.rand(1,10))

${\color{red}{Note}}$

*torch.nn* only supports mini-batches. The entire *torch.nn* package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, *nn.conv2d* will take in a 4D Tensor of $nSamples\times nChannels \times Height \times Width$.

If you have a single sample, just use *input.unsqueeze(0)* to add a fake batch dimension.(注: squeeze: 挤, 榨, 捏)

Before proceeding further, let's recap all the classes you've seen so far.

${\color{blue}{Recap:}}$

+ torch.Tensor ---- A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.
+ nn.Module ---- Neural network module. Convenient way of encapsulating(总结，囊括) parameters, with helpers for moving them to GPU, exporting loading, etc.
+ nn.Parameter ---- A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module. 
+ autograd.Function -- Implements forward and backward definitions of an autograd operation. Every Tensor operation, creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.

注
+ w.r.t--with regard to; with reference to; with respect to, about的意思，即关于blabla

${\color{blue}{At this point, we covered:}}$

+ Defining a neural network
+ Processing inputs and calling backward

${\color{blue}{Still Left:}}$

+ Computing the loss
+ Updating the weights of the network

#### Loss Function
A loss function takes the( output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different $\color{red}{loss\; function}$ under teh *nn.package*. A simple loss is: *nn.MSELoss*, which computes the mean-squared error between the input and the target.

Take an example:

In [11]:
output = net(input)
# a dummy target, for example
# 注：dummy: 仿制品。挂名代表，傀儡
target = torch.randn(10)
# make it the same shape as output
target = target.view(1, -1)
criterion = nn.MSELoss()
# 注：criterion 规范，标准，准则

loss = criterion(output, target)
print(loss)

tensor(1.5350, grad_fn=<MseLossBackward>)


Now, if you follow *loss* in the backward direction, using its *.grad_fn* attribute, you will see a graph of computations that looks like this

    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d

          -> view -> linear -> relu -> linear -> relu -> linear
          -> MSELoss
          -> loss

So, when we call *loss.backward()*， the whole graph is differentiated w.r.t the loss, and all Tensors in the graph that has *requires_grad = True* will have their *.grad* Tensor accumulated with the gradient.

For illustration, let us follow a few steps backward:

In [12]:
print(loss.grad_fn)           #MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward object at 0x000002511A5F48D0>
<ThAddmmBackward object at 0x000002511A5F4C50>
<ExpandBackward object at 0x000002511A5F48D0>


#### Backprop
To backpropagate the error all we have to do is to *loss.backward()*. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call *loss.backward()*, and have a look at conv1's bias gradients before and after the backward.

In [14]:
# zeros the gradient buffers of all parameters
net.zero_grad()

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0011, -0.0055, -0.0124,  0.0115,  0.0161, -0.0073])


Now, we have seen how to use loss functions.

$\textbf{Read Later}$:

The neural network package contains various modules and loss functions that form the building blocks of deep neural networks. A full list with documentation is here.

$\textbf{The only thing left to learn is:}$

+ Updating the weights of the network


#### Update The Weights
The simplest update rule used in practice is the Stochastic Gradient Descent(SGD)

$$weight = weight - learning\_rate \times gradient$$

We can implement this using Python code:

In [16]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data*learning_rate)
    

However, as you use neural networks, you want to use various update rules such as SGD, Neterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: *torch.optim* that implements all these method. Using it is very simple:

In [18]:
import torch.optim as optim

# creat your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()  # zro the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()   # Does the update

$\color{red}{NOTE}$

Observe how gradient buffers had to be manually set to zero using *optimizer.zero_grad()*. This is because gradients are accumulated as explained in $\color{red}{Backprop}$ section.

### Training A Classifier
This is it. You have seen how to define neural networks, compute loss and make updates to the weights of the network.