# PyTorch basics: neural network package nn and optimizer optm
torch.nn is a modular interface specially designed for neural networks. nn is built on Autograd and can be used to define and run neural networks.
Here we mainly introduce a few commonly used classes

**Convention: torch.nn For convenience, we will set an alias for him as nn. This chapter has other naming conventions besides nn**

In [1]:
# First introduce relevant packages
import torch
# Introduce torch.nn and specify alias
import torch.nn as nn
#Print the version
torch.__version__

'1.6.0'

In addition to the nn alias, we also quoted nn.functional. This package contains some commonly used functions used in neural networks. The feature of these functions is that they do not have learnable parameters (such as ReLU, pool, DropOut, etc.). These The function can be placed in the constructor or not, but it is not recommended here.

Under normal circumstances, we will **set nn.functional to capital F**, so that the abbreviation is convenient to call

In [2]:
import torch.nn.functional as F

## Define a network
PyTorch has prepared a ready-made network model for us, as long as it inherits nn.Module and implements its forward method, PyTorch will automatically implement the backward function according to autograd. In the forward function, any function supported by tensor can be used. Use Python syntax such as if, for loop, print, log, etc., and the writing is consistent with the standard Python writing.

In [3]:
class Net(nn.Module):
    def __init__(self):
        # nn.Module subclass functions must execute the parent class constructor in the constructor
        super(Net, self).__init__()

        # Convolutional layer '1' means that the input picture is a single channel, '6' means the number of output channels, '3' means that the convolution kernel is 3*3
        self.conv1 = nn.Conv2d(1, 6, 3)
        #Linear layer, input 1350 features, output 10 features
        self.fc1 = nn.Linear(1350, 10) #How is 1350 calculated here? This depends on the forward function behind
    #Forward spread
    def forward(self, x):
        print(x.size()) # Result: [1, 1, 32, 32]
        # Convolution -> Activation -> Pooling
        x = self.conv1(x) #According to the calculation formula of the convolution size, the calculation result is 30. The specific calculation formula will be described in detail in the fourth section of the second chapter convolution neural network.
        x = F.relu(x)
        print(x.size()) # Result: [1, 6, 30, 30]
        x = F.max_pool2d(x, (2, 2)) #We use the pooling layer, the calculation result is 15
        x = F.relu(x)
        print(x.size()) # Result: [1, 6, 15, 15]
        # reshape, ‘-1’ means adaptive
        #What I did here is the squashing operation, which is to squash the following [1, 6, 15, 15] into [1, 1350]
        x = x.view(x.size()[0], -1)
        print(x.size()) # Here is the input 1350 of the fc1 layer
        x = self.fc1(x)
        return x

net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=1350, out_features=10, bias=True)
)


The learnable parameters of the network are returned by net.parameters()

In [4]:
for parameters in net.parameters():
    print(parameters)

Parameter containing:
tensor([[[[-0.1344,  0.1935,  0.0283],
          [ 0.1961,  0.2092,  0.2041],
          [ 0.2554, -0.1514, -0.1703]]],


        [[[-0.0207,  0.0106, -0.1412],
          [-0.3179,  0.0231, -0.0267],
          [ 0.0117, -0.1489, -0.0665]]],


        [[[ 0.2336, -0.2888,  0.0846],
          [-0.1015, -0.0979, -0.2875],
          [ 0.0362, -0.1150, -0.3182]]],


        [[[ 0.2797, -0.3027, -0.1414],
          [ 0.0131,  0.0606, -0.3207],
          [ 0.0614,  0.2541, -0.1477]]],


        [[[ 0.1471,  0.0704,  0.1482],
          [ 0.1755, -0.0004,  0.1870],
          [-0.2855,  0.1054, -0.3121]]],


        [[[-0.0927,  0.3014,  0.2150],
          [ 0.3258, -0.2874,  0.1559],
          [-0.1763, -0.1845, -0.2128]]]], requires_grad=True)
Parameter containing:
tensor([-0.3111, -0.2193,  0.3168,  0.1261,  0.2435, -0.1907],
       requires_grad=True)
Parameter containing:
tensor([[-0.0161, -0.0269,  0.0251,  ..., -0.0108,  0.0045, -0.0151],
        [-0.0146,  0.0211, -0

net.named_parameters can return learnable parameters and names at the same time.

In [5]:
for name,parameters in net.named_parameters():
    print(name,':',parameters.size())

conv1.weight : torch.Size([6, 1, 3, 3])
conv1.bias : torch.Size([6])
fc1.weight : torch.Size([10, 1350])
fc1.bias : torch.Size([10])


The input and output of the forward function are both Tensor

In [6]:
input = torch.randn(1, 1, 32, 32) # The input corresponding to the previous fforward here is 32
out = net(input)
out.size()

torch.Size([1, 1, 32, 32])
torch.Size([1, 6, 30, 30])
torch.Size([1, 6, 15, 15])
torch.Size([1, 1350])


torch.Size([1, 10])

In [7]:
input.size()

torch.Size([1, 1, 32, 32])

Before backpropagation, first clear the gradient of all parameters

In [8]:
net.zero_grad()
out.backward(torch.ones(1,10)) # The implementation of backpropagation is automatically implemented by PyTorch, we only need to call this function

**Note**: torch.nn only supports mini-batches, and does not support inputting one sample at a time, that is, one batch must be entered at a time.

In other words, even if we input a sample, the sample will be divided into batches. Therefore, all inputs will increase by one dimension. Let’s compare the input just now. nn is defined as 3 dimensions, but we increase it manually when creating One dimension becomes 4 dimension, the first 1 is batch-size

## Loss function
In nn, PyTorch also prefabricated the commonly used loss function, below we use MSELoss to calculate the mean square error

In [9]:
y = torch.arange(0,10).view(1,10).float()
criterion = nn.MSELoss()
loss = criterion(out, y)
#loss is a scalar, we can directly use item to get his python type value
print(loss.item())

26.99104881286621


## Optimizer
After calculating the gradients of all parameters in backpropagation, optimization methods are also needed to update the weights and parameters of the network. For example, the update strategy of stochastic gradient descent (SGD) is as follows:

weight = weight-learning_rate * gradient

Realize most of the optimization methods in torch.optim, such as RMSProp, Adam, SGD, etc. Below we use SGD as a simple example

In [10]:
import torch.optim

In [11]:
out = net(input) # When called here, the size of x we ​​printed in the forword function will be printed
criterion = nn.MSELoss()
loss = criterion(out, y)
#Create a new optimizer, SGD only needs to adjust the parameters and learning rate
optimizer = torch.optim.SGD(net.parameters(), lr = 0.01)
# Clear the gradient first (the same effect as net.zero_grad())
optimizer.zero_grad()
loss.backward()

#Update parameters
optimizer.step()

torch.Size([1, 1, 32, 32])
torch.Size([1, 6, 30, 30])
torch.Size([1, 6, 15, 15])
torch.Size([1, 1350])


In this way, a complete dissemination of neural network data has been achieved through PyTorch. The following chapter will introduce the data loading and processing tools provided by PyTorch, which can be used to conveniently process the required data.

After reading this section, you may still have doubts about the calculation methods of some parameters in the neural network model. This part will be introduced in detail in Chapter 2, Section 4, Convolutional Neural Networks, and in Chapter 3, Section 2 MNIST Data There are detailed notes in the practical code of handwritten number recognition.