### Building Convolutional Neural Networks

    The inputs and layers in convolutional networks are somewhat different from traditional neural networks and need to be redesigned. The training module remains basically the same.

Convolutional neural networks or CNNs have been particularly effective in tasks where the input data is grid-like topology such as image pixel data. They distinguish themselves from traditional neural nets as they contain Convolutional layers, Pooling layers, and Fully Connected layers. Despite the architectural differences, the training process, specifically the propagation and weight update steps, remain largely the same.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets,transforms 
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

### con - relu - pooling
ReLU stands for Rectified Linear Unit and it's one of the most commonly used activation functions in neural networks and deep learning models.

An activation function in a neural network defines how the weighted sum of the input is transformed into an output from a node or nodes in a layer of the network. In other words, they decide whether a given neuron should be activated or not based on the weighted sum.

ReLU is mathematically defined as f(x) = max(0, x). The function returns x if it is greater than 0, otherwise, it returns 0.

### First, load the data

    Separately construct the training set and test set (validation set).

    Use DataLoader to iterate over the data.

In the realm of machine learning, working with datasets always starts with loading the data and splitting it into a training set and a *test* set (with an **optional** *validation* set). The training set is used to train the model, while the test set (and/or validation set) is used to evaluate the model's performance. The DataLoader is a utility function commonly used in PyTorch that provides an iterator over the dataset, allowing for easy batch processing and shuffling of the data.

In [2]:
# set the hyperparameters
input_size = 28  # The size of the image is 28*28
num_classes = 10  # the number if classification categories
num_epochs = 10  # the number of training cycles
batch_size = 64  # the number of images handled in a batch，64


# you load the training and testing sets. 
# You are using the MNIST dataset here, also known as the handwritten digits dataset. 
# This dataset includes 60,000 training samples and 10,000 testing samples. 
# You download the dataset and convert each image into tensor format.
# TRAIN
train_dataset = datasets.MNIST(root='./data',  
                            train=True,   
                            transform=transforms.ToTensor(),  
                            download=True) 

# TEST
test_dataset = datasets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

# CONSTRUCT
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

### Building a Convolutional Network Module

    Generally, the convolutional layer, ReLU layer, and pooling layer can be combined into a package.
    
    Be aware that the final result of the convolution is still a feature map. This map needs to be converted into a vector to perform classification or regression tasks.

In Convolutional Neural Networks (CNNs), convolutions, nonlinear activations (like ReLU), and pooling operations are typically used in conjunction to extract increasingly complex features from input data. 

After extracting features through these layers, the data still exists in a grid-like (image) format. Before connecting to a fully connected layer for classification or regression, the data must be flattened or transformed from a matrix into a vector.

In [3]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(         # 输入大小 (1, 28, 28)
            ## Conv2d + ReLU + MaxPooling
            ### channels input and chanels output is important!
            nn.Conv2d(
                in_channels=1,              # 灰度图
                out_channels=16,            # 要得到几多少个特征图
                kernel_size=5,              # 卷积核大小
                stride=1,                   # 步长
                padding=2,                  # 如果希望卷积后大小跟原来一样，需要设置padding=(kernel_size-1)/2 if stride=1
            ),                              # 输出的特征图为 (16, 28, 28)
            nn.ReLU(),                      # relu层
            nn.MaxPool2d(kernel_size=2),    # 进行池化操作（2x2 区域）, 输出结果为： (16, 14, 14)
        )
        self.conv2 = nn.Sequential(         # 下一个套餐的输入 (16, 14, 14)
            nn.Conv2d(16, 32, 5, 1, 2),     # 输出 (32, 14, 14)
            nn.ReLU(),                      # relu层
            nn.MaxPool2d(2),                # 输出 (32, 7, 7)
        )
        # Fully connected layer: 32 * 7 * 7 = 1568
        self.out = nn.Linear(32 * 7 * 7, 10)   # 全连接层得到的结果

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)           # flatten操作，结果为：(batch_size, 32 * 7 * 7)
        output = self.out(x)
        return output

### Accuracy as the evaluation metric

It's simply calculated as the proportion of correct predictions over total predictions. 

However, while simple and easy to interpret, accuracy may not be suitable for all scenarios especially when dealing with imbalanced datasets. 

Other metrics like precision, recall, F1-score, or even ROC AUC might be more appropriate depending on the specific problem and data distribution.

In [6]:
def accuracy(predictions, labels):
    pred = torch.max(predictions.data, 1)[1] 
    rights = pred.eq(labels.data.view_as(pred)).sum() 
    return rights, len(labels) 

### Train the network model

In [7]:
# A CNN named net is instantiated.
net = CNN() 
# The loss function is defined as CrossEntropyLoss. This is often used in multi-class classification problems.
criterion = nn.CrossEntropyLoss() 
# The optimizer is defined as Adam (Adaptive Moment Estimation), which is a popular choice as it automatically adapts the learning rate during training.
optimizer = optim.Adam(net.parameters(), lr=0.001) #定义优化器，普通的随机梯度下降算法

# Next, the training cycle begins for a specified number of epochs:
for epoch in range(num_epochs):
    print('epoch:\t',epoch)
    #当前epoch的结果保存下来
    train_rights = [] 
    
    for batch_idx, (data, target) in enumerate(train_loader):  #针对容器中的每一个批进行循环
        net.train()                             
        output = net(data) 
        loss = criterion(output, target) 
        optimizer.zero_grad() 
        loss.backward() 
        optimizer.step() 
        right = accuracy(output, target) 
        train_rights.append(right) 

        # In the code, every 100 batches, the system enters evaluation mode (net.eval()) and the process is repeated for the test loader to determine the CNN's accuracy on unseen data.
        if batch_idx % 100 == 0: 
            
            net.eval() 
            val_rights = [] 
            
            for (data, target) in test_loader:
                output = net(data) 
                right = accuracy(output, target) 
                val_rights.append(right)
                
            #准确率计算
            train_r = (sum([tup[0] for tup in train_rights]), sum([tup[1] for tup in train_rights]))
            val_r = (sum([tup[0] for tup in val_rights]), sum([tup[1] for tup in val_rights]))

            print('当前epoch: {} [{}/{} ({:.0f}%)]\t损失: {:.6f}\t训练集准确率: {:.2f}%\t测试集正确率: {:.2f}%'.format(
                epoch, batch_idx * batch_size, len(train_loader.dataset),
                100. * batch_idx / len(train_loader), 
                loss.data, 
                100. * train_r[0].numpy() / train_r[1], 
                100. * val_r[0].numpy() / val_r[1]))

# 5m 39.9s

epoch:	 0
epoch:	 1
epoch:	 2
epoch:	 3
epoch:	 4
epoch:	 5
epoch:	 6
epoch:	 7
epoch:	 8
epoch:	 9


In [9]:
import torch
import torch.nn as nn
import torch.optim as optim

# Assuming CNN class and accuracy function are defined elsewhere

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 

# instantiate the model
net = CNN().to(device)

# loss function
criterion = nn.CrossEntropyLoss()

# optimizer
optimizer = optim.Adam(net.parameters(), lr=0.001)

# start training loop
for epoch in range(num_epochs):
    print('Epoch:', epoch)
    # store results for the current epoch
    train_rights = []

    for batch_idx, (data, target) in enumerate(train_loader):
        # transfer tensors to the selected device
        data, target = data.to(device), target.to(device)
        net.train()  # train the network
        output = net(data)  # forward pass
        loss = criterion(output, target)  # calculate loss
        optimizer.zero_grad()  # reset gradients
        loss.backward()  # backward pass
        optimizer.step()  # optimization step
        right = accuracy(output, target)  # calculate accuracy
        train_rights.append(right)

        # print progress and validation accuracy every 100 batches
        if batch_idx % 100 == 0:
            net.eval()  # put network in evaluation mode
            val_rights = []

            for data, target in test_loader:
                data, target = data.to(device), target.to(device)
                output = net(data)  # forward pass
                right = accuracy(output, target)  # calculate accuracy
                val_rights.append(right)

            # accuracy calculation
            train_r = (sum([tup[0] for tup in train_rights]), sum([tup[1] for tup in train_rights]))
            val_r = (sum([tup[0] for tup in val_rights]), sum([tup[1] for tup in val_rights]))

            print(f'Current Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
                  f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item()}\t'
                  f'Train Accuracy: {100. * train_r[0] / train_r[1]:.2f}%\t'
                  f'Test Accuracy: {100. * val_r[0] / val_r[1]:.2f}%')

# 2m 58.8s

Epoch: 0
Epoch: 1
Epoch: 2
Epoch: 3
Epoch: 4
Epoch: 5
Epoch: 6
Epoch: 7
Epoch: 8
Epoch: 9


time cost by cpu:gpu = 2:1

In [14]:
print('Epoch:', 1)
# store results for the current epoch
train_rights = []

for batch_idx, (data, target) in enumerate(train_loader):
    # transfer tensors to the selected device
    data, target = data.to(device), target.to(device)
    net.train()  # train the network
    output = net(data)  # forward pass
    loss = criterion(output, target)  # calculate loss
    optimizer.zero_grad()  # reset gradients
    loss.backward()  # backward pass
    optimizer.step()  # optimization step
    right = accuracy(output, target)  # calculate accuracy
    train_rights.append(right)

    # print progress and validation accuracy every 100 batches
    if batch_idx % 300 == 0:
        net.eval()  # put network in evaluation mode
        val_rights = []

        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = net(data)  # forward pass
            right = accuracy(output, target)  # calculate accuracy
            
            val_rights.append(right)

        # accuracy calculation
        train_r = (sum([tup[0] for tup in train_rights]), sum([tup[1] for tup in train_rights]))
        val_r = (sum([tup[0] for tup in val_rights]), sum([tup[1] for tup in val_rights]))
        print(train_r)
        print(val_r)
        print(f'Current Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
                f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item()}\t'
                f'Train Accuracy: {100. * train_r[0] / train_r[1]:.2f}%\t'
                f'Test Accuracy: {100. * val_r[0] / val_r[1]:.2f}%')


Epoch: 1
(tensor(64, device='cuda:0'), 64)
(tensor(9914, device='cuda:0'), 10000)
(tensor(19255, device='cuda:0'), 19264)
(tensor(9915, device='cuda:0'), 10000)
(tensor(38425, device='cuda:0'), 38464)
(tensor(9904, device='cuda:0'), 10000)
(tensor(57581, device='cuda:0'), 57664)
(tensor(9896, device='cuda:0'), 10000)
