# Convolutional Neural Networks
---
In this notebook, we train a **CNN** to classify images from the CIFAR-10 database.

The images in this database are small color images that fall into one of ten classes; 

### Test for CUDA

Since these are larger (32x32x3) images, it may prove useful to speed up your training time by using a GPU. CUDA is a parallel computing platform and CUDA Tensors are the same as typical Tensors, only they utilize GPU's for computation.

In [1]:
import torch
import numpy as np

# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available!  Training on GPU ...')

CUDA is available!  Training on GPU ...


---
## Load the Data

If you're not familiar with the Cifar-10, you may find it useful to look at: http://www.cs.toronto.edu/~kriz/cifar.html . Or you could search it by yourself. 

If you can't download it online or it takes long time due to the internet, you may use the attachment file provided on the class website

#### TODO: Load the data

In [2]:
#############################################################################
# TODO: load the data   #
#############################################################################
import torchvision
import torchvision.transforms as transforms

batch_size=2048

transform=transforms.Compose(
    [transforms.Resize(224),
     transforms.ToTensor(),
     transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

full_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=True,
    download=True,
    transform=transform)
train_length = int(full_dataset.__len__()*0.8)
val_length = full_dataset.__len__() - train_length
train_dataset, val_dataset = torch.utils.data.random_split(full_dataset, [train_length, val_length])
test_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=transform)

transform=transforms.Compose(
    [transforms.ToTensor(),
     transforms.Resize(224),
     transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

trainloader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=0)
valloader = torch.utils.data.DataLoader(
    val_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=0)
testloader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=1,
    shuffle=False,
    num_workers=0)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
#############################################################################
#                          END OF YOUR CODE               #
#############################################################################

Files already downloaded and verified
Files already downloaded and verified


---
## Define the Network Architecture

This time, you'll define a CNN architecture. You may use 
* [Convolutional layers](https://pytorch.org/docs/stable/nn.html#conv2d), which can be thought of as stack of filtered images.
* [Maxpooling layers](https://pytorch.org/docs/stable/nn.html#maxpool2d), which reduce the x-y size of an input, keeping only the most _active_ pixels from the previous layer.
* The usual Linear + Dropout layers to avoid overfitting and produce a 10-dim output.


#### Output volume for a convolutional layer

To compute the output size of a given convolutional layer we can perform the following calculation:
> We can compute the spatial size of the output volume as a function of the input volume size (W), the kernel/filter size (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. The correct formula for calculating how many neurons define the output_W is given by `(W−F+2P)/S+1`. 

For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output.

#### TODO: Define a model with multiple convolutional layers and an output layer for image classification

In [3]:
#Define a CNN model

import torch.nn as nn
import torch.nn.functional as F

#使用ResNet模型，首先定义基本块
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.shortcut = nn.Sequential()
        # 经过处理后的x要与x的维度相同(尺寸和深度)
        # 如果不相同，需要添加卷积+BN来变换为同一维度
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

# define the CNN architecture
class CnnNet(nn.Module):
    def __init__(self):
        super(CnnNet, self).__init__()
        #############################################################################
        # TODO: define your own CNN network              #
        #############################################################################
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1_1 = BasicBlock(in_planes=64, planes=64, stride=1)
        self.layer1_2 = BasicBlock(in_planes=64, planes=64, stride=1)
        self.layer2_1 = BasicBlock(in_planes=64, planes=128, stride=2)
        self.layer2_2 = BasicBlock(in_planes=128, planes=128, stride=1)
        self.layer3_1 = BasicBlock(in_planes=128, planes=256, stride=2)
        self.layer3_2 = BasicBlock(in_planes=256, planes=256, stride=1)
        self.layer4_1 = BasicBlock(in_planes=256, planes=512, stride=2)
        self.layer4_2 = BasicBlock(in_planes=512, planes=512, stride=1)
        self.linear = nn.Linear(512, 10)
        #self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.maxpool(out)
        out = self.layer1_1(out)
        out = self.layer1_2(out)
        out = self.layer2_1(out)
        out = self.layer2_2(out)
        out = self.layer3_1(out)
        out = self.layer3_2(out)
        out = self.layer4_1(out)
        out = self.layer4_2(out)
        out = F.adaptive_avg_pool2d(out,output_size=1)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        #out = self.dropout(out)
        return out
        #############################################################################
        #                          END OF YOUR CODE               #
        #############################################################################

model = CnnNet()
print(model)

# move tensors to GPU if CUDA is available
if train_on_gpu:
    model.cuda()

CnnNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1_1): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (shortcut): Sequential()
  )
  (layer1_2): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (

### Specify [Loss Function](http://pytorch.org/docs/stable/nn.html#loss-functions) and [Optimizer](http://pytorch.org/docs/stable/optim.html)

Decide on a loss and optimization function that is best suited for this classification task. Pay attention to the value for **learning rate** as this value determines how your model converges to a small error.

#### TODO: Define the loss and optimizer 

In [4]:
#############################################################################
# TODO: define the loss and optimizer              #
#############################################################################
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(params=model.parameters(), lr=0.001)
#############################################################################
#                          END OF YOUR CODE               #
#############################################################################

---
## Train the Network

Remember to look at how the training and validation loss decreases over time and print them.

In [5]:
#############################################################################
# TODO: train and validation              #
#############################################################################
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = nn.DataParallel(model)

for epoch in range(30):
    train_loss = 0.0
    train_acc = 0.0
    train_correct_count = 0
    model.train()
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs=inputs.to(device)
        labels=labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        train_correct_count += (torch.argmax(outputs,dim=1)==labels).sum().cpu().item()
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        train_loss += loss.cpu().item()
    train_loss=train_loss/(i+1)
    train_acc=train_correct_count/(i+1)/batch_size

    val_loss = 0.0
    val_acc = 0.0
    val_correct_count = 0
    model.eval()
    for i, data in enumerate(valloader, 0):
        inputs, labels = data
        inputs=inputs.to(device)
        labels=labels.to(device)

        outputs = model(inputs)
        val_correct_count += (torch.argmax(outputs,dim=1)==labels).sum().cpu().item()
        loss = criterion(outputs, labels)
        val_loss += loss.cpu().item()
    val_loss=val_loss/(i+1)
    val_acc=val_correct_count/(i+1)/batch_size

    print('Epoch %d|Train_loss:%.3f Eval_loss:%.3f Train_acc:%.3f Eval_acc:%.3f'%(epoch+1,train_loss,val_loss,train_acc,val_acc))
#############################################################################
#                          END OF YOUR CODE               #
#############################################################################

Epoch 1|Train_loss:2.023 Eval_loss:2.043 Train_acc:0.274 Eval_acc:0.255
Epoch 2|Train_loss:1.550 Eval_loss:1.775 Train_acc:0.415 Eval_acc:0.357
Epoch 3|Train_loss:1.309 Eval_loss:1.716 Train_acc:0.509 Eval_acc:0.415
Epoch 4|Train_loss:1.152 Eval_loss:1.340 Train_acc:0.566 Eval_acc:0.509
Epoch 5|Train_loss:1.011 Eval_loss:1.334 Train_acc:0.623 Eval_acc:0.517
Epoch 6|Train_loss:0.893 Eval_loss:1.102 Train_acc:0.664 Eval_acc:0.588
Epoch 7|Train_loss:0.798 Eval_loss:0.967 Train_acc:0.700 Eval_acc:0.647
Epoch 8|Train_loss:0.719 Eval_loss:0.852 Train_acc:0.728 Eval_acc:0.677
Epoch 9|Train_loss:0.636 Eval_loss:1.230 Train_acc:0.755 Eval_acc:0.618
Epoch 10|Train_loss:0.572 Eval_loss:0.772 Train_acc:0.780 Eval_acc:0.720
Epoch 11|Train_loss:0.496 Eval_loss:1.068 Train_acc:0.807 Eval_acc:0.654
Epoch 12|Train_loss:0.430 Eval_loss:0.992 Train_acc:0.832 Eval_acc:0.676
Epoch 13|Train_loss:0.375 Eval_loss:0.902 Train_acc:0.848 Eval_acc:0.695
Epoch 14|Train_loss:0.338 Eval_loss:1.091 Train_acc:0.862 Ev

---
## Test the Trained Network

Test your trained model on previously unseen data and print the test accuracy of each class and the whole! Try your best to get a better accuracy.

In [6]:
#############################################################################
# TODO: test the trained network             #
#############################################################################
class_count=np.zeros(10,dtype=int)
correct_count=np.zeros(10,dtype=int)
model.eval()
for i, data in enumerate(testloader, 0):
    inputs, labels = data
    inputs=inputs.to(device)
    outputs = torch.argmax(model(inputs),dim=1).cpu().item()
    class_count[labels]+=1
    if outputs==labels:
        correct_count[labels]+=1
for i in range(10):
    print('Test|Class{}('.format(i+1)+classes[i]+')-acc={}'.format(correct_count[i]/class_count[i]))
#此处展示的是各类别的查准率=正确识别数量/该类别样本总数
print('\nTest|Overall-acc={}'.format(np.sum(correct_count/10000)))
#############################################################################
#                          END OF YOUR CODE               #
#############################################################################

Test|Class1(plane)-acc=0.867
Test|Class2(car)-acc=0.951
Test|Class3(bird)-acc=0.816
Test|Class4(cat)-acc=0.72
Test|Class5(deer)-acc=0.831
Test|Class6(dog)-acc=0.751
Test|Class7(frog)-acc=0.896
Test|Class8(horse)-acc=0.859
Test|Class9(ship)-acc=0.926
Test|Class10(truck)-acc=0.889

Test|Overall-acc=0.8506


### Question: What are your model's weaknesses during your experiment and how might they be improved?
+ 观察最终各类别上的分类准确率，类别之间存在较大的差异。因此可以得知，不同类别的物体特征获取、识别的难度不尽相同。通过随机裁剪、翻转、缩放等方式进行数据增强可以获得模型在不同样本上更好的鲁棒性，有可能提升模型表现。
+ 较小规模的网络的表出能力有限而难以取得较好的结果，较大规模的网络训练代价又较大。因此选择合适规模的网络是必要的，事实上，神经网络的表出能力也是机器学习理论研究的前沿方向之一。同时，例如深度可分离卷积等方法也提供了神经轻量化的方法。
