# Level 5
-------------------
In this task, I will do the famous Dogs vs. Cats competition. 

## Part 1 Data Preprocessing
### Image Processing
Note that the data given are of various sizes. Need to standardize image specifications. The network input of VGG is uniform 3×224×224, so the image is processed to the corresponding size. Use ``torchvision.transforms`` to resize the image and convert it to``torch.tensor``.

In [None]:
TRANSFORM = transforms.Compose([transforms.Resize((256,256)),
                       transforms.RandomCrop((224,224)),
                       transforms.ToTensor(),
                        ])

By randomly cropping the training set pictures, you can introduce more randomness. On the one hand, you can increase the training data (the pictures are different for different epochs), and on the other hand, reduce overfitting.

Another very important operation to prevent overfitting is image normalization. Image normalization is not simply every position / 256.0, but requires more processing. PyTorch has implemented a normalization function. The parameters that need to be controlled are the variance and the mean. Standing on the shoulders of giants, the commonly used parameters in image recognition are (normalize each pixel to [−1,1]):

In [None]:
Normalize(mean = [0.485, 0.456, 0.406], 
          std = [0.229, 0.224, 0.225])

Therefore the training set images are preprocessed like this:

In [None]:
TRANSFORM = transforms.Compose([transforms.Resize((256,256)),
                                transforms.RandomCrop((224,224)),
                                transforms.ToTensor(),
                                transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))
                               ])

The test set does not need to be randomly cropped, just use the original image to scale:

In [None]:
TRANSFORM = transforms.Compose([transforms.Resize((224,224)),
                                transforms.ToTensor(),
                                transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))
                               ])

### DataLoader
PyTorch's built-in ```Dataset``` can automatically implement data loading. First, you need to define an inheritance of ```torch.utils.data.Dataset```. There are two ways to achieve this:

One is the **Dynamic Loading** type: only the file path of each picture is stored in the memory. When the picture needs to be called later, the picture is read on-site, the single picture is pre-processed and returned. So no pre-processing time is required to run directly. 

The disadvantage is that in the case of many epochs, the total time cost of repeatedly reading the picture will be large. Because the training process is based on batch as the smallest unit, the memory overhead is completely controlled by batch_size except for storage network parameters, which can be automatically adjusted according to memory. 

^ _ ^ Poor lower middle peasant options.

In [None]:
class ImageDataset(data.Dataset):
    def __init__(self, image_list, label_list):
        self.data = image_list
        self.label = label_list

    def __getitem__(self, index):
        global TRANSFORM
        img = Image.open(self.data[index])
        data = TRANSFORM(img)
        img.close()
        return data.cuda(),torch.cuda.FloatTensor([self.label[index]])

    def __len__(self):
        return len(self.data)

The other is the **Preload**: the entire picture library is preprocessed and loaded into memory, and the pictures can be returned each time in the future. It takes about 40s to read the picture library at the beginning of the experiment, which is acceptable. After that, the speed of calling pictures will be very fast, and there is a significant improvement. 

But the memory overhead is particularly huge. It seems that using numpy.ndarray storage also requires about 20GB of memory (no specific number is measured). If you use ``torch.tensor``, it will explode further.

^ _ ^ Big Capitalist Options

In [None]:
class ImageDataset(data.Dataset):
    def __init__(self, image_list, label_list):
        self.data = []
        self.label = []
        for i in range(len(image_list)):
            img = Image.open(image_list[i])
            self.data.append(TRANSFORM(img))
            img.close()
            self.label.append(label_list[i])

    def __getitem__(self, index):
        return self.data[index].cuda(),torch.cuda.FloatTensor([self.label[index]])

    def __len__(self):
        return len(self.data)

Load the data. Here, a unified random seed is used to ensure that the verification set and training set divided at each run are the same.

In [None]:
def load():
    np.random.seed(998244353)
    torch.manual_seed(998244353)
    image_list = []
    label_list = []
    for i in range(ORIGIN_DATA_SIZE):
        image_list.append(INPUT_PATH+"/train/cat.{0}.jpg".format(i))
        label_list.append(0)
        image_list.append(INPUT_PATH+"/train/dog.{0}.jpg".format(i))
        label_list.append(1)
    n = int(ORIGIN_DATA_SIZE*2*RATIO)
    train_data = ImageDataset(image_list[:n],label_list[:n])
    validate_data = ImageDataset(image_list[n:],label_list[n:])
    image_list = []
    for i in range(TARGET_DATA_SIZE):
        image_list.append(INPUT_PATH+"/test/{0}.jpg".format(i+1))
    test_data = ImageDataset(image_list,[0]*TARGET_DATA_SIZE)
    np.random.seed()
    torch.seed()
    return train_data,validate_data,test_data

## Part 3 PyTorch Implements VGG
### VGG Structure

VGG model's reference [VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION](https://arxiv.org/pdf/1409.1556.pdf "ref") 

Next, a general VGG will be implemented according to the VGG version structure table. It is mainly achieved by using ``torch.nn.Sequential ()`` of PyTorch and ``add_module ()`` which can dynamically add connection layers. Here is the basic structure:

In [None]:
class VGG(nn.Module):
    def __init__(self, name="11"):
        super(VGG, self).__init__()
        self.name = "VGG"+name
        self.conv = nn.Sequential()
        i = 1; p = 1

        # ... Different versions of VGG

        self.fc = nn.Sequential(
            nn.Linear(512*7*7,4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096,4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096,1000),
            nn.ReLU(),
            nn.Linear(1000,1),
            nn.Sigmoid()
        )

The fully connected part of all VGG versions is certain, only the part omitted in the middle is different. When you create a VGG, you need to define a name, and select from``[" 11 "," 11-LRN "," 13 "," 16-1 "," 16 "," 19 "]``.Names that are default or not in the structure table are considered VGG11. Because `` add_module () ``needs a name to create a layer, `` i ``and `` p ``are used to identify the labels of different convolution layers and different pooling layers, respectively.

### VGG Implementation
Create a convolutional layer using the following method, and you can see that each parameter has been explicitly noted. Note that the number of channels between different convolutional layers must match. In order to ensure that the size of the picture does not change, each 3 × 3 convolution uses ``padding = 1`` and each 1 × 1 convolution uses ```padding = 0```. Therefore, the 3 × 224 × 224 picture is directly passed in initially Instead of a 3 × 227 × 227 picture.

In [None]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=  3, out_channels= 64, kernel_size=3, stride=1, padding=1))

Use the following method to create an activation function layer (the standard uses ReLU), and there is an activation function layer behind each convolution layer. The two labels can be the same, so the label ```i``` is updated after each activation of the function layer.

In [2]:
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1

NameError: name 'self' is not defined

Create a pooling layer using the following method. In VGG, only a pooling layer of the above type and parameter is used: 2 × 2 MaxPooling. Note the update of the pooling layer label ```p```.

In [None]:
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1

An LRN layer (Local Response Norm, local response normalization) is created using the following method. The LRN layer is used only once in VGG11-LRN. The main idea is to normalize the signals of adjacent channels in the middle of the neural network. Therefore, when a certain neuron signal is relatively large, the relative signal size of peripheral neurons will decrease, which can mimic the excitement of a neuron. Phenomenon of peripheral neuron suppression. The main function is to prevent overfitting, especially on neural networks using ReLU activation functions.

In [None]:
self.conv.add_module('LRN',nn.LocalResponseNorm(size=2))

The input is ```batch_size × 3 × 224 × 224``` dimensions, and the output is ```batch_size × 512 × 7 × 7```. Next, the convolutional layer with input ```batch_size × 25088``` is connected. A reorganization of the neurons is required, so the forward propagation process is implemented as follows:

In [None]:
def forward(self, x):
    x = self.conv(x)
    x = x.view(x.shape[0], -1)
    x = self.fc(x)
    return x

Start building VGG!!!

In [None]:
net = VGG("19")

## Part 4 Adjustment Parameter
The RMSprop optimizer can automatically adjust parameters. Just call PyTorch's built-in RMSprop optimizer:

In [None]:
optimizer = optim.RMSprop(net.parameters(), lr=LR, alpha=0.9)

## Part 5 Training
Due to the limitation of the storage capacity (save the parameter + optimizer more than 1G, remember to save in the middle stage, save once every UPDATE epoch, and record Validation Error and Validation Loss when saving.

In [None]:
def train(optimizer):
    global net
    global EPOCH
    global BATCH_SIZE
    global train_data
    global validate_data
    global PROGRAM_START
    print('['+net.name+'] with optimizer ['+str(type(optimizer))+']:')
    train_loader = data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
    validate_loader = data.DataLoader(validate_data, batch_size=1, shuffle=False, num_workers=0)
    BATCH = len(train_loader)
    m = len(validate_loader)
    for epoch in range(EPOCH):
        EPOCH_START = time.time()
        print("\tEpoch #{0}/{1}:".format(epoch+1,EPOCH))
        for batch,(x,y) in enumerate(train_loader):
            optimizer.zero_grad()
            t = net(x)
            loss = LOSS_FUNC(t,y)
            print("\t\tBatch #{0}/{1}: ".format(batch+1,BATCH) + "Loss = %.6f"%float(loss))
            loss.backward()
            optimizer.step()
        with torch.no_grad():
            L = 0.
            E = 0.
            for batch,(x,y) in enumerate(validate_loader):
                t = net(x)
                L += float(LOSS_FUNC(t,y))
                E += float((float(t[0][0])>0.5)!=y)
            print("\t  Validation Loss = %.6f. Error Rate = %.3f%%"%(L/m,E*100/m))
            if((epoch+1)%UPDATE==0):
                torch.save(net.state_dict(),OUTPUT_PATH+"/{0}[{1}]".format(net.name,epoch+1)+"-L(%.6f)E(%.3f).pt"%(L/m,E*100/m))
                torch.save(optimizer.state_dict(),OUTPUT_PATH+"/{0}[{1}]".format(net.name,epoch+1)+"-L(%.6f)E(%.3f)-optimizer.pt"%(L/m,E*100/m))
        print("\t  Finish epoch #{0}".format(epoch+1)+" in %.4f s."%(time.time()-EPOCH_START)+" Total Time Cost = %.4f s."%(time.time()-PROGRAM_START))

The training set has 25,000 pictures, of which 99% or 24750 are used as the training set and 250 are used as the test set. Update every 5 epochs.
A larger batch_size is better, but without GPU and memory, I can only set it to 75.

One technique is to use binary SWITCH to control program operation, 1 for training and 2 for loading ready-trained models. Then if SWITCH = 1, the ready-trained model is trained and then predicted, SWITCH = 2 is loaded into the ready-trained model and predicted, and SWITCH = 3 is loaded into the ready-trained model and predicted after training.

In [None]:
# Configurations
INPUT_PATH = ""
OUTPUT_PATH = ""
ORIGIN_DATA_SIZE = 12500
TARGET_DATA_SIZE = 12500
RATIO = 0.99
EPOCH = 15
BATCH_SIZE = 75
LOSS_FUNC = nn.BCELoss()
LR = 0.0001
SWITCH = 3
UPDATE = 5

## Part 6 完整代码

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
from PIL import Image
import pandas as pd
import numpy as np
import math
import time
import os
import gc

# Configurations
INPUT_PATH = ""
OUTPUT_PATH = ""
ORIGIN_DATA_SIZE = 12500
TARGET_DATA_SIZE = 12500
RATIO = 0.99
EPOCH = 15
BATCH_SIZE = 75
LOSS_FUNC = nn.BCELoss()
LR = 0.0001
SWITCH = 3
UPDATE = 5
TRANSFORM = transforms.Compose([transforms.Resize((256,256)),
                                transforms.RandomCrop((224,224)),
                                transforms.ToTensor(),
                                transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))
                               ])
PARAMETERS = ""

class ImageDataset(data.Dataset):
    def __init__(self, image_list, label_list):
        self.data = image_list
        self.label = label_list

    def __getitem__(self, index):
        global TRANSFORM
        img = Image.open(self.data[index])
        data = TRANSFORM(img)
        img.close()
        return data.cuda(),torch.cuda.FloatTensor([self.label[index]])

    def __len__(self):
        return len(self.data)

class VGG(nn.Module):
    def __init__(self, name="11"):
        super(VGG, self).__init__()
        self.name = "VGG"+name
        self.conv = nn.Sequential()
        i = 1; p = 1
        self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=  3, out_channels= 64, kernel_size=3, stride=1, padding=1))
        self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["13","16-1","16","19"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels= 64, out_channels= 64, kernel_size=3, stride=1, padding=1))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["11-LRN"]:
            self.conv.add_module('LRN',nn.LocalResponseNorm(size=2))
        self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1   # 224 -> 112

        self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels= 64, out_channels=128, kernel_size=3, stride=1, padding=1))
        self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["13","16-1","16","19"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1   # 112 -> 56

        self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1))
        self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
        self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["16","19"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["16-1"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1, padding=0))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["19"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1   # 56 -> 28

        self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=1, padding=1))
        self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
        self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["16","19"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["16-1"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=1, stride=1, padding=0))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["19"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1   # 28 -> 14

        self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
        self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
        self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["16","19"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["16-1"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=1, stride=1, padding=0))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        if name in ["19"]:
            self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
            self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
        self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1   # 14 -> 7

        self.fc = nn.Sequential(
            nn.Linear(512*7*7,4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096,4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096,1000),
            nn.ReLU(),
            nn.Linear(1000,1),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.shape[0], -1)
        x = self.fc(x)
        return x

def train(optimizer):
    global net
    global EPOCH
    global BATCH_SIZE
    global train_data
    global validate_data
    global PROGRAM_START
    print('['+net.name+'] with optimizer ['+str(type(optimizer))+']:')
    train_loader = data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
    validate_loader = data.DataLoader(validate_data, batch_size=1, shuffle=False, num_workers=0)
    BATCH = len(train_loader)
    m = len(validate_loader)
    for epoch in range(EPOCH):
        EPOCH_START = time.time()
        print("\tEpoch #{0}/{1}:".format(epoch+1,EPOCH))
        for batch,(x,y) in enumerate(train_loader):
            optimizer.zero_grad()
            t = net(x)
            loss = LOSS_FUNC(t,y)
            print("\t\tBatch #{0}/{1}: ".format(batch+1,BATCH) + "Loss = %.6f"%float(loss))
            loss.backward()
            optimizer.step()
        with torch.no_grad():
            L = 0.
            E = 0.
            for batch,(x,y) in enumerate(validate_loader):
                t = net(x)
                L += float(LOSS_FUNC(t,y))
                E += float((float(t[0][0])>0.5)!=y)
            print("\t  Validation Loss = %.6f. Error Rate = %.3f%%"%(L/m,E*100/m))
            if((epoch+1)%UPDATE==0):
                torch.save(net.state_dict(),OUTPUT_PATH+"/{0}[{1}]".format(net.name,epoch+1)+"-L(%.6f)E(%.3f).pt"%(L/m,E*100/m))
                torch.save(optimizer.state_dict(),OUTPUT_PATH+"/{0}[{1}]".format(net.name,epoch+1)+"-L(%.6f)E(%.3f)-optimizer.pt"%(L/m,E*100/m))
        print("\t  Finish epoch #{0}".format(epoch+1)+" in %.4f s."%(time.time()-EPOCH_START)+" Total Time Cost = %.4f s."%(time.time()-PROGRAM_START))

def run(filename):
    global net
    global test_data
    prediction = []
    test_loader = data.DataLoader(test_data, batch_size=1, shuffle=False, num_workers=0)
    with torch.no_grad():
        for i,(x,y) in enumerate(test_loader):
            t = net(x)
            prediction.append([i+1,float(t[0][0])])
    submission = pd.DataFrame(prediction)
    submission.columns = ['id','label']
    submission.to_csv(filename+".csv",index=0)

def load():
    np.random.seed(998244353)
    torch.manual_seed(998244353)
    image_list = []
    label_list = []
    for i in range(ORIGIN_DATA_SIZE):
        image_list.append(INPUT_PATH+"/train/cat.{0}.jpg".format(i))
        label_list.append(0)
        image_list.append(INPUT_PATH+"/train/dog.{0}.jpg".format(i))
        label_list.append(1)
    n = int(ORIGIN_DATA_SIZE*2*RATIO)
    train_data = ImageDataset(image_list[:n],label_list[:n])
    validate_data = ImageDataset(image_list[n:],label_list[n:])
    image_list = []
    for i in range(TARGET_DATA_SIZE):
        image_list.append(INPUT_PATH+"/test/{0}.jpg".format(i+1))
    test_data = ImageDataset(image_list,[0]*TARGET_DATA_SIZE)
    # np.random.seed()
    # torch.seed()
    return train_data,validate_data,test_data

print("*****Start")
PROGRAM_START = time.time()
train_data,validate_data,test_data = load()
print("Finish reading data in %.4f s."%(time.time()-PROGRAM_START))
net = VGG("19")
if SWITCH//2==1 and PARAMETERS!="":
    net.load_state_dict(torch.load(PARAMETERS+".pt"))
    print("Load Model ["+PARAMETERS+".pt] Success!")
net.cuda()
optimizer = optim.RMSprop(net.parameters(), lr=LR, alpha=0.9)
if SWITCH//2==1 and PARAMETERS!="":
    optimizer.load_state_dict(torch.load(PARAMETERS+"-optimizer.pt"))
    print("Load Optimizer ["+PARAMETERS+"-optimizer.pt] Success!")
LOSS_FUNC.cuda()
if SWITCH% 2==1:
    train(optimizer)
TRANSFORM = transforms.Compose([transforms.Resize((224,224)),
                                transforms.ToTensor(),
                                transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))
                               ])
run(OUTPUT_PATH+"/{0}".format(net.name))
print("*****Finish")

## Part 7 Training Results
![level5-1](level5-1.png)
![level5-2](level5-2.png)
There is no GPU avelable, and the memory is not enough, so at the beginning, only 2,000 training samples were selected, so there was a significant overfitting phenomenon.

The biggest feature of the training curve is overfitting. The accuracy on the training set increases linearly, approaching 100%, while the accuracy on the validation set is between 70% and 72%. Similarly, the loss on the training set decreases linearly to 0, while the loss on the validation set tends to increase after 5 epochs.

Later I thought of using a pre-trained network and slowly data augmentation to solve this problem.

This is the first time I used jupyter notebook to write deep learning code, the more I wrote, the more it like Readme. Please understand. If you don't understand the notebook, you can move to ``model.py``.

**pending upgrade. . .**