# Fellowship.ai Food-101 Challenge Report

Author: Shiyuan Duan

Date: 07/05/2020

## Part 1: Introduction
This report is a Fellowship.ai challenge. This report shows my knowlege in python and deeplearning, espicially in CV(Feel free to read my other repos that also demonstrates my interests in deep learning). 
The goal of this project is to construct an image classifier >85% accuracy to classify food images selected from 101 categories. In this report, I will explain the process of solving this challenging problem and demonstrate my thought process. I attempted transfer learning but only reached an accuracy of 82%. I then reached an accuracy of 85.49% by fine-tuning the entire model. I will include code in this report just for demonstration purposes. Please refer to other notebooks to see the outputs as rerunning the entire pipeline is too computationally expensive.

## Part 2: First attempt with transfer learning (Only reached 82% accuracy)
For this project, I am using a ResNet50 as suggested. ResNet50 is suitable for this problem because it is deep enough to extract complicated features, and it is capable of avoiding gradient vanishes/explode problems. Pytorch has ResNet50 pretrained on imagenet 1000 datasets. Therefore, my first attempt is to adopt a transfer learning method. I trained the model for 100 epochs and only reached an accuracy of 82%

### Step 1: Re-orgnize dataset
My first step is to re-orgnize the dataset by spliting images into train and test folders according to train.txt and test.txt

In [None]:
import os
import shutil
relative_path = './food-101/images/'
with open('food-101/meta/labels.txt','r') as lable_file:
    labels = lable_file.read().split('\n')
labels = [x for x in labels if x != '']

# Create train folder and move train images into it
for label in labels:
    label = label.lower().replace(' ', '_') 
    os.makedirs('./food-101/train/'+label)
    
with open('food-101/meta/train.txt','r') as train_file:
    train_img_dirs = train_file.read().split('\n')
train_img_dirs = [x+'.jpg' for x in train_img_dirs if x != '']

for img_dir in train_img_dirs:
    label = img_dir.split('/')[0]
    sourse = relative_path+img_dir
    dest = './food-101/train/'+label
    shutil.move(sourse, dest)
    
# Create test folder and move train images into it
with open('food-101/meta/test.txt','r') as train_file:
    train_img_dirs = train_file.read().split('\n')
    
train_img_dirs = [x+'.jpg' for x in train_img_dirs if x != '']
for img_dir in train_img_dirs:
    label = img_dir.split('/')[0]
    sourse = relative_path+img_dir
    dest = './food-101/test/'+label
    shutil.move(sourse, dest)

### Step 2: Load dataset and pretrained resnet50 to perform transfer learning

In [None]:
import torch
import torchvision.transforms as transforms
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
from torchvision import datasets, models, transforms
import torch.nn as nn

For data augmentation, I am using RandomResizedCrop, RandomHorizontalFlip, RandomVerticalFlip, RandomRotation, and RandomAffine. 

In [None]:
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.RandomVerticalFlip(),
        transforms.RandomRotation(45),
        transforms.RandomAffine(45),
        transforms.ColorJitter(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

Define datasets and dataloaders and other util parameters

In [None]:
train_set = datasets.ImageFolder('./food-101/train', data_transforms['train'])
test_set = datasets.ImageFolder('./food-101/test', data_transforms['test'])
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True, num_workers=4)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64, shuffle=True, num_workers=4)

class_names = train_set.classes
train_set_size = len(train_set)
test_set_size = len(test_set)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Download pretrained resnet50 and re-construct the fc layer.

In [None]:
resnet50 = models.resnet50(pretrained=True)
fc_input = resnet50.fc.in_features
resnet50.fc = nn.Sequential(
    nn.Linear(fc_input, 512),
    nn.ReLU(),
    nn.Linear(512, len(class_names))
)
# Set requires_grad True only to fc layers and freezing other layers 
for name, param in resnet50.named_parameters():
    param.requires_grad = 'fc' in name

resnet50 = resnet50.to(device)

Define hyperparameters optimizers, and learn rate scheduler

In [None]:
max_epoch = 20
criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.SGD(resnet50.parameters(), lr=0.01, momentum = 0.9)
exp_lr_scheduler = lr_scheduler.ReduceLROnPlateau(optimizer_ft, 'min', patience = 2)

Training Process

In [None]:
for epoch in range(max_epoch):
    #train
    resnet50.train()
    running_loss = 0.0
    running_corrects = 0
    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)
        optimizer_ft.zero_grad()
        
        output = resnet50(images)
        _, preds = torch.max(output, 1)
        loss = criterion(output, labels)
        
        loss.backward()
        optimizer_ft.step()
        
        running_loss += loss.item() * images.size(0)
        running_corrects += torch.sum(preds == labels.data)
        
    #eval
    resnet50.eval()
    test_running_loss = 0.0
    test_running_corrects = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)

            output = resnet50(images)
            _, preds = torch.max(output, 1)
            loss = criterion(output, labels)

            test_running_loss += loss.item() * images.size(0)
            test_running_corrects += torch.sum(preds == labels.data)
        
    epoch_loss = running_loss / train_set_size
    epoch_acc = running_corrects.double() / train_set_size
    print('epoch: {} Train_Loss: {:.4f} Train_Acc: {:.4f}'.format(epoch, epoch_loss, epoch_acc))
    print('test_acc: {:.4f}'.format(test_running_corrects.double()/test_set_size))

### Step 3: Result of transfer learning
Transfer learning only reached accuarcy of 82% before overfitting. In order to achive a better result. I decided to fine-tune the entire model. 

## Part 3: Second attempt: Fine-tuning
Transfer learning did not reach the desired accuracy. I suspect that the pre-trained feature extractor is not representative enough for the food-101 dataset. Therefore my second attempt is to fine-tune the model by training the entire pretrained model. 

Fine-tuning the model requires much more computational power. To avoid wasting time and computational power, I split the entire training pipeline into several checkpoints. After each checkpoint, I evaluate the model and make adjustments on data preprocessing and hyper-parameters. 

Initially, my idea was to fine-tune the last layer only because I believe that the first layers are just simple features like lines and edges. I reached an accuracy of 84% before over-fitting the model. I then unfroze the entire model and increased image resolution. Finally, I reached an accuracy of 85.49%, and the result is shown in the final-checkpoint

The code below shown how I train the model at each check points. The output shown at the last cell is obtained by training from checkpoint 5 and saved as checkpoint 6

In [None]:
import torch
import torchvision.transforms as transforms
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
from torchvision import datasets, models, transforms
import torch.nn as nn

import copy

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(448),
        transforms.RandomHorizontalFlip(),
        transforms.RandomVerticalFlip(),
        transforms.RandomRotation(45),
        transforms.RandomAffine(45),
        transforms.ColorJitter(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'test': transforms.Compose([
        transforms.Resize(512),
        transforms.CenterCrop(448),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}
train_set = datasets.ImageFolder('./food-101/train', data_transforms['train'])
test_set = datasets.ImageFolder('./food-101/test', data_transforms['test'])
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True, num_workers=2)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=32, shuffle=True, num_workers=2)


class_names = train_set.classes
train_set_size = len(train_set)
test_set_size = len(test_set)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


resnet50_cp5 = models.resnet50(pretrained = True)
resnet50_cp5.fc = nn.Sequential(
    nn.Linear(2048, 500),
    nn.ReLU(),
    nn.Linear(500, 101)
)
resnet50_cp5 = resnet50_cp5.to(device)


resnet50_cp5.load_state_dict(torch.load('PATH_TO_PREVIOUS_CHECK_POINT'))



In [6]:
## Last check point cell
max_epoch = 10
criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.SGD(resnet50_cp5.parameters(), lr=0.01, momentum = 0.9)
exp_lr_scheduler = lr_scheduler.ReduceLROnPlateau(optimizer_ft, 'min', patience = 5)

best_eval_acc = 0
best_eval_dict = copy.deepcopy(resnet50_cp5.state_dict())
while best_eval_acc <= 0.85:
    #train
    resnet50_cp5.train()
    running_loss = 0.0
    running_corrects = 0
    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)
        optimizer_ft.zero_grad()
        
        output = resnet50_cp5(images)
        _, preds = torch.max(output, 1)
        loss = criterion(output, labels)
        
        loss.backward()
        optimizer_ft.step()
        
        running_loss += loss.item() * images.size(0)
        running_corrects += torch.sum(preds == labels.data)
        
    #eval
    resnet50_cp5.eval()
    test_running_loss = 0.0
    test_running_corrects = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)
            optimizer_ft.zero_grad()

            output = resnet50_cp5(images)
            _, preds = torch.max(output, 1)
            loss = criterion(output, labels)

            test_running_loss += loss.item() * images.size(0)
            test_running_corrects += torch.sum(preds == labels.data)
            
    
    best_eval_acc = max(best_eval_acc, test_running_corrects.double()/test_set_size)    
    epoch_loss = running_loss / train_set_size
    epoch_acc = running_corrects.double() / train_set_size
    print('Loss: {:.4f} Acc: {:.4f}'.format(epoch_loss, epoch_acc))
    print('test: acc: {:.4f}, current best acc: {:.4f}'.format(test_running_corrects.double()/test_set_size, best_eval_acc))


Loss: 0.4609 Acc: 0.8739
test: acc: 0.8441, current best acc: 0.8441
Loss: 0.4547 Acc: 0.8751
test: acc: 0.8464, current best acc: 0.8464
Loss: 0.4442 Acc: 0.8791
test: acc: 0.8493, current best acc: 0.8493
Loss: 0.4449 Acc: 0.8791
test: acc: 0.8549, current best acc: 0.8549


## Conclusion
In this challenge, I successfully trained a resnet50 that reaches 85.49% accuracy on food-101 dataset. The trianing pipeline is:

1. transfer learning for 30 epoches
2. fine-tune for 50 epoches with 224*224 image resolution (acc: 82.00%)
3. fine-tune for 33 epoches with 512*512 image resolution (acc: 85.49%)

Total: 113 epoches trained.
If more time and computational power was granted. I would like to explore a better pipeline that can reach the same accuracy with fewer epoches.