# Deep Learning : Final Project
## Topic: Bee Reidentification
Name: Raj Chovatiya

Matriculation no: 31595 

## Introduction

Reidentification is the problem to identify the individual detected object in current video frame or image with set of same objects. Currently, this problem is only limited to persons because there is more features to distinguish one person to anothor like color of cloths etc. But this task is more difficult for bees as most bees looks similar to people. So, this task is to try deep convolution network to see if it can cache the features to identify different bees from same hive.

Here, I am using [open-reid](https://github.com/Cysu/open-reid) which is library of person reidentification. It has prebuilt training,evatuation function and different types of loss metrics. It uses unified data interface which makes less error prone to work with. The data is generated using custom python scricpt which make compatible with open-reid. The data is originally from [here](https://groups.oist.jp/bptu/honeybee-tracking-dataset).

Here, I changed some files from open-reid repo that causes error at runtime. That can be imported seperately.

In [1]:
from dataset import Dataset
from reid import models
from reid.dist_metric import DistanceMetric
from reid.loss import TripletLoss
# from reid.trainers import Trainer
from trainers import Trainer
# from reid.evaluators import Evaluator
from reid.utils.data import transforms as T
from reid.utils.data.preprocessor import Preprocessor
from reid.utils.data.sampler import RandomIdentitySampler
from reid.utils.logging import Logger
from reid.utils.serialization import load_checkpoint, save_checkpoint
from reid.evaluation_metrics import accuracy

import numpy as np
import sys
import os
import torch
from torch import nn
from torch.backends import cudnn
from torch.utils.data import DataLoader
from torch.autograd import Variable
torch.cuda.empty_cache()

## Data Generation

The original data, downloaded from link above, contatins video, detected bounding box and trajectory of each bee in five parts. Then the data generation script [datamodulemaker.py](file://datamodulemaker.py) converts the data. The steps included:
- Extract the tar file
- Extract frames from videos and make images by OpenCV
- Crop the images and generate image for each idividual bee from one frame
- Make meta.json and splits.json

As shown in figure most of the time is consumed in imread of OpenCV.
![](./datamodulemaker_profiler_stat.PNG)

### Efficient Data Generator

The given script was very inefficient as it load each image for one bee and than again same images for another loop. So, I changed the algotithm to first load one image and make crop from that for all bees. This function The step includes (after extracting frames from video):
- Read json files of trajectories and store that data into numpy array.
- Load one image
- Loop through bee_id and search if that available in loaded image frame
- If bee_id available than crop the bounding box from given co-ordinates and save it.
- Increase bee_id by 1.

The improved result is shown in image.
![](./datamodulemaker_optimized_profiler_stat.PNG)


### Remaining problems in new algotithm:
- The bee_id goes ahead by 1 compare to data generated by original script after S3 trajectory 1983. (I do not know wether this will be problem)
- After some trajectories in S4 the process slows down but stil overall time reduced.

## Pre-processing

Pre-processing data includes:
- Reshape image
- Randomly flip image horizontaly
- Convert color image to grayscale
- Normalize image with mean and std 0.5

In [2]:
def get_data(dataset, batch_size=256, num_instances=4,
             workers=0, combine_trainval=True):

    train_set = dataset.trainval if combine_trainval else dataset.train
    num_classes = (dataset.num_trainval_ids if combine_trainval
                   else dataset.num_train_ids)

    train_transformer = T.Compose([
        T.RectScale(90, 90),
        T.RandomHorizontalFlip(),
        T.ToTensor(),
        T.Grayscale(num_output_channels=1),
        T.Normalize(mean=(0.5,), std=(0.5,)),
    ])

    test_transformer = T.Compose([
        T.RectScale(90, 90),
        T.ToTensor(),
        T.Grayscale(num_output_channels=1),
        T.Normalize(mean=(0.5,), std=(0.5,)),
    ])

    train_loader = DataLoader(
        Preprocessor(train_set, root=dataset.images_dir,
                     transform=train_transformer),
        batch_size=batch_size, num_workers=workers,
        shuffle=True, pin_memory=True, drop_last=True)

    val_loader = DataLoader(
        Preprocessor(dataset.val, root=dataset.images_dir,
                     transform=test_transformer),
        batch_size=batch_size, num_workers=workers,
        shuffle=False, pin_memory=True)

    test_loader = DataLoader(
        Preprocessor(list(set(dataset.query) | set(dataset.gallery)),
                     root=dataset.images_dir, transform=test_transformer),
        batch_size=batch_size, num_workers=workers,
        shuffle=False, pin_memory=True)

    return dataset, num_classes, train_loader, val_loader, test_loader

Helper functions to adjust learning rate and load checkpoint for resume training.

In [3]:
def adjust_lr(epoch):
        lr = 0.0002 if epoch <= 100 else 0.0002 * (0.001 ** ((epoch - 100) / 50.0))
        for g in optimizer.param_groups:
            g['lr'] = lr * g.get('lr_mult', 1)

def load_checkpoint(fpath):
    if os.path.isfile(fpath):
        checkpoint = torch.load(fpath)
        print("=> Loaded checkpoint '{}'".format(fpath))
        return checkpoint
    else:
        raise ValueError("=> No checkpoint found at '{}'".format(fpath))

In [4]:
seed = 1
np.random.seed(seed)
torch.manual_seed(seed)
cudnn.benchmark = True

In [5]:
bee_data = Dataset('C:/Users/rajch/DeepLearning/data/beeid_data/') # loads the data
bee_data.load()

Dataset dataset loaded
  subset   | # ids | # images
  ---------------------------
  train    |  2173 |   601313
  val      |   932 |   258933
  trainval |  3105 |   860246
  query    |    78 |    20508
  gallery  |   775 |   201285


In [6]:
dataset, num_classes, train_loader, val_loader, test_loader = get_data(bee_data)

## Network 

Here I use resnet50 as the base feature extractor, cross entropy loss and SGD as optimizer. The resnet network from pytorch only accepts color image of 3 channels. In our case images are gray scale so I added a convolution layer to convert 1 channel image to 3 channel. 

In [7]:
model = models.create('resnet50', num_features=128, dropout=0, num_classes=num_classes)
criterion = nn.CrossEntropyLoss().cuda()
# checkpoint_file = sorted(os.listdir('./checkpoint/'))[-1]
checkpoint_file = 'checkpoint_135.pth.tar' ## add checkpoint path to resume training
checkpoint = load_checkpoint(os.path.join('./checkpoint/', checkpoint_file))
model.load_state_dict(checkpoint['state_dict'])
model = nn.DataParallel(model).cuda()
start_epoch = checkpoint['epoch']

if hasattr(model.module, 'base'):
    base_param_ids = set(map(id, model.module.base.parameters()))
    new_params = [p for p in model.parameters() if
                    id(p) not in base_param_ids]
    param_groups = [
        {'params': model.module.base.parameters(), 'lr_mult': 0.1},
        {'params': new_params, 'lr_mult': 1.0}]
else:
    param_groups = model.parameters()
optimizer = torch.optim.SGD(param_groups, lr=0.0002,
                            momentum=0.9,
                            weight_decay=5e-4,
                                nesterov=True)
trainer = Trainer(model, criterion)

=> Loaded checkpoint './checkpoint/checkpoint_135.pth.tar'


## Training
Traing takes long time as data is very large. I have trained the network for 135 epoch. It took approximately 3 days with batch size 256. My laptop has 8 GB RAM, ryzen 7 4000 series, Nvidia RTX 2060 6 GB graphics card. Due to high load of gpu laptop sometimes crashes so I could not save training loss and accuracy history. It started with loss around 8 and precesion 0.00%. Fortunately I have a screenshot for some training steps.
![](./training.png)


In [8]:
for epoch in range(start_epoch, 150):
        adjust_lr(epoch)
        trainer.train(epoch, train_loader, optimizer, print_freq=1000)
        # if epoch < 20:
        #     continue
        # top1 = evaluator.evaluate(val_loader, dataset.val, dataset.val)

        # is_best = top1 > best_top1
        is_best = False
        # best_top1 = max(top1, best_top1)
        save_checkpoint({
            'state_dict': model.module.state_dict(),
            'epoch': epoch + 1,
        }, is_best, fpath=os.path.join('./checkpoint', 'checkpoint_'+ str(epoch+1) +'.pth.tar'))

Epoch: [135][1000/3360]	Time 0.533 (0.581)	Data 0.363 (0.400)	Loss 2.923 (2.879)	Prec 39.84% (43.79%)	
Epoch: [135][2000/3360]	Time 0.553 (0.567)	Data 0.370 (0.388)	Loss 3.122 (2.879)	Prec 38.67% (43.75%)	


KeyboardInterrupt: 

## Evaluation

Evaluation for open-reid can not run as it load whole extracted features of data on cpu so it can not be tested. Sometimes network gives negative output it cannot be possible. 

In [7]:
torch.cuda.empty_cache()
dataset, num_classes, _, val_loader, test_loader = get_data(bee_data, batch_size=32)
model = models.create('resnet50', num_features=128, dropout=0, num_classes=num_classes)
# evaluator = Evaluator(model)
metric = DistanceMetric(algorithm='euclidean')

checkpoint_file = 'checkpoint_126.pth.tar'
checkpoint = load_checkpoint(os.path.join('./checkpoint/', checkpoint_file))
model.load_state_dict(checkpoint['state_dict'])
model=nn.DataParallel(model).cuda()
loss = nn.CrossEntropyLoss().cuda()
def parse_data(inputs):
        imgs, _, pids, _ = inputs
        inputs = [Variable(imgs)]
        targets = Variable(pids.long().cuda())
        return inputs, targets

for i, inputs in enumerate(test_loader):
    inputs, targets = parse_data(inputs)
    output = model(*inputs)
    
    prec, = accuracy(output.data, targets.data)
    print('Precision {}'.format(prec), "Loss: {}".format(loss(output, targets)))

RuntimeError: CUDA error: device-side assert triggered