# Object Detection Pipeline using Remo

![remo_logo](assets/remo_normal.png)

In this tutorial, Remo will be used to accelerate the process of building a transfer learning pipeline for the task of Object Detection.

In [1]:
import sys
%load_ext autoreload
%autoreload 2
# Specify path to Remo
# Mac version
local_path_to_repo =  '/home/harsha/Documents/rediscovery/remo-python'
# Windows version
#local_path_to_repo =  'C:/Users/crows/Documents/GitHub/remo-python'

sys.path.insert(0, local_path_to_repo)

In [2]:
# Imports

from PIL import Image
import os
import glob
import random
import csv
random.seed(4)

import pandas as pd
import numpy as np
import tqdm

import torch
from torch.utils.data import DataLoader, Dataset

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
import torchvision.transforms as transforms


import remo
remo.set_viewer('jupyter')

## Adding Data to Remo
- The dataset used in this example is a subset of the [Open Images Dataset](https://storage.googleapis.com/openimages/web/index.html).

- The directory structure of the dataset is:

        ├── object_detection_dataset
            ├── images
                ├── image_1.jpg
                ├── image_2.jpg
                ├── ...
            ├── annotations
                ├── annotations.csv
                ├── model_predictions.csv


In [3]:
# The dataset will be extracted in a new folder
if not os.path.exists('object_detection_dataset.zip'):
    !wget https://s-3.s3-eu-west-1.amazonaws.com/object_detection_dataset.zip
    !unzip -qq object_detection_dataset.zip
else:
    print('Files already downloaded')

Files already downloaded


In [4]:
# The path to the folders
path_to_images =  './object_detection_dataset/images/'
path_to_annotations = './object_detection_dataset/annotations/'

annotations_file_path = os.path.join(path_to_annotations, 'annotations.csv')

To visualise the labels as strings rather than IDs, we can use a dictionary mapping the two of them.

In [5]:
# Mapping between Class name and Index
cat_to_index = {'Wheel'        : 1, 
                'Car'          : 2,
                'Person'       : 3, 
                'Land vehicle' : 4, 
                'Human body'   : 5, 
                'Plant'        : 6, 
                'Tire'         : 7, 
                'Vehicle'      : 8, 
                'Vehicle registration plate' : 9}

### Train / test split

In Remo, we can use tags to organise our images.
Among other things, this allows us to generate train / test splits without the need to move image files around.

To do this, we just need to pass a dictionary (mapping tags to the relevant images paths) to the function 
```remo.generate_image_tags()```.

In [6]:
im_list = [os.path.abspath(i) for i in glob.glob(path_to_images + '/**/*.jpg', recursive=True)]
im_list = random.sample(im_list, len(im_list))

# Definining the train test split
train_idx = round(len(im_list) * 0.6)
valid_idx = train_idx + round(len(im_list) * 0.2)
test_idx  = valid_idx + round(len(im_list) * 0.2)

# Creating a dictionary with tags
tags_dict =  {'train' : im_list[0:train_idx], 
              'valid' : im_list[train_idx:valid_idx], 
              'test' : im_list[valid_idx:test_idx]}

train_test_split_file_path = os.path.join(path_to_annotations, 'images_tags.csv') 
remo.generate_image_tags(tags_dictionary  = tags_dict, 
                         output_file_path = train_test_split_file_path, 
                         append_path = False)

'./object_detection_dataset/annotations/images_tags.csv'

### Create a dataset

To create a dataset we can use ```remo.create_dataset()```, specifying the path to data and annotations.

The class encoding (if required) is passed via a dictionary.

For a complete list of formats supported, you can <a href="https://remo.ai/docs/annotation-formats/"> refer to the docs</a>.


In [7]:
# The annotations.csv is generated in the same path of the sub-folder
object_detection_dataset =  remo.create_dataset(name = 'object_detection_dataset', 
                                                local_files = [path_to_images, path_to_annotations],
                                                annotation_task = 'Object Detection')

Acquiring data - completed                                                                           
Processing annotation files: 1 of 3 filesProcessing data - completed                                                                          
Data upload completed with some errors:
model_predictions.csv: Annotation format for Object detection not recognised for file 'model_predictions.csv'. Please check the documentation to see supported formats.


**Visualizing the dataset**

To view your data and labels using the Remo visual interface directly in the notebook, call the ```dataset.view()``` method.




In [8]:
object_detection_dataset.view()

Open http://localhost:8123/datasets/211


![dataset_view](assets/obj_dataset_view.png)

**Dataset Statistics**

Remo alleviates the need to write extra boilerplate for accessing dataset properties.

This can be done either using code, or via the visual interface.


In [9]:
object_detection_dataset.view_annotation_stats()

Open http://localhost:8123/annotation-detail/297/insights


![view_annotations_stats](assets/obj_view_annotations.png)

## Feeding Data into PyTorch

Here we start working with PyTorch. To load data, we will define a custom PyTorch ```Dataset``` object (as usual with PyTorch).

In order to adapt this to your dataset, the following are required:
- **Path to Tags:** path to tags csv file for Train, Test, Validation split. Format: file_name, tag
- **Path to Annotations:** Path to Annotations CSV File (Format : file_name, classes, xmin, ymin, xmax, ymax)
- **(Optional) Mapping:** a dictionary containing mapping of class name and class index. Format : {'class_name' : 'class_index'}


In [10]:
class ObjectDetectionDataset(Dataset):

    def __init__(self, annotations, train_test_valid_split, image_dir, mapping = None, mode = 'train', transform = None):
        self.image_dir = image_dir
        self.data = pd.read_csv(annotations)
        #ALTERNATIVE
        #self.data['file_name'] = self.data['file_name'].apply(lambda x : self.image_dir + x)
        self.data = self.data.set_index('file_name')
        self.train_test_valid_split = pd.read_csv(train_test_valid_split).set_index('file_name')
        self.data['tag'] = -1

        # Update tags using Pandas, Column file_name in self.data is compared to file_name in self.train_test_valid_split 
        
        self.data.update(self.train_test_valid_split)
        self.data = self.data.reset_index()
        
        self.mapping = mapping
        self.transform = transform
        self.mode = mode
        
        # Load only Train/Test/Split depending on the mode
        i_index = self.data['tag'] == self.mode
        self.data = self.data[i_index].reset_index(drop=True)
        
        self.file_names = self.data['file_name'].unique()

    def __len__(self) -> int:
        return self.file_names.shape[0]


    def __getitem__(self, index: int):

        file_name = self.file_names[index]
        records = self.data[self.data['file_name'] == file_name].reset_index()
        
        image = np.array(Image.open(f'{self.image_dir}/{file_name}'), dtype=np.float32)
        image /= 255.0
        

        if self.transform:
            image = self.transform(image)  
            
        if self.mode != 'test':
            boxes = records[['xmin', 'ymin', 'xmax', 'ymax']].values
            
            area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
            area = torch.as_tensor(area, dtype=torch.float32)

            if self.mapping is not None:
                labels = np.zeros((records.shape[0],))
            
                for i in range(records.shape[0]):
                    labels[i] = self.mapping[records.loc[i, 'classes']]
                    
                labels = torch.as_tensor(labels, dtype=torch.int64)
            
            else:
                labels = torch.ones((records.shape[0],), dtype=torch.int64)

            iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64)
            
            target = {}

            target['boxes'] = boxes
            target['labels'] = labels
            target['image_id'] = torch.tensor([index])
            target['area'] = area
            target['iscrowd'] = iscrowd 
            target['boxes'] = torch.stack(list((map(torch.tensor, target['boxes'])))).type(torch.float32)

            return image, target, file_name
        else:
            return image, file_name

def collate_fn(batch):
    return tuple(zip(*batch))


The train, test and validation datasets are instantiated and wrapped around a DataLoader method.



In [11]:
tensor_transform = transforms.Compose([transforms.ToTensor()])

train_dataset = ObjectDetectionDataset(annotations = annotations_file_path,  
                                       train_test_valid_split = train_test_split_file_path,
                                       image_dir = path_to_images, 
                                       transform = tensor_transform,
                                       mapping = cat_to_index,
                                       mode = 'train')

test_dataset = ObjectDetectionDataset(annotations = annotations_file_path,  
                                       train_test_valid_split = train_test_split_file_path, 
                                       image_dir = path_to_images,
                                       transform = tensor_transform,
                                       mapping = cat_to_index,
                                       mode = 'test')


train_data_loader = DataLoader(train_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)
test_data_loader  = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)

## Training the Model

The pre-trained ```Faster RCNN``` Model with the ```ResNet-50 Backbone``` is used in this tutorial.

To train the model, the following details are specified:

- **Model**: The edited version of the pre-trained model.
- **num_classes**: The number of classes present in your dataset (Eg: num_classes + 1 (background))
- **Optimizer:** The optimizer used for training the network
- **Num_epochs:** The number of epochs for which we would like to train the network.

In [14]:
device      = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
num_classes = 10
loss_value  = 0.0
num_epochs  = 5

In [15]:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

model.to(device)

params = [p for p in model.parameters() if p.requires_grad]

optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)

In [16]:
# The training loop trains the model for the total number of epochs.
# (1 epoch = one complete pass over the entire dataset)

for epoch in range(num_epochs):
    
    train_data_loader = tqdm.tqdm(train_data_loader)
    for images, targets, image_ids in train_data_loader:
        
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)

        losses = sum(loss for loss in loss_dict.values())
        loss_value = losses.item()

        optimizer.zero_grad()
        losses.backward()
        optimizer.step() 
    print('Training Loss : {:.5f}'.format(loss_value))

100%|██████████| 4/4 [00:32<00:00,  8.16s/it]
  0%|          | 0/4 [00:00<?, ?it/s]Training Loss : 0.26520
100%|██████████| 4/4 [00:33<00:00,  8.26s/it]
  0%|          | 0/4 [00:00<?, ?it/s]Training Loss : 0.17619
100%|██████████| 4/4 [00:33<00:00,  8.29s/it]
  0%|          | 0/4 [00:00<?, ?it/s]Training Loss : 0.28138
100%|██████████| 4/4 [00:33<00:00,  8.35s/it]
  0%|          | 0/4 [00:00<?, ?it/s]Training Loss : 0.17134
100%|██████████| 4/4 [00:33<00:00,  8.37s/it]Training Loss : 0.15984




## Visualizing Predictions

Using Remo, we can visualize predictions vs the original labels.

To do this we create a new AnnotationSet, and upload predictions as a csv file

In [18]:
# Mapping Between Predicted Index and Class Name
mapping = { value : key for (key, value) in cat_to_index.items()}

detection_threshold = 0.5
results = []

model.eval()
test_data_loader = tqdm.tqdm(test_data_loader)

with torch.no_grad():
    for images, image_ids in test_data_loader:

        images = list(image.to(device) for image in images)
        outputs = model(images)

        for i, image in enumerate(images):

            boxes = outputs[i]['boxes'].data.cpu().numpy()
            scores = outputs[i]['scores'].data.cpu().numpy()
            boxes = boxes[scores >= detection_threshold].astype(np.int32)
            scores = scores[scores >= detection_threshold]
            image_id = image_ids[i]
            
            for box, labels in zip(boxes, outputs[i]['labels']):
                results.append({'file_name' : os.path.basename(image_id), 
                                'classes'   : mapping[labels.item()], 
                                'xmin'      : box[0],
                                'ymin'      : box[1],
                                'xmax'      : box[2],
                                'ymax'      : box[3]})

model_predictions_path = path_to_annotations + 'model_predictions.csv'
            
with open(model_predictions_path, 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=['file_name', 'classes', 'xmin', 'ymin', 'xmax', 'ymax'])
    writer.writeheader()
    writer.writerows(results)



100%|██████████| 1/1 [00:02<00:00,  2.71s/it]


## Visualizing Predictions

Using Remo, we can visually compare the model predictions against the original labels.

To do this we create a new ```AnnotationSet```, and  upload predictions as a csv file.

In [23]:
predictions = object_detection_dataset.create_annotation_set(annotation_task='Object Detection', 
                                                             name = 'model_predictions',
                                                             paths_to_files = [train_test_split_file_path, model_predictions_path])

Progress 100% - 2/2 - elapsed 0:00:00.001000 - speed: 2000.00 img / s, ETA: 0:00:00
Acquiring data - completed                                                                           
Processing data - completed                                                                          
Data upload completed


In [24]:
object_detection_dataset.view()

Open http://localhost:8123/datasets/211


![visualize_predictions](assets/obj_visualize_results.png)