<a href="https://colab.research.google.com/github/saidineshpola/mesh-transformer-jax/blob/master/queryInst_mmdet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![AIcrowd-Logo](https://raw.githubusercontent.com/AIcrowd/AIcrowd/master/app/assets/images/misc/aicrowd-horizontal.png)

# 🍕 Food Recognition Benchmark


# Problem Statement
Detecting & Segmenting various kinds of food from an image. For ex. Someone got into new restaurent and get a food that he has never seen, well our DL model is in rescue, so our DL model will help indentifying which food it is from the class our model is being trained on!    

<img src="https://i.imgur.com/zS2Nbf0.png" width="300" />


# Dataset
We will be using data from Food Recognition Challenge - A benchmark for image-based food recognition challange which is running since 2020.


https://www.aicrowd.com/challenges/food-recognition-benchmark-2022#datasets

We have a total of **39k training images** with **3k validation set** and **4k public-testing set**. All the images are RGB and annotations exist in **MS-COCO format**. 

<img src="https://lh5.googleusercontent.com/iySoTCAHFoEKxjvzELzCJKbZaTG2TzMcjuBxAlBVGupjkpE_XI1xNPnE71UIBthTu9_fZ4A1tz-ArABpI0DD2ZeF87qHPccRogEezd-UbhkQgZcQBYCE1HMeDusaKtj8ClCWjw-p">

<small>Reference: This notebook is based on the notebook created by [Shraddhaa Mohan](https://www.linkedin.com/in/shraddhaa-mohan-20a008185/) and [Rohit Midha](https://www.linkedin.com/in/rohitmidha/) for previous iteration of the challenge. You can find the [original notebook here](https://colab.research.google.com/drive/1vKAQ9D3dgubbBc2jGYGQB0-lZXlT8hTh#scrollTo=Dha6_NXmIzB9).</small>

In this Notebook, we will first do an analysis of the Food Recognition Dataset and then use maskrcnn for training on the dataset.

## The Challenge


*   Given Images of Food, we are asked to provide Instance Segmentation over the images for the food items.
*   The Training Data is provided in the COCO format, making it simpler to load with pre-available COCO data processors in popular libraries.
*   The test set provided in the public dataset is similar to Validation set, but with no annotations.
*   The test set after submission is much larger and contains private images upon which every submission is evaluated.
*   Pariticipants have to submit their trained model along with trained weights. Immediately after the submission the AICrowd Grader picks up the submitted model and produces inference on the private test set using Cloud GPUs.
*   This requires Users to structure their repositories and follow a provided paradigm for submission.


## The Notebook
> *  Installation of MMDetection
> *  Training a simple model with MMDetection
> *  Local Evaluation/Quick Submision using MMDetection
> * Active Submission using trained model


# GPU Check

Do a quick check if you have been allocated a GPU. 

If this command fails for you, please go to `Runtime` -> `Change Runtime Type` -> `Hardware Accelerator` -> `GPU`

In [1]:
!nvidia-smi

Wed Apr  6 10:11:40 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Setting our Workspace 💼

In this section we will be downloading our dataset, unzipping it & downloading mmdetection repo/library and importing all libraries that we will be using

In [2]:
# Login to AIcrowd
!pip install aicrowd-cli > /dev/null
#!aicrowd login

########## or ################
# Get your API key from https://www.aicrowd.com/participants/me
API_KEY = "61a473a8ff6ff34c77e7f9f8544ef7dd"
!aicrowd login --api-key $API_KEY

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.27.1 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
[32mAPI Key valid[0m
[33mGitlab oauth token invalid or absent.
It is highly recommended to simply run `aicrowd login` without passing the API Key.[0m
[32mSaved details successfully![0m


In [3]:
# List dataset for this challenge
!aicrowd dataset list -c food-recognition-benchmark-2022

# Download dataset
!aicrowd dataset download -c food-recognition-benchmark-2022 4 6 7

[3m                          Datasets for challenge #962                           [0m
┌───┬───────────────────────────────┬───────────────────────────────┬──────────┐
│[1;35m [0m[1;35m#[0m[1;35m [0m│[1;35m [0m[1;35mTitle                        [0m[1;35m [0m│[1;35m [0m[1;35mDescription                  [0m[1;35m [0m│[1;35m [0m[1;35m    Size[0m[1;35m [0m│
├───┼───────────────────────────────┼───────────────────────────────┼──────────┤
│ 0 │ random_prediction.json        │ Random prediction for Quick   │  4.36 MB │
│   │                               │ Submission into Round 2       │          │
│ 1 │ [Round 1]                     │ [Public] Testing Dataset      │     197M │
│   │ public_test_release_2.0.tar.… │ (contains 3000 images and 498 │          │
│   │                               │ categories, without           │          │
│   │                               │ annotations)                  │          │
│ 2 │ [Round 1]                     │ Training Dat

In [4]:
!mkdir -p data/ data/train data/val data/test
!echo "Extracting test dataset" && tar -xvf public_test_release_2.1.tar.gz -C data/test  > /dev/null
!echo "Extracting val dataset" && tar -xvf public_validation_set_release_2.1.tar.gz -C data/val  > /dev/null
!echo "Extracting train dataset" && tar -xvf public_training_set_release_2.1.tar.gz -C data/train  > /dev/null
!rm -r *.gz

Extracting test dataset
Extracting val dataset
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.quarantine'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.lastuseddate#PS'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.quarantine'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.lastuseddate#PS'
Extracting train dataset


In [None]:
!du -sh /content/data/train/

3.5G	/content/data/train/


## Mount the Google Drive

In [5]:
#alternatively copy files from drive

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Installation

In [6]:

import torch
TORCH_VERSION = torch.__version__.split("+")[0]
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)

#we have used torch version 1.10.0 and cuda 11.1 as it is preinstalled in this colab version
!pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/$CUDA_VERSION/torch$TORCH_VERSION/index.html
# If there is not yet a detectron2 release that matches the given torch + CUDA version, you need to install a different pytorch.
#don't forget to restart the runtime 

# Install mmdetection
!rm -rf mmdetection
#!git clone https://github.com/open-mmlab/mmdetection.git
!git clone https://github.com/hustvl/QueryInst.git mmdetection
%cd mmdetection

!pip install -e .

!pip install Pillow
!pip uninstall pycocotools -y
!pip install -q git+https://github.com/waleedka/coco.git#subdirectory=PythonAPI
%cd ..


torch:  1.10.0 ; cuda:  cu111
Looking in links: https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html
Collecting mmcv-full==1.4.0
  Downloading https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/mmcv_full-1.4.0-cp37-cp37m-manylinux1_x86_64.whl (58.0 MB)
[K     |████████████████████████████████| 58.0 MB 98 kB/s 
Collecting addict
  Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)
Collecting yapf
  Downloading yapf-0.32.0-py2.py3-none-any.whl (190 kB)
[K     |████████████████████████████████| 190 kB 4.2 MB/s 
Installing collected packages: yapf, addict, mmcv-full
Successfully installed addict-2.4.0 mmcv-full-1.4.0 yapf-0.32.0
Cloning into 'mmdetection'...
remote: Enumerating objects: 1281, done.[K
remote: Counting objects: 100% (1281/1281), done.[K
remote: Compressing objects: 100% (832/832), done.[K
remote: Total 1281 (delta 474), reused 1235 (delta 447), pack-reused 0[K
Receiving objects: 100% (1281/1281), 7.46 MiB | 6.60 MiB/s, done.
Resolving deltas: 1

#### **Note:** Before continuing restart runtime

To restart runtime : `Runtime` > `Restart Runtime`

## Imports

In [None]:
#%cd /content/

#Directories present
import numpy as np
import pandas as pd
import os
for dirname, _, filenames in os.walk('data/'):
        print(dirname)
import os
import sys
sys.path.append("mmdetection")
import time
import matplotlib
import matplotlib.pylab as plt
plt.rcParams["axes.grid"] = False

So, the `data` directory is something like this:

<img src="https://images.aicrowd.com/uploads/ckeditor/pictures/674/content_carbon__3_.png" width="50%">

## Reading Data

In [None]:
%cd ..
# For reading annotations file
import json
from pycocotools.coco import COCO

# Reading annotations.json
TRAIN_ANNOTATIONS_PATH = "data/train/annotations.json"
TRAIN_IMAGE_DIRECTIORY = "data/train/images/"

VAL_ANNOTATIONS_PATH = "data/val/annotations.json"
VAL_IMAGE_DIRECTIORY = "data/val/images/"

train_coco = COCO(TRAIN_ANNOTATIONS_PATH)

In [None]:
# Reading the annotation files
with open(TRAIN_ANNOTATIONS_PATH) as f:
  train_annotations_data = json.load(f)

with open(VAL_ANNOTATIONS_PATH) as f:
  val_annotations_data = json.load(f)
#train_annotations_data['annotations'][0]

## Data Format 🔍 

Our COCO data format is something like this -

```
"info": {...},
"categories": [...],
"images": [...],
"annotations": [...],
```

In which categories is like this
```
[
  {'id': 2578,
  'name': 'water',
  'name_readable': 'Water',
  'supercategory': 'food'},
  {'id': 1157,
  'name': 'pear',
  'name_readable': 'Pear',
  'supercategory': 'food'},
  ...
  {'id': 1190,
  'name': 'peach',
  'name_readable': 'Peach',
  'supercategory': 'food'}
]
```

Info is empty ( not sure why )

images is like this

```
[
  {'file_name': '065537.jpg', 
  'height': 464, 
  'id': 65537, 
  'width': 464},
  {'file_name': '065539.jpg', 
  'height': 464, 
  'id': 65539, 
  'width': 464},
 ...
  {'file_name': '069900.jpg', 
  'height': 391, 
  'id': 69900, 
  'width': 392},
]
```
Annotations is like this

```
{'area': 44320.0,
 'bbox': [86.5, 127.49999999999999, 286.0, 170.0],
 'category_id': 2578,
 'id': 102434,
 'image_id': 65537,
 'iscrowd': 0,
 'segmentation': [[235.99999999999997,
   372.5,
   169.0,
   372.5,
   ...
   368.5,
   264.0,
   371.5]]}
```


## Fixing the Data

In [7]:
#fix dataset
import numpy as np
import pandas as pd
import cv2
import json
from tqdm import tqdm

# Reading annotations.json
TRAIN_ANNOTATIONS_PATH = "data/train/annotations.json"
TRAIN_IMAGE_DIRECTIORY = "data/train/images/"

VAL_ANNOTATIONS_PATH = "data/val/annotations.json"
VAL_IMAGE_DIRECTIORY = "data/val/images/"

# train_coco = COCO(TRAIN_ANNOTATIONS_PATH)

# Reading the annotation files
with open(TRAIN_ANNOTATIONS_PATH) as f:
  train_annotations_data = json.load(f)

with open(VAL_ANNOTATIONS_PATH) as f:
  val_annotations_data = json.load(f)



# Function for taking a annotation & directiory of images and returning new annoation json with fixed image size info
def fix_data(annotations, directiory, VERBOSE = False):
  for n, i in enumerate(tqdm((annotations['images']))):
   
      img = cv2.imread(directiory+i["file_name"])
 
      if img.shape[0] != i['height']:
          annotations['images'][n]['height'] = img.shape[0]
          if VERBOSE:
            print(i["file_name"])
            print(annotations['images'][n], img.shape)

      if img.shape[1] != i['width']:
          annotations['images'][n]['width'] = img.shape[1]
          if VERBOSE:
            print(i["file_name"])
            print(annotations['images'][n], img.shape)

  return annotations

train_annotations_data = fix_data(train_annotations_data, TRAIN_IMAGE_DIRECTIORY)

with open('data/train/new_ann.json', 'w') as f:
    json.dump(train_annotations_data, f)

val_annotations_data = fix_data(val_annotations_data, VAL_IMAGE_DIRECTIORY)

with open('data/val/new_ann.json', 'w') as f:
    json.dump(val_annotations_data, f)

100%|██████████| 54392/54392 [05:14<00:00, 172.81it/s]
100%|██████████| 946/946 [00:05<00:00, 179.87it/s]


## Setting up hyperparameters

Modify the model configuration hyperparameters for our training

* Load the configuration files and modify them for our dataset.
* Set the desired hyperparameters as well
* Start training and logging

In [None]:
# You can add more model configs like below.
MODELS_CONFIG = {
    'mask_rcnn_swin-s': {
        'config_file': 'configs/swin/mask_rcnn_swin-s-p4-w7_fpn_fp16_ms-crop-3x_coco.py'
    }
}

# Pick the model you want to use
selected_model = 'mask_rcnn_swin-s' # chose any config you want from the MODELS_CONFIG

# Name of the config file.
config_file = MODELS_CONFIG[selected_model]['config_file']

config_fname = os.path.join('mmdetection', config_file)
assert os.path.isfile(config_fname), '`{}` not exist'.format(config_fname)
config_fname

'mmdetection/configs/swin/mask_rcnn_swin-s-p4-w7_fpn_fp16_ms-crop-3x_coco.py'

### Edit config

We will edit the config to be suited to the food dataset, there are a lot of parameters other than the ones we have changed below that one can edit in the existing config file that might lead to a better score. We leave that upto you, do feel free to explore documentation for [mmdetection](https://github.com/open-mmlab/mmdetection/tree/master/docs).

**Note:** Instead of using regular expressions to edit the existing file, feel free to download the config file and edit it using the text editor of your choice and then reupload the same and have the variable config_fname point to the same

In [None]:
# import re
# fname = config_fname
# with open(fname) as f:
#     s = f.read()

#     s = re.sub('num_classes=.*?,',
#                'num_classes={},'.format(len(classes_names)), s)

# with open(fname, 'w') as f:
#     f.write(s)
# #lets check if the changes have been updated
# !cat {config_fname}
# #print(len(classes_names))

In [None]:
# import re
# fname2 = 'mmdetection/configs/htc/htc_without_semantic_r50_fpn_1x_coco.py'

# with open(fname2) as f:
#     s = f.read()

#     s = re.sub('num_classes=.*?,',
#                'num_classes={},'.format(len(classes_names)), s)
# with open(fname2, 'w') as f:
#     f.write(s)    
   

In [None]:
import re
fname2 = 'mmdetection/configs/_base_/datasets/coco_instance.py'

with open(fname2) as f:
    s = f.read()
    s = re.sub("data_root = 'data/coco/'",
                "data_root = 'data/'", s)
    s = re.sub("annotations/instances_train2017.json",
                "train/new_ann.json", s)
    s = re.sub("annotations/instances_val2017.json",
                "val/new_ann.json", s)
    s = re.sub("annotations/instances_val2017.json",
                "val/new_ann.json", s)
    s = re.sub("train2017", "train/images", s)
    s = re.sub("val2017", "val/images", s)
    s = re.sub("workers_per_gpu=2","workers_per_gpu=0",s)
    s = re.sub("samples_per_gpu=2","samples_per_gpu=4",s) 
   

with open(fname2, 'w') as f:
    f.write(s)

#to check if the changes have been updated
# !cat {fname2}


total_epochs = 22
fname = 'mmdetection/configs/_base_/schedules/schedule_1x.py'
with open(fname) as f:
    s = f.read()
    s = re.sub('max_epochs=\d+',
               'max_epochs={}'.format(total_epochs), s)
    s = re.sub("lr=0.02","lr=0.0001",s)  #need to change lr to 0.0025 since we are working with only 1 gpu
with open(fname, 'w') as f:
    f.write(s)

### Just Run this

In [None]:
%%writefile mmdetection/mmdet/datasets/coco.py

#@title Don't forget to run this cell, Modify coco dataset mmdet file (set classes list) { display-mode: "form" }
# Copyright (c) OpenMMLab. All rights reserved.
import contextlib
import io
import itertools
import logging
import os.path as osp
import tempfile
import warnings
from collections import OrderedDict

import mmcv
import numpy as np
from mmcv.utils import print_log
from terminaltables import AsciiTable

from mmdet.core import eval_recalls
from .api_wrappers import COCO, COCOeval
from .builder import DATASETS
from .custom import CustomDataset


@DATASETS.register_module()
class CocoDataset(CustomDataset):

    CLASSES = ('bread-wholemeal', 'jam', 'water', 'bread-sourdough', 'banana', 'soft-cheese', 'ham-raw', 'hard-cheese', 'cottage-cheese', 'bread-half-white', 'coffee-with-caffeine', 'fruit-salad', 'pancakes', 'tea', 'salmon-smoked', 'avocado', 'spring-onion-scallion', 'ristretto-with-caffeine', 'ham', 'egg', 'bacon-frying', 'chips-french-fries', 'juice-apple', 'chicken', 'tomato-raw', 'broccoli', 'shrimp-boiled', 'beetroot-steamed-without-addition-of-salt', 'carrot-raw', 'chickpeas', 'french-salad-dressing', 'pasta-hornli', 'sauce-cream', 'meat-balls', 'pasta', 'tomato-sauce', 'cheese', 'pear', 'cashew-nut', 'almonds', 'lentils', 'mixed-vegetables', 'peanut-butter', 'apple', 'blueberries', 'cucumber', 'cocoa-powder', 'greek-yaourt-yahourt-yogourt-ou-yoghourt', 'maple-syrup-concentrate', 'buckwheat-grain-peeled', 'butter', 'herbal-tea', 'mayonnaise', 'soup-vegetable', 'wine-red', 'wine-white', 'green-bean-steamed-without-addition-of-salt', 'sausage', 'pizza-margherita-baked', 'salami', 'mushroom', 'bread-meat-substitute-lettuce-sauce', 'tart', 'tea-verveine', 'rice', 'white-coffee-with-caffeine', 'linseeds', 'sunflower-seeds', 'ham-cooked', 'bell-pepper-red-raw', 'zucchini', 'green-asparagus', 'tartar-sauce', 'lye-pretzel-soft', 'cucumber-pickled', 'curry-vegetarian', 'yaourt-yahourt-yogourt-ou-yoghourt-natural', 'soup-of-lentils-dahl-dhal', 'soup-cream-of-vegetables', 'balsamic-vinegar', 'salmon', 'salt-cake-vegetables-filled', 'bacon', 'orange', 'pasta-noodles', 'cream', 'cake-chocolate', 'pasta-spaghetti', 'black-olives', 'parmesan', 'spaetzle', 'salad-lambs-ear', 'salad-leaf-salad-green', 'potatoes-steamed', 'white-cabbage', 'halloumi', 'beetroot-raw', 'bread-grain', 'applesauce-unsweetened-canned', 'cheese-for-raclette', 'mushrooms', 'bread-white', 'curds-natural-with-at-most-10-fidm', 'bagel-without-filling', 'quiche-with-cheese-baked-with-puff-pastry', 'soup-potato', 'bouillon-vegetable', 'beef-sirloin-steak', 'taboule-prepared-with-couscous', 'eggplant', 'bread', 'turnover-with-meat-small-meat-pie-empanadas', 'mungbean-sprouts', 'mozzarella', 'pasta-penne', 'lasagne-vegetable-prepared', 'mandarine', 'kiwi', 'french-beans', 'tartar-meat', 'spring-roll-fried', 'pork-chop', 'caprese-salad-tomato-mozzarella', 'leaf-spinach', 'roll-of-half-white-or-white-flour-with-large-void', 'pasta-ravioli-stuffing', 'omelette-plain', 'tuna', 'dark-chocolate', 'sauce-savoury', 'dried-raisins', 'ice-tea', 'kaki', 'macaroon', 'smoothie', 'crepe-plain', 'chicken-nuggets', 'chili-con-carne-prepared', 'veggie-burger', 'cream-spinach', 'cod', 'chinese-cabbage', 'hamburger-bread-meat-ketchup', 'soup-pumpkin', 'sushi', 'chestnuts', 'coffee-decaffeinated', 'sauce-soya', 'balsamic-salad-dressing', 'pasta-twist', 'bolognaise-sauce', 'leek', 'fajita-bread-only', 'potato-gnocchi', 'beef-cut-into-stripes-only-meat', 'rice-noodles-vermicelli', 'tea-ginger', 'tea-green', 'bread-whole-wheat', 'onion', 'garlic', 'hummus', 'pizza-with-vegetables-baked', 'beer', 'glucose-drink-50g', 'chicken-wing', 'ratatouille', 'peanut', 'high-protein-pasta-made-of-lentils-peas', 'cauliflower', 'quiche-with-spinach-baked-with-cake-dough', 'green-olives', 'brazil-nut', 'eggplant-caviar', 'bread-pita', 'pasta-wholemeal', 'sauce-pesto', 'oil', 'couscous', 'sauce-roast', 'prosecco', 'crackers', 'bread-toast', 'shrimp-prawn-small', 'panna-cotta', 'romanesco', 'water-with-lemon-juice', 'espresso-with-caffeine', 'egg-scrambled-prepared', 'juice-orange', 'ice-cubes', 'braided-white-loaf', 'emmental-cheese', 'croissant-wholegrain', 'hazelnut-chocolate-spread-nutella-ovomaltine-caotina', 'tomme', 'water-mineral', 'hazelnut', 'bacon-raw', 'bread-nut', 'black-forest-tart', 'soup-miso', 'peach', 'figs', 'beef-filet', 'mustard-dijon', 'rice-basmati', 'mashed-potatoes-prepared-with-full-fat-milk-with-butter', 'dumplings', 'pumpkin', 'swiss-chard', 'red-cabbage', 'spinach-raw', 'naan-indien-bread', 'chicken-curry-cream-coconut-milk-curry-spices-paste', 'crunch-muesli', 'biscuits', 'bread-french-white-flour', 'meatloaf', 'fresh-cheese', 'honey', 'vegetable-mix-peas-and-carrots', 'parsley', 'brownie', 'dairy-ice-cream', 'tea-black', 'carrot-cake', 'fish-fingers-breaded', 'salad-dressing', 'dried-meat', 'chicken-breast', 'mixed-salad-chopped-without-sauce', 'feta', 'praline', 'tea-peppermint', 'walnut', 'potato-salad-with-mayonnaise-yogurt-dressing', 'kebab-in-pita-bread', 'kolhrabi', 'alfa-sprouts', 'brussel-sprouts', 'bacon-cooking', 'gruyere', 'bulgur', 'grapes', 'pork-escalope', 'chocolate-egg-small', 'cappuccino', 'zucchini-stewed-without-addition-of-fat-without-addition-of-salt', 'crisp-bread-wasa', 'bread-black', 'perch-fillets-lake', 'rosti', 'mango', 'sandwich-ham-cheese-and-butter', 'muesli', 'spinach-steamed-without-addition-of-salt', 'fish', 'risotto-without-cheese-cooked', 'milk-chocolate-with-hazelnuts', 'cake-oblong', 'crisps', 'pork', 'pomegranate', 'sweet-corn-canned', 'flakes-oat', 'greek-salad', 'cantonese-fried-rice', 'sesame-seeds', 'bouillon', 'baked-potato', 'fennel', 'meat', 'bread-olive', 'croutons', 'philadelphia', 'mushroom-average-stewed-without-addition-of-fat-without-addition-of-salt', 'bell-pepper-red-stewed-without-addition-of-fat-without-addition-of-salt', 'white-chocolate', 'mixed-nuts', 'breadcrumbs-unspiced', 'fondue', 'sauce-mushroom', 'tea-spice', 'strawberries', 'tea-rooibos', 'pie-plum-baked-with-cake-dough', 'potatoes-au-gratin-dauphinois-prepared', 'capers', 'vegetables', 'bread-wholemeal-toast', 'red-radish', 'fruit-tart', 'beans-kidney', 'sauerkraut', 'mustard', 'country-fries', 'ketchup', 'pasta-linguini-parpadelle-tagliatelle', 'chicken-cut-into-stripes-only-meat', 'cookies', 'sun-dried-tomatoe', 'bread-ticino', 'semi-hard-cheese', 'margarine', 'porridge-prepared-with-partially-skimmed-milk', 'soya-drink-soy-milk', 'juice-multifruit', 'popcorn-salted', 'chocolate-filled', 'milk-chocolate', 'bread-fruit', 'mix-of-dried-fruits-and-nuts', 'corn', 'tete-de-moine', 'dates', 'pistachio', 'celery', 'white-radish', 'oat-milk', 'cream-cheese', 'bread-rye', 'witloof-chicory', 'apple-crumble', 'goat-cheese-soft', 'grapefruit-pomelo', 'risotto-with-mushrooms-cooked', 'blue-mould-cheese', 'biscuit-with-butter', 'guacamole', 'pecan-nut', 'tofu', 'cordon-bleu-from-pork-schnitzel-fried', 'paprika-chips', 'quinoa', 'kefir-drink', 'm-m-s', 'salad-rocket', 'bread-spelt', 'pizza-with-ham-with-mushrooms-baked', 'fruit-coulis', 'plums', 'beef-minced-only-meat', 'pizza-with-ham-baked', 'pineapple', 'soup-tomato', 'cheddar', 'tea-fruit', 'rice-jasmin', 'seeds', 'focaccia', 'milk', 'coleslaw-chopped-without-sauce', 'pastry-flaky', 'curd', 'savoury-puff-pastry-stick', 'sweet-potato', 'chicken-leg', 'croissant', 'sour-cream', 'ham-turkey', 'processed-cheese', 'fruit-compotes', 'cheesecake', 'pasta-tortelloni-stuffing', 'sauce-cocktail', 'croissant-with-chocolate-filling', 'pumpkin-seeds', 'artichoke', 'champagne', 'grissini', 'sweets-candies', 'brie', 'wienerli-swiss-sausage', 'syrup-diluted-ready-to-drink', 'apple-pie', 'white-bread-with-butter-eggs-and-milk', 'savoury-puff-pastry', 'anchovies', 'tuna-in-oil-drained', 'lemon-pie', 'meat-terrine-pate', 'coriander', 'falafel-balls', 'berries', 'latte-macchiato-with-caffeine', 'faux-mage-cashew-vegan-chers', 'beans-white', 'sugar-melon', 'mixed-seeds', 'hamburger', 'hamburger-bun', 'oil-vinegar-salad-dressing', 'soya-yaourt-yahourt-yogourt-ou-yoghourt', 'chocolate-milk-chocolate-drink', 'celeriac', 'chocolate-mousse', 'cenovis-yeast-spread', 'thickened-cream-35', 'meringue', 'lamb-chop', 'shrimp-prawn-large', 'beef', 'lemon', 'croque-monsieur', 'chives', 'chocolate-cookies', 'birchermuesli-prepared-no-sugar-added', 'fish-crunchies-battered', 'muffin', 'savoy-cabbage-steamed-without-addition-of-salt', 'pine-nuts', 'chorizo', 'chia-grains', 'frying-sausage', 'french-pizza-from-alsace-baked', 'chocolate', 'cooked-sausage', 'grits-polenta-maize-flour', 'gummi-bears-fruit-jellies-jelly-babies-with-fruit-essence', 'wine-rose', 'coca-cola', 'raspberries', 'roll-with-pieces-of-chocolate', 'goat-average-raw', 'lemon-cake', 'coconut-milk', 'rice-wild', 'gluten-free-bread', 'pearl-onions', 'buckwheat-pancake', 'bread-5-grain', 'light-beer', 'sugar-glazing', 'tzatziki', 'butter-herb', 'ham-croissant', 'corn-crisps', 'lentils-green-du-puy-du-berry', 'cocktail', 'rice-whole-grain', 'veal-sausage', 'cervelat', 'sorbet', 'aperitif-with-alcohol-aperol-spritz', 'dips', 'corn-flakes', 'peas', 'tiramisu', 'apricots', 'cake-marble', 'lamb', 'lasagne-meat-prepared', 'coca-cola-zero', 'cake-salted', 'dough-puff-pastry-shortcrust-bread-pizza-dough', 'rice-waffels', 'sekt', 'brioche', 'vegetable-au-gratin-baked', 'mango-dried', 'processed-meat-charcuterie', 'mousse', 'sauce-sweet-sour', 'basil', 'butter-spread-puree-almond', 'pie-apricot-baked-with-cake-dough', 'rusk-wholemeal', 'beef-roast', 'vanille-cream-cooked-custard-creme-dessert', 'pasta-in-conch-form', 'nuts', 'sauce-carbonara', 'fig-dried', 'pasta-in-butterfly-form-farfalle', 'minced-meat', 'carrot-steamed-without-addition-of-salt', 'ebly', 'damson-plum', 'shoots', 'bouquet-garni', 'coconut', 'banana-cake', 'waffle', 'apricot-dried', 'sauce-curry', 'watermelon-fresh', 'sauce-sweet-salted-asian', 'pork-roast', 'blackberry', 'smoked-cooked-sausage-of-pork-and-beef-meat-sausag', 'bean-seeds', 'italian-salad-dressing', 'white-asparagus', 'pie-rhubarb-baked-with-cake-dough', 'tomato-stewed-without-addition-of-fat-without-addition-of-salt', 'cherries', 'nectarine')

    def load_annotations(self, ann_file):
        """Load annotation from COCO style annotation file.

        Args:
            ann_file (str): Path of annotation file.

        Returns:
            list[dict]: Annotation info from COCO api.
        """

        self.coco = COCO('./data/train/new_ann.json')
        # The order of returned `cat_ids` will not
        # change with the order of the CLASSES
        self.cat_ids = self.coco.getCatIds()

        self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}
        self.img_ids = self.coco.getImgIds()
        data_infos = []
        total_ann_ids = []
        for i in self.img_ids:
            info = self.coco.load_imgs([i])[0]
            info['filename'] = info['file_name']
            data_infos.append(info)
            ann_ids = self.coco.get_ann_ids(img_ids=[i])
            total_ann_ids.extend(ann_ids)
        assert len(set(total_ann_ids)) == len(
            total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!"
        return data_infos

    def get_ann_info(self, idx):
        """Get COCO annotation by index.

        Args:
            idx (int): Index of data.

        Returns:
            dict: Annotation info of specified index.
        """

        img_id = self.data_infos[idx]['id']
        ann_ids = self.coco.get_ann_ids(img_ids=[img_id])
        ann_info = self.coco.load_anns(ann_ids)
        return self._parse_ann_info(self.data_infos[idx], ann_info)

    def getCatIds(self, idx):
        """Get COCO category ids by index.

        Args:
            idx (int): Index of data.

        Returns:
            list[int]: All categories in the image of specified index.
        """

        img_id = self.data_infos[idx]['id']
        ann_ids = self.coco.get_ann_ids(img_ids=[img_id])
        ann_info = self.coco.load_anns(ann_ids)
        return [ann['category_id'] for ann in ann_info]

    def _filter_imgs(self, min_size=32):
        """Filter images too small or without ground truths."""
        valid_inds = []
        # obtain images that contain annotation
        ids_with_ann = set(_['image_id'] for _ in self.coco.anns.values())
        # obtain images that contain annotations of the required categories
        ids_in_cat = set()
        for i, class_id in enumerate(self.cat_ids):
            ids_in_cat |= set(self.coco.cat_img_map[class_id])
        # merge the image id sets of the two conditions and use the merged set
        # to filter out images if self.filter_empty_gt=True
        ids_in_cat &= ids_with_ann

        valid_img_ids = []
        for i, img_info in enumerate(self.data_infos):
            img_id = self.img_ids[i]
            if self.filter_empty_gt and img_id not in ids_in_cat:
                continue
            if min(img_info['width'], img_info['height']) >= min_size:
                valid_inds.append(i)
                valid_img_ids.append(img_id)
        self.img_ids = valid_img_ids
        return valid_inds

    def _parse_ann_info(self, img_info, ann_info):
        """Parse bbox and mask annotation.

        Args:
            ann_info (list[dict]): Annotation info of an image.
            with_mask (bool): Whether to parse mask annotations.

        Returns:
            dict: A dict containing the following keys: bboxes, bboxes_ignore,\
                labels, masks, seg_map. "masks" are raw annotations and not \
                decoded into binary masks.
        """
        gt_bboxes = []
        gt_labels = []
        gt_bboxes_ignore = []
        gt_masks_ann = []
        for i, ann in enumerate(ann_info):
            if ann.get('ignore', False):
                continue
            x1, y1, w, h = ann['bbox']
            inter_w = max(0, min(x1 + w, img_info['width']) - max(x1, 0))
            inter_h = max(0, min(y1 + h, img_info['height']) - max(y1, 0))
            if inter_w * inter_h == 0:
                continue
            if ann['area'] <= 0 or w < 1 or h < 1:
                continue
            if ann['category_id'] not in self.cat_ids:
                continue
            bbox = [x1, y1, x1 + w, y1 + h]
            if ann.get('iscrowd', False):
                gt_bboxes_ignore.append(bbox)
            else:
                gt_bboxes.append(bbox)
                gt_labels.append(self.cat2label[ann['category_id']])
                gt_masks_ann.append(ann.get('segmentation', None))

        if gt_bboxes:
            gt_bboxes = np.array(gt_bboxes, dtype=np.float32)
            gt_labels = np.array(gt_labels, dtype=np.int64)
        else:
            gt_bboxes = np.zeros((0, 4), dtype=np.float32)
            gt_labels = np.array([], dtype=np.int64)

        if gt_bboxes_ignore:
            gt_bboxes_ignore = np.array(gt_bboxes_ignore, dtype=np.float32)
        else:
            gt_bboxes_ignore = np.zeros((0, 4), dtype=np.float32)

        seg_map = img_info['filename'].replace('jpg', 'png')

        ann = dict(
            bboxes=gt_bboxes,
            labels=gt_labels,
            bboxes_ignore=gt_bboxes_ignore,
            masks=gt_masks_ann,
            seg_map=seg_map)

        return ann

    def xyxy2xywh(self, bbox):
        """Convert ``xyxy`` style bounding boxes to ``xywh`` style for COCO
        evaluation.

        Args:
            bbox (numpy.ndarray): The bounding boxes, shape (4, ), in
                ``xyxy`` order.

        Returns:
            list[float]: The converted bounding boxes, in ``xywh`` order.
        """

        _bbox = bbox.tolist()
        return [
            _bbox[0],
            _bbox[1],
            _bbox[2] - _bbox[0],
            _bbox[3] - _bbox[1],
        ]

    def _proposal2json(self, results):
        """Convert proposal results to COCO json style."""
        json_results = []
        for idx in range(len(self)):
            img_id = self.img_ids[idx]
            bboxes = results[idx]
            for i in range(bboxes.shape[0]):
                data = dict()
                data['image_id'] = img_id
                data['bbox'] = self.xyxy2xywh(bboxes[i])
                data['score'] = float(bboxes[i][4])
                data['category_id'] = 1
                json_results.append(data)
        return json_results

    def _det2json(self, results):
        """Convert detection results to COCO json style."""
        json_results = []
        for idx in range(len(self)):
            img_id = self.img_ids[idx]
            result = results[idx]
            for label in range(len(result)):
                bboxes = result[label]
                for i in range(bboxes.shape[0]):
                    data = dict()
                    data['image_id'] = img_id
                    data['bbox'] = self.xyxy2xywh(bboxes[i])
                    data['score'] = float(bboxes[i][4])
                    data['category_id'] = self.cat_ids[label]
                    json_results.append(data)
        return json_results

    def _segm2json(self, results):
        """Convert instance segmentation results to COCO json style."""
        bbox_json_results = []
        segm_json_results = []
        for idx in range(len(self)):
            img_id = self.img_ids[idx]
            det, seg = results[idx]
            for label in range(len(det)):
                # bbox results
                bboxes = det[label]
                for i in range(bboxes.shape[0]):
                    data = dict()
                    data['image_id'] = img_id
                    data['bbox'] = self.xyxy2xywh(bboxes[i])
                    data['score'] = float(bboxes[i][4])
                    data['category_id'] = self.cat_ids[label]
                    bbox_json_results.append(data)

                # segm results
                # some detectors use different scores for bbox and mask
                if isinstance(seg, tuple):
                    segms = seg[0][label]
                    mask_score = seg[1][label]
                else:
                    segms = seg[label]
                    mask_score = [bbox[4] for bbox in bboxes]
                for i in range(bboxes.shape[0]):
                    data = dict()
                    data['image_id'] = img_id
                    data['bbox'] = self.xyxy2xywh(bboxes[i])
                    data['score'] = float(mask_score[i])
                    data['category_id'] = self.cat_ids[label]
                    if isinstance(segms[i]['counts'], bytes):
                        segms[i]['counts'] = segms[i]['counts'].decode()
                    data['segmentation'] = segms[i]
                    segm_json_results.append(data)
        return bbox_json_results, segm_json_results

    def results2json(self, results, outfile_prefix):
        """Dump the detection results to a COCO style json file.

        There are 3 types of results: proposals, bbox predictions, mask
        predictions, and they have different data types. This method will
        automatically recognize the type, and dump them to json files.

        Args:
            results (list[list | tuple | ndarray]): Testing results of the
                dataset.
            outfile_prefix (str): The filename prefix of the json files. If the
                prefix is "somepath/xxx", the json files will be named
                "somepath/xxx.bbox.json", "somepath/xxx.segm.json",
                "somepath/xxx.proposal.json".

        Returns:
            dict[str: str]: Possible keys are "bbox", "segm", "proposal", and \
                values are corresponding filenames.
        """
        result_files = dict()
        if isinstance(results[0], list):
            json_results = self._det2json(results)
            result_files['bbox'] = f'{outfile_prefix}.bbox.json'
            result_files['proposal'] = f'{outfile_prefix}.bbox.json'
            mmcv.dump(json_results, result_files['bbox'])
        elif isinstance(results[0], tuple):
            json_results = self._segm2json(results)
            result_files['bbox'] = f'{outfile_prefix}.bbox.json'
            result_files['proposal'] = f'{outfile_prefix}.bbox.json'
            result_files['segm'] = f'{outfile_prefix}.segm.json'
            mmcv.dump(json_results[0], result_files['bbox'])
            mmcv.dump(json_results[1], result_files['segm'])
        elif isinstance(results[0], np.ndarray):
            json_results = self._proposal2json(results)
            result_files['proposal'] = f'{outfile_prefix}.proposal.json'
            mmcv.dump(json_results, result_files['proposal'])
        else:
            raise TypeError('invalid type of results')
        return result_files

    def fast_eval_recall(self, results, proposal_nums, iou_thrs, logger=None):
        gt_bboxes = []
        for i in range(len(self.img_ids)):
            ann_ids = self.coco.get_ann_ids(img_ids=self.img_ids[i])
            ann_info = self.coco.load_anns(ann_ids)
            if len(ann_info) == 0:
                gt_bboxes.append(np.zeros((0, 4)))
                continue
            bboxes = []
            for ann in ann_info:
                if ann.get('ignore', False) or ann['iscrowd']:
                    continue
                x1, y1, w, h = ann['bbox']
                bboxes.append([x1, y1, x1 + w, y1 + h])
            bboxes = np.array(bboxes, dtype=np.float32)
            if bboxes.shape[0] == 0:
                bboxes = np.zeros((0, 4))
            gt_bboxes.append(bboxes)

        recalls = eval_recalls(
            gt_bboxes, results, proposal_nums, iou_thrs, logger=logger)
        ar = recalls.mean(axis=1)
        return ar

    def format_results(self, results, jsonfile_prefix=None, **kwargs):
        """Format the results to json (standard format for COCO evaluation).

        Args:
            results (list[tuple | numpy.ndarray]): Testing results of the
                dataset.
            jsonfile_prefix (str | None): The prefix of json files. It includes
                the file path and the prefix of filename, e.g., "a/b/prefix".
                If not specified, a temp file will be created. Default: None.

        Returns:
            tuple: (result_files, tmp_dir), result_files is a dict containing \
                the json filepaths, tmp_dir is the temporal directory created \
                for saving json files when jsonfile_prefix is not specified.
        """
        assert isinstance(results, list), 'results must be a list'
        assert len(results) == len(self), (
            'The length of results is not equal to the dataset len: {} != {}'.
            format(len(results), len(self)))

        if jsonfile_prefix is None:
            tmp_dir = tempfile.TemporaryDirectory()
            jsonfile_prefix = osp.join(tmp_dir.name, 'results')
        else:
            tmp_dir = None
        result_files = self.results2json(results, jsonfile_prefix)
        return result_files, tmp_dir

    def evaluate(self,
                 results,
                 metric='bbox',
                 logger=None,
                 jsonfile_prefix=None,
                 classwise=False,
                 proposal_nums=(100, 300, 1000),
                 iou_thrs=None,
                 metric_items=None):
        """Evaluation in COCO protocol.

        Args:
            results (list[list | tuple]): Testing results of the dataset.
            metric (str | list[str]): Metrics to be evaluated. Options are
                'bbox', 'segm', 'proposal', 'proposal_fast'.
            logger (logging.Logger | str | None): Logger used for printing
                related information during evaluation. Default: None.
            jsonfile_prefix (str | None): The prefix of json files. It includes
                the file path and the prefix of filename, e.g., "a/b/prefix".
                If not specified, a temp file will be created. Default: None.
            classwise (bool): Whether to evaluating the AP for each class.
            proposal_nums (Sequence[int]): Proposal number used for evaluating
                recalls, such as recall@100, recall@1000.
                Default: (100, 300, 1000).
            iou_thrs (Sequence[float], optional): IoU threshold used for
                evaluating recalls/mAPs. If set to a list, the average of all
                IoUs will also be computed. If not specified, [0.50, 0.55,
                0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95] will be used.
                Default: None.
            metric_items (list[str] | str, optional): Metric items that will
                be returned. If not specified, ``['AR@100', 'AR@300',
                'AR@1000', 'AR_s@1000', 'AR_m@1000', 'AR_l@1000' ]`` will be
                used when ``metric=='proposal'``, ``['mAP', 'mAP_50', 'mAP_75',
                'mAP_s', 'mAP_m', 'mAP_l']`` will be used when
                ``metric=='bbox' or metric=='segm'``.

        Returns:
            dict[str, float]: COCO style evaluation metric.
        """

        metrics = metric if isinstance(metric, list) else [metric]
        allowed_metrics = ['bbox', 'segm', 'proposal', 'proposal_fast']
        for metric in metrics:
            if metric not in allowed_metrics:
                raise KeyError(f'metric {metric} is not supported')
        if iou_thrs is None:
            iou_thrs = np.linspace(
                .5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
        if metric_items is not None:
            if not isinstance(metric_items, list):
                metric_items = [metric_items]

        result_files, tmp_dir = self.format_results(results, jsonfile_prefix)

        eval_results = OrderedDict()
        cocoGt = self.coco
        for metric in metrics:
            msg = f'Evaluating {metric}...'
            if logger is None:
                msg = '\n' + msg
            print_log(msg, logger=logger)

            if metric == 'proposal_fast':
                ar = self.fast_eval_recall(
                    results, proposal_nums, iou_thrs, logger='silent')
                log_msg = []
                for i, num in enumerate(proposal_nums):
                    eval_results[f'AR@{num}'] = ar[i]
                    log_msg.append(f'\nAR@{num}\t{ar[i]:.4f}')
                log_msg = ''.join(log_msg)
                print_log(log_msg, logger=logger)
                continue

            iou_type = 'bbox' if metric == 'proposal' else metric
            if metric not in result_files:
                raise KeyError(f'{metric} is not in results')
            try:
                predictions = mmcv.load(result_files[metric])
                if iou_type == 'segm':
                    # Refer to https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/coco.py#L331  # noqa
                    # When evaluating mask AP, if the results contain bbox,
                    # cocoapi will use the box area instead of the mask area
                    # for calculating the instance area. Though the overall AP
                    # is not affected, this leads to different
                    # small/medium/large mask AP results.
                    for x in predictions:
                        x.pop('bbox')
                    warnings.simplefilter('once')
                    warnings.warn(
                        'The key "bbox" is deleted for more accurate mask AP '
                        'of small/medium/large instances since v2.12.0. This '
                        'does not change the overall mAP calculation.',
                        UserWarning)
                cocoDt = cocoGt.loadRes(predictions)
            except IndexError:
                print_log(
                    'The testing results of the whole dataset is empty.',
                    logger=logger,
                    level=logging.ERROR)
                break

            cocoEval = COCOeval(cocoGt, cocoDt, iou_type)
            cocoEval.params.catIds = self.cat_ids
            cocoEval.params.imgIds = self.img_ids
            cocoEval.params.maxDets = list(proposal_nums)
            cocoEval.params.iouThrs = iou_thrs
            # mapping of cocoEval.stats
            coco_metric_names = {
                'mAP': 0,
                'mAP_50': 1,
                'mAP_75': 2,
                'mAP_s': 3,
                'mAP_m': 4,
                'mAP_l': 5,
                'AR@100': 6,
                'AR@300': 7,
                'AR@1000': 8,
                'AR_s@1000': 9,
                'AR_m@1000': 10,
                'AR_l@1000': 11
            }
            if metric_items is not None:
                for metric_item in metric_items:
                    if metric_item not in coco_metric_names:
                        raise KeyError(
                            f'metric item {metric_item} is not supported')

            if metric == 'proposal':
                cocoEval.params.useCats = 0
                cocoEval.evaluate()
                cocoEval.accumulate()

                # Save coco summarize print information to logger
                redirect_string = io.StringIO()
                with contextlib.redirect_stdout(redirect_string):
                    cocoEval.summarize()
                print_log('\n' + redirect_string.getvalue(), logger=logger)

                if metric_items is None:
                    metric_items = [
                        'AR@100', 'AR@300', 'AR@1000', 'AR_s@1000',
                        'AR_m@1000', 'AR_l@1000'
                    ]

                for item in metric_items:
                    val = float(
                        f'{cocoEval.stats[coco_metric_names[item]]:.3f}')
                    eval_results[item] = val
            else:
                cocoEval.evaluate()
                cocoEval.accumulate()

                # Save coco summarize print information to logger
                redirect_string = io.StringIO()
                with contextlib.redirect_stdout(redirect_string):
                    cocoEval.summarize()
                print_log('\n' + redirect_string.getvalue(), logger=logger)

                if classwise:  # Compute per-category AP
                    # Compute per-category AP
                    # from https://github.com/facebookresearch/detectron2/
                    precisions = cocoEval.eval['precision']
                    # precision: (iou, recall, cls, area range, max dets)
                    assert len(self.cat_ids) == precisions.shape[2]

                    results_per_category = []
                    for idx, catId in enumerate(self.cat_ids):
                        # area range index 0: all area ranges
                        # max dets index -1: typically 100 per image
                        nm = self.coco.loadCats(catId)[0]
                        precision = precisions[:, :, idx, 0, -1]
                        precision = precision[precision > -1]
                        if precision.size:
                            ap = np.mean(precision)
                        else:
                            ap = float('nan')
                        results_per_category.append(
                            (f'{nm["name"]}', f'{float(ap):0.3f}'))

                    num_columns = min(6, len(results_per_category) * 2)
                    results_flatten = list(
                        itertools.chain(*results_per_category))
                    headers = ['category', 'AP'] * (num_columns // 2)
                    results_2d = itertools.zip_longest(*[
                        results_flatten[i::num_columns]
                        for i in range(num_columns)
                    ])
                    table_data = [headers]
                    table_data += [result for result in results_2d]
                    table = AsciiTable(table_data)
                    print_log('\n' + table.table, logger=logger)

                if metric_items is None:
                    metric_items = [
                        'mAP', 'mAP_50', 'mAP_75', 'mAP_s', 'mAP_m', 'mAP_l'
                    ]

                for metric_item in metric_items:
                    key = f'{metric}_{metric_item}'
                    val = float(
                        f'{cocoEval.stats[coco_metric_names[metric_item]]:.3f}'
                    )
                    eval_results[key] = val
                ap = cocoEval.stats[:6]
                eval_results[f'{metric}_mAP_copypaste'] = (
                    f'{ap[0]:.3f} {ap[1]:.3f} {ap[2]:.3f} {ap[3]:.3f} '
                    f'{ap[4]:.3f} {ap[5]:.3f}')
        if tmp_dir is not None:
            tmp_dir.cleanup()
        return eval_results


Overwriting mmdetection/mmdet/datasets/coco.py


## Resume Experiment or Start a new training

In [None]:
import re
#if you want to continue experiment from your last checkpoint, set the RESUME to True and paste the model path in model_path variable,
#don't forget to use the same architecture/parameters in above config

RESUME = True
if RESUME:
  model_path = "'../input/epoch5/epoch_5.pth'"
  fname = 'mmdetection/configs/_base_/default_runtime.py'
  with open(fname) as f:
    
      s = f.read()
      s = re.sub('load_from = None','load_from = {}'.format(model_path), s)
      #s = re.sub('load_from = None','resume_from = {}'.format(model_path), s)

      s = re.sub(r'CLASSES = \(.*?\)',"EMPTY",s)

  with open(fname, 'w') as f:
      f.write(s)


## My_config

In [2]:
%%writefile query_large.py

dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
CLASSES = ['beetroot-steamed-without-addition-of-salt', 'bread_wholemeal', 'jam', 'water', 'bread', 'banana', 'soft_cheese', 'ham_raw', 'hard_cheese', 'cottage_cheese', 'coffee', 'fruit_mixed', 'pancake', 'tea', 'salmon_smoked', 'avocado', 'spring_onion_scallion', 'ristretto_with_caffeine', 'ham_n_s', 'egg', 'bacon', 'chips_french_fries', 'juice_apple', 'chicken', 'tomato', 'broccoli', 'shrimp_prawn', 'carrot', 'chickpeas', 'french_salad_dressing', 'pasta_hornli_ch', 'sauce_cream', 'pasta_n_s', 'tomato_sauce', 'cheese_n_s', 'pear', 'cashew_nut', 'almonds', 'lentil_n_s', 'mixed_vegetables', 'peanut_butter', 'apple', 'blueberries', 'cucumber', 'yogurt', 'butter', 'mayonnaise', 'soup', 'wine_red', 'wine_white', 'green_bean_steamed_without_addition_of_salt', 'sausage', 'pizza_margherita_baked', 'salami_ch', 'mushroom', 'tart_n_s', 'rice', 'white_coffee', 'sunflower_seeds', 'bell_pepper_red_raw', 'zucchini', 'asparagus', 'tartar_sauce', 'lye_pretzel_soft', 'cucumber_pickled_ch', 'curry_vegetarian', 'soup_of_lentils_dahl_dhal', 'salmon', 'salt_cake_ch_vegetables_filled', 'orange', 'pasta_noodles', 'cream_double_cream_heavy_cream_45', 'cake_chocolate', 'pasta_spaghetti', 'black_olives', 'parmesan', 'spaetzle', 'salad_lambs_ear', 'salad_leaf_salad_green', 'potato', 'white_cabbage', 'halloumi', 'beetroot_raw', 'bread_grain', 'applesauce', 'cheese_for_raclette_ch', 'bread_white', 'curds_natural', 'quiche', 'beef_n_s', 'taboule_prepared_with_couscous', 'aubergine_eggplant', 'mozzarella', 'pasta_penne', 'lasagne_vegetable_prepared', 'mandarine', 'kiwi', 'french_beans', 'spring_roll_fried', 'caprese_salad_tomato_mozzarella', 'leaf_spinach', 'roll_of_half_white_or_white_flour_with_large_void', 'omelette_with_flour_thick_crepe_plain', 'tuna', 'dark_chocolate', 'sauce_savoury_n_s', 'raisins_dried', 'ice_tea_on_black_tea_basis', 'kaki', 'smoothie', 'crepe_with_flour_plain', 'nuggets', 'chili_con_carne_prepared', 'veggie_burger', 'chinese_cabbage', 'hamburger', 'soup_pumpkin', 'sushi', 'chestnuts_ch', 'sauce_soya', 'balsamic_salad_dressing', 'pasta_twist', 'bolognaise_sauce', 'leek', 'fajita_bread_only', 'potato_gnocchi', 'rice_noodles_vermicelli', 'bread_whole_wheat', 'onion', 'garlic', 'hummus', 'pizza_with_vegetables_baked', 'beer', 'glucose_drink_50g', 'ratatouille', 'peanut', 'cauliflower', 'green_olives', 'bread_pita', 'pasta_wholemeal', 'sauce_pesto', 'couscous', 'sauce', 'bread_toast', 'water_with_lemon_juice', 'espresso', 'egg_scrambled', 'juice_orange', 'braided_white_loaf_ch', 'emmental_cheese_ch', 'hazelnut_chocolate_spread_nutella_ovomaltine_caotina', 'tomme_ch', 'hazelnut', 'peach', 'figs', 'mashed_potatoes_prepared_with_full_fat_milk_with_butter', 'pumpkin', 'swiss_chard', 'red_cabbage_raw', 'spinach_raw', 'chicken_curry_cream_coconut_milk_curry_spices_paste', 'crunch_muesli', 'biscuit', 'meatloaf_ch', 'fresh_cheese_n_s', 'honey', 'vegetable_mix_peas_and_carrots', 'parsley', 'brownie', 'ice_cream_n_s', 'salad_dressing', 'dried_meat_n_s', 'chicken_breast', 'mixed_salad_chopped_without_sauce', 'feta', 'praline_n_s', 'walnut', 'potato_salad', 'kolhrabi', 'alfa_sprouts', 'brussel_sprouts', 'gruyere_ch', 'bulgur', 'grapes', 'chocolate_egg_small', 'cappuccino', 'crisp_bread', 'bread_black', 'rosti_n_s', 'mango', 'muesli_dry', 'spinach', 'fish_n_s', 'risotto', 'crisps_ch', 'pork_n_s', 'pomegranate', 'sweet_corn', 'flakes', 'greek_salad', 'sesame_seeds', 'bouillon', 'baked_potato', 'fennel', 'meat_n_s', 'croutons', 'bell_pepper_red_stewed', 'nuts', 'breadcrumbs_unspiced', 'fondue', 'sauce_mushroom', 'strawberries', 'pie_plum_baked_with_cake_dough', 'potatoes_au_gratin_dauphinois_prepared', 'capers', 'bread_wholemeal_toast', 'red_radish', 'fruit_tart', 'beans_kidney', 'sauerkraut', 'mustard', 'country_fries', 'ketchup', 'pasta_linguini_parpadelle_tagliatelle', 'chicken_cut_into_stripes_only_meat', 'cookies', 'sun_dried_tomatoe', 'bread_ticino_ch', 'semi_hard_cheese', 'porridge_prepared_with_partially_skimmed_milk', 'juice', 'chocolate_milk', 'bread_fruit', 'corn', 'dates', 'pistachio', 'cream_cheese_n_s', 'bread_rye', 'witloof_chicory', 'goat_cheese_soft', 'grapefruit_pomelo', 'blue_mould_cheese', 'guacamole', 'tofu', 'cordon_bleu', 'quinoa', 'kefir_drink', 'salad_rocket', 'pizza_with_ham_with_mushrooms_baked', 'fruit_coulis', 'plums', 'pizza_with_ham_baked', 'pineapple', 'seeds_n_s', 'focaccia', 'mixed_milk_beverage', 'coleslaw_chopped_without_sauce', 'sweet_potato', 'chicken_leg', 'croissant', 'cheesecake', 'sauce_cocktail', 'croissant_with_chocolate_filling', 'pumpkin_seeds', 'artichoke', 'soft_drink_with_a_taste', 'apple_pie', 'white_bread_with_butter_eggs_and_milk', 'savoury_pastry_stick', 'tuna_in_oil_drained', 'meat_terrine_pate', 'falafel_balls', 'berries_n_s', 'latte_macchiato', 'sugar_melon_galia_honeydew_cantaloupe', 'mixed_seeds_n_s', 'oil_vinegar_salad_dressing', 'celeriac', 'chocolate_mousse', 'lemon', 'chocolate_cookies', 'birchermuesli_prepared_no_sugar_added', 'muffin', 'pine_nuts', 'french_pizza_from_alsace_baked', 'chocolate_n_s', 'grits_polenta_maize_flour', 'wine_rose', 'cola_based_drink', 'raspberries', 'roll_with_pieces_of_chocolate', 'cake_lemon', 'rice_wild', 'gluten_free_bread', 'pearl_onion', 'tzatziki', 'ham_croissant_ch', 'corn_crisps', 'lentils_green_du_puy_du_berry', 'rice_whole_grain', 'cervelat_ch', 'aperitif_with_alcohol_n_s_aperol_spritz', 'peas', 'tiramisu', 'apricots', 'lasagne_meat_prepared', 'brioche', 'vegetable_au_gratin_baked', 'basil', 'butter_spread_puree_almond', 'pie_apricot', 'rusk_wholemeal', 'pasta_in_conch_form', 'pasta_in_butterfly_form_farfalle', 'damson_plum', 'shoots_n_s', 'coconut', 'banana_cake', 'sauce_curry', 'watermelon_fresh', 'white_asparagus', 'cherries', 'nectarine']

data = dict(
    samples_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
        type='CocoDataset',
        ann_file='data/train/new_ann.json',
        img_prefix='data/train/images/',

        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='AutoAugment',
                policies=[[{
                    'type': 'Resize',
                    'img_scale': [(400, 1333), (1200, 1333)],
                    'multiscale_mode': 'range',
                    'keep_ratio': True
                }],
                          [{
                              'type': 'Resize',
                              'img_scale': [(400, 1333), (500, 1333),
                                            (600, 1333)],
                              'multiscale_mode': 'value',
                              'keep_ratio': True
                          }, {
                              'type': 'RandomCrop',
                              'crop_type': 'absolute_range',
                              'crop_size': (384, 600),
                              'allow_negative_crop': True
                          }, {
                              'type': 'Resize',
                              'img_scale': [(400, 1333), (1200, 1333)],
                              'multiscale_mode': 'range',
                              'override': True,
                              'keep_ratio': True
                          }]]),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ],classes=CLASSES),
    val=dict(
        type='CocoDataset',
        ann_file='data/val/new_ann.json',
        img_prefix='data/val/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],classes=CLASSES),
    test=dict(
        type='CocoDataset',
        ann_file='data/val/new_ann.json',
        img_prefix='data/val/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(metric=['segm'])
optimizer = dict(
    type='AdamW',
    lr=2.5e-05,
    weight_decay=0.0001,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
# optimizer_config = dict(
#     grad_clip=dict(max_norm=1, norm_type=2),
#     type='DistOptimizerHook',
#     update_interval=1,
#     coalesce=True,
#     bucket_size_mb=-1,
#     use_fp16=False)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[12])
runner = dict(type='EpochBasedRunner', max_epochs=22)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = '/content/drive/MyDrive/queryinst_swin_large_patch4_window7_fpn_300_queries-832c5813.pth' #None
resume_from = None
workflow = [('train', 1)]
num_stages = 6
num_proposals = 300
model = dict(
    type='QueryInst',
    pretrained=None,
    backbone=dict(
        type='SwinTransformer',
        embed_dim=192,
        depths=[2, 2, 18, 2],
        num_heads=[6, 12, 24, 48],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.3,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='FPN',
        in_channels=[192, 384, 768, 1536],
        out_channels=256,
        start_level=0,
        add_extra_convs='on_input',
        num_outs=4),
    rpn_head=dict(
        type='EmbeddingRPNHead',
        num_proposals=300,
        proposal_feature_channel=256),
    roi_head=dict(
        type='QueryRoIHead',
        num_stages=6,
        stage_loss_weights=[1, 1, 1, 1, 1, 1],
        proposal_feature_channel=256,
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='DIIHead',
                num_classes=323,
                num_ffn_fcs=2,
                num_heads=8,
                num_cls_fcs=1,
                num_reg_fcs=3,
                feedforward_channels=2048,
                in_channels=256,
                dropout=0.0,
                ffn_act_cfg=dict(type='ReLU', inplace=True),
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=7,
                    with_proj=True,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                loss_bbox=dict(type='L1Loss', loss_weight=5.0),
                loss_iou=dict(type='GIoULoss', loss_weight=2.0),
                loss_cls=dict(
                    type='FocalLoss',
                    use_sigmoid=True,
                    gamma=2.0,
                    alpha=0.25,
                    loss_weight=2.0),
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    clip_border=False,
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.5, 0.5, 1.0, 1.0])),
            dict(
                type='DIIHead',
                num_classes=323,
                num_ffn_fcs=2,
                num_heads=8,
                num_cls_fcs=1,
                num_reg_fcs=3,
                feedforward_channels=2048,
                in_channels=256,
                dropout=0.0,
                ffn_act_cfg=dict(type='ReLU', inplace=True),
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=7,
                    with_proj=True,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                loss_bbox=dict(type='L1Loss', loss_weight=5.0),
                loss_iou=dict(type='GIoULoss', loss_weight=2.0),
                loss_cls=dict(
                    type='FocalLoss',
                    use_sigmoid=True,
                    gamma=2.0,
                    alpha=0.25,
                    loss_weight=2.0),
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    clip_border=False,
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.5, 0.5, 1.0, 1.0])),
            dict(
                type='DIIHead',
                num_classes=323,
                num_ffn_fcs=2,
                num_heads=8,
                num_cls_fcs=1,
                num_reg_fcs=3,
                feedforward_channels=2048,
                in_channels=256,
                dropout=0.0,
                ffn_act_cfg=dict(type='ReLU', inplace=True),
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=7,
                    with_proj=True,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                loss_bbox=dict(type='L1Loss', loss_weight=5.0),
                loss_iou=dict(type='GIoULoss', loss_weight=2.0),
                loss_cls=dict(
                    type='FocalLoss',
                    use_sigmoid=True,
                    gamma=2.0,
                    alpha=0.25,
                    loss_weight=2.0),
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    clip_border=False,
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.5, 0.5, 1.0, 1.0])),
            dict(
                type='DIIHead',
                num_classes=323,
                num_ffn_fcs=2,
                num_heads=8,
                num_cls_fcs=1,
                num_reg_fcs=3,
                feedforward_channels=2048,
                in_channels=256,
                dropout=0.0,
                ffn_act_cfg=dict(type='ReLU', inplace=True),
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=7,
                    with_proj=True,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                loss_bbox=dict(type='L1Loss', loss_weight=5.0),
                loss_iou=dict(type='GIoULoss', loss_weight=2.0),
                loss_cls=dict(
                    type='FocalLoss',
                    use_sigmoid=True,
                    gamma=2.0,
                    alpha=0.25,
                    loss_weight=2.0),
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    clip_border=False,
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.5, 0.5, 1.0, 1.0])),
            dict(
                type='DIIHead',
                num_classes=323,
                num_ffn_fcs=2,
                num_heads=8,
                num_cls_fcs=1,
                num_reg_fcs=3,
                feedforward_channels=2048,
                in_channels=256,
                dropout=0.0,
                ffn_act_cfg=dict(type='ReLU', inplace=True),
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=7,
                    with_proj=True,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                loss_bbox=dict(type='L1Loss', loss_weight=5.0),
                loss_iou=dict(type='GIoULoss', loss_weight=2.0),
                loss_cls=dict(
                    type='FocalLoss',
                    use_sigmoid=True,
                    gamma=2.0,
                    alpha=0.25,
                    loss_weight=2.0),
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    clip_border=False,
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.5, 0.5, 1.0, 1.0])),
            dict(
                type='DIIHead',
                num_classes=323,
                num_ffn_fcs=2,
                num_heads=8,
                num_cls_fcs=1,
                num_reg_fcs=3,
                feedforward_channels=2048,
                in_channels=256,
                dropout=0.0,
                ffn_act_cfg=dict(type='ReLU', inplace=True),
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=7,
                    with_proj=True,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                loss_bbox=dict(type='L1Loss', loss_weight=5.0),
                loss_iou=dict(type='GIoULoss', loss_weight=2.0),
                loss_cls=dict(
                    type='FocalLoss',
                    use_sigmoid=True,
                    gamma=2.0,
                    alpha=0.25,
                    loss_weight=2.0),
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    clip_border=False,
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.5, 0.5, 1.0, 1.0]))
        ],
        mask_head=[
            dict(
                type='DynamicMaskHead',
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=14,
                    with_proj=False,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
               
                dropout=0.0,
                num_convs=4,
                 num_classes=323,
                roi_feat_size=14,
                in_channels=256,
                conv_kernel_size=3,
                conv_out_channels=256,
                class_agnostic=False,
                norm_cfg=dict(type='BN'),
                upsample_cfg=dict(type='deconv', scale_factor=2),
                loss_dice=dict(type='DiceLoss', loss_weight=8.0)),
            dict(
                type='DynamicMaskHead',
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=14,
                    with_proj=False,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                
                dropout=0.0,
                num_convs=4,
                 num_classes=323,
                roi_feat_size=14,
                in_channels=256,
                conv_kernel_size=3,
                conv_out_channels=256,
                class_agnostic=False,
                norm_cfg=dict(type='BN'),
                upsample_cfg=dict(type='deconv', scale_factor=2),
                loss_dice=dict(type='DiceLoss', loss_weight=8.0)),
            dict(
                type='DynamicMaskHead',
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=14,
                    with_proj=False,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                
                dropout=0.0,
                num_convs=4,
                 num_classes=323,
                roi_feat_size=14,
                in_channels=256,
                conv_kernel_size=3,
                conv_out_channels=256,
                class_agnostic=False,
                norm_cfg=dict(type='BN'),
                upsample_cfg=dict(type='deconv', scale_factor=2),
                loss_dice=dict(type='DiceLoss', loss_weight=8.0)),
            dict(
                type='DynamicMaskHead',
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=14,
                    with_proj=False,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
               
                dropout=0.0,
                num_convs=4,
                 num_classes=323,
                roi_feat_size=14,
                in_channels=256,
                conv_kernel_size=3,
                conv_out_channels=256,
                class_agnostic=False,
                norm_cfg=dict(type='BN'),
                upsample_cfg=dict(type='deconv', scale_factor=2),
                loss_dice=dict(type='DiceLoss', loss_weight=8.0)),
            dict(
                type='DynamicMaskHead',
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=14,
                    with_proj=False,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
               
                dropout=0.0,
                num_convs=4,
                 num_classes=323,
                roi_feat_size=14,
                in_channels=256,
                conv_kernel_size=3,
                conv_out_channels=256,
                class_agnostic=False,
                norm_cfg=dict(type='BN'),
                upsample_cfg=dict(type='deconv', scale_factor=2),
                loss_dice=dict(type='DiceLoss', loss_weight=8.0)),
            dict(
                type='DynamicMaskHead',
                dynamic_conv_cfg=dict(
                    type='DynamicConv',
                    in_channels=256,
                    feat_channels=64,
                    out_channels=256,
                    input_feat_shape=14,
                    with_proj=False,
                    act_cfg=dict(type='ReLU', inplace=True),
                    norm_cfg=dict(type='LN')),
                dropout=0.0,
                num_convs=4,
                 num_classes=323,
                roi_feat_size=14,
                in_channels=256,
                conv_kernel_size=3,
                conv_out_channels=256,
                class_agnostic=False,
                norm_cfg=dict(type='BN'),
                upsample_cfg=dict(type='deconv', scale_factor=2),
                loss_dice=dict(type='DiceLoss', loss_weight=8.0))
        ]),
    train_cfg=dict(
        rpn=None,
        rcnn=[
            dict(
                assigner=dict(
                    type='HungarianAssigner',
                    cls_cost=dict(type='FocalLossCost', weight=2.0),
                    reg_cost=dict(type='BBoxL1Cost', weight=5.0),
                    iou_cost=dict(type='IoUCost', iou_mode='giou',
                                  weight=2.0)),
                sampler=dict(type='PseudoSampler'),
                pos_weight=1,
                mask_size=28,
                debug=False),
            dict(
                assigner=dict(
                    type='HungarianAssigner',
                    cls_cost=dict(type='FocalLossCost', weight=2.0),
                    reg_cost=dict(type='BBoxL1Cost', weight=5.0),
                    iou_cost=dict(type='IoUCost', iou_mode='giou',
                                  weight=2.0)),
                sampler=dict(type='PseudoSampler'),
                pos_weight=1,
                mask_size=28,
                debug=False),
            dict(
                assigner=dict(
                    type='HungarianAssigner',
                    cls_cost=dict(type='FocalLossCost', weight=2.0),
                    reg_cost=dict(type='BBoxL1Cost', weight=5.0),
                    iou_cost=dict(type='IoUCost', iou_mode='giou',
                                  weight=2.0)),
                sampler=dict(type='PseudoSampler'),
                pos_weight=1,
                mask_size=28,
                debug=False),
            dict(
                assigner=dict(
                    type='HungarianAssigner',
                    cls_cost=dict(type='FocalLossCost', weight=2.0),
                    reg_cost=dict(type='BBoxL1Cost', weight=5.0),
                    iou_cost=dict(type='IoUCost', iou_mode='giou',
                                  weight=2.0)),
                sampler=dict(type='PseudoSampler'),
                pos_weight=1,
                mask_size=28,
                debug=False),
            dict(
                assigner=dict(
                    type='HungarianAssigner',
                    cls_cost=dict(type='FocalLossCost', weight=2.0),
                    reg_cost=dict(type='BBoxL1Cost', weight=5.0),
                    iou_cost=dict(type='IoUCost', iou_mode='giou',
                                  weight=2.0)),
                sampler=dict(type='PseudoSampler'),
                pos_weight=1,
                mask_size=28,
                debug=False),
            dict(
                assigner=dict(
                    type='HungarianAssigner',
                    cls_cost=dict(type='FocalLossCost', weight=2.0),
                    reg_cost=dict(type='BBoxL1Cost', weight=5.0),
                    iou_cost=dict(type='IoUCost', iou_mode='giou',
                                  weight=2.0)),
                sampler=dict(type='PseudoSampler'),
                pos_weight=1,
                mask_size=28,
                debug=False)
        ]),
    test_cfg=dict(rpn=None, rcnn=dict(max_per_img=300, mask_thr_binary=0.5)))
total_epochs = 22
min_values = (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
optimizer_config = dict(grad_clip=dict(max_norm=15, norm_type=2))
fp16 = dict(loss_scale=512.0)
work_dir = '/content/drive/MyDrive/log_mmdetQuery'
gpu_ids = range(0, 1)

Overwriting query_large.py


## Myconfig base

In [None]:
%%writefile query_large.py

model = dict(
    type='HybridTaskCascade',
    pretrained=None,
    backbone=dict(
        type='CBSwinTransformer',
        embed_dim=128,
        depths=[2, 2, 18, 2],
        num_heads=[4, 8, 16, 32],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.3,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='CBFPN',
        in_channels=[128, 256, 512, 1024],
        out_channels=256,
        num_outs=5),
        rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='HybridTaskCascadeRoIHead',
        interleaved=True,
        mask_info_flow=True,
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=323,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=True,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='BN', requires_grad=True),
                loss_cls=dict(
                    type='SeesawLoss',
                    p=0.8,
                    q=2.0,
                    num_classes=323,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=323,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=True,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='BN', requires_grad=True),
                loss_cls=dict(
                    type='SeesawLoss',
                    p=0.8,
                    q=2.0,
                    num_classes=323,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=323,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=True,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='BN', requires_grad=True),
                loss_cls=dict(
                    type='SeesawLoss',
                    p=0.8,
                    q=2.0,
                    num_classes=323,
                    loss_weight=1.0),

                loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
        ],
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=[
            dict(
                type='HTCMaskHead',
                with_conv_res=False,
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=323,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)),
            dict(
                type='HTCMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=323,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)),
            dict(
                type='HTCMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=323,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))
        ]),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_across_levels=False,
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.5,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=1000,
            mask_thr_binary=0.45)))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='LoadAnnotations', with_bbox=True, with_mask=True), #with_seg=True),
    dict(
        type='Resize',
        img_scale=[(1200, 1100)],
        multiscale_mode='range',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    #dict(type='SegRescale', scale_factor=0.125),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1200, 1100),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]

CLASSES = ['beetroot-steamed-without-addition-of-salt', 'bread_wholemeal', 'jam', 'water', 'bread', 'banana', 'soft_cheese', 'ham_raw', 'hard_cheese', 'cottage_cheese', 'coffee', 'fruit_mixed', 'pancake', 'tea', 'salmon_smoked', 'avocado', 'spring_onion_scallion', 'ristretto_with_caffeine', 'ham_n_s', 'egg', 'bacon', 'chips_french_fries', 'juice_apple', 'chicken', 'tomato', 'broccoli', 'shrimp_prawn', 'carrot', 'chickpeas', 'french_salad_dressing', 'pasta_hornli_ch', 'sauce_cream', 'pasta_n_s', 'tomato_sauce', 'cheese_n_s', 'pear', 'cashew_nut', 'almonds', 'lentil_n_s', 'mixed_vegetables', 'peanut_butter', 'apple', 'blueberries', 'cucumber', 'yogurt', 'butter', 'mayonnaise', 'soup', 'wine_red', 'wine_white', 'green_bean_steamed_without_addition_of_salt', 'sausage', 'pizza_margherita_baked', 'salami_ch', 'mushroom', 'tart_n_s', 'rice', 'white_coffee', 'sunflower_seeds', 'bell_pepper_red_raw', 'zucchini', 'asparagus', 'tartar_sauce', 'lye_pretzel_soft', 'cucumber_pickled_ch', 'curry_vegetarian', 'soup_of_lentils_dahl_dhal', 'salmon', 'salt_cake_ch_vegetables_filled', 'orange', 'pasta_noodles', 'cream_double_cream_heavy_cream_45', 'cake_chocolate', 'pasta_spaghetti', 'black_olives', 'parmesan', 'spaetzle', 'salad_lambs_ear', 'salad_leaf_salad_green', 'potato', 'white_cabbage', 'halloumi', 'beetroot_raw', 'bread_grain', 'applesauce', 'cheese_for_raclette_ch', 'bread_white', 'curds_natural', 'quiche', 'beef_n_s', 'taboule_prepared_with_couscous', 'aubergine_eggplant', 'mozzarella', 'pasta_penne', 'lasagne_vegetable_prepared', 'mandarine', 'kiwi', 'french_beans', 'spring_roll_fried', 'caprese_salad_tomato_mozzarella', 'leaf_spinach', 'roll_of_half_white_or_white_flour_with_large_void', 'omelette_with_flour_thick_crepe_plain', 'tuna', 'dark_chocolate', 'sauce_savoury_n_s', 'raisins_dried', 'ice_tea_on_black_tea_basis', 'kaki', 'smoothie', 'crepe_with_flour_plain', 'nuggets', 'chili_con_carne_prepared', 'veggie_burger', 'chinese_cabbage', 'hamburger', 'soup_pumpkin', 'sushi', 'chestnuts_ch', 'sauce_soya', 'balsamic_salad_dressing', 'pasta_twist', 'bolognaise_sauce', 'leek', 'fajita_bread_only', 'potato_gnocchi', 'rice_noodles_vermicelli', 'bread_whole_wheat', 'onion', 'garlic', 'hummus', 'pizza_with_vegetables_baked', 'beer', 'glucose_drink_50g', 'ratatouille', 'peanut', 'cauliflower', 'green_olives', 'bread_pita', 'pasta_wholemeal', 'sauce_pesto', 'couscous', 'sauce', 'bread_toast', 'water_with_lemon_juice', 'espresso', 'egg_scrambled', 'juice_orange', 'braided_white_loaf_ch', 'emmental_cheese_ch', 'hazelnut_chocolate_spread_nutella_ovomaltine_caotina', 'tomme_ch', 'hazelnut', 'peach', 'figs', 'mashed_potatoes_prepared_with_full_fat_milk_with_butter', 'pumpkin', 'swiss_chard', 'red_cabbage_raw', 'spinach_raw', 'chicken_curry_cream_coconut_milk_curry_spices_paste', 'crunch_muesli', 'biscuit', 'meatloaf_ch', 'fresh_cheese_n_s', 'honey', 'vegetable_mix_peas_and_carrots', 'parsley', 'brownie', 'ice_cream_n_s', 'salad_dressing', 'dried_meat_n_s', 'chicken_breast', 'mixed_salad_chopped_without_sauce', 'feta', 'praline_n_s', 'walnut', 'potato_salad', 'kolhrabi', 'alfa_sprouts', 'brussel_sprouts', 'gruyere_ch', 'bulgur', 'grapes', 'chocolate_egg_small', 'cappuccino', 'crisp_bread', 'bread_black', 'rosti_n_s', 'mango', 'muesli_dry', 'spinach', 'fish_n_s', 'risotto', 'crisps_ch', 'pork_n_s', 'pomegranate', 'sweet_corn', 'flakes', 'greek_salad', 'sesame_seeds', 'bouillon', 'baked_potato', 'fennel', 'meat_n_s', 'croutons', 'bell_pepper_red_stewed', 'nuts', 'breadcrumbs_unspiced', 'fondue', 'sauce_mushroom', 'strawberries', 'pie_plum_baked_with_cake_dough', 'potatoes_au_gratin_dauphinois_prepared', 'capers', 'bread_wholemeal_toast', 'red_radish', 'fruit_tart', 'beans_kidney', 'sauerkraut', 'mustard', 'country_fries', 'ketchup', 'pasta_linguini_parpadelle_tagliatelle', 'chicken_cut_into_stripes_only_meat', 'cookies', 'sun_dried_tomatoe', 'bread_ticino_ch', 'semi_hard_cheese', 'porridge_prepared_with_partially_skimmed_milk', 'juice', 'chocolate_milk', 'bread_fruit', 'corn', 'dates', 'pistachio', 'cream_cheese_n_s', 'bread_rye', 'witloof_chicory', 'goat_cheese_soft', 'grapefruit_pomelo', 'blue_mould_cheese', 'guacamole', 'tofu', 'cordon_bleu', 'quinoa', 'kefir_drink', 'salad_rocket', 'pizza_with_ham_with_mushrooms_baked', 'fruit_coulis', 'plums', 'pizza_with_ham_baked', 'pineapple', 'seeds_n_s', 'focaccia', 'mixed_milk_beverage', 'coleslaw_chopped_without_sauce', 'sweet_potato', 'chicken_leg', 'croissant', 'cheesecake', 'sauce_cocktail', 'croissant_with_chocolate_filling', 'pumpkin_seeds', 'artichoke', 'soft_drink_with_a_taste', 'apple_pie', 'white_bread_with_butter_eggs_and_milk', 'savoury_pastry_stick', 'tuna_in_oil_drained', 'meat_terrine_pate', 'falafel_balls', 'berries_n_s', 'latte_macchiato', 'sugar_melon_galia_honeydew_cantaloupe', 'mixed_seeds_n_s', 'oil_vinegar_salad_dressing', 'celeriac', 'chocolate_mousse', 'lemon', 'chocolate_cookies', 'birchermuesli_prepared_no_sugar_added', 'muffin', 'pine_nuts', 'french_pizza_from_alsace_baked', 'chocolate_n_s', 'grits_polenta_maize_flour', 'wine_rose', 'cola_based_drink', 'raspberries', 'roll_with_pieces_of_chocolate', 'cake_lemon', 'rice_wild', 'gluten_free_bread', 'pearl_onion', 'tzatziki', 'ham_croissant_ch', 'corn_crisps', 'lentils_green_du_puy_du_berry', 'rice_whole_grain', 'cervelat_ch', 'aperitif_with_alcohol_n_s_aperol_spritz', 'peas', 'tiramisu', 'apricots', 'lasagne_meat_prepared', 'brioche', 'vegetable_au_gratin_baked', 'basil', 'butter_spread_puree_almond', 'pie_apricot', 'rusk_wholemeal', 'pasta_in_conch_form', 'pasta_in_butterfly_form_farfalle', 'damson_plum', 'shoots_n_s', 'coconut', 'banana_cake', 'sauce_curry', 'watermelon_fresh', 'white_asparagus', 'cherries', 'nectarine']


data = dict(
    samples_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
       type='CocoDataset',
        ann_file='data/train/new_ann.json',
        img_prefix='data/train/images/',

        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='Resize', img_scale=[(1333, 640), (1333, 1000)]),
            dict(
                type='Albu',
                transforms=[dict(type='RandomRotate90', p=1)],
                bbox_params=dict(
                    type='BboxParams',
                    format='pascal_voc',
                    label_fields=['gt_labels'],
                    min_visibility=0.0,
                    filter_lost_elements=True),
                keymap=dict(img='image', gt_masks='masks', gt_bboxes='bboxes'),
                update_pad_shape=False,
                skip_img_without_anno=True),
            dict(type='RandomFlip', direction='vertical', flip_ratio=0.5),
            dict(type='RandomFlip', direction='horizontal', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ],classes=CLASSES
        #seg_prefix='data/coco/stuffthingmaps/train2017/'
        ),
    val=dict(
        type='CocoDataset',
        ann_file='data/val/new_ann.json',
        img_prefix='data/val/images/',

        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=[(1333, 800), (1333, 1000)],
                flip=True,
                flip_direction=['horizontal', 'vertical'],
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=CLASSES
        ),
    test=dict(
       type='CocoDataset',
        ann_file='data/val/new_ann.json',
        img_prefix='data/val/images/',

        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=[(1333, 800), (1333, 1000)],
                flip=True,
                flip_direction=['horizontal', 'vertical'],
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=CLASSES))
evaluation = dict(interval=1, metric=['segm'])
optimizer = dict(
    type='AdamW',
    lr= 0.0001,  #5e-05,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict(grad_clip=dict(max_norm=15, norm_type=2))
fp16 = dict(loss_scale=512.0)
# optimizer_config = dict(
#     grad_clip=None,
#     type='DistOptimizerHook',
#     update_interval=1,
#     coalesce=True,
#     bucket_size_mb=-1,
#     use_fp16=True)
# lr_config = dict(
#     policy='step',
#     warmup='linear',
#     warmup_iters=500,
#     warmup_ratio=0.001,
#     step=[16, 19])

lr_config = dict(
    policy='CosineAnnealing',
    by_epoch=False,
    warmup='linear',
    warmup_iters=720,
    warmup_ratio=0.001,
    min_lr=5e-06)
runner = dict(type='EpochBasedRunner', max_epochs=20)
#checkpoint_config = dict(interval=1)
checkpoint_config = dict(max_keep_ckpts=2, interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = '/content/htc_cbv2_swin_base22k_patch4_window7_mstrain_400-1400_giou_4conv1f_adamw_20e_coco.pth' #'https://github.com/CBNetwork/storage/releases/download/v1.0.0/htc_cbv2_swin_base22k_patch4_window7_mstrain_400-1400_giou_4conv1f_adamw_20e_coco.pth.zip' # None
resume_from = None
workflow = [('train', 1)]
#fp16 = None
work_dir = '/content/drive/MyDrive/log_mmdetCBNET'
gpu_ids = range(0, 1)


Overwriting cbnet_base.py


In [8]:
!wget -O pre.pth 'https://drive.google.com/file/d/1tqkpaArF0a0WVEolsCC8yrvgoydY7_Ha/view?usp=sharing'

--2022-04-06 10:25:08--  https://drive.google.com/file/d/1tqkpaArF0a0WVEolsCC8yrvgoydY7_Ha/view?usp=sharing
Resolving drive.google.com (drive.google.com)... 142.250.157.139, 142.250.157.101, 142.250.157.100, ...
Connecting to drive.google.com (drive.google.com)|142.250.157.139|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘pre.pth’

pre.pth                 [ <=>                ]  64.87K  --.-KB/s    in 0.1s    

2022-04-06 10:25:09 (529 KB/s) - ‘pre.pth’ saved [66427]



In [None]:
!unzip *.zip

# Training the Model 🚂

Finally training our model!

In [None]:
#Lets train the model
!pip uninstall pycocotools -y
!pip install mmpycocotools
!pip install --upgrade albumentations
!pip uninstall opencv-python-headless -y
!pip install opencv-python-headless==4.5.2.52
!python mmdetection/tools/train.py cbnet_base.py --work-dir '/content/drive/MyDrive/log_mmdetCBNET' #--no-validate

In [None]:
!python mmdetection/tools/train.py query_large.py --work-dir '/content/drive/MyDrive/log_mmdetQuery' #--no-validate

apex is not installed
apex is not installed
apex is not installed
apex is not installed
fatal: not a git repository (or any of the parent directories): .git
2022-04-06 11:18:22,782 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.13 (default, Mar 16 2022, 17:37:17) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla P100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architectu

In [None]:
!du -sh log_mmdet/epoch_1.pth
!ls log_mmdet
## loss at epoch3 1.0179 and s0.loss_mask=0.25

In [None]:
from IPython.display import FileLink
FileLink('log_mmdet/epoch_1.pth')

In [None]:
!pip install ipython

## Testing

In [None]:
#lets get the latest checkpoint file 

work_dir = "/content/drive/MyDrive/log_mmdet"
checkpoint_file = os.path.join(work_dir, "latest.pth")
assert os.path.isfile(
    checkpoint_file), '`{}` not exist'.format(checkpoint_file)
checkpoint_file = os.path.abspath(checkpoint_file)
checkpoint_file

In [None]:
#Lets visualize some results
import time
import matplotlib
import matplotlib.pylab as plt
plt.rcParams["axes.grid"] = False

import mmcv
from mmcv.runner import load_checkpoint
import mmcv.visualization.image as mmcv_image
# fix for colab

def imshow(img, win_name='', wait_time=0): plt.figure(
    figsize=(50, 50)); plt.imshow(img)


mmcv_image.imshow = imshow
from mmdet.models import build_detector
from mmdet.apis import inference_detector, show_result_pyplot, init_detector


score_thr = 0.1 #decrease the threshold if you feel like you are missing some predictions


# build the model from a config file and a checkpoint file
model = init_detector(config_fname, checkpoint_file)

# test a single image and show the results
img = '/content/data/val/images/008082.jpg'   #you can change this to any image you want!

result = inference_detector(model, img)
show_result_pyplot(model, img, result, score_thr=0.1, title='result', wait_time=0)

## Create categories files for correct annotations during inference

In [None]:

import json, os
annotation_path = os.path.join("data", "train/annotations.json")
json_file = open(annotation_path)
coco = json.load(json_file)

with open("classes.json",'w') as f:
    json.dump(coco["categories"],f)

## Copy the config file and trained model

In [None]:
#copy the trained model and config file to home directory
%cp /content/drive/MyDrive/log_mmdet/htc_without_semantic_r50_fpn_1x_coco.py /content/htc_without_semantic_r50_fpn_1x_coco.py
%cp /content/drive/MyDrive/log_mmdet/latest.pth /content/latest.pth


# Quick Submission 💪

## Inference on the public test set
*   loading the model config and setting up related paths
*   running inference and generating json file for submission



In [None]:
#@title Inference code from myfood exp repo, Used for quick submission
%%writefile inference_mmdet.py
'''
@Author: Gaurav Singhal
@Description: Standalone file for testing and evaluating
the models. It doesn't do any post-processing or ensembling.
'''

import argparse
import os
import warnings
import glob
import json
import mmcv
import torch
from mmcv import Config, DictAction
from mmcv.cnn import fuse_conv_bn
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import (get_dist_info, init_dist, load_checkpoint,
                         wrap_fp16_model)
from mmdet.apis import init_detector, inference_detector

from mmdet.apis import multi_gpu_test
from mmdet.datasets import (build_dataloader, build_dataset,
                            replace_ImageToTensor)
from mmdet.models import build_detector
# import aicrowd_helpers
import os.path as osp
import traceback
import pickle
import shutil
import tempfile
import time
import torch.distributed as dist
from mmcv.image import tensor2imgs
from mmdet.core import encode_mask_results

import uuid

# TEST_IMAGES_PATH = "/mnt/public/xxx/imrec/data/val/images"

def create_test_predictions(images_path):
    test_predictions_file = tempfile.NamedTemporaryFile(mode="w+", suffix=".json")
	
    annotations = {'categories': [], 'info': {}, 'images': []}
    for item in glob.glob(images_path+'/*.jpg'):
        image_dict = dict()
        img = mmcv.imread(item)
        height,width,__ = img.shape
        id = int(os.path.basename(item).split('.')[0])
        image_dict['id'] = id
        image_dict['file_name'] = os.path.basename(item)
        image_dict['width'] = width
        image_dict['height'] = height
        annotations['images'].append(image_dict)
    annotations['categories'] = json.loads(open("classes.json").read())
    json.dump(annotations, open(test_predictions_file.name, 'w'))

    return test_predictions_file

def single_gpu_test(model,
                    data_loader,
                    show=False,
                    out_dir=None,
                    show_score_thr=0.3):
    
    model.eval()
    results = []
    dataset = data_loader.dataset
    prog_bar = mmcv.ProgressBar(len(dataset))
    for i, data in enumerate(data_loader):
        # aicrowd_helpers.execution_progress({"image_ids" : [i]})
        with torch.no_grad():
            result = model(return_loss=False, rescale=True, **data)

        batch_size = len(result)
        if show or out_dir:
            if batch_size == 1 and isinstance(data['img'][0], torch.Tensor):
                img_tensor = data['img'][0]
            else:
                img_tensor = data['img'][0].data[0]
            img_metas = data['img_metas'][0].data[0]
            imgs = tensor2imgs(img_tensor, **img_metas[0]['img_norm_cfg'])
            assert len(imgs) == len(img_metas)

            for i, (img, img_meta) in enumerate(zip(imgs, img_metas)):
                h, w, _ = img_meta['img_shape']
                img_show = img[:h, :w, :]

                ori_h, ori_w = img_meta['ori_shape'][:-1]
                img_show = mmcv.imresize(img_show, (ori_w, ori_h))

                if out_dir:
                    out_file = osp.join(out_dir, img_meta['ori_filename'])
                else:
                    out_file = None

                model.module.show_result(
                    img_show,
                    result[i],
                    show=show,
                    out_file=out_file,
                    score_thr=show_score_thr)

        # Perform RLE encode for masks
        if isinstance(result[0], tuple):
            result = [(bbox_results, encode_mask_results(mask_results))
                      for bbox_results, mask_results in result]
        results.extend(result)

        for _ in range(batch_size):
            prog_bar.update()
    return results

def parse_args():
    parser = argparse.ArgumentParser(
        description='MMDet test (and eval) a model')
    parser.add_argument('--config', help='test config file path')
    parser.add_argument('--checkpoint', help='checkpoint file')
    parser.add_argument('--data', help='test data folder path')
    parser.add_argument('--out', help='output result file in pickle format')
    parser.add_argument(
        '--fuse-conv-bn',
        action='store_true',
        help='Whether to fuse conv and bn, this will slightly increase'
        'the inference speed')
    parser.add_argument(
        '--format-only',
        action='store_true',
        help='Format the output results without perform evaluation. It is'
        'useful when you want to format the result to a specific format and '
        'submit it to the test server')
    parser.add_argument(
        '--eval',
        type=str,
        nargs='+',
        help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
        ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
    parser.add_argument('--show', action='store_true', help='show results')
    parser.add_argument(
        '--show-dir', help='directory where painted images will be saved')
    parser.add_argument(
        '--show-score-thr',
        type=float,
        default=0.3,
        help='score threshold (default: 0.3)')
    parser.add_argument(
        '--gpu-collect',
        action='store_true',
        help='whether to use gpu to collect results.')
    parser.add_argument(
        '--tmpdir',
        help='tmp directory used for collecting results from multiple '
        'workers, available when gpu-collect is not specified')
    parser.add_argument(
        '--cfg-options',
        nargs='+',
        action=DictAction,
        help='override some settings in the used config, the key-value pair '
        'in xxx=yyy format will be merged into config file.')
    parser.add_argument(
        '--options',
        nargs='+',
        action=DictAction,
        help='custom options for evaluation, the key-value pair in xxx=yyy '
        'format will be kwargs for dataset.evaluate() function (deprecate), '
        'change to --eval-options instead.')
    parser.add_argument(
        '--eval-options',
        nargs='+',
        action=DictAction,
        help='custom options for evaluation, the key-value pair in xxx=yyy '
        'format will be kwargs for dataset.evaluate() function')
    parser.add_argument(
        '--launcher',
        choices=['none', 'pytorch', 'slurm', 'mpi'],
        default='none',
        help='job launcher')
    parser.add_argument('--out_file', help='output result file')
    parser.add_argument('--local_rank', type=int, default=0)
    parser.add_argument('--type', type=str, choices=['val', 'test'], default='test')
    parser.add_argument('--reduce_ms', action='store_true',
        help='Whether to reduce the multi-scale aug')
    args = parser.parse_args()

    if 'LOCAL_RANK' not in os.environ:
        os.environ['LOCAL_RANK'] = str(args.local_rank)

    if args.options and args.eval_options:
        raise ValueError(
            '--options and --eval-options cannot be both '
            'specified, --options is deprecated in favor of --eval-options')
    if args.options:
        warnings.warn('--options is deprecated in favor of --eval-options')
        args.eval_options = args.options
    return args

def reduce_multiscale_TTA(cfg):
    '''
    Keep only 1st and last image sizes from Multi-Scale TTA
    
    @input
    cfg -> Configuration file
    '''

    scale = cfg.data.test.pipeline[1]['img_scale']
    if len(scale) > 2:
        new_scale = [scale[0], scale[-1]]
        cfg.data.test.pipeline[1]['img_scale'] = new_scale   
    return cfg

def main():
    ########################################################################
    # Register Prediction Start
    ########################################################################

    # aicrowd_helpers.execution_start()
    args = parse_args()
    data_folder = args.data
    # Create annotations if not already created
    test_predictions_file = create_test_predictions(data_folder)
    
    # Load annotations
    with open(test_predictions_file.name) as f:
        annotations = json.loads(f.read())

    assert args.out or args.eval or args.format_only or args.show \
        or args.show_dir, \
        ('Please specify at least one operation (save/eval/format/show the '
         'results / save the results) with the argument "--out", "--eval"'
         ', "--format-only", "--show" or "--show-dir"')

    if args.eval and args.format_only:
        raise ValueError('--eval and --format_only cannot be both specified')

    if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
        raise ValueError('The output file must be a pkl file.')

    cfg = Config.fromfile(args.config)
    if args.cfg_options is not None:
        cfg.merge_from_dict(args.cfg_options)
    
    JSONFILE_PREFIX="predictions_{}".format(str(uuid.uuid4())) 
    # import modules present in list of strings.
    if cfg.get('custom_imports', None):
        from mmcv.utils import import_modules_from_strings
        import_modules_from_strings(**cfg['custom_imports'])
    
    # set cudnn_benchmark
    if cfg.get('cudnn_benchmark', False):
        torch.backends.cudnn.benchmark = True
    
    cfg.data.samples_per_gpu = 1
    cfg.data.workers_per_gpu = 2
    cfg.model.pretrained = None
    cfg.data.test.test_mode = True
    cfg.data.test.ann_file = test_predictions_file.name
    cfg.data.test.img_prefix = data_folder

    if cfg.model.get('neck'):
        if isinstance(cfg.model.neck, list):
            for neck_cfg in cfg.model.neck:
                if neck_cfg.get('rfp_backbone'):
                    if neck_cfg.rfp_backbone.get('pretrained'):
                        neck_cfg.rfp_backbone.pretrained = None
        elif cfg.model.neck.get('rfp_backbone'):
            if cfg.model.neck.rfp_backbone.get('pretrained'):
                cfg.model.neck.rfp_backbone.pretrained = None

    # in case the test dataset is concatenated
    if isinstance(cfg.data.test, dict):
        cfg.data.test.test_mode = True
    elif isinstance(cfg.data.test, list):
        for ds_cfg in cfg.data.test:
            ds_cfg.test_mode = True

    cfg.data.test.ann_file = test_predictions_file.name
    cfg.data.test.img_prefix = data_folder
        
    # if args.reduce_ms:
    #     print("Reduce multi-scale TTA")
    #     cfg = reduce_multiscale_tta(cfg)
    #     print(cfg.data.test.pipeline[1]['img_scale'])
        
    if args.launcher == 'none':
        distributed = False
    else:
        distributed = True
        init_dist(args.launcher, **cfg.dist_params)
    
    # build the dataloader
    samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)
    if samples_per_gpu > 1:
        # Replace 'ImageToTensor' to 'DefaultFormatBundle'
        cfg.data.test.pipeline = replace_ImageToTensor(cfg.data.test.pipeline)
    dataset = build_dataset(cfg.data.test)
    print(dataset)
    dataset.cat_ids = [category["id"] for category in annotations["categories"]]
    data_loader = build_dataloader(
        dataset,
        samples_per_gpu=1,
        workers_per_gpu=2,
        dist=distributed,
        shuffle=False)

    # build the model and load checkpoint
    # model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.model.test_cfg)
    model = init_detector(args.config,args.checkpoint,device='cuda:0')

    fp16_cfg = cfg.get('fp16', None)
    if fp16_cfg is not None:
        wrap_fp16_model(model)
    # checkpoint = load_checkpoint(model, args.checkpoint, map_location='cuda')
    if args.fuse_conv_bn:
        model = fuse_conv_bn(model)

    model.CLASSES = [category['name'] for category in annotations['categories']]
    # if 'CLASSES' in checkpoint['meta']:
        # model.CLASSES = checkpoint['meta']['CLASSES']
    # else:
        # model.CLASSES = dataset.CLASSES

    if not distributed:
        model = MMDataParallel(model, device_ids=[0])
        outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
                                  args.show_score_thr)
    else:
        model = MMDistributedDataParallel(
            model.cuda(),
            device_ids=[torch.cuda.current_device()],
            broadcast_buffers=False)
        outputs = multi_gpu_test(model, data_loader, args.tmpdir,
                                 args.gpu_collect)

    rank, _ = get_dist_info()
    if rank == 0:
        if args.out:
            print(f'\nwriting results to {args.out}')
            mmcv.dump(outputs, args.out)
        kwargs = {} if args.eval_options is None else args.eval_options
        if args.format_only:
            dataset.format_results(outputs, **kwargs)
        if args.eval:
            eval_kwargs = cfg.get('evaluation', {}).copy()
            for key in ['interval', 'tmpdir', 'start', 'gpu_collect']:
                eval_kwargs.pop(key, None)
            eval_kwargs.update(dict(metric=args.eval, **kwargs))
            print(dataset.evaluate(outputs, **eval_kwargs))
    
    # consolidate_results(["predictions.segm.json"], 'test_predictions.json', args.out_file)
    ########################################################################
    # Register Prediction Complete
    ########################################################################
    # aicrowd_helpers.execution_success({
    #     "predictions_output_path" : args.out_file
    # })
    print("\nAICrowd register complete")
    # preds = []
    # with open("predictions.segm.json", "r") as pred_file:
    #     preds.extend(json.loads(pred_file.read()))
    # print(preds)
    JSONFILE_PREFIX = args.eval_options['jsonfile_prefix']
    shutil.move("{}.segm.json".format(JSONFILE_PREFIX), args.out_file)
    os.remove("{}.bbox.json".format(JSONFILE_PREFIX))
        
if __name__ == '__main__':
    try:
        main()
    except Exception as e:
        error = traceback.format_exc()
        print(error)


In [None]:
#setting the paths for images and output file
test_images_dir="/content/data/test/images"
output_filepath="/content/predictions_mmdetection.json"

#path of trained model & config
model_path="/content/latest.pth"
config_file="/content/htc_without_semantic_r50_fpn_1x_coco.py"

In [None]:
!python inference_mmdet.py --config $config_file --checkpoint $model_path \
--data $test_images_dir \
--format-only --eval-options "jsonfile_prefix=preds" --out_file $output_filepath

Now that the prediction file is generated for public test set, To make quick submission:
* Use AIcrowd CLL `aicrowd submit` command to do a quick submission. </br>

**Alternatively:**
* download the `predictions_mmdetection.json` file by running below cell
* visit the [create submission page](https://www.aicrowd.com/challenges/food-recognition-benchmark-2022/submissions/new) 
* Upload the `predictions_mmdetection.json` file 
* Voila!! You just made your first submission!


In [None]:
#use aicrowd CLI to make quick submission
!aicrowd submission create -c food-recognition-benchmark-2022 -f $output_filepath

#Active submission 🤩

Step 0 : Fork the baseline to make your own changes to it. Go to settings and make the repo private.


Step 1 : For first time setup, Setting up SSH to login to Gitlab.

  0. Run the next cell to check if you already have SSH keys in your drive, if yes, skip this step. 
  1. Run `ssh-keygen -t ecdsa -b 521` 
  2. Run `cat ~./ssh/id_ecdsa.pub` and copy the output
  3. Go to [Gitlab SSH Keys](https://gitlab.aicrowd.com/profile/keys) and then paste the output inside the key and use whaever title you like. 


Step 2: Clone your forked Repo & Add Models & Push Changes

  1. Run `git clone git@gitlab.aicrowd.com:[Your Username]/mmdetection-starter-food-2022.git`
  2. Put your model inside the models directioary and then run `git-lfs track "*.pth"`
  3. Run `git add .` then `git commit -m " adding model"`
  3. Run `git push origin master`

Step 3. Create Submission

  1. Go to the repo and then tags and then New Tag. 
  2. In the tag name,you can use `submission_v1`, ( Everytime you make a new submission, just increase the no. like - `submission_v2`,  `submission_v3` )
  3. A new issue will be created with showing the process. Enjoy!




If you do not have SSH Keys, Check this [Page](https://docs.gitlab.com/ee/ssh/index.html#generate-an-ssh-key-pair)

Add your SSH Keys to your GitLab account by following the instructions here

In [None]:
%%bash
SSH_PRIV_KEY=/content/drive/MyDrive/id_ecdsa
SSH_PUB_KEY=/content/drive/MyDrive/id_ecdsa.pub
if [ -f "$SSH_PRIV_KEY" ]; then
    echo "SSH Key found! ✅\n"
    mkdir -p /root/.ssh
    cp /content/drive/MyDrive/id_ecdsa ~/.ssh/id_ecdsa
    cp /content/drive/MyDrive/id_ecdsa.pub ~/.ssh/id_ecdsa.pub
    echo "SSH key successfully copied to local!"
else
    echo "SSH Key does not exist."
    ssh-keygen -t ecdsa -b521 -f ~/.ssh/id_ecdsa
    cat ~/.ssh/id_ecdsa.pub
    echo "❗️Please open https://gitlab.aicrowd.com/profile/keys and copy-paste the above text in the **key** textbox."
    cp  ~/.ssh/id_ecdsa /content/drive/MyDrive/id_ecdsa
    cp  ~/.ssh/id_ecdsa.pub /content/drive/MyDrive/id_ecdsa.pub
    echo "SSH key successfully created and copied to drive!"
fi

In [None]:
import IPython

html = "<b>Copy paste below SSH key in your GitLab account here (one time):</b><br/>"
html += '<a href="https://gitlab.aicrowd.com/-/profile/keys" target="_blank">https://gitlab.aicrowd.com/-/profile/keys</a><br><br>'

public_key = open("/content/drive/MyDrive/id_ecdsa.pub").read()
html += '<br/><textarea>'+public_key+'</textarea><button onclick="navigator.clipboard.writeText(\''+public_key.strip()+'\');this.innerHTML=\'Copied ✅\'">Click to copy</button>'
IPython.display.HTML(html)

Clone the gitlab starter repo and add submission files

In [None]:
# Set your AIcrowd username for action submission.
# This username will store repository and used for submitter's username, etc
username = "jerome_patel"
!echo -n {username} > author.txt

In [None]:
%%bash
username=$(cat author.txt)
echo "Username $username"

git config --global user.name "$username"
git config --global user.email "$username@noreply.gitlab.aicrowd.com"

touch ${HOME}/.ssh/known_hosts
ssh-keyscan -H gitlab.aicrowd.com >> ${HOME}/.ssh/known_hosts 2> /dev/null


apt install -qq -y jq git-lfs &> /dev/null

git lfs install
cd /content/

echo "Checking if repository already exist, otherwise create one"
export SUBMISSION_REPO="git@gitlab.aicrowd.com:$username/mmdetection-starter-food-2022.git"
echo "cloning the $SUBMISSION_REPO"
git clone $SUBMISSION_REPO mmdetection-starter-food-2022
ALREADYEXIST=$?

if [ $ALREADYEXIST -ne 0 ]; then
  echo "Project didn't exist, forking from upstream"
  git clone https://github.com/AIcrowd/food-recognition-benchmark-starter-kit.git mmdetection-starter-food-2022
fi

cd /content/mmdetection-starter-food-2022
git remote remove origin
git remote add origin "$SUBMISSION_REPO"

## To make active submission:
* Required Files are `aicrowd.json, apt.txt, requirements.txt, predict.py` (already configured for mmdetection)
* **[IMP]** Copy mmdetection trained model, corresponding config file to repo
* for inference place these files : `predict_mmdetection.py mmdet_inference.py` (already present in repo)
* Modify requirements.txt and `predict.py` for mmdetection
* **[IMP]** Modify `aicrowd.json` for your submission

**Note:** You only need to place your trained model and modify aicrowd.json to create your first easy submission. 

In [None]:
#@title Modify mmdet_inference.py (modify and run only if you want to change the inference)
%%writefile /content/mmdetection-starter-food-2022/utils/mmdet_inference.py

import mmcv
import numpy as np
import torch
from mmcv.ops import RoIPool
from mmcv.parallel import collate, scatter
from mmcv.runner import load_checkpoint

from mmdet.core import get_classes
from mmdet.datasets import replace_ImageToTensor
from mmdet.datasets.pipelines import Compose
from mmdet.models import build_detector
# import time


def inference(model, imgs):

    # start = time.process_time()
    imgs = [imgs]
    cfg = model.cfg
    device = 'cuda:0'
    if isinstance(imgs[0], np.ndarray):
        cfg = cfg.copy()
        # set loading pipeline type
        cfg.data.test.pipeline[0].type = 'LoadImageFromWebcam'

    cfg.data.test.pipeline = replace_ImageToTensor(cfg.data.test.pipeline)
    test_pipeline = Compose(cfg.data.test.pipeline)

    datas = []
    data = dict(img_info=dict(filename=imgs[0]), img_prefix=None)
    # build the data pipeline
    data = test_pipeline(data)
    datas.append(data)

    data = collate(datas, samples_per_gpu=len(imgs))
    # just get the actual data from DataContainer
    data['img_metas'] = [img_metas.data[0] for img_metas in data['img_metas']]
    data['img'] = [img.data[0] for img in data['img']]
    # scatter to specified GPU
    data = scatter(data, [device])[0]
    
    # forward the model
    with torch.no_grad():
        results = model(return_loss=False, rescale=True, **data)
    # your code here    
    # print(time.process_time() - start)
    return results[0]



In [None]:
#@title Modify predict_mmdetection.py (modify & run this only if you want to change inference code part)
%%writefile /content/mmdetection-starter-food-2022/predict_mmdetection.py

import os
import json
import glob
from PIL import Image
import importlib
import numpy as np
import cv2
import torch
import traceback
import pickle
import shutil
import glob
import tempfile
import time
import mmcv
import torch.distributed as dist
from mmcv.image import tensor2imgs
from mmdet.core import encode_mask_results
from mmcv import Config, DictAction
from mmcv.cnn import fuse_conv_bn
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import (get_dist_info, init_dist, load_checkpoint,
                         wrap_fp16_model)
from mmdet.apis import init_detector, inference_detector

from mmdet.apis import multi_gpu_test
from mmdet.datasets import (build_dataloader, build_dataset,
                            replace_ImageToTensor)
from mmdet.models import build_detector
import pycocotools.mask as mask_util

from utils.mmdet_inference import inference
from evaluator.food_challenge import FoodChallengePredictor


"""
Expected ENVIRONMENT Variables
* AICROWD_TEST_IMAGES_PATH : abs path to  folder containing all the test images
* AICROWD_PREDICTIONS_OUTPUT_PATH : path where you are supposed to write the output predictions.json
"""

class MMDetectionPredictor(FoodChallengePredictor):

    """
    PARTICIPANT_TODO:
    You can do any preprocessing required for your codebase here like loading up models into memory, etc.
    """
    def prediction_setup(self):
        # self.PADDING = 50
        # self.SEGMENTATION_LENGTH = 10
        # self.MAX_NUMBER_OF_ANNOTATIONS = 10

        #set the config parameters, including the architecture which was previously used
        self.cfg_name, self.checkpoint_name = self.get_mmdetection_config()
        self.cfg = Config.fromfile(self.cfg_name)
        # self.test_img_path = os.getenv("AICROWD_TEST_IMAGES_PATH", os.getcwd() + "/data/images/")
        self.test_predictions_file = self.create_test_predictions(self.test_data_path)

        if self.cfg.get('cudnn_benchmark', False):
            torch.backends.cudnn.benchmark = True
        self.cfg.data.samples_per_gpu = 1
        self.cfg.data.workers_per_gpu = 2
        self.cfg.model.pretrained = None
        self.cfg.data.test.test_mode = True
        self.cfg.data.test.ann_file = self.test_predictions_file.name
        self.cfg.data.test.img_prefix = self.test_data_path

        self.model = init_detector(self.cfg_name,self.checkpoint_name,device='cuda:0')

        fp16_cfg = self.cfg.get('fp16', None)
        if fp16_cfg is not None:
            wrap_fp16_model(self.model)

         # Load annotations
        with open(self.test_predictions_file.name) as f:
            self.annotations = json.loads(f.read())
        self.cat_ids = [category["id"] for category in self.annotations["categories"]]

        self.model.CLASSES = [category['name'] for category in self.annotations['categories']]

    """
    PARTICIPANT_TODO:
    During the evaluation all image file path will be provided one by one.
    NOTE: In case you want to load your model, please do so in `predict_setup` function.
    """
    def prediction(self, image_path):
        print("Generating for", image_path)
        # read the image
        result = inference(self.model, image_path)
        #RLE Encode the masks
        result = (result[0], encode_mask_results(result[1]))
        result = self.segm2jsonformat(result,image_path)
        return result

    def xyxy2xywh(self,bbox):
        _bbox = bbox.tolist()
        return [
            _bbox[0],
            _bbox[1],
            _bbox[2] - _bbox[0] + 1,
            _bbox[3] - _bbox[1] + 1,
        ]

    def segm2jsonformat(self, result,image_path):
        segm_json_results = []
        img_id = int(os.path.basename(image_path).split(".")[0])
        det, seg = result
        # print("image:",img_id)
        for label in range(len(det)):
                bboxes = det[label]
                #print(type(bboxes))
                segms = seg[label]
                mask_score = [bbox[4] for bbox in bboxes]
                for i in range(len(bboxes)):
                        data = dict()
                        data['image_id'] = img_id
                        data['bbox'] = self.xyxy2xywh(bboxes[i])
                        data['score'] = float(mask_score[i])
                        data['category_id'] = self.cat_ids[label]
                        if isinstance(segms[i]['counts'], bytes):
                                segms[i]['counts'] = segms[i]['counts'].decode()
                        data['segmentation'] = segms[i]
                        segm_json_results.append(data)
        return segm_json_results


    def create_test_predictions(self,images_path):
        test_predictions_file = tempfile.NamedTemporaryFile(mode="w+", suffix=".json")
        annotations = {'categories': [], 'info': {}, 'images': []}
        for item in glob.glob(images_path+'/*.jpg'):
            image_dict = dict()
            img = mmcv.imread(item)
            height,width,__ = img.shape
            id = int(os.path.basename(item).split('.')[0])
            image_dict['image_id'] = id
            image_dict['file_name'] = os.path.basename(item)
            image_dict['width'] = width
            image_dict['height'] = height
            annotations['images'].append(image_dict)
        annotations['categories'] = json.loads(open("classes.json").read())
        json.dump(annotations, open(test_predictions_file.name, 'w'))

        return test_predictions_file

    def get_mmdetection_config(self):
        with open('aicrowd.json') as f:
            content = json.load(f)
            config_fname = content['model_config_file']
            checkpoint_fname = content['model_path']
        # config = Config.fromfile(config_fname)
        return (config_fname, checkpoint_fname)


if __name__ == "__main__":
    submission = MMDetectionPredictor()
    submission.run()
    print("Successfully generated predictions!")


In [None]:
MODEL_ARCH = "htc_without_semantic_r50_fpn_1x_coco.py"
aicrowd_json = {
  "challenge_id" : "food-recognition-benchmark-2022",
  "authors" : ["pola_saidinesh"],
  "description" : "Food Recognition Benchmark 2022 Submission mmdetection",
  "license" : "MIT",
  "gpu": True,
  "debug": False,
  "model_path": "models/latest.pth",
  "model_type": "mmdetection",
  "model_config_file": "models/" + MODEL_ARCH
}
import json
with open('/content/mmdetection-starter-food-2022/aicrowd.json', 'w') as fp:
  fp.write(json.dumps(aicrowd_json, indent=4))

### Copy required files (trained model, config, classes.json) to mmdetection repo

In [None]:
!mkdir -p /content/mmdetection-starter-food-2022/models
!cp /content/classes.json /content/mmdetection-starter-food-2022/utils/classes.json
!cp /content/latest.pth /content/mmdetection-starter-food-2022/models/latest.pth
!cp $MODEL_ARCH /content/mmdetection-starter-food-2022/models/$MODEL_ARCH

### Finally push the repo for active submission

In [None]:
%%bash

## Set your unique tag for this submission (no spaces), example:
# export MSG="v1"
# export MSG="v2" ...
# or something more informative...
export MSG="mmdetection_submission_v0_1"

username=$(cat author.txt)
echo "Username $username"


cd /content/mmdetection-starter-food-2022
git lfs track "*.pth"
git add .gitattributes
git add --all
git commit -m "$MSG" || true

find . -type f -size +5M -exec git lfs migrate import --include={} &> /dev/null \;

git tag -am "submission_$MSG" "submission_$MSG"
git config lfs.https://gitlab.aicrowd.com/$username/mmdetection-starter-food-2022.git/info/lfs.locksverify false

git remote remove origin
git remote add origin git@gitlab.aicrowd.com:$username/mmdetection-starter-food-2022.git

git lfs push origin master
git push origin master
git push origin "submission_$MSG"

echo "Track your submission status here: https://gitlab.aicrowd.com/$username/mmdetection-starter-food-2022/issues"

## Local Evaluation for Active Submission Repo

In [None]:
%%bash
cd /content/mmdetection-starter-food-2022

export TEST_DATASET_PATH=../data/test/images
export RESULTS_DATASET_PATH=../data
./run.sh