# Before You Start

1. You will need Credentials to Silverpond's PyPi server. Contact your Customer Success team member if you don't have one.
2. Highlighter API Token. If you don't already have one you can do the following:
  - Login to Highlighter
  - Click on the User Icon 👤 and click their name in the dropdown menu
  - Click Request Access Token (At the bottom). This token will be valid until it is deleted
  - Save the token somewhere safe
3. You'll need a GPU to do training If in Google Colab be sure select a GPU runtime
4. If in Google Colab be sure when the Install Packaged cell completes it may ask you to restart the runtime. Click the button and **do not** re-run the cell again.
5. The Install Packages cell will take ~5-10min to run.

# This notebook

- Installs packages
- Exports data from Highlighter
- Inspects exported data
- Samples data into train and test splits
- Saves data to Coco format
- Configures mmdetection Faster-RCNN model for training
- Trains model
- Exports model to Open Neural Network Exchange (ONNX)
- Use the exported ONNX model to preform inference on an image from your test set

# Helpful Links

- [mmdetection github](https://github.com/open-mmlab/mmdetection)
- [read the docs](https://mmdetection.readthedocs.io/en/v2.18.0/)
- [mmcv github](https://github.com/open-mmlab/mmcv)


# Install Packages


In [None]:
def i_am_running_in_colab():
    try:
        import google.colab
        return True
    except:
        return False
    
if i_am_running_in_colab():
    %env PYPI_USERNAME=rick_sanchez
    %env PYPI_PASSWORD=WubbaLubbaDubDub
    !git clone https://github.com/silverpond/highlighter-client-v2-notebooks.git
    !bash highlighter-client-v2-notebooks/colab-scripts/setup-train-mmdet.sh


# House Keeping

In [None]:
# Check Pytorch installation
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

# Check MMDetection installation
import mmdet
print(mmdet.__version__)

# # Check mmcv installation
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
print(get_compiling_cuda_version())
print(get_compiler_version())

In [None]:
HL_WEB_GRAPHQL_API_TOKEN="<HIGHLIGHTER_API_TOKEN>"
HL_WEB_GRAPHQL_ENDPOINT="https://<ACCOUNT_NAME>.highlighter.ai/graphql"

dataset_id = 191

In [None]:
from highlighter_client.gql_client import HLClient

# Small helper function for displaying the DataFrames in the highlighter clinet
# dataset object
def display_ds(ds, count=10):
    display(ds.annotations_df.head(count))
    display(ds.images_df.head(count))

# Download data using Highlighter Client.

For a more detailed run through of how to use HighlighterClient see the [export-submissions](https://github.com/tall-josh/highlighter-client-v2-notebooks/blob/main/export-submissions.ipynb) notebook.


In [None]:
from highlighter_client.datasets import get_reader, get_writer
from highlighter_client.datasets.dataset import Dataset
from highlighter_client.base_models import DatasetSubmissionTypeConnection
from highlighter_client.paginate import paginate

ds = Dataset(
    reader=get_reader("highlighter_submissions")(),
    writer=get_writer("coco")(),
)

client = HLClient.from_credential(api_token=HL_WEB_GRAPHQL_API_TOKEN, endpoint_url=HL_WEB_GRAPHQL_ENDPOINT)

submissions_gen = paginate(
client.datasetSubmissionConnection,
DatasetSubmissionTypeConnection,
datasetId=dataset_id,
)

print("This could take a minute")
ds.read(submissions_gen=submissions_gen)


In [None]:
display_ds(ds)

# Preprocessing

At this point you may wish to do some pre-processing eg:

  - **remove unwanted classes**: You may wish to filter some annotations from your dataset
  - **split the data**: notice the `split` column is only a single value *data*. We can apply a random split before saving to `coco` format.

To keep things general we will simply split the data into **train** and **test** in this notebook




In [None]:
train_frac = 0.8
ds.images_df["split"] = "train"

test_ids = ds.images_df.sample(frac=1-train_frac, random_state=42).image_id
ds.images_df.loc[ds.images_df.image_id.isin(test_ids), "split"] = "test"
ds.images_df

In [None]:
from pathlib import Path

image_dir = Path("data/images")
annotations_dir = Path("data/annotatoins")

image_dir.mkdir(parents=True, exist_ok=True)
annotations_dir.mkdir(parents=True, exist_ok=True)

ds.write(annotations_dir=annotations_dir)

In [None]:
from highlighter_client.io import multithread_graphql_image_download

result = multithread_graphql_image_download(
    client,
    list(ds.images_df.image_id.values),
    image_dir,
)

# Check the json files exported correctly

We'll also get the number of categories in the training data. We will need it
when we configure the mmdet model for training.


In [None]:
import json

with (annotations_dir/"train.json").open('r') as f:
    train_data = json.load(f)
    
# We'll use this later when configuring the mmdet frcnn model
categories = train_data["categories"]
sorted(categories, key = lambda i: i["id"])

num_classes = len(categories)

for c in categories:
    print(c)
    
CLASSES = [i["name"] for i in categories]

print(f"num_images: {len(train_data['images'])}")
print(f"num_annos: {len(train_data['annotations'])}")

# Confirgure MMDetection Model for training

For more info on how to configure mmdet models see their docs. This is a good place to start https://mmdetection.readthedocs.io/en/latest/tutorials/config.html


In [None]:
from mmcv import Config
import mmcv

# Your checkpoints and configuration will be saved in this directory
work_dir = "work_dir"

# Keep it small for demo purposes. You're welcome to bump this up
# if you're down to party
num_epochs = 2

mmdet_config = dict(
    work_dir = work_dir,
    gpu_ids = [0],
    seed = 42,
    runner = dict(max_epochs=num_epochs),
    data = dict(
        train = dict(
            ann_file=str(annotations_dir / "train.json"),
            img_prefix=str(image_dir),
            classes=CLASSES,
        ),
        val = dict(
            ann_file=str(annotations_dir / "test.json"),
            img_prefix=str(image_dir),
            classes=CLASSES,
        ),
        test = dict(
            ann_file=str(annotations_dir / "test.json"),
            img_prefix=str(image_dir),
            classes=CLASSES,
        ),
    ),
    model = dict(
        roi_head = dict(
            bbox_head = dict(
                num_classes = num_classes,
            ),
        ),
    )
)
cfg = Config.fromfile("mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py")
cfg.merge_from_dict(mmdet_config)

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))

# Save config
cfg.dump(f"{work_dir}/model-config.py")

# Show the saved config
!cat $work_dir/model-config.py

In [None]:
from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmdet.apis import train_detector
import os.path as osp

# Build dataset
datasets = [build_dataset(cfg.data.train)]

# Build the detector
model = build_detector(
    cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
# Add an attribute for visualization convenience
model.CLASSES = CLASSES


In [None]:
train_detector(model, datasets, cfg, distributed=False, validate=True)

# Export to Open Neural Network Exchange (ONNX Format)

You will need

1. The image shape the model expects you can get this from the model config under the `train_pipeline` field. It should look something like:

```python
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
                                   ☝☝☝☝☝☝☝☝☝
```


In [None]:
# NOTE: shape is in the order of; HEIGHT WIDHT
# ALSO: This produces a lot of output. If the export
#       is Successfull you'll get a message:
#       Successfully exported ONNX model: work_dir/model.onnx

!python mmdetection/tools/deployment/pytorch2onnx.py \
  $work_dir/model-config.py \
  zzz_work_dir/latest.pth \
  --output-file $work_dir/model.onnx \
  --shape 1333 800 \
  --show

# Perform Inference With Onnx Model

In [None]:
import numpy as np
from highlighter_client.io import write_image
import onnx
from mmdet.core.export import preprocess_example_input
from mmdet.core.export.model_wrappers import ONNXRuntimeDetector

# Select random image from test set
image_filename = ds.images_df[ds.images_df.split == "test"].filename.sample(n=1).values[0]
image_path = f"{image_dir}/{image_filename}"

# Define pre-processing steps 
image_shape = (1333, 800)  # <-- NOTE: This needs to be the shape defined when you exported the ONNX model
input_config = {'input_shape': (1,3,image_shape[0],image_shape[1]),
                'input_path': image_path,
                'normalize_cfg': cfg.img_norm_cfg,}

# Perform pre-processing
one_img, one_meta = preprocess_example_input(input_config)
img_list, img_meta_list = [one_img], [[one_meta]]
img_list = [_.cuda().contiguous() for _ in img_list]

# Instantiate Model
onnx_model_file = f"{work_dir}/model.onnx"
onnx_model = ONNXRuntimeDetector(onnx_model_file, 
                                 class_names=np.array(CLASSES), 
                                 device_id=0)

In [None]:
# Run Model
"""
Note output shape [num_images, num_detections, 5]

Where the first 4 elements of each inner list represent a bbox the 5th
represents the confidence

[
  [
    [x0,y0,x1,y1,conf],
    [x0,y0,x1,y1,conf],
    ...
  ]
]
"""

onnx_results = onnx_model(img_list, img_metas=img_meta_list, return_loss=False)[0]
onnx_results

In [None]:
show_img = one_meta['show_img']
score_thr=0.0
output_inferences_file = "test_image_overlay.jpg"
onnx_model.show_result(
            show_img,
            onnx_results,
            score_thr=score_thr,
            show=True,
            win_name='ONNXRuntime',
            out_file=output_inferences_file)

In [None]:
from IPython.display import Image
Image(output_inferences_file)