# MMDetection
"Get Started" tutorial:
https://mmdetection.readthedocs.io/en/latest/get_started.html

### Setup
I use a pyenv-managed venv for development.
I do not use conda, nor did I use MMLab's `mim` to install `mmdet`,
nor did I install from source.

I decided to simply `pip install mmdet`.

Let's see how this goes.

#### Verify the installation
They provide some example files to run an inference demo.

These files are directly available if you installed from source:

```
python demo/image_demo.py demo/demo.jpg rtmdet_tiny_8xb32-300e_coco.py \
--weights rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth \
--device cpu
```


Otherwise you are directed to use `mim` to download these demo files, eg:

```
mim download mmdet --config rtmdet_tiny_8xb32-300e_coco --dest .
```

Since we used none of these methods for installation, 
let's just see what's available.

In [None]:
import mmdet
print(mmdet.__version__)


3.2.0


In [None]:
help(mmdet)


Help on package mmdet:

NAME
    mmdet - # Copyright (c) OpenMMLab. All rights reserved.

PACKAGE CONTENTS
    apis (package)
    datasets (package)
    engine (package)
    evaluation (package)
    models (package)
    registry
    structures (package)
    testing (package)
    utils (package)
    version
    visualization (package)

FUNCTIONS
    digit_version(version_str: str, length: int = 4)
        Convert a version string into a tuple of integers.
        
        This method is usually used for comparing two versions. For pre-release
        versions: alpha < beta < rc.
        
        Args:
            version_str (str): The version string.
            length (int): The maximum number of version levels. Defaults to 4.
        
        Returns:
            tuple[int]: The version info in digits (integers).

DATA
    __all__ = ['__version__', 'version_info', 'digit_version']
    version_info = (3, 2, 0)

VERSION
    3.2.0

FILE
    /home/evan-cushing/.pyenv/versions/3.11.6/envs

#### Installation, pt deux
Okay, through default `pip` installation, mmdet was deeply borked.
It was absolutely unable to run, always returning the error:

```
ModuleNotFoundError: No module named 'mmcv._ext'
```

The problem seems be with `mmcv`.
You cannot "just install it" with `pip`.
You have to install the correct target package for your system.

Refer to this:

https://mmcv.readthedocs.io/en/latest/get_started/installation.html#install-with-pip

Now:

```
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
```

Then:

```
pip install mmengine
pip install mmdet mmpretrain
```

Now confirm `mmdet` is not fricking borked; in an interpreter:

```
from mmdet import apis
```

Should now "just work".

Sheesh.

In [None]:
from mmdet import apis, datasets, engine, evaluation, models, registry
from mmdet import structures, testing, utils, visualization


In [None]:
from mmdet.engine import hooks, optimizers, runner, schedulers


In [None]:
help(engine)


Help on package mmdet.engine in mmdet:

NAME
    mmdet.engine - # Copyright (c) OpenMMLab. All rights reserved.

PACKAGE CONTENTS
    hooks (package)
    optimizers (package)
    runner (package)
    schedulers (package)

FILE
    /home/evan-cushing/.pyenv/versions/3.11.6/envs/3116/lib/python3.11/site-packages/mmdet/engine/__init__.py




# DETR
It's DETR time now.

Let's see what it takes to get their DETR working.

We'll be referencing:
https://github.com/open-mmlab/mmdetection/tree/main/configs/detr

And build out from there.

Let's start with their "official" (trained) DETR config:
`configs/detr/detr_r50_8xb2-150e_coco.py`

https://github.com/open-mmlab/mmdetection/blob/main/configs/detr/detr_r50_8xb2-150e_coco.py

Now, the question is, can we simply copy-paste the configs and work directly with mmdet imports?

In [None]:
# _base_/default_runtime.py
# https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/default_runtime.py

default_scope = 'mmdet'

default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', interval=1),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='DetVisualizationHook'))

env_cfg = dict(
    cudnn_benchmark=False,
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    dist_cfg=dict(backend='nccl'),
)

vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
    type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer')
log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)

log_level = 'INFO'
load_from = None
resume = False


In [None]:
# _base/datasets/coco_detection.py
# https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py

# dataset settings
dataset_type = 'CocoDataset'
#data_root = 'data/coco/'
data_root = '/home/evan-cushing/Data/coco/'

# Example to use different file client
# Method 1: simply set the data root and let the file I/O module
# automatically infer from prefix (not support LMDB and Memcache yet)

# data_root = 's3://openmmlab/datasets/detection/coco/'

# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6
# backend_args = dict(
#     backend='petrel',
#     path_mapping=dict({
#         './data/': 's3://openmmlab/datasets/detection/',
#         'data/': 's3://openmmlab/datasets/detection/'
#     }))
backend_args = None

train_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PackDetInputs')
]
test_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='Resize', scale=(1333, 800), keep_ratio=True),
    # If you don't have a gt annotation, delete the pipeline
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor'))
]
train_dataloader = dict(
    batch_size=2,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    batch_sampler=dict(type='AspectRatioBatchSampler'),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_train2017.json',
        data_prefix=dict(img='train2017/'),
        filter_cfg=dict(filter_empty_gt=True, min_size=32),
        pipeline=train_pipeline,
        backend_args=backend_args))
val_dataloader = dict(
    batch_size=1,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='val2017/'),
        test_mode=True,
        pipeline=test_pipeline,
        backend_args=backend_args))
test_dataloader = val_dataloader

val_evaluator = dict(
    type='CocoMetric',
    ann_file=data_root + 'annotations/instances_val2017.json',
    metric='bbox',
    format_only=False,
    backend_args=backend_args)
test_evaluator = val_evaluator

# inference on test dataset and
# format the output results for submission.
# test_dataloader = dict(
#     batch_size=1,
#     num_workers=2,
#     persistent_workers=True,
#     drop_last=False,
#     sampler=dict(type='DefaultSampler', shuffle=False),
#     dataset=dict(
#         type=dataset_type,
#         data_root=data_root,
#         ann_file=data_root + 'annotations/image_info_test-dev2017.json',
#         data_prefix=dict(img='test2017/'),
#         test_mode=True,
#         pipeline=test_pipeline))
# test_evaluator = dict(
#     type='CocoMetric',
#     metric='bbox',
#     format_only=True,
#     ann_file=data_root + 'annotations/image_info_test-dev2017.json',
#     outfile_prefix='./work_dirs/coco_detection/test')


In [None]:
# configs/detr/detr_r50_8xb2-150e_coco.py
# https://github.com/open-mmlab/mmdetection/blob/main/configs/detr/detr_r50_8xb2-150e_coco.py

#_base_ = [
#    '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py'
#]

model = dict(
    type='DETR',
    num_queries=100,
    data_preprocessor=dict(
        type='DetDataPreprocessor',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        bgr_to_rgb=True,
        pad_size_divisor=1),
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(3, ),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='ChannelMapper',
        in_channels=[2048],
        kernel_size=1,
        out_channels=256,
        act_cfg=None,
        norm_cfg=None,
        num_outs=1),
    encoder=dict(  # DetrTransformerEncoder
        num_layers=6,
        layer_cfg=dict(  # DetrTransformerEncoderLayer
            self_attn_cfg=dict(  # MultiheadAttention
                embed_dims=256,
                num_heads=8,
                dropout=0.1,
                batch_first=True),
            ffn_cfg=dict(
                embed_dims=256,
                feedforward_channels=2048,
                num_fcs=2,
                ffn_drop=0.1,
                act_cfg=dict(type='ReLU', inplace=True)))),
    decoder=dict(  # DetrTransformerDecoder
        num_layers=6,
        layer_cfg=dict(  # DetrTransformerDecoderLayer
            self_attn_cfg=dict(  # MultiheadAttention
                embed_dims=256,
                num_heads=8,
                dropout=0.1,
                batch_first=True),
            cross_attn_cfg=dict(  # MultiheadAttention
                embed_dims=256,
                num_heads=8,
                dropout=0.1,
                batch_first=True),
            ffn_cfg=dict(
                embed_dims=256,
                feedforward_channels=2048,
                num_fcs=2,
                ffn_drop=0.1,
                act_cfg=dict(type='ReLU', inplace=True))),
        return_intermediate=True),
    positional_encoding=dict(num_feats=128, normalize=True),
    bbox_head=dict(
        type='DETRHead',
        num_classes=80,
        embed_dims=256,
        loss_cls=dict(
            type='CrossEntropyLoss',
            bg_cls_weight=0.1,
            use_sigmoid=False,
            loss_weight=1.0,
            class_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=5.0),
        loss_iou=dict(type='GIoULoss', loss_weight=2.0)),
    # training and testing settings
    train_cfg=dict(
        assigner=dict(
            type='HungarianAssigner',
            match_costs=[
                dict(type='ClassificationCost', weight=1.),
                dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
                dict(type='IoUCost', iou_mode='giou', weight=2.0)
            ])),
    test_cfg=dict(max_per_img=100))

# train_pipeline, NOTE the img_scale and the Pad's size_divisor is different
# from the default setting in mmdet.
train_pipeline = [
    dict(type='LoadImageFromFile', backend_args=None),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RandomFlip', prob=0.5),
    dict(
        type='RandomChoice',
        transforms=[[
            dict(
                type='RandomChoiceResize',
                scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                        (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                        (736, 1333), (768, 1333), (800, 1333)],
                keep_ratio=True)
        ],
                    [
                        dict(
                            type='RandomChoiceResize',
                            scales=[(400, 1333), (500, 1333), (600, 1333)],
                            keep_ratio=True),
                        dict(
                            type='RandomCrop',
                            crop_type='absolute_range',
                            crop_size=(384, 600),
                            allow_negative_crop=True),
                        dict(
                            type='RandomChoiceResize',
                            scales=[(480, 1333), (512, 1333), (544, 1333),
                                    (576, 1333), (608, 1333), (640, 1333),
                                    (672, 1333), (704, 1333), (736, 1333),
                                    (768, 1333), (800, 1333)],
                            keep_ratio=True)
                    ]]),
    dict(type='PackDetInputs')
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))

# optimizer
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001),
    clip_grad=dict(max_norm=0.1, norm_type=2),
    paramwise_cfg=dict(
        custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0)}))

# learning policy
#max_epochs = 150
max_epochs = 2
train_cfg = dict(
    type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')

param_scheduler = [
    dict(
        type='MultiStepLR',
        begin=0,
        end=max_epochs,
        by_epoch=True,
        milestones=[100],
        gamma=0.1)
]

# NOTE: `auto_scale_lr` is for automatically scaling LR,
# USER SHOULD NOT CHANGE ITS VALUES.
# base_batch_size = (8 GPUs) x (2 samples per GPU)
auto_scale_lr = dict(base_batch_size=16)


In [None]:
'''
default_scope
default_hooks
env_cfg
vis_backends
visualizer
log_processor
log_level
load_from
resume

dataset_type
data_root
backend_args
train_pipeline
test_pipeline
train_dataloader
val_dataloader
test_dataloader
val_evaluator
test_evaluator

model
train_pipeline
train_dataloader
optim_wrapper
max_epochs
train_cfg
val_cfg
test_cfg
param_scheduler
auto_scale_lr

----

dict(
    model=model,
    train_pipeline=train_pipeline,
    train_dataloader=train_dataloader,
    optim_wrapper=optim_wrapper,
    max_epochs=max_epochs,
    train_cfg=train_cfg,
    val_cfg=val_cfg,
    test_cfg=test_cfg,
    param_scheduler=param_scheduler,
    auto_scale_lr=auto_scale_lr,
    backend_args=backend_args,
    test_pipeline=test_pipeline,
    val_dataloader=val_dataloader,
    test_dataloader=test_dataloader,
    val_evaluator=val_evaluator,
    test_evaluator=test_evaluator,
    default_scope=default_scope,
    default_hooks=default_hooks,
    env_cfg=env_cfg,
    vis_backends=vis_backends,
    visualizer=visualizer,
    log_processor=log_processor,
    log_level=log_level,
    load_from=load_from,
    resume=resume,
)
'''


'\ndefault_scope\ndefault_hooks\nenv_cfg\nvis_backends\nvisualizer\nlog_processor\nlog_level\nload_from\nresume\n\ndataset_type\ndata_root\nbackend_args\ntrain_pipeline\ntest_pipeline\ntrain_dataloader\nval_dataloader\ntest_dataloader\nval_evaluator\ntest_evaluator\n\nmodel\ntrain_pipeline\ntrain_dataloader\noptim_wrapper\nmax_epochs\ntrain_cfg\nval_cfg\ntest_cfg\nparam_scheduler\nauto_scale_lr\n\n----\n\ndict(\n    model=model,\n    train_pipeline=train_pipeline,\n    train_dataloader=train_dataloader,\n    optim_wrapper=optim_wrapper,\n    max_epochs=max_epochs,\n    train_cfg=train_cfg,\n    val_cfg=val_cfg,\n    test_cfg=test_cfg,\n    param_scheduler=param_scheduler,\n    auto_scale_lr=auto_scale_lr,\n    backend_args=backend_args,\n    test_pipeline=test_pipeline,\n    val_dataloader=val_dataloader,\n    test_dataloader=test_dataloader,\n    val_evaluator=val_evaluator,\n    test_evaluator=test_evaluator,\n    default_scope=default_scope,\n    default_hooks=default_hooks,\n    en

In [None]:
CFG = dict(
    model=model,
    train_pipeline=train_pipeline,
    train_dataloader=train_dataloader,
    optim_wrapper=optim_wrapper,
    max_epochs=max_epochs,
    train_cfg=train_cfg,
    val_cfg=val_cfg,
    test_cfg=test_cfg,
    param_scheduler=param_scheduler,
    auto_scale_lr=auto_scale_lr,
    backend_args=backend_args,
    test_pipeline=test_pipeline,
    val_dataloader=val_dataloader,
    test_dataloader=test_dataloader,
    val_evaluator=val_evaluator,
    test_evaluator=test_evaluator,
    default_scope=default_scope,
    default_hooks=default_hooks,
    env_cfg=env_cfg,
    vis_backends=vis_backends,
    visualizer=visualizer,
    log_processor=log_processor,
    log_level=log_level,
    load_from=load_from,
    resume=resume,
)


# Run it
Okay, after browsing the `mmdet` package and subpackages (particularly
`mmdet.apis` and `mmdet.engine`), it's clear that you need to write your own
code for execution of training.

But how?

Well, taking a look at:

https://github.com/open-mmlab/mmdetection/blob/main/tools/train.py

It's clear that we must now use `mmengine`!

## MMEngine

Let's first check out what we can get from the `mmengine` package,
then reference mmdet's `tools/train.py`.

Here's a snippet of `help(mmengine)`:

```
PACKAGE CONTENTS
* mmengine
  * _strategy
    - base
    - colossalai
    - deepspeed
    - distributed
    - fsdp
    - single_device
    - utils
  * analysis
    - complexity_analysis
    - jit_analysis
    - jit_handles
    - print_helper
  * config
    - config
    - lazy
    - utils
  * dataset
    - base_dataset
    - dataset_wrapper
    - sampler
    - utils
  * device
    - utils
  * dist
    - dist
    - utils
  * evaluator
    - evaluator
    - metric
    - utils
  * fileio
    * backends
      - base
      - http_backend
      - lmdb_backend
      - local_backend
      - memcached_backend
      - petrel_backend
      - registry_utils
    - file_client
    * handlers
      - base
      - json_handler
      - pickle_handler
      - registry_utils
      - yaml_handler
    - io
    - parse
  * hooks
    - checkpoint_hook
    - early_stopping_hook
    - ema_hook
    - empty_cache_hook
    - hook
    - iter_timer_hook
    - logger_hook
    - naive_visualization_hook
    - param_scheduler_hook
    - profiler_hook
    - runtime_info_hook
    - sampler_seed_hook
    - sync_buffer_hook
    - test_time_aug_hook
  * hub
    - hub
  * infer
    - infer
  * logging
    - history_buffer
    - logger
    - message_hub
  * model
    - averaged_model
    * base_model
      - base_model
      - data_preprocessor
    - base_module
    - efficient_conv_bn_eval
    - test_time_aug
    - utils
    - weight_init
    * wrappers
      - distributed
      - fully_sharded_distributed
      - seperate_distributed
      - utils
  * optim
    * optimizer
      - amp_optimizer_wrapper
      - apex_optimizer_wrapper
      - base
      - builder
      - default_constructor
      - optimizer_wrapper
      - optimizer_wrapper_dict
      - zero_optimizer
    * scheduler
      - lr_scheduler
      - momentum_scheduler
      - param_scheduler
  * registry
    - build_functions
    - default_scope
    - registry
    - root
    - utils
  * runner
    - _flexible_runner
    - activation_checkpointing
    - amp
    - base_loop
    - checkpoint
    - log_processor
    - loops
    - priority
    - runner
    - utils
  * structures
    - base_data_element
    - instance_data
    - label_data
    - pixel_data
  * testing
    * _internal
      - distributed
    - compare
    - runner_test_case
  * utils
    * dl_utils
      - collect_env
      - hub
      - misc
      - parrots_wrapper
      - setup_env
      - time_counter
      - torch_ops
      - trace
      - visualize
    - manager
    - misc
    - package_utils
    - path
    - progressbar
    - progressbar_rich
    - timer
    - version_utils
  - version
  * visualization
    - utils
    - vis_backend
    - visualizer

DATA
    DATASETS
    DATA_SAMPLERS
    EVALUATOR
    FUNCTIONS
    HOOKS
    INFERENCERS
    LOOPS
    METRICS
    MODELS
    MODEL_WRAPPERS
    OPTIMIZERS
    OPTIM_WRAPPERS
    OPTIM_WRAPPER_CONSTRUCTORS
    PARAM_SCHEDULERS
    RUNNERS
    RUNNER_CONSTRUCTORS
    STRATEGIES
    TASK_UTILS
    VISBACKENDS
    VISUALIZERS
    WEIGHT_INITIALIZERS
```

In [None]:
# Import all registries.
from mmengine import (
    DATASETS, DATA_SAMPLERS, EVALUATOR, FUNCTIONS, HOOKS,
    INFERENCERS, LOOPS, METRICS, MODELS, MODEL_WRAPPERS, OPTIMIZERS,
    OPTIM_WRAPPERS, OPTIM_WRAPPER_CONSTRUCTORS, PARAM_SCHEDULERS, RUNNERS,
    RUNNER_CONSTRUCTORS, STRATEGIES, TASK_UTILS, VISBACKENDS, VISUALIZERS,
    WEIGHT_INITIALIZERS
)


In [None]:
import os

from mmengine.config import Config, DictAction
from mmengine.registry import RUNNERS
from mmengine.runner import Runner

from mmdet.utils import setup_cache_size_limit_of_dynamo


In [None]:
cfg = Config(CFG)


In [None]:
cfg


Config (path: None): {'model': {'type': 'DETR', 'num_queries': 100, 'data_preprocessor': {'type': 'DetDataPreprocessor', 'mean': [123.675, 116.28, 103.53], 'std': [58.395, 57.12, 57.375], 'bgr_to_rgb': True, 'pad_size_divisor': 1}, 'backbone': {'type': 'ResNet', 'depth': 50, 'num_stages': 4, 'out_indices': (3,), 'frozen_stages': 1, 'norm_cfg': {'type': 'BN', 'requires_grad': False}, 'norm_eval': True, 'style': 'pytorch', 'init_cfg': {'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'}}, 'neck': {'type': 'ChannelMapper', 'in_channels': [2048], 'kernel_size': 1, 'out_channels': 256, 'act_cfg': None, 'norm_cfg': None, 'num_outs': 1}, 'encoder': {'num_layers': 6, 'layer_cfg': {'self_attn_cfg': {'embed_dims': 256, 'num_heads': 8, 'dropout': 0.1, 'batch_first': True}, 'ffn_cfg': {'embed_dims': 256, 'feedforward_channels': 2048, 'num_fcs': 2, 'ffn_drop': 0.1, 'act_cfg': {'type': 'ReLU', 'inplace': True}}}}, 'decoder': {'num_layers': 6, 'layer_cfg': {'self_attn_cfg': {'embed_dims': 2