All model configurations and weights here are fine-tuned from models available in [Detectron2 Model Zoo](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md).

The models are trained on 512x512px RGB image patches with spatial resolution between 3.9cm and 4.3 cm. 

## Running models for image patches

For individual image patches, the models are fairly straightforward to run. 

```python
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.data import build_detection_test_loader
import cv2

cfg = get_cfg()
cfg.merge_from_file(<path_to_model_config>)
cfg.OUTPUT_DIR = '<path_to_output>'
cfg.MODEL.WEIGHTS = '<path_to_weights>'
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # score threshold for detections
predictor = DefaultPredictor(cfg)

img = cv2.imread('<path_to_image_patch>')
outputs = predictor(image)
```

More examples are shown on [Patch level results](4_patch_level_results.ipynb).

## Running models for larger scenes

Running on larger scenes requires the following steps:

1. Tiling the scenes into smaller image patches, optionally with overlap
2. Running the model on these smaller patches
3. Gathering the predictions into a single GIS data file
4. Optionally post-processing the results

[drone_detector](https://jaeeolma.github.io/drone_detector) package has helpers for this:

```python
from drone_detector.engines.detectron2.predict import predict_instance_masks

predict_instance_masks(path_to_model_config='<path_to_model_config>', # model config file
                       path_to_image='<path_to_image>', # which image to process
                       outfile='<name_for_predictions>.geojson', # where to save the results
                       processing_dir='temp', # directory for temporary files, deleted afterwards. Default: temp
                       tile_size=512, # image patch size in pixels, square patches. Default: 400
                       tile_overlap=256, # overlap between tiles. Default: 100
                       smooth_preds=False, # not yet implemented, at some points runs dilation+erosion to smooth polygons. Default: False
                       coco_set='<path_to_coco>', # the coco set the model was trained on to infer the classes. If empty, defaults to dummy categories. Default: None
                       postproc_results=True # whether to discard masks in the edge regions of patches Default: False
                      )
```

Also, after installing the package, `predict_instance_masks_detectron2` can be used as CLI command with identical syntax.

## Available models and results

Models are trained only with Hiidenportti dataset.

Patch-level data are non-overlapping 512x512 pixel tiles extracted from larger virtual plots. The results presented here are the run with test-time augmentation.

Scene-level data are the full virtual plots extracted from the full images. For Hiidenportti, the virtual plot sizes vary between 2560x2560px and 8192x4864px. These patches contain also non-annotated buffer areas in order to extract the complete annotated area. For Sudenpesänkangas, all 71 scenes are 100x100 meters (2063x2062) pixels, and during inference they are extracted from the full mosaic with enough buffer to cover the full area. The results presented here are run for 512x512 pixel tiles with 256 px overlap, with both edge filtering and mask merging described in the workflow.

### Hiidenportti test set

Hiidenportti test set contains 241 non-overlapping 512x512 pixel image patches, extracted from 5 scenes that cover 11 circular field plots.

|Model|Patch AP50|Patch AP|Patch AP-groundwood|Patch AP-uprightwood|Scene AP50|Scene AP|Scene AP-groundwood|Scene AP-uprightwood|
|----|-----------|--------|-------------------|--------------------|-----------|--------|-------------------|--------------------|
|mask_rcnn_R_50_FPN_3x|-|-|-|-|-|-|-|-|
|mask_rcnn_R_101_FPN_3x|0.704|0.366|0.326|0.406|0.596|0.284|0.244|0.324|
|mask_rcnn_X_101_32x8d_FPN_3x|-|-|-|-|-|-|-|-|
|cascade_mask_rcnn_R_50_FPN_3x|-|-|-|-|-|-|-|-|

### Sudenpesänkangas dataset

Sudenpesänkangas dataset contains 798 on-overlapping 512x512 pixel image patches, extracted from 71 scenes.

|Model|Patch AP50|Patch AP|Patch AP-groundwood|Patch AP-uprightwood|Scene AP50|Scene AP|Scene AP-groundwood|Scene AP-uprightwood|
|----|-----------|--------|-------------------|--------------------|-----------|--------|-------------------|--------------------|
|mask_rcnn_R_50_FPN_3x|-|-|-|-|-|-|-|-|
|mask_rcnn_R_101_FPN_3x|0.519|0.252|0.183|0.321|0.480|0.220|0.153|0.286|
|mask_rcnn_X_101_32x8d_FPN_3x|-|-|-|-|-|-|-|-|
|cascade_mask_rcnn_R_50_FPN_3x|-|-|-|-|-|-|-|-|