# Evaluation

- Compute average precision and average recall on the test dataset of a given model.
- The evaluation metrics are the ones defined in the COCO dataset.

In [3]:
%env CUDA_VISIBLE_DEVICES=1
%load_ext autoreload
%autoreload 2
from pathlib import Path

import matplotlib.pyplot as plt
import tqdm
import torch

import sys; sys.path.append("../")
import maskflow

root_dir = Path("/home/hadim/.data/Neural_Network/Maskflow/C_elegans")
data_dir = root_dir / "Data"
model_dir = root_dir / "Models"
model_dir.mkdir(exist_ok=True)

# Import the configuration associated with this dataset and network.
config = maskflow.config.load_config(root_dir / "config.yaml")

env: CUDA_VISIBLE_DEVICES=1
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [4]:
# Select the model
model_name = '2018.11.10-23:30:55'
model_path = model_dir / model_name

# Set some configurations
config['MODEL']['DEVICE'] = "cuda"
config['DATALOADER']['NUM_WORKERS'] = 16
config['TEST']['IMS_PER_BATCH'] = 16
    
# Run the evaluation
# (it will create a file called evauation.json in `model_path`)
results = maskflow.inference.run_evaluation(config, model_path, data_dir)

2018-11-10 23:47:58,230:INFO:maskrcnn_benchmark.utils.checkpoint: Loading checkpoint from /home/hadim/.data/Neural_Network/Maskflow/C_elegans/Models/2018.11.10-23:30:55/model_0000250.pth
2018-11-10 23:47:58,653:INFO:maskrcnn_benchmark.inference: Start evaluation on 20 images


loading annotations into memory...
Done (t=0.06s)
creating index...
index created!


  "See the documentation of nn.Upsample for details.".format(mode))
2it [00:08,  5.64s/it]
2018-11-10 23:48:08,401:INFO:maskrcnn_benchmark.inference: Total inference time: 0:00:09.743486 (0.48717432022094725 s / img per device, on 1 devices)
2018-11-10 23:48:08,402:INFO:maskrcnn_benchmark.inference: Preparing results for COCO format
2018-11-10 23:48:08,404:INFO:maskrcnn_benchmark.inference: Preparing bbox results
2018-11-10 23:48:08,419:INFO:maskrcnn_benchmark.inference: Preparing segm results
20it [00:00, 85.18it/s]
2018-11-10 23:48:08,665:INFO:maskrcnn_benchmark.inference: Evaluating predictions


Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.32s).
Accumulating evaluation results...
DONE (t=0.04s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.622
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.880
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.693
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.634
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.665
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.057
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.564
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1

2018-11-10 23:48:09,512:INFO:maskrcnn_benchmark.inference: OrderedDict([('bbox', OrderedDict([('AP', 0.6222989867726723), ('AP50', 0.8799681542576778), ('AP75', 0.6933596511797944), ('APs', 0.6344390275095129), ('APm', 0.6649455794430983), ('APl', -1.0)])), ('segm', OrderedDict([('AP', 0.5057224507325584), ('AP50', 0.8040537209755729), ('AP75', 0.5941074543219045), ('APs', 0.46026071657222634), ('APm', 0.5761194578043566), ('APl', -1.0)]))])


DONE (t=0.36s).
Accumulating evaluation results...
DONE (t=0.04s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.506
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.804
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.594
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.460
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.576
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.048
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.480
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.592
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.545
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.660
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= la

---

Microtubule, `2018.11.10-16:39:38`:

| Step | mAP | mAR | AP@0.50 | AP@0.75
| --- | --- | --- | --- | --- |
| 750 | 0.33 | 0.02 | 0.62 | 0.32 |
| 1000 | 0.33 | 0.02 | 0.61 | 0.31 |
| 1500 | 0.34 | 0.02 | 0.61 | 0.33 |
| 2500 | 0.36 | 0.019 | 0.65 | 0.36 |