## Model Evaluation

This notebook evaluates the trained Faster R-CNN ResNet-101 model on the 153-image test set using the TensorFlow Object Detection API evaluation script. Model performance is assessed using COCO evaluation metrics, with a particular focus on mean Average Precision (mAP) at IoU thresholds of 0.50 and 0.75.

The evaluation runs once on the final checkpoint (step 20,000) to measure generalization performance on completely unseen data.

### What is Evaluation?

**Evaluation** means testing the trained model on the **test set** (10% of our data that was NOT used for training) to measure:
- **Map @ IoU 0.5** - How often the model correctly detects animals with at least 50% bounding box overlap
- **Map @ IoU 0.75** - How often the model correctly detects animals with at least 75% bounding box overlap (stricter)
- **mAP (mean Average Precision)** - Overall detection accuracy across all three species
- **Validation Loss** - How well the model performs on unseen data (compared to training loss)

### Why Evaluation is Critical

Training loss only tells us how well the model learned the training images. It could be "memorizing" instead of truly learning. Evaluation metrics tell us if the model can generalize to NEW images.

In [19]:
from IPython.display import HTML
HTML("""
<style>
/* Wrap long code and outputs */
pre, code { 
  white-space: pre-wrap !important;
  word-wrap: break-word !important;
}

/* Extra: make code smaller only when printing */
@media print {
  pre, code { font-size: 9pt !important; }
}
</style>
""")

## Import Required Libraries

In [4]:
import os
import sys
import glob
import subprocess
from pathlib import Path

print("Libraries imported successfully")
print(f"Python interpreter: {sys.executable}")

Libraries imported successfully
Python interpreter: C:\Users\MSC1\anaconda3\envs\Env-7144COMP\python.exe


## Configuration Paths

In [20]:
MODEL_DIR = r"C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\object_detection\training\TF2\faster_rcnn_output"
PIPELINE_CONFIG = r"C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\object_detection\training\TF2\training\faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8\pipeline.config"
MODEL_MAIN = r"C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\models\research\object_detection\model_main_tf2.py"

print("Evaluation paths configured.")

Evaluation paths configured.


## Configure Python Paths for TF Object Detection API

In [21]:
import os, sys

TFOD_ROOT = r"C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\models"
RESEARCH_DIR = os.path.join(TFOD_ROOT, "research")
SLIM_DIR = os.path.join(RESEARCH_DIR, "slim")

for p in [RESEARCH_DIR, SLIM_DIR]:
    if p not in sys.path:
        sys.path.insert(0, p)

os.environ["PYTHONPATH"] = RESEARCH_DIR + os.pathsep + SLIM_DIR

print("TF Object Detection API paths configured.")

TF Object Detection API paths configured.


In [22]:
from object_detection import model_lib_v2
print("✅ object_detection import works!")

✅ object_detection import works!


## Run Model Evaluation on Test Set

In [18]:
import os
import sys
import subprocess

# Paths
MODEL_DIR = r"C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\object_detection\training\TF2\faster_rcnn_output"

PIPELINE_CONFIG = r"C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\object_detection\training\TF2\training\faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8\pipeline.config"

MODEL_MAIN = r"C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\models\research\object_detection\model_main_tf2.py"

print("MODEL_DIR:", MODEL_DIR)
print("PIPELINE_CONFIG:", PIPELINE_CONFIG)
print("MODEL_MAIN:", MODEL_MAIN)

# Build evaluation command
eval_command = [
    sys.executable,
    MODEL_MAIN,
    f"--pipeline_config_path={PIPELINE_CONFIG}",
    f"--model_dir={MODEL_DIR}",
    f"--checkpoint_dir={MODEL_DIR}",
    "--run_once=True"
]

print("\nRunning evaluation command:\n")
print(" ".join(eval_command))

# Run evaluation
result = subprocess.run(eval_command, capture_output=True, text=True)

print("\n==== STDOUT (last 5000 chars) ====")
print(result.stdout[-5000:])

print("\n==== STDERR (last 5000 chars) ====")
print(result.stderr[-5000:])

MODEL_DIR: C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\object_detection\training\TF2\faster_rcnn_output
PIPELINE_CONFIG: C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\object_detection\training\TF2\training\faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8\pipeline.config
MODEL_MAIN: C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\models\research\object_detection\model_main_tf2.py

Running evaluation command:

C:\Users\MSC1\anaconda3\envs\Env-7144COMP\python.exe C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\models\research\object_detection\model_main_tf2.py --pipeline_config_path=C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\object_detection\training\TF2\training\faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8\pipeline.config --model_dir=C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\object_detection\training\TF2\faster_rcnn_output --checkpoint_dir=C:\Users\MSC1\Desktop\Tensorflow-Object-Detection-API\Base\v1\o

## Evaluation Result Summary
The trained Faster R-CNN ResNet-101 model was evaluated on the held-out test set containing 153 images using the TensorFlow Object Detection API evaluation script. Evaluation was performed once using the final checkpoint at step 20,000 to measure generalisation performance on unseen data.

### COCO Evaluation Metrics

| Metric | Value | Interpretation |
|------|------|--------------|
| mAP@[0.50:0.95] | 0.660 | Overall detection performance across multiple IoU thresholds |
| mAP@0.50 | 0.954 | High detection and classification performance under moderate localisation constraints |
| mAP@0.75 | 0.811 | Strong localisation performance under stricter overlap requirements |
| AR@100 | 0.750 | The model retrieves most ground-truth objects when up to 100 detections are allowed |

### Interpretation of Results

The high mAP@0.50 score indicates that the model reliably detects and classifies the three wildlife species when a moderate bounding box overlap is required. The drop in performance at IoU 0.75 is expected, as stricter overlap thresholds place higher demands on precise localisation, particularly for animals with variable poses, occlusions, and scale differences.

The absence of small and medium object metrics reflects the dataset characteristics, where most annotated animals fall into the large object category. Overall, the evaluation results shows strong generalisation performance and confirm that the model has learned robust wildlife-specific features.

