GorillaWatch: An Automated System for In-the-Wild Gorilla Re-Identification and Population Monitoring
Paper accepted at WACV 2026 | arXiv:2512.07776
GorillaWatch is a comprehensive automated system for in-the-wild gorilla re-identification and population monitoring. Monitoring critically endangered western lowland gorillas has historically required immense manual effort to re-identify individuals from vast archives of camera trap footage. This work addresses this challenge by introducing an end-to-end pipeline integrating detection, tracking, and re-identification. We leverage multi-frame self-supervised pretraining and demonstrate that large-scale image backbones outperform specialized video architectures for this task.
-
Novel Benchmark Datasets: Three large-scale, in-the-wild datasets for gorilla analysis:
- Gorilla-SPAC-Wild: Largest video dataset for wild primate re-identification
- Gorilla-Berlin-Zoo: Cross-domain re-identification generalization assessment
- Gorilla-SPAC-MoT: Multi-object tracking in camera trap footage
-
End-to-End Detection & Tracking Pipeline: Integrated framework for automatic gorilla detection, tracking, and re-identification from video
-
Multi-Frame Self-Supervised Pre-training: Leverages temporal consistency in tracklets to learn domain-specific features without manual labels
-
Interpretability Verification: Differentiable adaptation of AttnLRP ensures the model relies on discriminative biometric traits rather than background correlations
-
Large-Scale Backbone Analysis: Demonstrates that aggregating features from large-scale image backbones outperforms specialized video architectures
-
Unsupervised Population Counting: Spatiotemporal constraints integrated into clustering to mitigate over-segmentation for accurate population monitoring
The GorillaWatch system comprises:
- Detection & Tracking: Automatic gorilla detection and temporal tracking from video streams
- Re-Identification Backbone: Large-scale image backbones (Vision Transformer, ConvNets) for learning discriminative person/gorilla embeddings
- Multi-Frame Self-Supervised Learning: Temporal pretraining that leverages consistency across frames in tracklets
- Population Monitoring: Constrained clustering with spatiotemporal constraints for accurate gorilla counting and re-identification
- Interpretability: AttnLRP-based verification to ensure predictions rely on valid biometric features (distinctive markings, body shape) rather than spurious background correlations
- Docker
Using Docker:
./scripts/run-in-docker.sh -g 0src/gorillawatch/
├── data/ # Data loading and dataset utilities
├── model/ # Model architectures and training logic
├── clustering/ # Constrained clustering for evaluation
├── losses/ # Triplet loss and regularization implementations
├── qualitative_evaluate/ # Qualitative analysis utilities
├── utils/ # Determinism and type helpers
├── train_and_eval.py # Main training entry point
└── evaluate.py # Evaluation and inference
The easiest way to train a model is using the provided shell script:
./scripts/train.shThis trains a ViT-Small DINOv2 model on the Gorilla-SPAC-Wild dataset with paper-compliant hyperparameters:
- Batch size: 8 (effective: 48 with 6 gradient accumulation steps)
- Epochs: 100 with early stopping (patience=10)
- Learning rate: 1.9×10⁻⁶ (cosine schedule)
- Regularization: L2=0.0059, L2SP=1.3×10⁻⁷
- Loss: Hard triplet mining with margin=0.647
- Evaluation frequency: Every 10 epochs (configurable via
EVAL_FREQUENCY)
You can customize parameters by editing variables in scripts/train.sh or run the Python script directly:
python src/gorillawatch/train_and_eval.py \
--wandb_project "GorillaWatch-Training" \
--wandb_run "my_experiment" \
--backbone_name vit_small_patch14_dinov2.lvd142m \
--dataset "gorilla-watch/Gorilla-SPAC-Wild" \
--dataset_config "face" \
--epochs 100 \
--batch_size 8 \
--lr 0.0000019 \
--eval_frequency 10The dataset will be automatically downloaded from HuggingFace on first run.
Evaluate a trained checkpoint:
./scripts/eval_fine_tuned.shOr evaluate specific checkpoints:
python src/gorillawatch/evaluate.py \
--evaluate_model_path saved_checkpoints/your_model.pth \
--backbone_name vit_small_patch14_dinov2.lvd142m \
--dataset "gorilla-watch/Gorilla-SPAC-Wild" \
--dataset_config "face" \
--batch_size 8Evaluate pre-trained backbones without fine-tuning:
./scripts/eval_zero_shot.sh./scripts/clustering.shThe paper demonstrates:
- State-of-the-art re-identification performance on Gorilla-SPAC-Wild
- Cross-domain generalization assessment using Gorilla-Berlin-Zoo
- Multi-object tracking evaluation on Gorilla-SPAC-MoT
- Interpretability analysis via AttnLRP showing reliance on biometric features
- Comparative analysis of backbone architectures for gorilla re-identification
- Population counting accuracy using spatiotemporal-constrained clustering
Detailed results and ablation studies are available in the full paper.
If you use GorillaWatch in your research, please cite our WACV 2026 paper:
@inproceedings{GorillaWatch2026,
title={GorillaWatch: An Automated System for In-the-Wild Gorilla Re-Identification and Population Monitoring},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
author={Maximilian Schall and Felix Leonard Knöfel and Noah Elias König and Jan Jonas Kubeler and Maximilian von Klinski and Joan Wilhelm Linnemann and Xiaoshi Liu and Iven Jelle Schlegelmilch and Ole Woyciniuk and Alexandra Schild and Dante Wasmuht and Magdalena Bermejo Espinet and German Illera Basas and Gerard de Melo},
year={2026},
archivePrefix={arXiv},
eprint={2512.07776}
}This project is licensed under the MIT License - see the LICENSE file for details.
We thank the collaborators and data providers who made this research possible. The work was conducted in collaboration with wildlife conservation organizations to ensure practical impact on endangered species monitoring.