Reducing small object false positives with high-resolution training / Hyperparameter Tuning #24589
Replies: 2 comments
-
|
👋 Hello @cosminlovin7, thank you for your detailed discussion and for your interest in Ultralytics 🚀 This is an automated response to help get things moving quickly, and an Ultralytics engineer will also assist soon. We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered. If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it. Since this appears to be a custom training and small-object performance ❓ question, please provide as much information as possible, including dataset image examples, validation predictions, training logs, and the exact training command or config used, and verify you are following our Tips for Best Training Results. Helpful additions would be:
Join the Ultralytics community where it suits you best. For real-time chat, head to Discord 🎧. Prefer in-depth discussions? Check out Discourse. Or dive into threads on our Subreddit to share knowledge with the community. UpgradeUpgrade to the latest pip install -U ultralyticsEnvironmentsYOLO may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLO Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit. |
Beta Was this translation helpful? Give feedback.
-
|
on the assignment side, our default detect head has smallest stride 8 (P3), and the task-aligned assigner clamps any GT box with width or height < q1 - filtering boxes with shortest side < 8px: reasonable for a standard P3-head model, but it does not gain you much because the assigner already clamps such boxes up to stride. if you switch to a P2 head, drop the filter floor to <4px. before changing the filter, run a size-bucketed val to see how many of your AP_small failures actually sit below 8px vs in the 10-16px range you care about. q2 - q3 - 50/50 internet vs your own camera: any sensor-domain delta (color, noise, lens distortion, fov) can leak as a confidence-domain shift on your camera. if inference is fixed to your camera, oversampling your own data or running a short second-stage fine-tune on your-camera-only after the joint training usually closes the precision gap without throwing away the diversity. q4 - other small-object precision strategies: the highest-impact ones we recommend are P2 head, higher training resolution (you are already at 1920), and treating your high-confidence false positives as a labelling target - crop them and add the surrounding tiles as background-only images (about 0-10% of the dataset, per tips for best training results). your underrepresented "counter-example" class is the same mechanism; extending it with mined FP crops keeps doing the same thing. q5 / q6 - tracking and SAHI together: TrackerNano is single-object and fully driven by the initial ROI, so any first-frame FP propagates. for moving-camera multi-object tracking that is robust to per-frame detection noise, our default tracking flow ( q7 - COCO@640 → 1920 fine-tune: not a problem in itself. confirmed by running it: pretrained weights transfer and train cleanly at depends on dataset and domain factors, but these are the levers that usually move small-object precision on this symptom. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Setup & Training Configuration
I trained YOLOv9-S on a custom dataset with the following setup:
The training lasts for about ~24h on 2 GPUs (~10–12 min/epoch with per-epoch validation)
Results
Overall mAP@0.5:0.95 = 0.78 on the validation set, but there's a significant gap across object sizes:
Core Problem: High-confidence false positives on small objects
The high-resolution training (imgsz=1920) does improve recall on small objects, but it comes with a notable downside: a high number of false positives on small objects, many with confidence scores around ~0.75. This makes confidence threshold filtering ineffective (I use a confidence threshold of 0.7 as good predictions have around 0.7 - 0.95 confidence depending on object size).
My use case is object tracking (using TrackerNano from OpenCV), where the model output is used as the initial ROI. A false positive ROI leads directly to bad tracking, so precision on small objects is critical - even more than recall.
What I've already tried / considered
Questions
I have a couple of questions regarding what I could do:
Any guidance or pointers would be appreciated. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions