Reducing small object false positives with high-resolution training / Hyperparameter Tuning #24589

cosminlovin7 · 2026-05-25T07:02:37Z

cosminlovin7
May 25, 2026

Setup & Training Configuration
I trained YOLOv9-S on a custom dataset with the following setup:

~20k training images at Full HD and 4K resolution, 3 classes
One class is a bit underrepresented (only ~1k images), but used as a counter-example to reduce confusion between classes - I don't need to detect it, just prevent the other two classes from being confused with it
imgsz = 1920
100 epochs
default hyperparameters (ex.: mosaic=1.0, lr0=0.01, lrf=0.01, mixup=0.0, cutmix=0.0, scale=0.5)

The training lasts for about ~24h on 2 GPUs (~10–12 min/epoch with per-epoch validation)

Results
Overall mAP@0.5:0.95 = 0.78 on the validation set, but there's a significant gap across object sizes:

AP_small: 0.3284
AP_medium: 0.7739
AP_large: 0.7593

Core Problem: High-confidence false positives on small objects

The high-resolution training (imgsz=1920) does improve recall on small objects, but it comes with a notable downside: a high number of false positives on small objects, many with confidence scores around ~0.75. This makes confidence threshold filtering ineffective (I use a confidence threshold of 0.7 as good predictions have around 0.7 - 0.95 confidence depending on object size).
My use case is object tracking (using TrackerNano from OpenCV), where the model output is used as the initial ROI. A false positive ROI leads directly to bad tracking, so precision on small objects is critical - even more than recall.

What I've already tried / considered

Careful annotation: bounding boxes are tightly fitted around objects
Pre-training cleanup: removed boxes with shortest side < 8px (at training resolution) and dropped images that had no boxes remaining after this step - not very sure whether this was the right call
SAHI: tested it, but performance was similar to full-HD training, and it introduced duplicate/unstable overlapping boxes for the same object, which hurts tracking
Hyperparameter tuning: considered it, but impractical given that a single full run takes ~24h and the documentation itself notes that short tuning runs rarely produce transferable results. According to documentation: "Hyperparameters derived from short or small-scale tuning runs are rarely optimal for real-world training. In practice, tuning should be performed under settings similar to full training — including comparable datasets, epochs, and augmentations — to ensure reliable and transferable results. Quick tuning may bias parameters toward faster convergence or short-term validation gains that do not generalize."
YOLOv26-tiny: I have also tried yolov26, the tiny version, with the same training settings, but i got same problems

Questions

I have a couple of questions regarding what I could do:

Is removing boxes with shortest side < 8px before training a good practice, or should I leave them in? Ideally, I'd rather not detect anything below 8px and treat it as a known limitation, but I still want reliable predictions for objects in the 10–16px range.
Is mosaic=1.0 potentially harmful here? I suspect mosaic augmentation may introduce small artifact patches that the model learns to associate with my classes, contributing to false positives. Would reducing or disabling mosaic help?
My dataset consists of data collected from the internet and data collected by myself in a ratio of 50/50. I have tested using my camera setup. Can data collected from the internet be harmful to my camera settings? Should i focus more on using data from my camera settings?
Are there any other strategies to improve precision specifically for small objects, beyond what I've already tried?
Any suggestions for more robust tracking approaches that are less sensitive to an imperfect initial ROI, or that handle high-confidence false positives more gracefully? It's worth mentioning that my camera is not a static one, so it's placed on a moving object.
From your experience, which approach is better suited for object tracking? SAHI or high-resolution training?
Can it be problem that I use the pre-trained weights on COCO dataset at 640px to fine-tune at 1920px on my custom dataset?

Any guidance or pointers would be appreciated. Thanks!

UltralyticsAssistant · 2026-05-25T07:03:09Z

UltralyticsAssistant
May 25, 2026
Maintainer

👋 Hello @cosminlovin7, thank you for your detailed discussion and for your interest in Ultralytics 🚀 This is an automated response to help get things moving quickly, and an Ultralytics engineer will also assist soon.

We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

Since this appears to be a custom training and small-object performance ❓ question, please provide as much information as possible, including dataset image examples, validation predictions, training logs, and the exact training command or config used, and verify you are following our Tips for Best Training Results. Helpful additions would be:

A few representative images showing the small-object false positives
Validation batch visualizations or confusion patterns for the affected classes
Your exact model, package version, and whether the same behavior appears after upgrading
Whether the issue is reproducible on a smaller subset suitable for a minimum reproducible example 🔍

Join the Ultralytics community where it suits you best. For real-time chat, head to Discord 🎧. Prefer in-depth discussions? Check out Discourse. Or dive into threads on our Subreddit to share knowledge with the community.

Upgrade

Upgrade to the latest ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8 to verify your issue is not already resolved in the latest version:

pip install -U ultralytics

Environments

YOLO may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLO Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

0 replies

raimbekovm · 2026-05-25T08:48:25Z

raimbekovm
May 25, 2026
Maintainer

on the assignment side, our default detect head has smallest stride 8 (P3), and the task-aligned assigner clamps any GT box with width or height < stride[0] up to stride[0] so it still gets positives (ultralytics/utils/tal.py). i verified this on a synthetic GT: a 4×4 box gets clamped to 8×8 under a P3 head, and stays 4×4 under a P2 head (which has stride[0]=4). practically, objects below ~8 source pixels at training resolution get weak supervision but are not dropped. for objects you want to detect reliably in the 10-16px range, the cleanest lever is a P2-head model (yolo26-p2.yaml, yolov8-p2.yaml) which lowers the smallest stride to 4 - see the small-objects blog.

q1 - filtering boxes with shortest side < 8px: reasonable for a standard P3-head model, but it does not gain you much because the assigner already clamps such boxes up to stride. if you switch to a P2 head, drop the filter floor to <4px. before changing the filter, run a size-bucketed val to see how many of your AP_small failures actually sit below 8px vs in the 10-16px range you care about.

q2 - mosaic=1.0: i measured the geometric effect of the full augmentation pipeline at imgsz=1920 with default scale=0.5 on a small sample - median object linear size after mosaic+scale+letterbox drops to about 60% of the raw source size, so an already-small object loses another ~30-40% on its shortest side. it is not necessarily harmful for precision, but it does push already-small classes further into the sub-stride zone. before disabling outright, extend close_mosaic from the default 10 to e.g. 30-50 epochs so the model stabilises on un-mosaicked data, or lower the probability to mosaic=0.5. see data augmentation.

q3 - 50/50 internet vs your own camera: any sensor-domain delta (color, noise, lens distortion, fov) can leak as a confidence-domain shift on your camera. if inference is fixed to your camera, oversampling your own data or running a short second-stage fine-tune on your-camera-only after the joint training usually closes the precision gap without throwing away the diversity.

q4 - other small-object precision strategies: the highest-impact ones we recommend are P2 head, higher training resolution (you are already at 1920), and treating your high-confidence false positives as a labelling target - crop them and add the surrounding tiles as background-only images (about 0-10% of the dataset, per tips for best training results). your underrepresented "counter-example" class is the same mechanism; extending it with mined FP crops keeps doing the same thing.

q5 / q6 - tracking and SAHI together: TrackerNano is single-object and fully driven by the initial ROI, so any first-frame FP propagates. for moving-camera multi-object tracking that is robust to per-frame detection noise, our default tracking flow (model.track(..., tracker="botsort.yaml", persist=True)) is the natural fit - BoT-SORT has camera motion compensation built in (gmc_method: sparseOptFlow in botsort.yaml) and frame-to-frame association absorbs occasional FPs. see tracking modes. on SAHI: in a quick check on a stress-tested image, the default postprocess (GREEDYNMM at match threshold 0.5) deduped slice-boundary detections cleanly, so the duplicates you saw are likely sensitive to postprocess_type and postprocess_match_threshold - worth tuning these before walking away from SAHI. that said, for a tracking workload high-resolution training tends to give more frame-to-frame consistent boxes than SAHI, which is what trackers need most.

q7 - COCO@640 → 1920 fine-tune: not a problem in itself. confirmed by running it: pretrained weights transfer and train cleanly at imgsz=1920, stride stays the same. the backbone and head are fully convolutional and stride-based, so they handle any imgsz that is a multiple of the stride.

depends on dataset and domain factors, but these are the levers that usually move small-object precision on this symptom.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

Reducing small object false positives with high-resolution training / Hyperparameter Tuning #24589

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Ultralytics

Reducing small object false positives with high-resolution training / Hyperparameter Tuning #24589

Uh oh!

Uh oh!

cosminlovin7 May 25, 2026

Replies: 2 comments

Uh oh!

UltralyticsAssistant May 25, 2026 Maintainer

Upgrade

Environments

Status

Uh oh!

raimbekovm May 25, 2026 Maintainer

cosminlovin7
May 25, 2026

UltralyticsAssistant
May 25, 2026
Maintainer

raimbekovm
May 25, 2026
Maintainer