# 🥇 Multi-Instance Object Detection Challenge - 1st Place Solution
**Author:** Siam Arefin  
**Score:** 0.995 mAP@50 (Private Leaderboard)  
**Data:** 100% FalconCloud Synthetic Data Only

This notebook explains my full pipeline for training a YOLO-based model on synthetic data generated in FalconCloud, optimized to detect soup cans in real-world test images.


🙏 Acknowledgments
This solution was made possible thanks to the valuable contributions and resources shared by the Kaggle community. I would like to express my sincere gratitude to the following:

Ángel Jacinto Sánchez Ruiz
📁 Dataset: Falcon Soup Cans
For sharing a diverse and well-curated synthetic dataset using FalconCloud.

mehul0518 (Owner), Annabelle Min (Editor), Nikhil Reddy (Editor)
📁 Dataset: Soup Can
For publishing another rich synthetic dataset that helped enhance training diversity.

Younes Benalia
💡 For code structure and implementation insights that inspired parts of this notebook and training pipeline.

🧠 The final training dataset was a merged version of the above two FalconCloud-generated datasets, and all code was adapted while respecting competition rules and original sources.

This competition was a great collaborative experience — thank you all!

# Create Yaml file 

In [1]:
import pandas
import yaml

data = {
    'train': [
        '/kaggle/input/soup-can/CAN_Dataset/cameraDistance/train',
        '/kaggle/input/soup-can/CAN_Dataset/baseData/train',
        '/kaggle/input/soup-can/CAN_Dataset/coolLighting/train',
        '/kaggle/input/soup-can/CAN_Dataset/furniture/train',
        '/kaggle/input/soup-can/CAN_Dataset/plants/train',
        '/kaggle/input/soup-can/CAN_Dataset/genericObjects/train',
        '/kaggle/input/soup-can/CAN_Dataset/misclassifications2/train',
        '/kaggle/input/soup-can/CAN_Dataset/misclassified-objects/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputallcouch/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputallfridge/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputallplant/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputalltable/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputalltv/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputcarpet2/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputcarpet3/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputcarpet4/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputcarpet5/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputcouch2/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputcouch3/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputcouch4/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputcouch5/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputfridge2/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputfridge3/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputfridge4/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputfridge5/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputplant2/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputplant3/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputplant4/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputplant5/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputtable2/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputtable3/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputtable4/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputtable5/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputtv2/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputtv3/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputtv4/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/outputtv5/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/plants/train',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge1/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge2/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge3/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge4/Output/1/train',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge5/Output/1/train',
        
    ],
    'val': [
        '/kaggle/input/soup-can/CAN_Dataset/cameraDistance/val',
        '/kaggle/input/soup-can/CAN_Dataset/baseData/val',
        '/kaggle/input/soup-can/CAN_Dataset/coolLighting/val',
        '/kaggle/input/soup-can/CAN_Dataset/furniture/val',
        '/kaggle/input/soup-can/CAN_Dataset/plants/val',
        '/kaggle/input/soup-can/CAN_Dataset/genericObjects/val',
        '/kaggle/input/soup-can/CAN_Dataset/misclassifications2/val',
        '/kaggle/input/soup-can/CAN_Dataset/misclassified-objects/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputallcouch/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputallfridge/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputallplant/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputalltable/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputalltv/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputcarpet2/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputcarpet3/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputcarpet4/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputcarpet5/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputcouch2/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputcouch3/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputcouch4/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputcouch5/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputfridge2/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputfridge3/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputfridge4/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputfridge5/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputplant2/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputplant3/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputplant4/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputplant5/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputtable2/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputtable3/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputtable4/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputtable5/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputtv2/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputtv3/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputtv4/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/outputtv5/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/plants/val',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge1/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge2/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge3/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge4/Output/1/val',
        '/kaggle/input/soup-can/CAN_Dataset/topfridge5/Output/1/val',
       
    ],
    'test': '/kaggle/input/kan-dataset/Kan/test_dataset',
    'nc': 1,
    'names': ['Soup']
}

with open('yolo_params.yaml', 'w') as file:
    yaml.dump(data, file)


In [2]:
!pip install ultralytics > /dev/null 

In [3]:
!pip install ensemble-boxes  > /dev/null 

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ultralytics import YOLO
from pathlib import Path
import csv
import os
import random
import torch
# Set random seeds for reproducibility
np.random.seed(42)
random.seed(42)
torch.manual_seed(42)

Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


<torch._C.Generator at 0x7f12f22bbe30>

In [5]:
model = YOLO("yolo11m.pt")
data_yaml = "/kaggle/working/yolo_params.yaml"

model.train(
    data=data_yaml,
    epochs=10,                
    batch=4,                   
    imgsz=1056,
    patience=150,               
    optimizer='SGD',
    momentum=0.937,          
    lr0=0.001,                
    weight_decay=0.0005,       
    cos_lr=True,               
    save_period=1,             
    workers=8,
    # Augmentations
    close_mosaic=20,
    hsv_h=0.015,
    hsv_s=0.7,
    hsv_v=0.4,
    flipud=0.5,
    fliplr=0.5,
    translate=0.1,
    scale=0.5,
    shear=0.01
)

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11m.pt to 'yolo11m.pt'...


100%|██████████| 38.8M/38.8M [00:01<00:00, 33.2MB/s]


Ultralytics 8.3.165 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (Tesla P100-PCIE-16GB, 16269MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=4, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=20, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=True, cutmix=0.0, data=/kaggle/working/yolo_params.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.5, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=1056, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.001, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolo11m.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train, nbs=64, nms=False, opset=None, optimize=False, optimizer=SGD, overlap_mask=True, patience=150, perspective=0.0, plots

100%|██████████| 755k/755k [00:00<00:00, 3.84MB/s]


Overriding model.yaml nc=80 with nc=1

                   from  n    params  module                                       arguments                     
  0                  -1  1      1856  ultralytics.nn.modules.conv.Conv             [3, 64, 3, 2]                 
  1                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  2                  -1  1    111872  ultralytics.nn.modules.block.C3k2            [128, 256, 1, True, 0.25]     
  3                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
  4                  -1  1    444928  ultralytics.nn.modules.block.C3k2            [256, 512, 1, True, 0.25]     
  5                  -1  1   2360320  ultralytics.nn.modules.conv.Conv             [512, 512, 3, 2]              
  6                  -1  1   1380352  ultralytics.nn.modules.block.C3k2            [512, 512, 1, True]           
  7                  -1  1   2360320  ultralytics

100%|██████████| 5.35M/5.35M [00:00<00:00, 16.2MB/s]


[34m[1mAMP: [0mchecks passed ✅
[34m[1mtrain: [0mFast image access ✅ (ping: 0.8±0.5 ms, read: 93.6±32.2 MB/s, size: 3641.0 KB)


[34m[1mtrain: [0mScanning /kaggle/input/soup-can/CAN_Dataset/baseData/train/labels... 5275 images, 183 backgrounds, 0 corrupt: 100%|██████████| 5275/5275 [01:14<00:00, 70.77it/s]


[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, method='weighted_average', num_output_channels=3), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))
[34m[1mval: [0mFast image access ✅ (ping: 0.0±0.0 ms, read: 66.8±47.6 MB/s, size: 3562.0 KB)


[34m[1mval: [0mScanning /kaggle/input/soup-can/CAN_Dataset/baseData/val/labels... 760 images, 21 backgrounds, 0 corrupt: 100%|██████████| 760/760 [00:11<00:00, 64.62it/s]


Plotting labels to runs/detect/train/labels.jpg... 
[34m[1moptimizer:[0m SGD(lr=0.001, momentum=0.937) with parameter groups 106 weight(decay=0.0), 113 weight(decay=0.0005), 112 bias(decay=0.0)
Image sizes 1056 train, 1056 val
Using 4 dataloader workers
Logging results to [1mruns/detect/train[0m
Starting training for 10 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/10      5.89G     0.3653     0.9098     0.8876         13       1056: 100%|██████████| 1319/1319 [12:37<00:00,  1.74it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:26<00:00,  3.58it/s]

                   all        760       1021      0.987      0.948      0.979      0.968






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       2/10      6.79G     0.3132     0.3407     0.8491          7       1056: 100%|██████████| 1319/1319 [12:27<00:00,  1.76it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.81it/s]

                   all        760       1021      0.995      0.965       0.99      0.975






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       3/10      6.81G      0.318     0.3057     0.8445          9       1056: 100%|██████████| 1319/1319 [12:25<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.87it/s]

                   all        760       1021      0.993      0.972      0.992      0.974






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       4/10      6.81G     0.3097     0.2936     0.8332         15       1056: 100%|██████████| 1319/1319 [12:24<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.82it/s]

                   all        760       1021      0.988      0.971      0.986      0.975






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       5/10      6.83G     0.2981     0.2669     0.8292         11       1056: 100%|██████████| 1319/1319 [12:24<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.85it/s]

                   all        760       1021      0.994      0.965      0.991      0.976






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       6/10      6.87G     0.2846     0.2482     0.8279          7       1056: 100%|██████████| 1319/1319 [12:24<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.89it/s]

                   all        760       1021      0.992      0.976      0.994      0.979






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       7/10      6.89G     0.2675     0.2255     0.8241         13       1056: 100%|██████████| 1319/1319 [12:25<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.84it/s]

                   all        760       1021      0.998      0.967      0.994      0.981






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       8/10      6.89G     0.2553     0.2129     0.8177          4       1056: 100%|██████████| 1319/1319 [12:25<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:25<00:00,  3.78it/s]

                   all        760       1021      0.989      0.976      0.994      0.983






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       9/10      6.89G     0.2485     0.1998     0.8171          6       1056: 100%|██████████| 1319/1319 [12:25<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.89it/s]

                   all        760       1021      0.997      0.971      0.994      0.982






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      10/10      6.89G     0.2446     0.1941     0.8153          8       1056: 100%|██████████| 1319/1319 [12:25<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.85it/s]

                   all        760       1021      0.986      0.983      0.994      0.982






10 epochs completed in 2.146 hours.
Optimizer stripped from runs/detect/train/weights/last.pt, 40.6MB
Optimizer stripped from runs/detect/train/weights/best.pt, 40.6MB

Validating runs/detect/train/weights/best.pt...
Ultralytics 8.3.165 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (Tesla P100-PCIE-16GB, 16269MiB)
YOLO11m summary (fused): 125 layers, 20,030,803 parameters, 0 gradients, 67.6 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 95/95 [00:24<00:00,  3.85it/s]
  xa[xa < 0] = -1
  xa[xa < 0] = -1


                   all        760       1021      0.988      0.976      0.994      0.983
Speed: 0.3ms preprocess, 21.8ms inference, 0.0ms loss, 1.7ms postprocess per image
Results saved to [1mruns/detect/train[0m


ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x7f120af69850>
curves: ['Precision-Recall(B)', 'F1-Confidence(B)', 'Precision-Confidence(B)', 'Recall-Confidence(B)']
curves_results: [[array([          0,    0.001001,    0.002002,    0.003003,    0.004004,    0.005005,    0.006006,    0.007007,    0.008008,    0.009009,     0.01001,    0.011011,    0.012012,    0.013013,    0.014014,    0.015015,    0.016016,    0.017017,    0.018018,    0.019019,     0.02002,    0.021021,    0.022022,    0.023023,
          0.024024,    0.025025,    0.026026,    0.027027,    0.028028,    0.029029,     0.03003,    0.031031,    0.032032,    0.033033,    0.034034,    0.035035,    0.036036,    0.037037,    0.038038,    0.039039,     0.04004,    0.041041,    0.042042,    0.043043,    0.044044,    0.045045,    0.046046,    0.047047,
          0.048048, 

In [6]:
model = YOLO("/kaggle/working/runs/detect/train/weights/best.pt")

test_images_path = "/kaggle/input/multi-instance-object-detection-challenge/Starter_Dataset/TestImages/images"
output_dir = "/kaggle/working/predictions/labels"

conf=0.0001

def predict(test_images_path, output_dir , model, conf):
    os.makedirs(output_dir, exist_ok=True)
    model.eval()
    model.training = False
    for img_path in Path(test_images_path).glob("*"):
        if img_path.suffix.lower() not in ['.png', '.jpg', '.jpeg']:
            continue
    
        results = model.predict(img_path, conf=conf, augment=True, iou=0.4, max_det=600, verbose=False)  
        
        output_txt = Path(output_dir) / f"{img_path.stem}.txt"
    
        with open(output_txt, "w") as f:
            for result in results:
                img_height, img_width = result.orig_shape
                for box in result.boxes.data:
                    x1, y1, x2, y2, confidence, cls_id = box.tolist()
    
                    x_center = ((x1 + x2) / 2) / img_width
                    y_center = ((y1 + y2) / 2) / img_height
                    width = (x2 - x1) / img_width
                    height = (y2 - y1) / img_height
    
                    f.write(f"0 {confidence:.6f} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n")
    
    print(f"[notice] ✅ Predictions saved: {output_dir}")
predict(test_images_path, output_dir , model, conf)

[notice] ✅ Predictions saved: /kaggle/working/predictions/labels


In [7]:
# Convert predictions to CSV
def predictions_to_csv(
    preds_folder: str = "/kaggle/working/predictions/labels", 
    output_csv: str = "/kaggle/working/submission.csv", 
    test_images_folder: str = "/kaggle/input/multi-instance-object-detection-challenge/Starter_Dataset/TestImages/images",
    allowed_extensions: tuple = (".jpg", ".png", ".jpeg")
):
    preds_path = Path(preds_folder)
    test_images_path = Path(test_images_folder)

    test_images = {p.stem for p in test_images_path.glob("*") if p.suffix.lower() in allowed_extensions}

    predictions = []
    predicted_images = set()

    for txt_file in preds_path.glob("*.txt"):
        image_id = txt_file.stem
        predicted_images.add(image_id)

        with open(txt_file, "r") as f:
            valid_lines = [line.strip() for line in f if len(line.strip().split()) == 6]

        pred_str = " ".join(valid_lines) if valid_lines else "no boxes"
        predictions.append({"image_id": image_id, "prediction_string": pred_str})

    missing_images = test_images - predicted_images
    for image_id in missing_images:
        predictions.append({"image_id": image_id, "prediction_string": "no boxes"})

    submission_df = pd.DataFrame(predictions)
    submission_df.to_csv(output_csv, index=False, quoting=csv.QUOTE_MINIMAL)
    print(submission_df.head(10))
    print(f"[notice] ✅ Submission saved to {output_csv}")

predictions_to_csv()

   image_id                                  prediction_string
0  IMG_9739  0 0.966308 0.178656 0.659910 0.304898 0.444785...
1  IMG_9721  0 0.966872 0.626559 0.672425 0.148692 0.179362...
2  IMG_9787  0 0.958117 0.575858 0.312043 0.047019 0.104097...
3  IMG_9636  0 0.985332 0.645029 0.627094 0.120642 0.201542...
4  IMG_9773  0 0.940645 0.352912 0.774163 0.032285 0.065712...
5  IMG_9618  0 0.979862 0.371084 0.738629 0.185067 0.219681...
6  IMG_9621  0 0.969845 0.281797 0.703377 0.291019 0.345015...
7  IMG_9740  0 0.978620 0.338172 0.588835 0.111748 0.162160...
8  IMG_9785  0 0.967875 0.574419 0.229472 0.073226 0.148235...
9  IMG_9700  0 0.969870 0.530899 0.413692 0.093558 0.198100...
[notice] ✅ Submission saved to /kaggle/working/submission.csv


In [8]:
def filter_invalid_boxes(boxes, scores, labels):
    filtered_boxes, filtered_scores, filtered_labels = [], [], []
    for b, s, l in zip(boxes, scores, labels):
        if abs(b[2] - b[0]) > 1e-6 and abs(b[3] - b[1]) > 1e-6:
            filtered_boxes.append(b)
            filtered_scores.append(s)
            filtered_labels.append(l)
    return filtered_boxes, filtered_scores, filtered_labels
    
def run_inference(models, image_sizes, test_images_path):
    image_paths = [p for p in Path(test_images_path).glob("*") if p.suffix.lower() in [".jpg", ".jpeg", ".png"]]
    predictions = {}

    for model_idx, model in enumerate(models):
        model.eval()
        predictions[model_idx] = {}
        for size in image_sizes:
            predictions[model_idx][size] = {}
            pred = []
            for img_path in image_paths:
                image_id = img_path.stem
                image = Image.open(img_path)
                img_width, img_height = image.size

                results = model.predict(source=str(img_path), conf=conf,iou=0.4, max_det=600, augment=True, imgsz=size, verbose=False)
                boxes, scores, labels = [], [], []

                for result in results:
                    if result.boxes is None:
                        continue
                    boxes = result.boxes.xyxy.cpu().numpy().tolist()
                    scores = result.boxes.conf.cpu().numpy().tolist()
                    labels = result.boxes.cls.cpu().numpy().tolist()

                    norm_boxes = [
                        [x1 / img_width, y1 / img_height, x2 / img_width, y2 / img_height]
                        for x1, y1, x2, y2 in boxes
                    ]
                    norm_boxes, scores, labels = filter_invalid_boxes(norm_boxes, scores, labels)

                predictions[model_idx][size][image_id] = {
                    "boxes": norm_boxes,
                    "scores": scores,
                    "labels": labels
                }
                
                if boxes:
                    prediction_string = " ".join(
                        f"{int(lbl)} {score:.6f} {(b[0]+b[2])/2:.6f} {(b[1]+b[3])/2:.6f} {(b[2]-b[0]):.6f} {(b[3]-b[1]):.6f}"
                        for b, score, lbl in zip(norm_boxes, scores, labels)
                    )
                else:
                    prediction_string = "no boxes"

                pred.append({
                    "image_id": image_id,
                    "prediction_string": prediction_string
                })

            # Save CSV per model and size
            df = pd.DataFrame(pred)
            csv_path = f"submission_{model_idx}_{size}.csv"
            df.to_csv(csv_path, index=False, quoting=csv.QUOTE_MINIMAL)
            print(f"[saved] {csv_path}")
            print(df.head(10))

    return predictions

def apply_wbf_and_save_final_submission(predictions, image_ids, output_path="submission_wbf.csv"):
    wbf_results = []

    for image_id in image_ids:
        all_boxes, all_scores, all_labels = [], [], []

        for model_preds in predictions.values():
            for size_preds in model_preds.values():
                if image_id not in size_preds:
                    continue
                pred = size_preds[image_id]
                if not pred["boxes"]:
                    continue
                all_boxes.append(pred["boxes"])
                all_scores.append(pred["scores"])
                all_labels.append(pred["labels"])

        if not all_boxes:
            pred_str = "no boxes"
        else:
            fused_boxes, fused_scores, fused_labels = weighted_boxes_fusion(
                all_boxes, all_scores, all_labels, iou_thr=iou_thr, skip_box_thr=skip_box_thr
            )

            pred_str = " ".join(
                f"{int(lbl)} {score:.6f} {(b[0]+b[2])/2:.6f} {(b[1]+b[3])/2:.6f} {(b[2]-b[0]):.6f} {(b[3]-b[1]):.6f}"
                for b, score, lbl in zip(fused_boxes, fused_scores, fused_labels)
            )

        wbf_results.append({
            "image_id": image_id,
            "prediction_string": pred_str
        })

    wbf_df = pd.DataFrame(wbf_results)
    wbf_df.to_csv(output_path, index=False, quoting=csv.QUOTE_MINIMAL)
    print(f"[notice] ✅ WBF submission saved to {output_path}")
    print(wbf_df.head(10))



In [9]:
import os
from pathlib import Path
import pandas as pd
import csv
from ultralytics import YOLO
from ensemble_boxes import weighted_boxes_fusion
from PIL import Image

model_paths = [
    "/kaggle/working/runs/detect/train/weights/best.pt",
    # "/kaggle/working/runs/detect/train/weights/last.pt",
]

test_images_path = "/kaggle/input/multi-instance-object-detection-challenge/Starter_Dataset/TestImages/images"
output_dir = "/kaggle/working/predictions/labels"
conf = 0.0001
iou_thr = 0.5
skip_box_thr = 0.01
image_sizes = [1056, 1440, 1920, 2560, 3200]

models = [YOLO(path) for path in model_paths]
predictions = run_inference(models, image_sizes, test_images_path)

image_ids = list(next(iter(next(iter(predictions.values())).values())).keys())

apply_wbf_and_save_final_submission(predictions, image_ids)

[saved] submission_0_1056.csv
   image_id                                  prediction_string
0  IMG_9602  0 0.979048 0.751044 0.414001 0.137646 0.232900...
1  IMG_9594  0 0.980289 0.377842 0.685819 0.100592 0.133639...
2  IMG_9717  0 0.970737 0.675509 0.837135 0.103430 0.247215...
3  IMG_9703  0 0.966215 0.349797 0.298296 0.074875 0.159365...
4  IMG_9780  0 0.977819 0.525275 0.492996 0.204275 0.436754...
5  IMG_9617  0 0.959200 0.346113 0.632760 0.133522 0.195139...
6  IMG_9788  0 0.971883 0.337447 0.488986 0.081239 0.166027...
7  IMG_9786  0 0.955648 0.513414 0.334809 0.047593 0.102705...
8  IMG_9769  0 0.974109 0.309803 0.548142 0.319678 0.686106...
9  IMG_9588  0 0.976255 0.884056 0.405728 0.109969 0.138087...
[saved] submission_0_1440.csv
   image_id                                  prediction_string
0  IMG_9602  0 0.976509 0.750919 0.413694 0.137876 0.233863...
1  IMG_9594  0 0.979818 0.377823 0.685302 0.100007 0.133934...
2  IMG_9717  0 0.964238 0.322737 0.552359 0.051268 0.09982