# Kuzushiji Recognition: Plan

Goals:
- Establish GPU-enabled environment; verify CUDA 12.1 torch stack if needed.
- Inspect data schema (train.csv, unicode_translation.csv, sample_submission.csv) and image zips.
- Define CV strategy mirroring test distribution.
- Ship a fast baseline (e.g., lightweight detector/classifier or simple heuristic) to create a valid submission.
- Iterate toward medal via improved modeling (e.g., CNN-based detector/classifier with augmentations, ensembling).

Initial Milestones:
1) Environment + data EDA
2) Baseline pipeline and valid submission
3) Cross-validation and OOF checks
4) Model improvements and ensembling

At each milestone, we will request expert review.

In [1]:
import os, sys, time, subprocess, json, math, re, textwrap, zipfile
import pandas as pd
from pathlib import Path

def run_cmd(cmd):
    print('>',' '.join(cmd), flush=True)
    try:
        out = subprocess.check_output(cmd, stderr=subprocess.STDOUT).decode()
        print(out, flush=True)
    except subprocess.CalledProcessError as e:
        print(e.output.decode(), flush=True)

print('Checking GPU via nvidia-smi...')
run_cmd(['bash','-lc','nvidia-smi || true'])

base = Path('.')
files = [
    'train.csv',
    'unicode_translation.csv',
    'sample_submission.csv',
    'train_images.zip',
    'test_images.zip',
]
for f in files:
    p = base / f
    print(f'{f}: exists={p.exists()} size={p.stat().st_size if p.exists() else None}')

print('\nLoading CSVs...')
train_df = pd.read_csv(base/'train.csv')
utr_df = pd.read_csv(base/'unicode_translation.csv')
ss_df = pd.read_csv(base/'sample_submission.csv')
print('train.csv shape:', train_df.shape)
print('train.csv columns:', list(train_df.columns))
print(train_df.head(3))
print('\nunicode_translation.csv shape:', utr_df.shape)
print(utr_df.head(3))
print('\nsample_submission.csv shape:', ss_df.shape)
print(ss_df.head(3))

def peek_zip(zpath, n=10):
    if not Path(zpath).exists():
        print(f'{zpath} not found')
        return
    with zipfile.ZipFile(zpath) as zf:
        infos = zf.infolist()
        print(f'{zpath}: {len(infos)} files in archive')
        for i, zi in enumerate(infos[:n]):
            print(f'  {i}: {zi.filename} size={zi.file_size}')

print('\nPeeking into zips...')
peek_zip('train_images.zip', n=5)
peek_zip('test_images.zip', n=5)

# Quick inference about submission format
print('\nSubmission format sample row:')
print(ss_df.iloc[0].to_dict())
print('\nDone EDA baseline. Next: confirm whether labels string uses triplets or quintets and pixel units.')

Checking GPU via nvidia-smi...
> bash -lc nvidia-smi || true


Tue Sep 30 07:21:09 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     414MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

train.csv: exists=True size=14069467
unicode_translation.csv: exists=True size=52646
sample_submission.csv: exists=True size=13700
train_images.zip: exists=True size=2711943248
test_images.zip: exists=True size=307454375

Loading CSVs...
train.csv shape: (3244, 2)
train.csv columns: ['image_id', 'labels']
            image_id                                             labels
0  200004148_00015_1  U+306F 1187 361 47 27 U+306F 1487 2581 48 28 U...
1  200021712-00008_2  U+4E00 1543 1987 58 11 U+4E00 1296 1068 91 11 ...
2  100249416_00034_1  U+4E00 1214 415 73 11 U+4E00 1386 412 72 13 U+...

unicode_translation.csv shape: (4781, 2)
  Unicode char
0  U+0031    1
1  U+0032    2
2  U+0034    4

sample_submission.csv shape: (361, 2)
            image_id                 labels
0        umgy007-028  U+003F 1 1 U+FF2F 2 2
1        hnsd004-026  U+003F 1 1 U+FF2F 2 2
2  200003076_00034_2  U+003F 1 1 U+FF2F 2 2

Peeking into zips...
train_images.zip: 3244 files in archive
  0: brsk001-030.jpg size=

In [2]:
import shutil
from collections import Counter

base = Path('.')
train_zip = base/'train_images.zip'
test_zip = base/'test_images.zip'
train_dir = base/'train_images'
test_dir = base/'test_images'

def ensure_unzip(zpath: Path, out_dir: Path, max_preview: int = 0):
    if out_dir.exists() and any(out_dir.iterdir()):
        print(f'Exists: {out_dir} (skipping unzip)')
        return
    print(f'Extracting {zpath} -> {out_dir} ...')
    out_dir.mkdir(parents=True, exist_ok=True)
    with zipfile.ZipFile(zpath) as zf:
        zf.extractall(out_dir)
    print('Done extract.')
    if max_preview:
        print('Preview files:')
        for i,p in enumerate(sorted(out_dir.iterdir())[:max_preview]):
            print(i, p.name, p.stat().st_size)

ensure_unzip(train_zip, train_dir, max_preview=5)
ensure_unzip(test_zip, test_dir, max_preview=5)

# Parse train labels: tokens repeating as [unicode x y w h] in pixels
def parse_labels_row(row):
    image_id = row['image_id']
    s = str(row['labels']) if pd.notna(row['labels']) else ''
    if not s.strip():
        return []
    toks = s.strip().split()
    out = []
    i = 0
    while i < len(toks):
        u = toks[i];
        if not u.startswith('U+') or i+4 >= len(toks):
            # malformed; break
            break
        try:
            x = int(toks[i+1]); y = int(toks[i+2]); w = int(toks[i+3]); h = int(toks[i+4])
        except Exception:
            break
        out.append({'image_id': image_id, 'unicode': u, 'x': x, 'y': y, 'w': w, 'h': h})
        i += 5
    return out

all_rows = []
for _, r in train_df.iterrows():
    all_rows.extend(parse_labels_row(r))
boxes_df = pd.DataFrame(all_rows)
print('Parsed boxes:', boxes_df.shape, 'columns:', list(boxes_df.columns))
print(boxes_df.head())

# Basic stats
per_image_counts = boxes_df.groupby('image_id').size().rename('n').reset_index()
print('Images:', train_df.shape[0], 'Total boxes:', len(boxes_df), 'Mean per image:', per_image_counts['n'].mean())
print('Quantiles per image:', per_image_counts['n'].quantile([0,0.25,0.5,0.75,0.9,0.95,0.99]).to_dict())

# Build unicode <-> class_id map for training
unicodes = sorted(boxes_df['unicode'].unique().tolist())
u2id = {u:i for i,u in enumerate(unicodes)}
id2u = {i:u for u,i in u2id.items()}
print('Num classes:', len(unicodes))
print('Sample mapping:', list(u2id.items())[:5])

# Prepare YOLO labels (normalized cx,cy,w,h) into yolo_labels/<image_id>.txt
yolo_labels_dir = base/'yolo_labels'
yolo_labels_dir.mkdir(exist_ok=True)

from PIL import Image

def write_yolo_label_for_image(image_id):
    img_path = train_dir / f'{image_id}.jpg'
    if not img_path.exists():
        # Some files may use .png (unlikely here), try alternate
        alt = list(train_dir.glob(f'{image_id}.*'))
        if alt:
            img_path = alt[0]
        else:
            return 0
    with Image.open(img_path) as im:
        w_img, h_img = im.size
    df = boxes_df[boxes_df.image_id == image_id]
    lines = []
    for _, b in df.iterrows():
        cx = (b['x'] + b['w']/2) / w_img
        cy = (b['y'] + b['h']/2) / h_img
        ww = b['w'] / w_img
        hh = b['h'] / h_img
        cls = u2id[b['unicode']]
        # clamp
        cx = min(max(cx, 0.0), 1.0); cy = min(max(cy, 0.0), 1.0)
        ww = min(max(ww, 0.0), 1.0); hh = min(max(hh, 0.0), 1.0)
        lines.append(f"{cls} {cx:.6f} {cy:.6f} {ww:.6f} {hh:.6f}")
    (yolo_labels_dir/f'{image_id}.txt').write_text('\n'.join(lines))
    return len(lines)

t0 = time.time()
n_written = 0
for img_id in train_df['image_id']:
    n_written += write_yolo_label_for_image(img_id)
print(f'Wrote YOLO labels for {train_df.shape[0]} images, total boxes {n_written}, elapsed {time.time()-t0:.1f}s')

# Create simple train/val split by image with stratification by box-count buckets
def make_split(df_images, per_image_counts, val_frac=0.1, seed=42):
    df = df_images.merge(per_image_counts, on='image_id', how='left').fillna({'n':0})
    bins = pd.qcut(df['n'], q=min(10, max(2, df.shape[0]//50)), duplicates='drop')
    df['bin'] = bins.astype(str)
    # simple stratified split
    rng = pd.Series(range(df.shape[0]))
    df = df.sample(frac=1.0, random_state=seed).reset_index(drop=True)
    val_idx = []
    for b, g in df.groupby('bin'):
        k = max(1, int(len(g)*val_frac))
        val_idx.extend(g.index[:k].tolist())
    df['is_val'] = False
    df.loc[val_idx, 'is_val'] = True
    return df[['image_id','is_val']]

split_df = make_split(train_df[['image_id']], per_image_counts, val_frac=0.1, seed=42)
print(split_df['is_val'].value_counts())
split_df.head()

print('Ready to install Ultralytics and kick off YOLO training next. Also confirmed submission format appears as triplets (unicode cx cy) from sample_submission; we will output centers from predicted boxes in pixels.')

Exists: train_images (skipping unzip)
Exists: test_images (skipping unzip)


Parsed boxes: (613505, 6) columns: ['image_id', 'unicode', 'x', 'y', 'w', 'h']
            image_id unicode     x     y   w   h
0  200004148_00015_1  U+306F  1187   361  47  27
1  200004148_00015_1  U+306F  1487  2581  48  28
2  200004148_00015_1  U+3070  1187  1063  74  30
3  200004148_00015_1  U+3070   594  1154  93  31
4  200004148_00015_1  U+306F  1192  1842  52  32
Images: 3244 Total boxes: 613505 Mean per image: 189.1199136868064
Quantiles per image: {0.0: 2.0, 0.25: 132.0, 0.5: 188.0, 0.75: 228.0, 0.9: 322.0, 0.95: 350.0, 0.99: 403.0}
Num classes: 4113
Sample mapping: [('U+003F', 0), ('U+2000B', 1), ('U+20D45', 2), ('U+2123D', 3), ('U+22999', 4)]


Wrote YOLO labels for 3244 images, total boxes 613505, elapsed 94.8s
is_val
False    2925
True      319
Name: count, dtype: int64
Ready to install Ultralytics and kick off YOLO training next. Also confirmed submission format appears as triplets (unicode cx cy) from sample_submission; we will output centers from predicted boxes in pixels.


In [None]:
import os, sys, subprocess, shutil, time
from pathlib import Path

def pip(*args):
    print('> pip', ' '.join(args), flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

print('Uninstalling any preexisting torch stack (idempotent) ...', flush=True)
for pkg in ('torch','torchvision','torchaudio'):
    subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', pkg], check=False)

for d in (
    '/app/.pip-target/torch',
    '/app/.pip-target/torch-2.8.0.dist-info',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchvision-0.23.0.dist-info',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torchaudio-2.8.0.dist-info',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
    '/app/.pip-target/torchgen',
    '/app/.pip-target/functorch',
):
    if os.path.exists(d):
        print('Removing', d, flush=True)
        shutil.rmtree(d, ignore_errors=True)

print('Installing CUDA 12.1 torch stack ...', flush=True)
pip('install',
    '--index-url', 'https://download.pytorch.org/whl/cu121',
    '--extra-index-url', 'https://pypi.org/simple',
    'torch==2.4.1', 'torchvision==0.19.1', 'torchaudio==2.4.1')

Path('constraints.txt').write_text('torch==2.4.1\ntorchvision==0.19.1\ntorchaudio==2.4.1\n')

print('Installing ultralytics and deps under constraints ...', flush=True)
pip('install', '-c', 'constraints.txt',
    'ultralytics==8.3.32',
    'opencv-python-headless',
    'albumentations',
    'pyyaml',
    '--upgrade-strategy', 'only-if-needed')

import torch
print('torch', torch.__version__, 'CUDA build', getattr(torch.version, 'cuda', None))
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))

print('Installation complete.')

In [None]:
import json
from pathlib import Path

# Persist mappings and splits
Path('artifacts').mkdir(exist_ok=True)
Path('artifacts/u2id.json').write_text(json.dumps(u2id, ensure_ascii=False))
Path('artifacts/id2u.json').write_text(json.dumps(id2u, ensure_ascii=False))
split_df.to_csv('artifacts/split.csv', index=False)

# Create train/val txt lists for Ultralytics
train_list = []
val_list = []
for _, r in split_df.iterrows():
    img_path = train_dir / f"{r['image_id']}.jpg"
    if not img_path.exists():
        alts = list(train_dir.glob(f"{r['image_id']}.*"))
        if alts:
            img_path = alts[0]
        else:
            continue
    if r['is_val']:
        val_list.append(str(img_path.resolve()))
    else:
        train_list.append(str(img_path.resolve()))

Path('artifacts/train.txt').write_text('\n'.join(train_list))
Path('artifacts/val.txt').write_text('\n'.join(val_list))
print(f'Train images: {len(train_list)}, Val images: {len(val_list)}')

# Build dataset YAML for Ultralytics (uses txt lists and external labels dir)
names = [id2u[i] for i in range(len(id2u))]
dataset_yaml = f'''
path: .
train: artifacts/train.txt
val: artifacts/val.txt
names: {json.dumps(names, ensure_ascii=False)}
nc: {len(names)}
roboflow: null
'''
Path('kuz_dataset.yaml').write_text(dataset_yaml)
print('Wrote kuz_dataset.yaml with', len(names), 'classes')

# Symlink or inform Ultralytics where labels are located: we keep labels in yolo_labels/
# Ultralytics infers labels by replacing /images/ with /labels/. Since we pass txts, it will still do that.
# To accommodate, create a parallel labels folder structure via a flat symlink directory named 'labels' at repo root.
labels_dir = Path('labels')
if not labels_dir.exists():
    labels_dir.mkdir(exist_ok=True)
    # Create symlinks for each label file into labels/ with same basename as image basename but .txt
    created = 0
    for p in train_dir.iterdir():
        if p.is_file():
            stem = p.stem
            src = Path('yolo_labels')/f'{stem}.txt'
            if src.exists():
                dst = labels_dir/f'{stem}.txt'
                try:
                    if not dst.exists():
                        dst.symlink_to(src.resolve())
                        created += 1
                except Exception:
                    # fallback: copy if symlink not permitted
                    if not dst.exists():
                        dst.write_text(src.read_text())
                        created += 1
    print('Prepared labels links/copied:', created)
else:
    print('Labels dir exists; assuming prepared.')

In [None]:
from pathlib import Path
import shutil

root = Path('dataset')
img_tr = root/'images'/'train'
img_va = root/'images'/'val'
lab_tr = root/'labels'/'train'
lab_va = root/'labels'/'val'
for d in (img_tr, img_va, lab_tr, lab_va):
    d.mkdir(parents=True, exist_ok=True)

def safe_link(src: Path, dst: Path):
    try:
        if dst.exists():
            return
        dst.symlink_to(src.resolve())
    except Exception:
        if not dst.exists():
            if src.is_file():
                shutil.copy2(src, dst)

n_tr = n_va = 0
for _, r in split_df.iterrows():
    img = train_dir / f"{r['image_id']}.jpg"
    if not img.exists():
        alts = list(train_dir.glob(f"{r['image_id']}.*"))
        if not alts:
            continue
        img = alts[0]
    lab = Path('yolo_labels')/f"{img.stem}.txt"
    if r['is_val']:
        dst_img = img_va / img.name
        dst_lab = lab_va / lab.name
        safe_link(img, dst_img)
        if lab.exists():
            safe_link(lab, dst_lab)
        n_va += 1
    else:
        dst_img = img_tr / img.name
        dst_lab = lab_tr / lab.name
        safe_link(img, dst_img)
        if lab.exists():
            safe_link(lab, dst_lab)
        n_tr += 1

print('Symlinked/copied images -> train:', n_tr, 'val:', n_va)
print('Images/train files:', len(list(img_tr.glob('*'))), 'Images/val files:', len(list(img_va.glob('*'))))
print('Labels/train files:', len(list(lab_tr.glob('*.txt'))), 'Labels/val files:', len(list(lab_va.glob('*.txt'))))

# Overwrite dataset YAML to use directory structure (more robust for Ultralytics)
names = [id2u[i] for i in range(len(id2u))]
yaml_dir = f'''
path: {root.as_posix()}
train: images/train
val: images/val
names: {json.dumps(names, ensure_ascii=False)}
nc: {len(names)}
roboflow: null
'''
Path('kuz_dataset.yaml').write_text(yaml_dir)
print('Wrote kuz_dataset.yaml (dir-based) with', len(names), 'classes')

In [None]:
import os, time
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
from ultralytics import YOLO

print('Starting YOLOv8 training (single-class detector to avoid OOM)...', flush=True)
t0 = time.time()
# Use smallest model and single class (class-agnostic detector)
model = YOLO('yolov8n.pt')
result = model.train(
    data='kuz_dataset_single.yaml',
    imgsz=1024,
    epochs=15,
    batch=8,
    workers=4,
    device=0,
    optimizer='auto',
    cos_lr=False,
    patience=7,
    project='runs',
    name='yolo8n_kuz_single',
    exist_ok=True,
    save_period=1,
    pretrained=True,
    amp=True,
    plots=False,
    single_cls=True
)
print('Training done. Elapsed: %.1fs' % (time.time()-t0), flush=True)
print('Results dir:', getattr(result, 'save_dir', 'runs/yolo8n_kuz_single'), flush=True)

In [None]:
import subprocess, sys
def bash(cmd):
    print('> bash -lc', cmd, flush=True)
    print(subprocess.check_output(['bash','-lc',cmd], stderr=subprocess.STDOUT).decode())

print('Installing system libs for OpenCV (libGL/libglib)...', flush=True)
bash('apt-get update -y && apt-get install -y libgl1 libglib2.0-0')
print('Testing cv2 import after install...', flush=True)
import cv2
print('cv2 version:', cv2.__version__)
print('Re-trying ultralytics import...')
from ultralytics import YOLO
print('Ultralytics import OK.')

In [None]:
import sys, subprocess

def pip(*args):
    print('> pip', ' '.join(args), flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

print('Fixing OpenCV import by enforcing headless build...', flush=True)
subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', 'opencv-python', 'opencv-contrib-python', 'opencv-python-headless'], check=False)
pip('install', 'opencv-python-headless==4.10.0.84')

print('Testing cv2 import...', flush=True)
import cv2, os
print('cv2 version:', cv2.__version__)
print('cv2 file:', cv2.__file__)

print('Re-trying ultralytics import...', flush=True)
from ultralytics import YOLO
print('Ultralytics import OK.')

In [None]:
from pathlib import Path
import json

# Rewrite kuz_dataset.yaml with absolute image dirs to bypass Ultralytics datasets_dir prefixing
root = Path('dataset').resolve()
img_tr_abs = (root/'images'/'train').as_posix()
img_va_abs = (root/'images'/'val').as_posix()
names = [id2u[i] for i in range(len(id2u))]
yaml_abs = f'''
train: {img_tr_abs}
val: {img_va_abs}
names: {json.dumps(names, ensure_ascii=False)}
nc: {len(names)}
roboflow: null
'''
Path('kuz_dataset.yaml').write_text(yaml_abs)
print('Rewrote kuz_dataset.yaml with absolute paths:')
print(Path('kuz_dataset.yaml').read_text()[:500])

In [None]:
import json, math, time
from pathlib import Path
import pandas as pd
from ultralytics import YOLO
from PIL import Image

def load_id2u(path='artifacts/id2u.json'):
    with open(path, 'r') as f:
        d = json.load(f)
    # keys may be strings; convert to int-indexed list
    max_k = max(int(k) for k in d.keys())
    arr = [None]*(max_k+1)
    for k,v in d.items():
        arr[int(k)] = v
    return arr

def build_submission(weights_path: str, conf=0.25, iou=0.65, imgsz=1024, max_det=1000, save_name='submission.csv'):
    print(f'Loading model: {weights_path}', flush=True)
    model = YOLO(weights_path)
    id2u_list = load_id2u('artifacts/id2u.json')
    ss = pd.read_csv('sample_submission.csv')
    image_ids = ss['image_id'].tolist()
    img_paths = []
    for img_id in image_ids:
        p = Path('test_images')/f'{img_id}.jpg'
        if not p.exists():
            alts = list(Path('test_images').glob(f'{img_id}.*'))
            if alts:
                p = alts[0]
        img_paths.append(str(p))
    print('Running inference on', len(img_paths), 'images ...', flush=True)
    results = model.predict(source=img_paths, imgsz=imgsz, conf=conf, iou=iou, max_det=max_det, device=0, stream=True, verbose=False)
    rows = []
    t0 = time.time()
    for i, (img_id, img_path, res) in enumerate(zip(image_ids, img_paths, results)):
        if i % 25 == 0:
            print(f'Processed {i}/{len(image_ids)} images, elapsed {time.time()-t0:.1f}s', flush=True)
        try:
            with Image.open(img_path) as im:
                w_img, h_img = im.size
        except Exception:
            w_img = h_img = None
        labels = []
        if res and hasattr(res, 'boxes') and res.boxes is not None:
            boxes = res.boxes
            if boxes.xyxy is not None and boxes.cls is not None:
                xyxy = boxes.xyxy.cpu().numpy()
                cls = boxes.cls.cpu().numpy().astype(int)
                for (x1,y1,x2,y2), c in zip(xyxy, cls):
                    cx = int(round((float(x1)+float(x2))/2.0))
                    cy = int(round((float(y1)+float(y2))/2.0))
                    # optional clamp if dims known
                    if w_img is not None and h_img is not None:
                        cx = max(0, min(cx, w_img-1))
                        cy = max(0, min(cy, h_img-1))
                    u = id2u_list[c] if 0 <= c < len(id2u_list) else None
                    if u:
                        labels.extend([u, str(cx), str(cy)])
        rows.append({'image_id': img_id, 'labels': ' '.join(labels)})
    sub = pd.DataFrame(rows)
    sub.to_csv(save_name, index=False)
    print('Wrote', save_name, 'with shape', sub.shape, flush=True)
    return sub

print('Inference/submission utilities ready. After training completes, call:')
print("build_submission('runs/yolo8m_kuz/weights/best.pt', conf=0.25, iou=0.65, imgsz=1024, max_det=1000)")

In [None]:
from pathlib import Path

# Create a single-class dataset YAML to avoid huge classification head
root_abs = Path('dataset').resolve()
img_tr_abs = (root_abs/'images'/'train').as_posix()
img_va_abs = (root_abs/'images'/'val').as_posix()
single_yaml = f'''
train: {img_tr_abs}
val: {img_va_abs}
names: ['char']
nc: 1
roboflow: null
'''
Path('kuz_dataset_single.yaml').write_text(single_yaml)
print('Wrote kuz_dataset_single.yaml with nc=1 at absolute paths:')
print(Path('kuz_dataset_single.yaml').read_text())

In [None]:
import sys, subprocess, os, time, math, json, gc
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
import cv2

def pip(*args):
    print('> pip', ' '.join(args), flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

print('Installing kNN embedding deps (timm, faiss-cpu) if missing...', flush=True)
try:
    import timm  # noqa
except Exception:
    pip('install', '-c', 'constraints.txt', 'timm==1.0.9', '--upgrade-strategy', 'only-if-needed')
try:
    import faiss  # noqa
except Exception:
    pip('install', 'faiss-cpu==1.8.0.post1')

import torch
import timm
import faiss

# Global CLAHE instance (reuse for speed)
CLAHE = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))

# Preprocess: grayscale->3ch, optional CLAHE, pad bbox by 15%, resize to 224
def crop_pad_resize(img: Image.Image, x:int, y:int, w:int, h:int, pad_frac:float=0.15, out_size:int=224):
    W, H = img.size
    cx = x + w/2.0
    cy = y + h/2.0
    pw = max(2, int(round(w * pad_frac)))
    ph = max(2, int(round(h * pad_frac)))
    x1 = max(0, int(round(cx - w/2 - pw)))
    y1 = max(0, int(round(cy - h/2 - ph)))
    x2 = min(W, int(round(cx + w/2 + pw)))
    y2 = min(H, int(round(cy + h/2 + ph)))
    crop = img.crop((x1, y1, x2, y2))
    # grayscale -> 3ch
    crop = crop.convert('L')
    arr = np.array(crop)
    # CLAHE light
    arr = CLAHE.apply(arr)
    # pad to square keep aspect
    h0, w0 = arr.shape[:2]
    m = max(h0, w0)
    pad_top = (m - h0) // 2
    pad_bottom = m - h0 - pad_top
    pad_left = (m - w0) // 2
    pad_right = m - w0 - pad_left
    arr = cv2.copyMakeBorder(arr, pad_top, pad_bottom, pad_left, pad_right, borderType=cv2.BORDER_CONSTANT, value=0)
    arr = cv2.resize(arr, (out_size, out_size), interpolation=cv2.INTER_AREA)
    arr = np.stack([arr, arr, arr], axis=0).astype(np.float32) / 255.0
    return arr

# Build embedding model (pretrained, global pooled, L2-normalized) with ImageNet normalization
def build_backbone(model_name:str='convnext_tiny', device='cuda' if torch.cuda.is_available() else 'cpu'):
    model = timm.create_model(model_name, pretrained=True, num_classes=0, global_pool='avg')
    model.eval().to(device)
    # resolve data config for proper normalization
    data_cfg = timm.data.resolve_model_data_config(model)
    mean = torch.tensor(data_cfg.get('mean', (0.485, 0.456, 0.406)), dtype=torch.float32, device=device).view(1,3,1,1)
    std = torch.tensor(data_cfg.get('std', (0.229, 0.224, 0.225)), dtype=torch.float32, device=device).view(1,3,1,1)
    # get feature dim
    with torch.no_grad():
        dummy = torch.zeros(1,3,224,224, device=device)
        dummy = (dummy - mean) / std
        f = model(dummy)
    feat_dim = int(f.shape[1])
    return model, feat_dim, device, mean, std

def embed_batch(model, device, batch_np, mean:torch.Tensor, std:torch.Tensor):
    with torch.no_grad():
        t = torch.from_numpy(batch_np).to(device)
        t = (t - mean) / std
        feats = model(t)
        feats = torch.nn.functional.normalize(feats, p=2, dim=1)
        return feats.detach().cpu().numpy().astype(np.float32)

# Create prototype medians per unicode using train split, capping per-class samples for speed
def build_prototypes(max_per_class:int=100, img_dir='train_images', out_dir='artifacts', model_name='convnext_tiny'):
    out = Path(out_dir); out.mkdir(exist_ok=True, parents=True)
    model, feat_dim, device, mean, std = build_backbone(model_name)
    # select training rows (not val) and pre-sample per class to avoid scanning all rows
    split_map = dict(zip(split_df['image_id'], split_df['is_val']))
    df = boxes_df.copy()
    df = df[~df['image_id'].map(split_map).fillna(False)]  # keep train-only
    # random sample up to max_per_class per unicode
    rng = np.random.RandomState(42)
    def _sample_grp(g):
        if len(g) <= max_per_class:
            return g
        idx = rng.choice(len(g), size=max_per_class, replace=False)
        return g.iloc[idx]
    t0 = time.time()
    df_s = df.groupby('unicode', as_index=False, group_keys=False).apply(_sample_grp)
    df_s = df_s.sample(frac=1.0, random_state=42).reset_index(drop=True)  # shuffle for IO locality somewhat
    total_target = len(df_s)
    n_classes = df_s['unicode'].nunique()
    print(f'Prototype sampling: classes={n_classes}, target_crops={total_target} (cap={max_per_class})', flush=True)

    batch, metas = [], []
    feats_list = {}  # unicode -> list of embeddings
    last_log = time.time()
    for i, r in enumerate(df_s.itertuples(index=False)):
        iid = r.image_id; u = r.unicode; x=int(r.x); y=int(r.y); w=int(r.w); h=int(r.h)
        p = Path(img_dir)/f'{iid}.jpg'
        if not p.exists():
            alts = list(Path(img_dir).glob(f'{iid}.*'))
            if not alts:
                continue
            p = alts[0]
        try:
            with Image.open(p) as im:
                arr = crop_pad_resize(im, x,y,w,h, pad_frac=0.15, out_size=224)
        except Exception:
            continue
        batch.append(arr); metas.append(u)
        if len(batch) == 64 or i == total_target-1:
            embs = embed_batch(model, device, np.stack(batch,0), mean, std)
            for e,u_ in zip(embs, metas):
                if u_ not in feats_list:
                    feats_list[u_] = [e.copy()]
                else:
                    feats_list[u_].append(e)
            batch, metas = [], []
        if (i+1) % 5000 == 0 or (time.time()-last_log) > 120:
            print(f'Embedded {i+1}/{total_target} crops, elapsed {time.time()-t0:.1f}s', flush=True)
            last_log = time.time()
    unicodes = sorted(feats_list.keys())
    # median aggregation per class
    protos = []
    for u in unicodes:
        arr = np.stack(feats_list[u], 0).astype(np.float32)
        med = np.median(arr, axis=0)
        protos.append(med)
    protos = np.stack(protos, 0).astype(np.float32)
    # L2 normalize just in case
    faiss.normalize_L2(protos)
    Path(out/'prototypes.npy').write_bytes(protos.tobytes())
    Path(out/'prototypes_unicodes.json').write_text(json.dumps(unicodes, ensure_ascii=False))
    print('Saved prototypes:', protos.shape, 'classes:', len(unicodes))
    return protos, unicodes, model_name

# Evaluate quick top-1 on val GT crops vs prototypes
def eval_val_top1(protos:np.ndarray, prot_u:list, model_name='convnext_tiny', img_dir='train_images', max_val_samples:int=20000):
    u2idx = {u:i for i,u in enumerate(prot_u)}
    # Force CPU to avoid occupying GPU VRAM prior to YOLO detection
    model, feat_dim, device, mean, std = build_backbone(model_name, device='cpu')
    # gather val rows
    split_map = dict(zip(split_df['image_id'], split_df['is_val']))
    val_rows = []
    for idx, b in boxes_df.iterrows():
        iid = b['image_id']
        if not split_map.get(iid, False):
            continue
        if b['unicode'] not in u2idx:
            continue
        val_rows.append((iid, b['unicode'], int(b['x']), int(b['y']), int(b['w']), int(b['h'])))
    if len(val_rows) > max_val_samples:
        rng = np.random.RandomState(42)
        val_rows = [val_rows[i] for i in rng.choice(len(val_rows), size=max_val_samples, replace=False)]
    correct = 0; total = 0
    batch, gts = [], []
    t0 = time.time()
    index = faiss.IndexFlatIP(protos.shape[1])
    index.add(protos)
    for i, (iid,u,x,y,w,h) in enumerate(val_rows):
        p = Path(img_dir)/f'{iid}.jpg'
        if not p.exists():
            alts = list(Path(img_dir).glob(f'{iid}.*'))
            if not alts:
                continue
            p = alts[0]
        try:
            with Image.open(p) as im:
                arr = crop_pad_resize(im, x,y,w,h, pad_frac=0.15, out_size=224)
        except Exception:
            continue
        batch.append(arr); gts.append(u)
        if len(batch) == 128:
            embs = embed_batch(model, device, np.stack(batch,0), mean, std)
            # cosine via IP on L2-normalized vectors
            D,I = index.search(embs, 1)
            preds = [prot_u[i0] for i0 in I.flatten().tolist()]
            correct += sum(int(a==b) for a,b in zip(preds, gts))
            total += len(preds)
            batch, gts = [], []
        if (i+1) % 5000 == 0:
            print(f'Val processed {i+1}/{len(val_rows)}, elapsed {time.time()-t0:.1f}s', flush=True)
    if batch:
        embs = embed_batch(model, device, np.stack(batch,0), mean, std)
        D,I = index.search(embs, 1)
        preds = [prot_u[i0] for i0 in I.flatten().tolist()]
        correct += sum(int(a==b) for a,b in zip(preds, gts))
        total += len(preds)
    acc = correct / max(1,total)
    print(f'Val top1 accuracy vs prototypes: {acc:.4f} ({correct}/{total})')
    return acc

print('Two-stage kNN embedding utilities ready.', flush=True)
print('Next (after detector finishes or when GPU is free):')
print('- Build prototypes: protos, prot_u, model_name = build_prototypes(max_per_class=30)')
print('- Eval on holdout: eval_val_top1(protos, prot_u, model_name)')
print('Later for submission: detect -> crop -> embed -> nearest prototype -> unicode mapping')

In [None]:
import json, time, gc, os
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
import faiss, torch, timm
from ultralytics import YOLO
import cv2

# Minimal duplicate of crop preprocess to be self-contained
def crop_pad_resize(img: Image.Image, x:int, y:int, w:int, h:int, pad_frac:float=0.15, out_size:int=224):
    W, H = img.size
    cx = x + w/2.0; cy = y + h/2.0
    pw = max(2, int(round(w * pad_frac))); ph = max(2, int(round(h * pad_frac)))
    x1 = max(0, int(round(cx - w/2 - pw))); y1 = max(0, int(round(cy - h/2 - ph)))
    x2 = min(W, int(round(cx + w/2 + pw))); y2 = min(H, int(round(cy + h/2 + ph)))
    crop = img.crop((x1, y1, x2, y2)).convert('L')
    arr = np.array(crop)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    arr = clahe.apply(arr)
    h0, w0 = arr.shape[:2]; m = max(h0, w0)
    pad_top = (m - h0) // 2; pad_bottom = m - h0 - pad_top
    pad_left = (m - w0) // 2; pad_right = m - w0 - pad_left
    arr = cv2.copyMakeBorder(arr, pad_top, pad_bottom, pad_left, pad_right, borderType=cv2.BORDER_CONSTANT, value=0)
    arr = cv2.resize(arr, (out_size, out_size), interpolation=cv2.INTER_AREA)
    arr = np.stack([arr, arr, arr], axis=0).astype(np.float32) / 255.0
    return arr

def build_backbone(model_name:str='convnext_tiny', device='cpu'):
    model = timm.create_model(model_name, pretrained=True, num_classes=0, global_pool='avg')
    model.eval().to(device)
    # resolve data config for proper normalization
    data_cfg = timm.data.resolve_model_data_config(model)
    mean = torch.tensor(data_cfg.get('mean', (0.485, 0.456, 0.406)), dtype=torch.float32, device=device).view(1,3,1,1)
    std = torch.tensor(data_cfg.get('std', (0.229, 0.224, 0.225)), dtype=torch.float32, device=device).view(1,3,1,1)
    with torch.no_grad():
        dummy = torch.zeros(1,3,224,224, device=device)
        dummy = (dummy - mean) / std
        f = model(dummy)
    feat_dim = int(f.shape[1])
    return model, feat_dim, device, mean, std

def embed_batch(model, device, batch_np, mean:torch.Tensor, std:torch.Tensor):
    with torch.no_grad():
        t = torch.from_numpy(batch_np).to(device)
        t = (t - mean) / std
        feats = model(t)
        feats = torch.nn.functional.normalize(feats, p=2, dim=1)
        return feats.detach().cpu().numpy().astype(np.float32)

def two_stage_build_submission(det_weights: str,
                               prototypes_path: str = 'artifacts/prototypes.npy',
                               prot_unicodes_path: str = 'artifacts/prototypes_unicodes.json',
                               imgsz_det: int = 768,
                               conf: float = 0.08,
                               iou: float = 0.65,
                               max_det: int = 4000,
                               crop_size: int = 224,
                               pad_frac: float = 0.15,
                               backbone_name: str = 'convnext_tiny',
                               min_cosine: float = 0.45,
                               det_device: str = 'cpu',
                               save_name: str = 'submission.csv',
                               predict_half: bool = True,
                               dedup_cell: int = 7):
    # mitigate fragmentation
    os.environ['PYTORCH_CUDA_ALLOC_CONF'] = os.environ.get('PYTORCH_CUDA_ALLOC_CONF', 'expandable_segments:True')
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        gc.collect()

    ss = pd.read_csv('sample_submission.csv')
    image_ids = ss['image_id'].tolist()
    img_paths = []
    for img_id in image_ids:
        p = Path('test_images')/f'{img_id}.jpg'
        if not p.exists():
            alts = list(Path('test_images').glob(f'{img_id}.*'))
            if alts:
                p = alts[0]
        img_paths.append(str(p))

    print('Loading detector:', det_weights, flush=True)
    det = YOLO(det_weights)
    print('Loading prototypes from', prototypes_path, flush=True)
    protos = np.fromfile(prototypes_path, dtype=np.float32)
    unicodes = json.loads(Path(prot_unicodes_path).read_text())
    n_cls = len(unicodes)
    feat_dim = protos.size // max(1, n_cls)
    protos = protos.reshape(n_cls, feat_dim)
    faiss.normalize_L2(protos)
    index = faiss.IndexFlatIP(feat_dim)
    index.add(protos)
    u_list = unicodes

    print('Loading backbone on CPU to avoid GPU OOM:', backbone_name, flush=True)
    model, feat_dim_backbone, device, mean, std = build_backbone(backbone_name, device='cpu')

    if torch.cuda.is_available():
        torch.cuda.empty_cache()

    print('Running detection stream on', len(img_paths), 'images ...', flush=True)
    results = det.predict(source=img_paths, imgsz=imgsz_det, conf=conf, iou=iou, max_det=max_det, augment=True, device=det_device, stream=True, verbose=False, batch=1, half=predict_half)

    rows = []
    t0 = time.time()
    for i, (img_id, img_path, res) in enumerate(zip(image_ids, img_paths, results)):
        if i % 25 == 0:
            print(f'Processed {i}/{len(image_ids)} images, elapsed {time.time()-t0:.1f}s', flush=True)
        labels_out = []
        with Image.open(img_path) as im:
            W, H = im.size
            if res and hasattr(res, 'boxes') and res.boxes is not None and len(res.boxes) > 0:
                b = res.boxes
                xyxy = b.xyxy.cpu().numpy()
                batch = []
                centers = []
                for (x1,y1,x2,y2) in xyxy:
                    x = int(round(x1)); y = int(round(y1)); w = int(round(x2-x1)); h = int(round(y2-y1))
                    # precision filter: skip tiny boxes
                    if w < 5 or h < 5:
                        continue
                    arr = crop_pad_resize(im, x,y,w,h, pad_frac=pad_frac, out_size=crop_size)
                    batch.append(arr)
                    cx = int(round((x1+x2)/2.0)); cy = int(round((y1+y2)/2.0))
                    cx = max(0, min(cx, W-1)); cy = max(0, min(cy, H-1))
                    centers.append((cx, cy))
                if batch:
                    embs = embed_batch(model, device, np.stack(batch,0), mean, std)
                    D,I = index.search(embs, 1)
                    sims = D.flatten().tolist()
                    idxs = I.flatten().tolist()
                    # collect predictions
                    pred_list = []  # (u, (cx,cy), sim)
                    for (cx,cy), j, sim in zip(centers, idxs, sims):
                        if sim >= min_cosine:
                            u = u_list[j] if 0 <= j < len(u_list) else None
                            if u:
                                pred_list.append((u, (cx,cy), float(sim)))
                    # deduplicate within small center cells per unicode
                    if pred_list:
                        cells = {}  # key: (u, cx_cell, cy_cell) -> (u,(cx,cy),sim)
                        for u, (cx,cy), sim in pred_list:
                            key = (u, cx//dedup_cell, cy//dedup_cell)
                            if key not in cells or sim > cells[key][2]:
                                cells[key] = (u, (cx,cy), sim)
                        final_preds = list(cells.values())
                        for u, (cx,cy), sim in final_preds:
                            labels_out.extend([u, str(cx), str(cy)])
        rows.append({'image_id': img_id, 'labels': ' '.join(labels_out)})
        del res

    sub = pd.DataFrame(rows)
    sub.to_csv(save_name, index=False)
    print('Saved', save_name, 'shape', sub.shape)
    return sub

print('Two-stage submission function ready: two_stage_build_submission(det_weights, ...)')

In [None]:
from pathlib import Path
import numpy as np, json, pandas as pd

# Driver: rebuild prototypes with normalization, then two-stage submission (skip eval for speed)
print('Rebuilding prototypes with normalization (max_per_class=30)...', flush=True)
protos, prot_u, model_name = build_prototypes(max_per_class=30, img_dir='train_images', out_dir='artifacts', model_name='convnext_tiny')

# Skip eval_val_top1 to keep GPU free and save time
# acc = eval_val_top1(protos, prot_u, model_name=model_name, img_dir='train_images', max_val_samples=10000)
# print('Val top1:', acc)

det_weights = 'runs/yolo8n_kuz_single/weights/best.pt'
print('Generating two-stage submission using detector:', det_weights, flush=True)
two_stage_build_submission(det_weights=det_weights,
                           prototypes_path='artifacts/prototypes.npy',
                           prot_unicodes_path='artifacts/prototypes_unicodes.json',
                           imgsz_det=832,
                           conf=0.15,
                           iou=0.65,
                           max_det=2000,
                           crop_size=224,
                           pad_frac=0.15,
                           backbone_name='convnext_tiny',
                           min_cosine=0.45,
                           det_device=0,
                           save_name='submission.csv')
print('Done. Check submission.csv head:')
print(pd.read_csv('submission.csv').head())

In [None]:
import numpy as np, json, time
from pathlib import Path
import faiss, torch, timm
from PIL import Image
import pandas as pd

# Sweep min_cosine on validation crops to choose threshold (F1 proxy)
@torch.no_grad()
def _build_backbone_cpu(model_name='convnext_tiny'):
    model = timm.create_model(model_name, pretrained=True, num_classes=0, global_pool='avg')
    model.eval().to('cpu')
    data_cfg = timm.data.resolve_model_data_config(model)
    mean = torch.tensor(data_cfg.get('mean', (0.485,0.456,0.406)), dtype=torch.float32, device='cpu').view(1,3,1,1)
    std  = torch.tensor(data_cfg.get('std',  (0.229,0.224,0.225)), dtype=torch.float32, device='cpu').view(1,3,1,1)
    return model, mean, std

@torch.no_grad()
def _embed_batch_cpu(model, batch_np, mean, std):
    t = torch.from_numpy(batch_np).to('cpu')
    t = (t - mean) / std
    f = model(t)
    f = torch.nn.functional.normalize(f, p=2, dim=1)
    return f.cpu().numpy().astype(np.float32)

def sweep_min_cosine(protos_path='artifacts/prototypes.npy',
                     prot_unicodes_path='artifacts/prototypes_unicodes.json',
                     model_name='convnext_tiny',
                     thresholds=np.arange(0.35, 0.71, 0.05),
                     max_val_samples=20000,
                     img_dir='train_images'):
    protos = np.fromfile(protos_path, dtype=np.float32)
    unicodes = json.loads(Path(prot_unicodes_path).read_text())
    n_cls = len(unicodes)
    d = protos.size // max(1, n_cls)
    protos = protos.reshape(n_cls, d).astype(np.float32)
    faiss.normalize_L2(protos)
    index = faiss.IndexFlatIP(d); index.add(protos)
    u2idx = {u:i for i,u in enumerate(unicodes)}
    split_map = dict(zip(split_df['image_id'], split_df['is_val']))
    rows = [(b.image_id, b.unicode, int(b.x), int(b.y), int(b.w), int(b.h))
            for _, b in boxes_df.iterrows()
            if split_map.get(b.image_id, False) and (b.unicode in u2idx)]
    if len(rows) > max_val_samples:
        rng = np.random.RandomState(42)
        rows = [rows[i] for i in rng.choice(len(rows), size=max_val_samples, replace=False)]
    model, mean, std = _build_backbone_cpu(model_name)
    sims = []; corrects = []
    batch = []; gts = []
    t0 = time.time()
    for i, (iid,u,x,y,w,h) in enumerate(rows):
        p = Path(img_dir)/f'{iid}.jpg'
        if not p.exists():
            alts = list(Path(img_dir).glob(f'{iid}.*'))
            if not alts:
                continue
            p = alts[0]
        with Image.open(p) as im:
            arr = crop_pad_resize(im, x,y,w,h, pad_frac=0.15, out_size=224)
        batch.append(arr); gts.append(u)
        if len(batch) == 128 or i == len(rows)-1:
            embs = _embed_batch_cpu(model, np.stack(batch,0), mean, std)
            D,I = index.search(embs, 1)
            for d_, i0, u_gt in zip(D.flatten(), I.flatten(), gts):
                sims.append(float(d_)); corrects.append(1 if unicodes[i0]==u_gt else 0)
            batch = []; gts = []
        if (i+1) % 5000 == 0:
            print(f'Sweep gather {i+1}/{len(rows)} elapsed {time.time()-t0:.1f}s', flush=True)
    sims = np.array(sims); corrects = np.array(corrects)
    best_t, best_f1 = 0.50, -1.0
    for t in thresholds:
        m = sims >= t
        if m.sum() == 0:
            print(f't={t:.2f} no positives'); continue
        prec = corrects[m].mean()
        rec = m.mean()
        f1 = 2*prec*rec/(prec+rec+1e-9)
        print(f't={t:.2f} prec={prec:.4f} rec={rec:.4f} F1p={f1:.4f}')
        if f1 > best_f1:
            best_f1, best_t = f1, float(t)
    print('Best min_cosine:', best_t, 'F1proxy:', round(best_f1,4))
    return best_t

# 1) Sweep for best min_cosine (≈20–30 min on 20k crops); if short on time, comment and set 0.50
try:
    best_t = sweep_min_cosine()
except Exception as e:
    print('Sweep failed, defaulting min_cosine=0.50. Error:', e)
    best_t = 0.50

# 2) Re-run submission with improved detector params
print('Building improved submission with imgsz=1024, conf=0.12, min_cosine=', best_t)
try:
    two_stage_build_submission(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                               prototypes_path='artifacts/prototypes.npy',
                               prot_unicodes_path='artifacts/prototypes_unicodes.json',
                               imgsz_det=1024,
                               conf=0.12,
                               iou=0.65,
                               max_det=2000,
                               crop_size=224,
                               pad_frac=0.15,
                               backbone_name='convnext_tiny',
                               min_cosine=best_t,
                               det_device=0,
                               save_name='submission_v2.csv')
except RuntimeError as e:
    # Fallback to 960 if OOM on 1024
    print('Got error (likely OOM) at 1024. Falling back to imgsz=960. Error:', e)
    two_stage_build_submission(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                               prototypes_path='artifacts/prototypes.npy',
                               prot_unicodes_path='artifacts/prototypes_unicodes.json',
                               imgsz_det=960,
                               conf=0.12,
                               iou=0.65,
                               max_det=2000,
                               crop_size=224,
                               pad_frac=0.15,
                               backbone_name='convnext_tiny',
                               min_cosine=best_t,
                               det_device=0,
                               save_name='submission_v2.csv')

print('submission_v2.csv head:')
print(pd.read_csv('submission_v2.csv').head())

In [None]:
import os, time, json, pandas as pd
import torch

def try_build_submission_chain(sizes=(1024, 960, 896, 864, 832, 768),
                               conf=0.12,
                               min_cosine=0.7,
                               save_name='submission_v2.csv'):
    det_weights='runs/yolo8n_kuz_single/weights/best.pt'
    print('Starting fallback chain with sizes:', sizes, 'conf=', conf, 'min_cosine=', min_cosine, flush=True)
    os.environ['PYTORCH_CUDA_ALLOC_CONF'] = os.environ.get('PYTORCH_CUDA_ALLOC_CONF','expandable_segments:True')
    for s in sizes:
        try:
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
            print(f'Attempting imgsz={s} ...', flush=True)
            sub = two_stage_build_submission(det_weights=det_weights,
                                            prototypes_path='artifacts/prototypes.npy',
                                            prot_unicodes_path='artifacts/prototypes_unicodes.json',
                                            imgsz_det=int(s),
                                            conf=float(conf),
                                            iou=0.65,
                                            max_det=2000,
                                            crop_size=224,
                                            pad_frac=0.15,
                                            backbone_name='convnext_tiny',
                                            min_cosine=float(min_cosine),
                                            det_device=0,
                                            save_name=save_name,
                                            predict_half=True)
            print('Success at imgsz', s, '->', save_name, flush=True)
            print(pd.read_csv(save_name).head())
            return s
        except Exception as e:
            print(f'Failed at imgsz={s}: {e}', flush=True)
            if torch.cuda.is_available():
                try:
                    print('CUDA mem allocated/reserved (GB):',
                          round(torch.cuda.memory_allocated()/1e9,3),
                          round(torch.cuda.memory_reserved()/1e9,3))
                except Exception:
                    pass
            time.sleep(1)
    raise RuntimeError('All sizes failed in fallback chain')

# Use best threshold from sweep (0.7) and run chain
best_t = 0.7
chosen_size = try_build_submission_chain(sizes=(1024, 960, 896, 864, 832, 768), conf=0.12, min_cosine=best_t, save_name='submission_v2.csv')
print('Chosen imgsz_det:', chosen_size)

In [None]:
import shutil, pandas as pd, os
src = 'submission_v2.csv'
dst = 'submission.csv'
assert os.path.exists(src), f"{src} not found"
shutil.copy2(src, dst)
print('Copied', src, '->', dst)
print(pd.read_csv(dst).head())

In [None]:
import shutil, os, pandas as pd, torch
print('Rebuilding median prototypes with max_per_class=50 using convnext_small ...', flush=True)
protos, prot_u, model_name = build_prototypes(max_per_class=50, img_dir='train_images', out_dir='artifacts', model_name='convnext_small')
print('Prototypes ready:', protos.shape, 'classes:', len(prot_u))
if torch.cuda.is_available():
    torch.cuda.empty_cache()
print('Running two-stage with augment=True, max_det=3000, dedup, imgsz=1024, conf=0.10, min_cosine=0.7, backbone=convnext_small ...', flush=True)
two_stage_build_submission(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                           prototypes_path='artifacts/prototypes.npy',
                           prot_unicodes_path='artifacts/prototypes_unicodes.json',
                           imgsz_det=1024,
                           conf=0.10,
                           iou=0.65,
                           max_det=3000,
                           crop_size=224,
                           pad_frac=0.15,
                           backbone_name='convnext_small',
                           min_cosine=0.7,
                           det_device=0,
                           save_name='submission_v4.csv',
                           predict_half=True,
                           dedup_cell=5)
print('Copying submission_v4.csv -> submission.csv for grading ...', flush=True)
shutil.copy2('submission_v4.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

In [None]:
import numpy as np, json, time, pandas as pd, torch
from pathlib import Path
import faiss
from PIL import Image

# Quick sweep for min_cosine using convnext_small prototypes
@torch.no_grad()
def _build_backbone_cpu(model_name='convnext_small'):
    import timm, torch
    model = timm.create_model(model_name, pretrained=True, num_classes=0, global_pool='avg')
    model.eval().to('cpu')
    data_cfg = timm.data.resolve_model_data_config(model)
    mean = torch.tensor(data_cfg.get('mean', (0.485,0.456,0.406)), dtype=torch.float32, device='cpu').view(1,3,1,1)
    std  = torch.tensor(data_cfg.get('std',  (0.229,0.224,0.225)), dtype=torch.float32, device='cpu').view(1,3,1,1)
    return model, mean, std

@torch.no_grad()
def _embed_batch_cpu(model, batch_np, mean, std):
    import torch
    t = torch.from_numpy(batch_np).to('cpu')
    t = (t - mean) / std
    f = model(t)
    f = torch.nn.functional.normalize(f, p=2, dim=1)
    return f.cpu().numpy().astype(np.float32)

def sweep_min_cosine_small(protos_path='artifacts/prototypes.npy',
                           prot_unicodes_path='artifacts/prototypes_unicodes.json',
                           thresholds=np.arange(0.40, 0.76, 0.05),
                           max_val_samples=20000,
                           img_dir='train_images'):
    protos = np.fromfile(protos_path, dtype=np.float32)
    unicodes = json.loads(Path(prot_unicodes_path).read_text())
    n_cls = len(unicodes)
    d = protos.size // max(1, n_cls)
    protos = protos.reshape(n_cls, d).astype(np.float32)
    faiss.normalize_L2(protos)
    index = faiss.IndexFlatIP(d); index.add(protos)
    u2idx = {u:i for i,u in enumerate(unicodes)}
    split_map = dict(zip(split_df['image_id'], split_df['is_val']))
    rows = [(b.image_id, b.unicode, int(b.x), int(b.y), int(b.w), int(b.h))
            for _, b in boxes_df.iterrows()
            if split_map.get(b.image_id, False) and (b.unicode in u2idx)]
    if len(rows) > max_val_samples:
        rng = np.random.RandomState(42)
        rows = [rows[i] for i in rng.choice(len(rows), size=max_val_samples, replace=False)]
    model, mean, std = _build_backbone_cpu('convnext_small')
    sims = []; corrects = []
    batch = []; gts = []
    t0 = time.time()
    for i, (iid,u,x,y,w,h) in enumerate(rows):
        p = Path(img_dir)/f'{iid}.jpg'
        if not p.exists():
            alts = list(Path(img_dir).glob(f'{iid}.*'))
            if not alts:
                continue
            p = alts[0]
        with Image.open(p) as im:
            arr = crop_pad_resize(im, x,y,w,h, pad_frac=0.15, out_size=224)
        batch.append(arr); gts.append(u)
        if len(batch) == 128 or i == len(rows)-1:
            embs = _embed_batch_cpu(model, np.stack(batch,0), mean, std)
            D,I = index.search(embs, 1)
            for d_, i0, u_gt in zip(D.flatten(), I.flatten(), gts):
                sims.append(float(d_)); corrects.append(1 if unicodes[i0]==u_gt else 0)
            batch = []; gts = []
        if (i+1) % 5000 == 0:
            print(f'Sweep gather {i+1}/{len(rows)} elapsed {time.time()-t0:.1f}s', flush=True)
    sims = np.array(sims); corrects = np.array(corrects)
    best_t, best_f1 = 0.60, -1.0
    for t in thresholds:
        m = sims >= t
        if m.sum() == 0:
            print(f't={t:.2f} no positives'); continue
        prec = corrects[m].mean()
        rec = m.mean()
        f1 = 2*prec*rec/(prec+rec+1e-9)
        print(f't={t:.2f} prec={prec:.4f} rec={rec:.4f} F1p={f1:.4f}')
        if f1 > best_f1:
            best_f1, best_t = f1, float(t)
    print('Best min_cosine:', best_t, 'F1proxy:', round(best_f1,4))
    return best_t

print('Sweeping min_cosine for convnext_small prototypes ...', flush=True)
try:
    best_t = sweep_min_cosine_small()
except Exception as e:
    print('Sweep failed; defaulting to 0.60. Error:', e); best_t = 0.60

print('Building v5 submission with conf=0.08, iou=0.65, max_det=4000, dedup_cell=7, min_cosine=', best_t, flush=True)
sub = two_stage_build_submission(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                 prototypes_path='artifacts/prototypes.npy',
                                 prot_unicodes_path='artifacts/prototypes_unicodes.json',
                                 imgsz_det=1024,
                                 conf=0.08,
                                 iou=0.65,
                                 max_det=4000,
                                 crop_size=224,
                                 pad_frac=0.15,
                                 backbone_name='convnext_small',
                                 min_cosine=best_t,
                                 det_device=0,
                                 save_name='submission_v5.csv',
                                 predict_half=True,
                                 dedup_cell=7)
import shutil
shutil.copy2('submission_v5.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

In [None]:
# v6: Exemplar kNN voting + optional two-backbone fusion + adaptive dedup
import os, json, time, gc, math
from pathlib import Path
from typing import List, Tuple, Dict, Optional
import numpy as np
import pandas as pd
import faiss, torch, timm, cv2
from PIL import Image
from ultralytics import YOLO

def _crop_pad_resize(img: Image.Image, x:int, y:int, w:int, h:int, pad_frac:float=0.15, out_size:int=224):
    W, H = img.size
    cx = x + w/2.0; cy = y + h/2.0
    pw = max(2, int(round(w * pad_frac))); ph = max(2, int(round(h * pad_frac)))
    x1 = max(0, int(round(cx - w/2 - pw))); y1 = max(0, int(round(cy - h/2 - ph)))
    x2 = min(W, int(round(cx + w/2 + pw))); y2 = min(H, int(round(cy + h/2 + ph)))
    crop = img.crop((x1, y1, x2, y2)).convert('L')
    arr = np.array(crop)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    arr = clahe.apply(arr)
    h0, w0 = arr.shape[:2]; m = max(h0, w0)
    pad_top = (m - h0) // 2; pad_bottom = m - h0 - pad_top
    pad_left = (m - w0) // 2; pad_right = m - w0 - pad_left
    arr = cv2.copyMakeBorder(arr, pad_top, pad_bottom, pad_left, pad_right, borderType=cv2.BORDER_CONSTANT, value=0)
    arr = cv2.resize(arr, (out_size, out_size), interpolation=cv2.INTER_AREA)
    arr = np.stack([arr, arr, arr], axis=0).astype(np.float32) / 255.0
    return arr

@torch.no_grad()
def _build_backbone_cpu(model_name:str):
    model = timm.create_model(model_name, pretrained=True, num_classes=0, global_pool='avg')
    model.eval().to('cpu')
    data_cfg = timm.data.resolve_model_data_config(model)
    mean = torch.tensor(data_cfg.get('mean', (0.485,0.456,0.406)), dtype=torch.float32, device='cpu').view(1,3,1,1)
    std  = torch.tensor(data_cfg.get('std',  (0.229,0.224,0.225)), dtype=torch.float32, device='cpu').view(1,3,1,1)
    # warmup
    _ = model((torch.zeros(1,3,224,224)-mean)/std)
    return model, mean, std

@torch.no_grad()
def _embed_batch_cpu(model, batch_np, mean, std):
    t = torch.from_numpy(batch_np).to('cpu')
    t = (t - mean) / std
    f = model(t)
    f = torch.nn.functional.normalize(f, p=2, dim=1)
    return f.cpu().numpy().astype(np.float32)

def build_exemplar_bank(max_per_class:int=3,
                        img_dir:str='train_images',
                        out_dir:str='artifacts',
                        model_name:str='convnext_small',
                        pad_frac:float=0.15,
                        out_size:int=224):
    out = Path(out_dir); out.mkdir(parents=True, exist_ok=True)
    split_map = dict(zip(split_df['image_id'], split_df['is_val']))
    df = boxes_df.copy()
    df = df[~df['image_id'].map(split_map).fillna(False)]  # train-only
    # cap exemplars per unicode
    rng = np.random.RandomState(42)
    def _sample_grp(g):
        if len(g) <= max_per_class: return g
        idx = rng.choice(len(g), size=max_per_class, replace=False)
        return g.iloc[idx]
    df_s = df.groupby('unicode', as_index=False, group_keys=False).apply(_sample_grp)
    df_s = df_s.sample(frac=1.0, random_state=42).reset_index(drop=True)
    print(f'Exemplar sampling: classes={df_s.unicode.nunique()}, exemplars={len(df_s)} (<= {max_per_class}/class)', flush=True)
    model, mean, std = _build_backbone_cpu(model_name)
    batch, metas = [], []
    embs = []
    t0 = time.time(); last = t0
    for i, r in enumerate(df_s.itertuples(index=False)):
        iid = r.image_id; u = r.unicode; x=int(r.x); y=int(r.y); w=int(r.w); h=int(r.h)
        p = Path(img_dir)/f'{iid}.jpg'
        if not p.exists():
            alts = list(Path(img_dir).glob(f'{iid}.*'))
            if not alts: continue
            p = alts[0]
        try:
            with Image.open(p) as im:
                arr = _crop_pad_resize(im, x,y,w,h, pad_frac=pad_frac, out_size=out_size)
        except Exception:
            continue
        batch.append(arr); metas.append(u)
        if len(batch)==128 or i==len(df_s)-1:
            e = _embed_batch_cpu(model, np.stack(batch,0), mean, std)
            embs.append(e); batch=[]
        if (i+1)%5000==0 or (time.time()-last)>120:
            print(f'Exemplar embed {i+1}/{len(df_s)} elapsed {time.time()-t0:.1f}s', flush=True); last=time.time()
    if batch:
        e = _embed_batch_cpu(model, np.stack(batch,0), mean, std)
        embs.append(e)
    embs = np.concatenate(embs, 0).astype(np.float32)
    faiss.normalize_L2(embs)
    # Save bank
    (out/'exemplars.npy').write_bytes(embs.tobytes())
    (out/'exemplars_unicodes.json').write_text(json.dumps(metas, ensure_ascii=False))
    (out/'exemplars_backbone.txt').write_text(model_name)
    print('Saved exemplars:', embs.shape, 'backbone:', model_name, flush=True)
    return embs, metas, model_name

def _search_k_per_unicode(index: faiss.IndexFlatIP,
                          queries: np.ndarray,
                          exemplar_unicodes: List[str],
                          k:int=5) -> List[Dict[str, float]]:
    # Returns for each query: dict unicode -> summed cosine over top-k neighbors of that unicode
    D, I = index.search(queries, k)
    out = []
    for drow, irow in zip(D, I):
        acc: Dict[str, float] = {}
        for sim, idx in zip(drow.tolist(), irow.tolist()):
            if idx == -1: continue
            u = exemplar_unicodes[idx]
            acc[u] = acc.get(u, 0.0) + float(sim)
        out.append(acc)
    return out

def two_stage_build_submission_exemplars(det_weights: str,
                                         imgsz_det:int=1024,
                                         conf:float=0.08,
                                         iou:float=0.65,
                                         max_det:int=4000,
                                         pad_frac:float=0.15,
                                         crop_size:int=224,
                                         backbone_primary:str='convnext_small',
                                         min_cosine:float=0.65,
                                         k_vote:int=5,
                                         save_name:str='submission_v6.csv',
                                         predict_half:bool=True,
                                         det_device=0,
                                         tiny_filter:int=5,
                                         exemplars_path:str='artifacts/exemplars.npy',
                                         exemplars_unicodes_path:str='artifacts/exemplars_unicodes.json',
                                         # Optional second backbone fusion
                                         exemplars2_path: Optional[str]=None,
                                         exemplars2_unicodes_path: Optional[str]=None,
                                         backbone_secondary: Optional[str]=None):
    os.environ['PYTORCH_CUDA_ALLOC_CONF'] = os.environ.get('PYTORCH_CUDA_ALLOC_CONF','expandable_segments:True')
    if torch.cuda.is_available():
        torch.cuda.empty_cache(); gc.collect()
    ss = pd.read_csv('sample_submission.csv')
    image_ids = ss['image_id'].tolist()
    img_paths = []
    for img_id in image_ids:
        p = Path('test_images')/f'{img_id}.jpg'
        if not p.exists():
            alts = list(Path('test_images').glob(f'{img_id}.*'))
            if alts: p = alts[0]
        img_paths.append(str(p))
    print('Loading detector:', det_weights, flush=True)
    det = YOLO(det_weights)
    # Load exemplar bank(s)
    ex1 = np.fromfile(exemplars_path, dtype=np.float32)
    metas1: List[str] = json.loads(Path(exemplars_unicodes_path).read_text())
    d1 = ex1.size // max(1,len(metas1)); ex1 = ex1.reshape(-1, d1).astype(np.float32)
    faiss.normalize_L2(ex1)
    idx1 = faiss.IndexFlatIP(d1); idx1.add(ex1)
    print('Exemplar bank1:', ex1.shape, 'k=', k_vote, flush=True)
    use_fusion = False
    if exemplars2_path and exemplars2_unicodes_path and backbone_secondary:
        ex2 = np.fromfile(exemplars2_path, dtype=np.float32)
        metas2: List[str] = json.loads(Path(exemplars2_unicodes_path).read_text())
        d2 = ex2.size // max(1,len(metas2)); ex2 = ex2.reshape(-1, d2).astype(np.float32)
        faiss.normalize_L2(ex2)
        idx2 = faiss.IndexFlatIP(d2); idx2.add(ex2)
        print('Exemplar bank2:', ex2.shape, 'k=', k_vote, flush=True)
        use_fusion = True
    # Load backbone(s) on CPU
    model1, mean1, std1 = _build_backbone_cpu(backbone_primary)
    if use_fusion:
        model2, mean2, std2 = _build_backbone_cpu(backbone_secondary)
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    print('Running detection stream on', len(img_paths), 'images ...', flush=True)
    results = det.predict(source=img_paths, imgsz=imgsz_det, conf=conf, iou=iou, max_det=max_det, augment=True, device=det_device, stream=True, verbose=False, batch=1, half=predict_half)
    rows = []; t0 = time.time()
    for i, (img_id, img_path, res) in enumerate(zip(image_ids, img_paths, results)):
        if i % 25 == 0:
            print(f'Processed {i}/{len(image_ids)} images, elapsed {time.time()-t0:.1f}s', flush=True)
        labels_out = []
        with Image.open(img_path) as im:
            W, H = im.size
            if res and hasattr(res,'boxes') and res.boxes is not None and len(res.boxes)>0:
                b = res.boxes
                xyxy = b.xyxy.cpu().numpy()
                batch1 = []; centers = []; ws = []; hs = []
                for (x1,y1,x2,y2) in xyxy:
                    x = int(round(x1)); y = int(round(y1)); w = int(round(x2-x1)); h = int(round(y2-y1))
                    if w < tiny_filter or h < tiny_filter:
                        continue
                    arr = _crop_pad_resize(im, x,y,w,h, pad_frac=pad_frac, out_size=crop_size)
                    batch1.append(arr);
                    cx = int(round((x1+x2)/2.0)); cy = int(round((y1+y2)/2.0))
                    cx = max(0, min(cx, W-1)); cy = max(0, min(cy, H-1))
                    centers.append((cx,cy)); ws.append(w); hs.append(h)
                if batch1:
                    q1 = _embed_batch_cpu(model1, np.stack(batch1,0), mean1, std1)
                    # vote on bank1
                    votes1 = _search_k_per_unicode(idx1, q1, metas1, k=k_vote)
                    if use_fusion:
                        q2 = _embed_batch_cpu(model2, np.stack(batch1,0), mean2, std2)
                        votes2 = _search_k_per_unicode(idx2, q2, metas2, k=k_vote)
                    # finalize per-crop prediction
                    preds = []  # (u, cx, cy, score, w, h)
                    for j in range(len(centers)):
                        # primary
                        v1 = votes1[j]
                        best_u = None; best_score = -1.0
                        # candidate set:
                        cand_us = set(v1.keys())
                        if use_fusion:
                            v2 = votes2[j]
                            cand_us |= set(v2.keys())
                        for u in cand_us:
                            s1 = v1.get(u, 0.0)
                            if use_fusion:
                                s2 = v2.get(u, 0.0)
                                s = max(s1, s2)  # fusion by max cosine per unicode
                            else:
                                s = s1
                            if s > best_score:
                                best_score = s; best_u = u
                        # gate by strongest single neighbor cosine approximation:
                        # We approximate by requiring best_score >= min_cosine (since votes sum top-k cosines).
                        if best_u is not None and best_score >= float(min_cosine):
                            (cx,cy) = centers[j]
                            preds.append((best_u, cx, cy, float(best_score), ws[j], hs[j]))
                    # adaptive dedup by unicode and size-aware grid
                    if preds:
                        kept = {}  # key: (u, gx, gy) -> (u,cx,cy,score)
                        for u, cx, cy, sc, w, h in preds:
                            cell_size = max(7, int(max(1, min(w,h))//4))
                            gx = cx // cell_size; gy = cy // cell_size
                            key = (u, gx, gy)
                            if key not in kept or sc > kept[key][3]:
                                kept[key] = (u, cx, cy, sc)
                        for u, cx, cy, sc in kept.values():
                            labels_out.extend([u, str(cx), str(cy)])
        rows.append({'image_id': img_id, 'labels': ' '.join(labels_out)})
        del res
    sub = pd.DataFrame(rows)
    sub.to_csv(save_name, index=False)
    print('Saved', save_name, 'shape', sub.shape, flush=True)
    return sub

print('v6 utilities ready:')
print('- build_exemplar_bank(max_per_class=3, model_name=\'convnext_small\')')
print('- Optional: also build tiny bank for fusion, then call two_stage_build_submission_exemplars(...)')

In [None]:
# Driver: v6 exemplar kNN voting (single backbone) with adaptive dedup
import shutil, pandas as pd
print('Building exemplar bank (convnext_small, max_per_class=3) ...', flush=True)
embs, metas, bb = build_exemplar_bank(max_per_class=3, img_dir='train_images', out_dir='artifacts', model_name='convnext_small', pad_frac=0.15, out_size=224)
print('Exemplars built:', embs.shape, 'backbone:', bb, flush=True)
print('Running two_stage_build_submission_exemplars with voting k=5, min_cosine=0.65 ...', flush=True)
sub6 = two_stage_build_submission_exemplars(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                            imgsz_det=1024,
                                            conf=0.08,
                                            iou=0.65,
                                            max_det=4000,
                                            pad_frac=0.15,
                                            crop_size=224,
                                            backbone_primary='convnext_small',
                                            min_cosine=0.65,
                                            k_vote=5,
                                            save_name='submission_v6.csv',
                                            predict_half=True,
                                            det_device=0,
                                            tiny_filter=5,
                                            exemplars_path='artifacts/exemplars.npy',
                                            exemplars_unicodes_path='artifacts/exemplars_unicodes.json')
print('Copying submission_v6.csv -> submission.csv for potential submit ...', flush=True)
shutil.copy2('submission_v6.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

In [None]:
# Driver: v7 two-backbone fusion (convnext_small + convnext_tiny) with exemplar voting and adaptive dedup
import shutil, os, pandas as pd
from pathlib import Path

art = Path('artifacts')
small_np = art/'exemplars.npy'
small_js = art/'exemplars_unicodes.json'
small_np_tag = art/'exemplars_small.npy'
small_js_tag = art/'exemplars_small_unicodes.json'
tiny_np_tag = art/'exemplars_tiny.npy'
tiny_js_tag = art/'exemplars_tiny_unicodes.json'

# 1) Preserve existing convnext_small exemplars as bank1
assert small_np.exists() and small_js.exists(), 'convnext_small exemplars not found; run v6 driver first'
shutil.copy2(small_np, small_np_tag)
shutil.copy2(small_js, small_js_tag)
print('Saved bank1 (small) ->', small_np_tag.name, small_js_tag.name, flush=True)

# 2) Build convnext_tiny exemplars as bank2 (will overwrite artifacts/exemplars.* temporarily)
print('Building exemplar bank2 (convnext_tiny, max_per_class=3) ...', flush=True)
embs2, metas2, bb2 = build_exemplar_bank(max_per_class=3, img_dir='train_images', out_dir='artifacts', model_name='convnext_tiny', pad_frac=0.15, out_size=224)
print('Exemplars2 built:', embs2.shape, 'backbone:', bb2, flush=True)
shutil.copy2(art/'exemplars.npy', tiny_np_tag)
shutil.copy2(art/'exemplars_unicodes.json', tiny_js_tag)
print('Saved bank2 (tiny) ->', tiny_np_tag.name, tiny_js_tag.name, flush=True)

# 3) Run fused inference: vote per bank (k=5), fuse by max cosine per unicode, threshold 0.65
print('Running v7 fused inference (small+tiny) k=5 min_cosine=0.65 ...', flush=True)
sub7 = two_stage_build_submission_exemplars(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                            imgsz_det=1024,
                                            conf=0.08,
                                            iou=0.65,
                                            max_det=4000,
                                            pad_frac=0.15,
                                            crop_size=224,
                                            backbone_primary='convnext_small',
                                            min_cosine=0.65,
                                            k_vote=5,
                                            save_name='submission_v7.csv',
                                            predict_half=True,
                                            det_device=0,
                                            tiny_filter=5,
                                            exemplars_path=str(small_np_tag),
                                            exemplars_unicodes_path=str(small_js_tag),
                                            exemplars2_path=str(tiny_np_tag),
                                            exemplars2_unicodes_path=str(tiny_js_tag),
                                            backbone_secondary='convnext_tiny')
print('Copying submission_v7.csv -> submission.csv ...', flush=True)
shutil.copy2('submission_v7.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

In [None]:
# v7b: Re-run fused inference with stricter threshold (min_cosine=0.67) using existing banks
import pandas as pd, shutil
from pathlib import Path

art = Path('artifacts')
small_np_tag = art/'exemplars_small.npy'
small_js_tag = art/'exemplars_small_unicodes.json'
tiny_np_tag = art/'exemplars_tiny.npy'
tiny_js_tag = art/'exemplars_tiny_unicodes.json'
assert small_np_tag.exists() and small_js_tag.exists() and tiny_np_tag.exists() and tiny_js_tag.exists(), 'Exemplar banks not found; run v7 first.'
print('Running v7b fused inference with min_cosine=0.67 (stricter) ...', flush=True)
sub7b = two_stage_build_submission_exemplars(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                             imgsz_det=1024,
                                             conf=0.08,
                                             iou=0.65,
                                             max_det=4000,
                                             pad_frac=0.15,
                                             crop_size=224,
                                             backbone_primary='convnext_small',
                                             min_cosine=0.67,
                                             k_vote=5,
                                             save_name='submission_v7b.csv',
                                             predict_half=True,
                                             det_device=0,
                                             tiny_filter=5,
                                             exemplars_path=str(small_np_tag),
                                             exemplars_unicodes_path=str(small_js_tag),
                                             exemplars2_path=str(tiny_np_tag),
                                             exemplars2_unicodes_path=str(tiny_js_tag),
                                             backbone_secondary='convnext_tiny')
print('Copying submission_v7b.csv -> submission.csv ...', flush=True)
shutil.copy2('submission_v7b.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

In [None]:
# v8: Multi-scale union (896 + 1024) + two-backbone exemplar voting with adaptive dedup
import time, json, gc, os, math
from pathlib import Path
from typing import List, Tuple, Optional, Dict
import numpy as np
import pandas as pd
import torch, faiss, timm, cv2
from PIL import Image
from ultralytics import YOLO

# Reuse helpers from v6 cell (crop/emb/backbone/search).
def _nms_iou_xyxy(boxes: np.ndarray, scores: np.ndarray, iou_thr: float=0.70, limit: Optional[int]=None) -> List[int]:
    # boxes: (N,4) xyxy, scores: (N,)
    if boxes.size == 0:
        return []
    x1 = boxes[:,0]; y1 = boxes[:,1]; x2 = boxes[:,2]; y2 = boxes[:,3]
    areas = (np.maximum(0, x2 - x1 + 1) * np.maximum(0, y2 - y1 + 1))
    order = scores.argsort()[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(int(i))
        if limit is not None and len(keep) >= limit:
            break
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter + 1e-9)
        inds = np.where(ovr <= iou_thr)[0]
        order = order[inds + 1]
    return keep

def two_stage_build_submission_exemplars_multiscale(det_weights: str,
                                                    sizes: Tuple[int,int]=(896,1024),
                                                    conf: float=0.08,
                                                    iou: float=0.65,
                                                    max_det: int=4000,
                                                    union_iou: float=0.70,
                                                    pad_frac: float=0.15,
                                                    crop_size: int=224,
                                                    backbone_primary: str='convnext_small',
                                                    min_cosine: float=0.65,
                                                    k_vote: int=5,
                                                    save_name: str='submission_v8.csv',
                                                    predict_half: bool=True,
                                                    det_device=0,
                                                    tiny_filter:int=5,
                                                    exemplars_path:str='artifacts/exemplars_small.npy',
                                                    exemplars_unicodes_path:str='artifacts/exemplars_small_unicodes.json',
                                                    exemplars2_path:str='artifacts/exemplars_tiny.npy',
                                                    exemplars2_unicodes_path:str='artifacts/exemplars_tiny_unicodes.json',
                                                    backbone_secondary: str='convnext_tiny'):
    os.environ['PYTORCH_CUDA_ALLOC_CONF'] = os.environ.get('PYTORCH_CUDA_ALLOC_CONF','expandable_segments:True')
    if torch.cuda.is_available():
        torch.cuda.empty_cache(); gc.collect()
    ss = pd.read_csv('sample_submission.csv')
    image_ids = ss['image_id'].tolist()
    img_paths = []
    for img_id in image_ids:
        p = Path('test_images')/f'{img_id}.jpg'
        if not p.exists():
            alts = list(Path('test_images').glob(f'{img_id}.*'))
            if alts: p = alts[0]
        img_paths.append(str(p))
    print('Loading detector:', det_weights, flush=True)
    det = YOLO(det_weights)
    # Exemplar banks
    ex1 = np.fromfile(exemplars_path, dtype=np.float32)
    metas1: List[str] = json.loads(Path(exemplars_unicodes_path).read_text())
    d1 = ex1.size // max(1,len(metas1)); ex1 = ex1.reshape(-1, d1).astype(np.float32)
    faiss.normalize_L2(ex1); idx1 = faiss.IndexFlatIP(d1); idx1.add(ex1)
    ex2 = np.fromfile(exemplars2_path, dtype=np.float32)
    metas2: List[str] = json.loads(Path(exemplars2_unicodes_path).read_text())
    d2 = ex2.size // max(1,len(metas2)); ex2 = ex2.reshape(-1, d2).astype(np.float32)
    faiss.normalize_L2(ex2); idx2 = faiss.IndexFlatIP(d2); idx2.add(ex2)
    print('Banks:', ex1.shape, ex2.shape, 'k=', k_vote, flush=True)
    # Backbones CPU
    model1, mean1, std1 = _build_backbone_cpu(backbone_primary)
    model2, mean2, std2 = _build_backbone_cpu(backbone_secondary)
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    # 1) Run detection at multiple scales and collect per-image boxes
    def detect_at_size(s:int):
        return det.predict(source=img_paths, imgsz=s, conf=conf, iou=iou, max_det=max_det, augment=True, device=det_device, stream=True, verbose=False, batch=1, half=predict_half)
    print('Running detection size', sizes[0], '...', flush=True)
    res_a = list(detect_at_size(int(sizes[0])))
    print('Running detection size', sizes[1], '...', flush=True)
    res_b = list(detect_at_size(int(sizes[1])))
    # 2) Union per image via class-agnostic NMS
    rows = []; t0 = time.time()
    for i, (img_id, img_path, ra, rb) in enumerate(zip(image_ids, img_paths, res_a, res_b)):
        if i % 25 == 0:
            print(f'Union {i}/{len(image_ids)} images, elapsed {time.time()-t0:.1f}s', flush=True)
        labels_out = []
        with Image.open(img_path) as im:
            W, H = im.size
            all_xyxy = []; all_scores = []
            for r in (ra, rb):
                if r is not None and hasattr(r,'boxes') and r.boxes is not None and len(r.boxes)>0:
                    b = r.boxes
                    xyxy = b.xyxy.cpu().numpy();
                    confs = b.conf.cpu().numpy() if getattr(b, 'conf', None) is not None else np.ones((xyxy.shape[0],), dtype=np.float32)
                    if xyxy.size:
                        all_xyxy.append(xyxy); all_scores.append(confs)
            if all_xyxy:
                xyxy_u = np.concatenate(all_xyxy, 0).astype(np.float32)
                scores_u = np.concatenate(all_scores, 0).astype(np.float32)
                keep = _nms_iou_xyxy(xyxy_u, scores_u, iou_thr=union_iou, limit=max_det)
                xyxy_u = xyxy_u[keep]; scores_u = scores_u[keep]
                # 3) Classify merged boxes with exemplar fusion
                batch1 = []; centers = []; ws = []; hs = []
                for (x1,y1,x2,y2) in xyxy_u:
                    x = int(round(x1)); y = int(round(y1)); w = int(round(x2-x1)); h = int(round(y2-y1))
                    if w < tiny_filter or h < tiny_filter:
                        continue
                    arr = _crop_pad_resize(im, x,y,w,h, pad_frac=pad_frac, out_size=crop_size)
                    batch1.append(arr)
                    cx = int(round((x1+x2)/2.0)); cy = int(round((y1+y2)/2.0))
                    cx = max(0, min(cx, W-1)); cy = max(0, min(cy, H-1))
                    centers.append((cx,cy)); ws.append(w); hs.append(h)
                if batch1:
                    q1 = _embed_batch_cpu(model1, np.stack(batch1,0), mean1, std1)
                    q2 = _embed_batch_cpu(model2, np.stack(batch1,0), mean2, std2)
                    # vote and fuse by max
                    votes1 = _search_k_per_unicode(idx1, q1, metas1, k=k_vote)
                    votes2 = _search_k_per_unicode(idx2, q2, metas2, k=k_vote)
                    preds = []
                    for j in range(len(centers)):
                        v1 = votes1[j]; v2 = votes2[j]
                        cand_us = set(v1.keys()) | set(v2.keys())
                        best_u=None; best_s=-1.0
                        for u in cand_us:
                            s = max(v1.get(u,0.0), v2.get(u,0.0))
                            if s > best_s:
                                best_s = s; best_u = u
                        if best_u is not None and best_s >= float(min_cosine):
                            (cx,cy) = centers[j]
                            preds.append((best_u, cx, cy, float(best_s), ws[j], hs[j]))
                    if preds:
                        kept = {}
                        for u, cx, cy, sc, w, h in preds:
                            cell_size = max(7, int(max(1, min(w,h))//4))
                            gx = cx // cell_size; gy = cy // cell_size
                            key = (u, gx, gy)
                            if key not in kept or sc > kept[key][3]:
                                kept[key] = (u, cx, cy, sc)
                        for u, cx, cy, sc in kept.values():
                            labels_out.extend([u, str(cx), str(cy)])
        rows.append({'image_id': img_id, 'labels': ' '.join(labels_out)})
        del ra, rb
    sub = pd.DataFrame(rows)
    sub.to_csv(save_name, index=False)
    print('Saved', save_name, 'shape', sub.shape, flush=True)
    return sub

# Driver for v8
import shutil
print('Running v8 multi-scale union (896,1024) with fused exemplar voting (small+tiny), min_cosine=0.65 ...', flush=True)
sub8 = two_stage_build_submission_exemplars_multiscale(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                                      sizes=(896,1024),
                                                      conf=0.08,
                                                      iou=0.65,
                                                      max_det=4000,
                                                      union_iou=0.70,
                                                      pad_frac=0.15,
                                                      crop_size=224,
                                                      backbone_primary='convnext_small',
                                                      min_cosine=0.65,
                                                      k_vote=5,
                                                      save_name='submission_v8.csv',
                                                      predict_half=True,
                                                      det_device=0,
                                                      tiny_filter=5,
                                                      exemplars_path='artifacts/exemplars_small.npy',
                                                      exemplars_unicodes_path='artifacts/exemplars_small_unicodes.json',
                                                      exemplars2_path='artifacts/exemplars_tiny.npy',
                                                      exemplars2_unicodes_path='artifacts/exemplars_tiny_unicodes.json',
                                                      backbone_secondary='convnext_tiny')
print('Copying submission_v8.csv -> submission.csv ...', flush=True)
shutil.copy2('submission_v8.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

In [None]:
# v8b: Multi-scale union with stricter threshold (min_cosine=0.67) using existing banks
import shutil, pandas as pd
print('Running v8b multi-scale union (896,1024) with fused exemplar voting (small+tiny), min_cosine=0.67 ...', flush=True)
sub8b = two_stage_build_submission_exemplars_multiscale(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                                       sizes=(896,1024),
                                                       conf=0.08,
                                                       iou=0.65,
                                                       max_det=4000,
                                                       union_iou=0.70,
                                                       pad_frac=0.15,
                                                       crop_size=224,
                                                       backbone_primary='convnext_small',
                                                       min_cosine=0.67,
                                                       k_vote=5,
                                                       save_name='submission_v8b.csv',
                                                       predict_half=True,
                                                       det_device=0,
                                                       tiny_filter=5,
                                                       exemplars_path='artifacts/exemplars_small.npy',
                                                       exemplars_unicodes_path='artifacts/exemplars_small_unicodes.json',
                                                       exemplars2_path='artifacts/exemplars_tiny.npy',
                                                       exemplars2_unicodes_path='artifacts/exemplars_tiny_unicodes.json',
                                                       backbone_secondary='convnext_tiny')
print('Copying submission_v8b.csv -> submission.csv ...', flush=True)
shutil.copy2('submission_v8b.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

In [None]:
# v8c: 3-scale union (832,960,1024) + corrected gating (single-neighbor max) + dynamic per-image threshold
import time, json, gc, os, math, numpy as np, pandas as pd, torch, faiss, timm, cv2
from pathlib import Path
from typing import List, Tuple, Optional, Dict
from PIL import Image
from ultralytics import YOLO

# Reuse crop + CPU backbone/embedding from earlier cells
def _crop_pad_resize(img: Image.Image, x:int, y:int, w:int, h:int, pad_frac:float=0.15, out_size:int=224):
    W, H = img.size
    cx = x + w/2.0; cy = y + h/2.0
    pw = max(2, int(round(w * pad_frac))); ph = max(2, int(round(h * pad_frac)))
    x1 = max(0, int(round(cx - w/2 - pw))); y1 = max(0, int(round(cy - h/2 - ph)))
    x2 = min(W, int(round(cx + w/2 + pw))); y2 = min(H, int(round(cy + h/2 + ph)))
    crop = img.crop((x1, y1, x2, y2)).convert('L')
    arr = np.array(crop)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    arr = clahe.apply(arr)
    h0, w0 = arr.shape[:2]; m = max(h0, w0)
    pad_top = (m - h0) // 2; pad_bottom = m - h0 - pad_top
    pad_left = (m - w0) // 2; pad_right = m - w0 - pad_left
    arr = cv2.copyMakeBorder(arr, pad_top, pad_bottom, pad_left, pad_right, borderType=cv2.BORDER_CONSTANT, value=0)
    arr = cv2.resize(arr, (out_size, out_size), interpolation=cv2.INTER_AREA)
    arr = np.stack([arr, arr, arr], axis=0).astype(np.float32) / 255.0
    return arr

@torch.no_grad()
def _build_backbone_cpu(model_name:str):
    model = timm.create_model(model_name, pretrained=True, num_classes=0, global_pool='avg')
    model.eval().to('cpu')
    data_cfg = timm.data.resolve_model_data_config(model)
    mean = torch.tensor(data_cfg.get('mean', (0.485,0.456,0.406)), dtype=torch.float32, device='cpu').view(1,3,1,1)
    std  = torch.tensor(data_cfg.get('std',  (0.229,0.224,0.225)), dtype=torch.float32, device='cpu').view(1,3,1,1)
    _ = model((torch.zeros(1,3,224,224)-mean)/std)
    return model, mean, std

@torch.no_grad()
def _embed_batch_cpu(model, batch_np, mean, std):
    t = torch.from_numpy(batch_np).to('cpu')
    t = (t - mean) / std
    f = model(t)
    f = torch.nn.functional.normalize(f, p=2, dim=1)
    return f.cpu().numpy().astype(np.float32)

def _nms_iou_xyxy(boxes: np.ndarray, scores: np.ndarray, iou_thr: float=0.75, limit: Optional[int]=None) -> List[int]:
    if boxes.size == 0:
        return []
    x1 = boxes[:,0]; y1 = boxes[:,1]; x2 = boxes[:,2]; y2 = boxes[:,3]
    areas = (np.maximum(0, x2 - x1 + 1) * np.maximum(0, y2 - y1 + 1))
    order = scores.argsort()[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(int(i))
        if limit is not None and len(keep) >= limit:
            break
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter + 1e-9)
        inds = np.where(ovr <= iou_thr)[0]
        order = order[inds + 1]
    return keep

def _search_k_sum_and_max(index: faiss.IndexFlatIP, queries: np.ndarray, exemplar_unicodes: List[str], k:int=5):
    # returns two lists, each element is dict: unicode -> sum/max of top-k cosines for that unicode
    D, I = index.search(queries, k)
    sum_list = []; max_list = []
    for drow, irow in zip(D, I):
        acc_sum: Dict[str, float] = {}; acc_max: Dict[str, float] = {}
        for sim, idx in zip(drow.tolist(), irow.tolist()):
            if idx == -1: continue
            u = exemplar_unicodes[idx]
            acc_sum[u] = acc_sum.get(u, 0.0) + float(sim)
            acc_max[u] = max(acc_max.get(u, -1e9), float(sim))
        sum_list.append(acc_sum); max_list.append(acc_max)
    return sum_list, max_list

def two_stage_build_submission_exemplars_multiscale_v8c(det_weights: str,
                                                        sizes: Tuple[int,int,int]=(832,960,1024),
                                                        conf: float=0.08,
                                                        iou: float=0.65,
                                                        max_det: int=4000,
                                                        union_iou: float=0.75,
                                                        pad_frac: float=0.15,
                                                        crop_size: int=224,
                                                        backbone_primary: str='convnext_small',
                                                        min_cosine_floor: float=0.65,
                                                        k_vote: int=5,
                                                        save_name: str='submission_v8c.csv',
                                                        predict_half: bool=True,
                                                        det_device=0,
                                                        tiny_filter:int=5,
                                                        exemplars_path:str='artifacts/exemplars_small.npy',
                                                        exemplars_unicodes_path:str='artifacts/exemplars_small_unicodes.json',
                                                        exemplars2_path:str='artifacts/exemplars_tiny.npy',
                                                        exemplars2_unicodes_path:str='artifacts/exemplars_tiny_unicodes.json',
                                                        backbone_secondary: str='convnext_tiny',
                                                        use_dynamic_thresh: bool=True):
    os.environ['PYTORCH_CUDA_ALLOC_CONF'] = os.environ.get('PYTORCH_CUDA_ALLOC_CONF','expandable_segments:True')
    if torch.cuda.is_available():
        torch.cuda.empty_cache(); gc.collect()
    ss = pd.read_csv('sample_submission.csv')
    image_ids = ss['image_id'].tolist()
    img_paths = []
    for img_id in image_ids:
        p = Path('test_images')/f'{img_id}.jpg'
        if not p.exists():
            alts = list(Path('test_images').glob(f'{img_id}.*'))
            if alts: p = alts[0]
        img_paths.append(str(p))
    print('Loading detector:', det_weights, flush=True)
    det = YOLO(det_weights)
    # Exemplar banks
    ex1 = np.fromfile(exemplars_path, dtype=np.float32)
    metas1: List[str] = json.loads(Path(exemplars_unicodes_path).read_text())
    d1 = ex1.size // max(1,len(metas1)); ex1 = ex1.reshape(-1, d1).astype(np.float32)
    faiss.normalize_L2(ex1); idx1 = faiss.IndexFlatIP(d1); idx1.add(ex1)
    ex2 = np.fromfile(exemplars2_path, dtype=np.float32)
    metas2: List[str] = json.loads(Path(exemplars2_unicodes_path).read_text())
    d2 = ex2.size // max(1,len(metas2)); ex2 = ex2.reshape(-1, d2).astype(np.float32)
    faiss.normalize_L2(ex2); idx2 = faiss.IndexFlatIP(d2); idx2.add(ex2)
    print('Banks:', ex1.shape, ex2.shape, 'k=', k_vote, flush=True)
    # Backbones CPU
    model1, mean1, std1 = _build_backbone_cpu(backbone_primary)
    model2, mean2, std2 = _build_backbone_cpu(backbone_secondary)
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    def detect_at_size(s:int):
        return det.predict(source=img_paths, imgsz=s, conf=conf, iou=iou, max_det=max_det, augment=True, device=det_device, stream=True, verbose=False, batch=1, half=predict_half)
    print('Running detection sizes', sizes, '...', flush=True)
    res_all = [list(detect_at_size(int(s))) for s in sizes]
    rows = []; t0 = time.time()
    for i, items in enumerate(zip(image_ids, img_paths, *res_all)):
        img_id = items[0]; img_path = items[1]; res_list = items[2:]
        if i % 25 == 0:
            print(f'Union {i}/{len(image_ids)} images, elapsed {time.time()-t0:.1f}s', flush=True)
        labels_out = []
        with Image.open(img_path) as im:
            W, H = im.size
            all_xyxy = []; all_scores = []
            for r in res_list:
                if r is not None and hasattr(r,'boxes') and r.boxes is not None and len(r.boxes)>0:
                    b = r.boxes
                    xyxy = b.xyxy.cpu().numpy();
                    confs = b.conf.cpu().numpy() if getattr(b, 'conf', None) is not None else np.ones((xyxy.shape[0],), dtype=np.float32)
                    if xyxy.size:
                        all_xyxy.append(xyxy); all_scores.append(confs)
            if all_xyxy:
                xyxy_u = np.concatenate(all_xyxy, 0).astype(np.float32)
                scores_u = np.concatenate(all_scores, 0).astype(np.float32)
                keep = _nms_iou_xyxy(xyxy_u, scores_u, iou_thr=union_iou, limit=max_det)
                xyxy_u = xyxy_u[keep]; scores_u = scores_u[keep]
                batch1 = []; centers = []; ws = []; hs = []
                for (x1,y1,x2,y2) in xyxy_u:
                    x = int(round(x1)); y = int(round(y1)); w = int(round(x2-x1)); h = int(round(y2-y1))
                    if w < tiny_filter or h < tiny_filter:
                        continue
                    arr = _crop_pad_resize(im, x,y,w,h, pad_frac=pad_frac, out_size=crop_size)
                    batch1.append(arr)
                    cx = int(round((x1+x2)/2.0)); cy = int(round((y1+y2)/2.0))
                    cx = max(0, min(cx, W-1)); cy = max(0, min(cy, H-1))
                    centers.append((cx,cy)); ws.append(w); hs.append(h)
                preds_all = []  # store (u, cx, cy, s_gate, best_sum, second_sum, w, h)
                if batch1:
                    q1 = _embed_batch_cpu(model1, np.stack(batch1,0), mean1, std1)
                    q2 = _embed_batch_cpu(model2, np.stack(batch1,0), mean2, std2)
                    v1_sum, v1_max = _search_k_sum_and_max(idx1, q1, metas1, k=k_vote)
                    v2_sum, v2_max = _search_k_sum_and_max(idx2, q2, metas2, k=k_vote)
                    for j in range(len(centers)):
                        sdict1 = v1_sum[j]; mdict1 = v1_max[j]
                        sdict2 = v2_sum[j]; mdict2 = v2_max[j]
                        cand_us = set(sdict1.keys()) | set(sdict2.keys())
                        # choose by fused sum (0.6 small + 0.4 tiny)
                        fused_scores = []
                        for u in cand_us:
                            fs = 0.6*sdict1.get(u,0.0) + 0.4*sdict2.get(u,0.0)
                            fused_scores.append((u, fs))
                        if not fused_scores:
                            continue
                        fused_scores.sort(key=lambda t: t[1], reverse=True)
                        best_u, best_sum = fused_scores[0]
                        second_sum = fused_scores[1][1] if len(fused_scores) > 1 else -1.0
                        s_gate = max(mdict1.get(best_u, -1.0), mdict2.get(best_u, -1.0))
                        (cx,cy) = centers[j]
                        preds_all.append((best_u, cx, cy, float(s_gate), float(best_sum), float(second_sum), ws[j], hs[j]))
                # dynamic per-image threshold from s_gate list
                if preds_all:
                    s_list = np.array([p[3] for p in preds_all], dtype=np.float32)
                    if use_dynamic_thresh and s_list.size > 0:
                        t_img = float(np.percentile(s_list, 85))
                        t_img = float(np.clip(t_img, 0.63, 0.72))
                        thr = max(t_img, float(min_cosine_floor))
                    else:
                        thr = float(min_cosine_floor)
                    kept = {}  # (u,gx,gy) -> (u,cx,cy,s_gate)
                    for (u, cx, cy, s_gate, best_sum, second_sum, w, h) in preds_all:
                        local_thr = thr
                        if (best_sum - second_sum) < 0.02:
                            local_thr = thr + 0.02
                        if s_gate >= local_thr:
                            cell_size = max(7, int(max(1, min(w,h))//4))
                            gx = cx // cell_size; gy = cy // cell_size
                            key = (u, gx, gy)
                            if key not in kept or s_gate > kept[key][3]:
                                kept[key] = (u, cx, cy, s_gate)
                    for u, cx, cy, s_gate in kept.values():
                        labels_out.extend([u, str(cx), str(cy)])
        rows.append({'image_id': img_id, 'labels': ' '.join(labels_out)})
    sub = pd.DataFrame(rows)
    sub.to_csv(save_name, index=False)
    print('Saved', save_name, 'shape', sub.shape, flush=True)
    return sub

# Driver v8c
import shutil
print('Running v8c: 3-scale union (832,960,1024), union_iou=0.75, fused-sum choice (0.6/0.4), gate by single-neighbor max with dynamic per-image threshold ...', flush=True)
sub8c = two_stage_build_submission_exemplars_multiscale_v8c(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                                           sizes=(832,960,1024),
                                                           conf=0.08,
                                                           iou=0.65,
                                                           max_det=4000,
                                                           union_iou=0.75,
                                                           pad_frac=0.15,
                                                           crop_size=224,
                                                           backbone_primary='convnext_small',
                                                           min_cosine_floor=0.65,
                                                           k_vote=5,
                                                           save_name='submission_v8c.csv',
                                                           predict_half=True,
                                                           det_device=0,
                                                           tiny_filter=5,
                                                           exemplars_path='artifacts/exemplars_small.npy',
                                                           exemplars_unicodes_path='artifacts/exemplars_small_unicodes.json',
                                                           exemplars2_path='artifacts/exemplars_tiny.npy',
                                                           exemplars2_unicodes_path='artifacts/exemplars_tiny_unicodes.json',
                                                           backbone_secondary='convnext_tiny',
                                                           use_dynamic_thresh=True)
print('Copying submission_v8c.csv -> submission.csv ...', flush=True)
shutil.copy2('submission_v8c.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

In [3]:
# v9: Supervised crop classifier (convnext_small) training + inference hooks
import os, math, time, json, gc, random
from pathlib import Path
from typing import List, Tuple, Dict
import numpy as np
import pandas as pd
from PIL import Image
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
import torchvision.transforms as T
import timm

# Globals from earlier cells: boxes_df, split_df, train_dir, id2u, u2id
u2id = json.loads(Path('artifacts/u2id.json').read_text())
id2u = json.loads(Path('artifacts/id2u.json').read_text())
num_classes = len(id2u)

def clamp_int(v, lo, hi):
    return int(max(lo, min(hi, v)))

class CropDataset(Dataset):
    def __init__(self, rows: List[Tuple[str,str,int,int,int,int]], img_dir='train_images', train=True):
        self.rows = rows
        self.img_dir = Path(img_dir)
        self.train = train
        # Augs as per expert advice (light, glyph-safe)
        self.jitter = T.ColorJitter(0.2,0.2,0.2) if train else nn.Identity()
        self.affine = T.RandomAffine(degrees=5, shear=5) if train else nn.Identity()
        self.blur = T.RandomApply([T.GaussianBlur(kernel_size=3, sigma=(0.1, 1.0))], p=0.15) if train else nn.Identity()
        self.erase = T.RandomErasing(p=0.1) if train else nn.Identity()
        self.to_tensor = T.ToTensor()
        self.norm_mean = torch.tensor([0.485,0.456,0.406]).view(3,1,1)
        self.norm_std = torch.tensor([0.229,0.224,0.225]).view(3,1,1)

    def __len__(self):
        return len(self.rows)

    def __getitem__(self, idx):
        iid, u, x, y, w, h = self.rows[idx]
        p = self.img_dir/f"{iid}.jpg"
        if not p.exists():
            alts = list(self.img_dir.glob(f"{iid}.*"))
            if alts:
                p = alts[0]
        with Image.open(p) as im:
            W,H = im.size
            # pad_frac in [0.25,0.35] during train; fixed 0.30 during val
            pad_frac = random.uniform(0.25,0.35) if self.train else 0.30
            cx = x + w/2.0; cy = y + h/2.0
            pw = max(2, int(round(w*pad_frac))); ph = max(2, int(round(h*pad_frac)))
            x1 = clamp_int(round(cx - w/2 - pw), 0, W-1);
            y1 = clamp_int(round(cy - h/2 - ph), 0, H-1);
            x2 = clamp_int(round(cx + w/2 + pw), 0, W-1);
            y2 = clamp_int(round(cy + h/2 + ph), 0, H-1);
            crop = im.crop((x1,y1,x2,y2)).convert('L')
            # square pad to 224
            arr = np.array(crop)
            h0,w0 = arr.shape[:2]; m = max(h0,w0)
            pad_top = (m - h0)//2; pad_bottom = m - h0 - pad_top
            pad_left = (m - w0)//2; pad_right = m - w0 - pad_left
            arr = np.pad(arr, ((pad_top,pad_bottom),(pad_left,pad_right)), mode='constant', constant_values=0)
            crop = Image.fromarray(arr).resize((224,224), resample=Image.BILINEAR)
            # grayscale->3ch
            img3 = Image.merge('RGB', (crop,crop,crop))
        # torchvision augs
        if self.train:
            img3 = self.jitter(img3)
            img3 = self.affine(img3)
            img3 = self.blur(img3)
        t = self.to_tensor(img3)  # [0,1]
        # RandomErasing requires Tensor
        if self.train:
            t = self.erase(t)
        t = (t - self.norm_mean) / self.norm_std
        y_lbl = u2id.get(u, -1)
        return t, y_lbl

def make_train_val_rows():
    split_map = dict(zip(split_df['image_id'], split_df['is_val']))
    rows_tr = []; rows_va = []
    for r in boxes_df.itertuples(index=False):
        tup = (r.image_id, r.unicode, int(r.x), int(r.y), int(r.w), int(r.h))
        if split_map.get(r.image_id, False):
            rows_va.append(tup)
        else:
            rows_tr.append(tup)
    return rows_tr, rows_va

def build_samplers(rows_tr: List[Tuple[str,str,int,int,int,int]], num_samples_per_epoch=160_000):
    # compute class frequencies
    from collections import Counter
    freq = Counter([u for (_,u,_,_,_,_) in rows_tr])
    # weight per sample = 1/sqrt(freq[label])
    weights = []
    for (_,u,_,_,_,_) in rows_tr:
        f = max(1, freq[u])
        weights.append(1.0/math.sqrt(f))
    weights = torch.tensor(weights, dtype=torch.float32)
    sampler = WeightedRandomSampler(weights=weights, num_samples=num_samples_per_epoch, replacement=True)
    return sampler

def create_model_convnext_small(nc: int):
    model = timm.create_model('convnext_small', pretrained=True, num_classes=nc)
    return model

def train_classifier(epochs_head=2, epochs_full=8, batch_size=256, num_workers=8, lr_head=3e-3, lr_all=5e-4, wd=0.05, label_smoothing=0.1, steps_warmup=200, samples_per_epoch=160_000, out_dir='artifacts/cls_convnext_small'):
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    rows_tr, rows_va = make_train_val_rows()
    ds_tr = CropDataset(rows_tr, train=True)
    ds_va = CropDataset(rows_va, train=False)
    sampler = build_samplers(rows_tr, num_samples_per_epoch=samples_per_epoch)
    dl_tr = DataLoader(ds_tr, batch_size=batch_size, sampler=sampler, num_workers=num_workers, pin_memory=True, persistent_workers=(num_workers>0))
    dl_va = DataLoader(ds_va, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True, persistent_workers=(num_workers>0))
    model = create_model_convnext_small(num_classes).to(device)
    # EMA
    ema = timm.utils.ModelEmaV2(model, decay=0.999, device=device)
    # Phase 1: strict freeze backbone, train head only (ConvNeXt: model.head.fc is final Linear)
    for p in model.parameters():
        p.requires_grad = False
    assert hasattr(model, 'head') and hasattr(model.head, 'fc'), 'Expected convnext_small with model.head.fc'
    for p in model.head.fc.parameters():
        p.requires_grad = True
    head_params = [p for p in model.parameters() if p.requires_grad]
    opt = torch.optim.AdamW(head_params, lr=lr_head, weight_decay=wd)
    scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))
    loss_fn = nn.CrossEntropyLoss(label_smoothing=label_smoothing)
    best_acc = 0.0
    Path(out_dir).mkdir(parents=True, exist_ok=True)

    def run_val():
        model_to_eval = ema.module if ema is not None else model
        model_to_eval.eval()
        correct=0; total=0
        with torch.no_grad():
            for xb, yb in dl_va:
                xb = xb.to(device, non_blocking=True); yb = yb.to(device, non_blocking=True)
                with torch.cuda.amp.autocast(enabled=(device=='cuda')):
                    logits = model_to_eval(xb)
                pred = logits.argmax(1)
                correct += (pred==yb).sum().item()
                total += yb.numel()
        acc = correct/max(1,total)
        return acc

    step=0
    print('Phase 1: training head for', epochs_head, 'epochs', flush=True)
    for ep in range(epochs_head):
        model.train()
        t0=time.time()
        for it,(xb,yb) in enumerate(dl_tr):
            xb = xb.to(device, non_blocking=True); yb = yb.to(device, non_blocking=True)
            with torch.cuda.amp.autocast(enabled=(device=='cuda')):
                logits = model(xb)
                loss = loss_fn(logits, yb)
            opt.zero_grad(set_to_none=True)
            scaler.scale(loss).backward()
            # grad clipping
            scaler.unscale_(opt)
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            scaler.step(opt)
            scaler.update()
            ema.update(model)
            step+=1
            if it%200==0:
                print(f'Head ep{ep} it{it} loss={loss.item():.4f}', flush=True)
        acc = run_val()
        print(f'Head epoch {ep} val_acc={acc:.4f} elapsed={time.time()-t0:.1f}s', flush=True)
        if acc>best_acc:
            best_acc=acc
            torch.save({'model': ema.module.state_dict(), 'acc': acc, 'ep': ep}, Path(out_dir)/'best_head.pt')

    # Phase 2: unfreeze all
    for p in model.parameters():
        p.requires_grad = True
    opt = torch.optim.AdamW(model.parameters(), lr=lr_all, weight_decay=wd)
    # cosine decay to 1e-5
    total_steps = epochs_full * math.ceil(samples_per_epoch/batch_size)
    sched = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=total_steps, eta_min=1e-5)
    print('Phase 2: training full model for', epochs_full, 'epochs', flush=True)
    warmup_steps = int(steps_warmup)
    for ep in range(epochs_full):
        model.train()
        t0=time.time()
        iters = math.ceil(samples_per_epoch/batch_size)
        it = 0
        for xb,yb in dl_tr:
            xb = xb.to(device, non_blocking=True); yb = yb.to(device, non_blocking=True)
            # warmup lr
            if step <= warmup_steps:
                lr_now = 1e-6 + (lr_all - 1e-6) * (step / max(1,warmup_steps))
                for pg in opt.param_groups: pg['lr'] = lr_now
            with torch.cuda.amp.autocast(enabled=(device=='cuda')):
                logits = model(xb)
                loss = loss_fn(logits, yb)
            opt.zero_grad(set_to_none=True)
            scaler.scale(loss).backward()
            scaler.unscale_(opt)
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            scaler.step(opt)
            scaler.update()
            ema.update(model)
            if step > warmup_steps:
                sched.step()
            step+=1
            it+=1
            if it>=iters:
                break
            if it%200==0:
                lr_disp = opt.param_groups[0]['lr']
                print(f'Full ep{ep} it{it}/{iters} loss={loss.item():.4f} lr={lr_disp:.2e}', flush=True)
        acc = run_val()
        print(f'Full epoch {ep} val_acc={acc:.4f} elapsed={time.time()-t0:.1f}s', flush=True)
        if acc>best_acc:
            best_acc=acc
            torch.save({'model': ema.module.state_dict(), 'acc': acc, 'ep': (ep)}, Path(out_dir)/'best.pt')
    # save final
    torch.save({'model': ema.module.state_dict(), 'acc': best_acc}, Path(out_dir)/'last.pt')
    with open(Path(out_dir)/'id2u.json','w') as f:
        json.dump(id2u, f, ensure_ascii=False)
    print('Training done. Best val_acc=', best_acc, 'artifacts in', out_dir, flush=True)
    return str(Path(out_dir)/'best.pt')

print('v9 classifier training utilities ready:')
print('- Run: best_ckpt = train_classifier(epochs_head=2, epochs_full=8, batch_size=256)')
print('- Then integrate into detection pipeline: crop -> model logits -> softmax; gate by prob>=0.45 and dedup as before.')

v9 classifier training utilities ready:
- Run: best_ckpt = train_classifier(epochs_head=2, epochs_full=8, batch_size=256)
- Then integrate into detection pipeline: crop -> model logits -> softmax; gate by prob>=0.45 and dedup as before.


  from .autonotebook import tqdm as notebook_tqdm


In [5]:
# v9 driver: launch supervised crop classifier training with auto batch fallback
import torch, time
print('Starting supervised classifier training (convnext_small) ...', flush=True)
# Enable TF32 for speed/stability per expert advice
torch.backends.cuda.matmul.allow_tf32 = True
torch.set_float32_matmul_precision('high')
torch.backends.cudnn.benchmark = True
# Start with safer batches to avoid wasting time on OOM at large sizes
bs_list = [128, 96, 64]
best_ckpt = None
ooms = []
for bs in bs_list:
    try:
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        print(f'Trying batch_size={bs} ...', flush=True)
        best_ckpt = train_classifier(epochs_head=2, epochs_full=8, batch_size=bs, num_workers=8, samples_per_epoch=160_000, out_dir='artifacts/cls_convnext_small')
        print('Training finished with batch_size=', bs, '->', best_ckpt, flush=True)
        break
    except RuntimeError as e:
        msg = str(e)
        if 'CUDA out of memory' in msg or 'CUDAMemoryError' in msg:
            print(f'OOM at batch_size={bs}. Reducing batch...', flush=True)
            ooms.append((bs, msg[:200]))
            continue
        else:
            raise
if best_ckpt is None:
    raise RuntimeError(f'All batch sizes failed. OOMs: {ooms[:3]} ...')
print('Best checkpoint:', best_ckpt, flush=True)

Starting supervised classifier training (convnext_small) ...


Trying batch_size=128 ...


Phase 1: training head for 2 epochs


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


  with torch.cuda.amp.autocast(enabled=(device=='cuda')):


Head ep0 it0 loss=8.5947


Head ep0 it200 loss=8.3531


Head ep0 it400 loss=9.0206


Head ep0 it600 loss=8.5483


Head ep0 it800 loss=7.4343


Head ep0 it1000 loss=7.9846


Head ep0 it1200 loss=7.2566


  with torch.cuda.amp.autocast(enabled=(device=='cuda')):


Head epoch 0 val_acc=0.4116 elapsed=614.9s


Head ep1 it0 loss=8.1001


Head ep1 it200 loss=7.3557


Head ep1 it400 loss=7.6340


Head ep1 it600 loss=7.8747


Head ep1 it800 loss=7.3639


Head ep1 it1000 loss=7.5143


Head ep1 it1200 loss=7.7730


Head epoch 1 val_acc=0.4702 elapsed=610.7s


Phase 2: training full model for 8 epochs


  with torch.cuda.amp.autocast(enabled=(device=='cuda')):




Full ep0 it200/1250 loss=7.3283 lr=5.00e-04


Full ep0 it400/1250 loss=6.8937 lr=4.98e-04


Full ep0 it600/1250 loss=4.6939 lr=4.96e-04


Full ep0 it800/1250 loss=3.4583 lr=4.92e-04


Full ep0 it1000/1250 loss=2.7400 lr=4.88e-04


Full ep0 it1200/1250 loss=2.5390 lr=4.83e-04


Full epoch 0 val_acc=0.3902 elapsed=661.6s


Full ep1 it200/1250 loss=2.2542 lr=4.75e-04


Full ep1 it400/1250 loss=2.0767 lr=4.68e-04


Full ep1 it600/1250 loss=1.8703 lr=4.60e-04


Full ep1 it800/1250 loss=1.7180 lr=4.51e-04


Full ep1 it1000/1250 loss=1.6815 lr=4.41e-04


Full ep1 it1200/1250 loss=1.7369 lr=4.31e-04


Full epoch 1 val_acc=0.9284 elapsed=651.0s


KeyboardInterrupt: 

In [6]:
# v9 inference: YOLO single-class detector + supervised convnext_small classifier
import os, time, json, math, gc
from pathlib import Path
from typing import List, Tuple, Dict, Optional
import numpy as np
import pandas as pd
from PIL import Image
import torch
from torch import nn
import torchvision.transforms as T
import timm
from ultralytics import YOLO

@torch.no_grad()
def load_cls_model(ckpt_path='artifacts/cls_convnext_small/best.pt', nc=None, device='cuda'):
    d = torch.load(ckpt_path, map_location='cpu')
    if nc is None:
        id2u_local = json.loads(Path('artifacts/id2u.json').read_text())
        nc = len(id2u_local)
    model = timm.create_model('convnext_small', pretrained=False, num_classes=nc)
    model.load_state_dict(d['model'], strict=True)
    model.eval().to(device)
    return model

def preprocess_crop_for_cls(im: Image.Image, x:int, y:int, w:int, h:int, pad_frac:float=0.30, out_size:int=224):
    W,H = im.size
    cx = x + w/2.0; cy = y + h/2.0
    pw = max(2, int(round(w*pad_frac))); ph = max(2, int(round(h*pad_frac)))
    x1 = max(0, int(round(cx - w/2 - pw))); y1 = max(0, int(round(cy - h/2 - ph)))
    x2 = min(W-1, int(round(cx + w/2 + pw))); y2 = min(H-1, int(round(cy + h/2 + ph)))
    crop = im.crop((x1, y1, x2, y2)).convert('L')
    arr = np.array(crop)
    h0,w0 = arr.shape[:2]; m = max(h0,w0)
    pad_top = (m-h0)//2; pad_bottom = m-h0-pad_top
    pad_left = (m-w0)//2; pad_right = m-w0-pad_left
    arr = np.pad(arr, ((pad_top,pad_bottom),(pad_left,pad_right)), mode='constant', constant_values=0)
    crop = Image.fromarray(arr).resize((out_size,out_size), resample=Image.BILINEAR)
    img3 = Image.merge('RGB', (crop,crop,crop))
    return img3

class ImgNetNorm(nn.Module):
    def __init__(self):
        super().__init__()
        mean = torch.tensor([0.485,0.456,0.406]).view(1,3,1,1)
        std = torch.tensor([0.229,0.224,0.225]).view(1,3,1,1)
        self.register_buffer('mean', mean, persistent=False)
        self.register_buffer('std', std, persistent=False)
    def forward(self, x):
        return (x - self.mean) / self.std

@torch.no_grad()
def classify_crops_batch(model, crops: List[Image.Image], device='cuda', batch_size=512):
    tfm = T.ToTensor()
    norm = ImgNetNorm().to(device)
    probs_all = []
    for i in range(0, len(crops), batch_size):
        batch_imgs = crops[i:i+batch_size]
        if not batch_imgs:
            continue
        t = torch.stack([tfm(img) for img in batch_imgs], 0).to(device, non_blocking=True)
        t = norm(t)
        with torch.cuda.amp.autocast(enabled=(device=='cuda')):
            logits = model(t)
            probs = torch.softmax(logits, dim=1)
        probs_all.append(probs.detach().cpu())
    if probs_all:
        return torch.cat(probs_all, 0).numpy()
    return np.zeros((0,1), dtype=np.float32)

def _nms_iou_xyxy(boxes: np.ndarray, scores: np.ndarray, iou_thr: float=0.75, limit: Optional[int]=None) -> List[int]:
    if boxes.size == 0:
        return []
    x1 = boxes[:,0]; y1 = boxes[:,1]; x2 = boxes[:,2]; y2 = boxes[:,3]
    areas = (np.maximum(0, x2 - x1 + 1) * np.maximum(0, y2 - y1 + 1))
    order = scores.argsort()[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(int(i))
        if limit is not None and len(keep) >= limit:
            break
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter + 1e-9)
        inds = np.where(ovr <= iou_thr)[0]
        order = order[inds + 1]
    return keep

@torch.no_grad()
def two_stage_supervised_submission(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                   cls_ckpt='artifacts/cls_convnext_small/best.pt',
                                   sizes=(832,960,1024),
                                   conf=0.08,
                                   iou=0.65,
                                   union_iou=0.75,
                                   max_det=4000,
                                   pad_frac=0.30,
                                   crop_size=224,
                                   prob_floor=0.45,
                                   dyn_pct=85,
                                   dyn_clamp=(0.46, 0.58),
                                   ambiguity_delta=0.02,
                                   tiny_filter=5,
                                   save_name='submission_v9.csv',
                                   device_cls='cuda',
                                   predict_half=True,
                                   det_device=0):
    os.environ['PYTORCH_CUDA_ALLOC_CONF'] = os.environ.get('PYTORCH_CUDA_ALLOC_CONF','expandable_segments:True')
    id2u_local = json.loads(Path('artifacts/id2u.json').read_text())
    model = load_cls_model(cls_ckpt, nc=len(id2u_local), device=device_cls)
    det = YOLO(det_weights)
    ss = pd.read_csv('sample_submission.csv')
    image_ids = ss['image_id'].tolist()
    img_paths = []
    for img_id in image_ids:
        p = Path('test_images')/f'{img_id}.jpg'
        if not p.exists():
            alts = list(Path('test_images').glob(f'{img_id}.*'))
            if alts: p = alts[0]
        img_paths.append(str(p))
    # run detection at multiple sizes
    def detect_at_size(s:int):
        return det.predict(source=img_paths, imgsz=int(s), conf=float(conf), iou=float(iou), max_det=int(max_det), augment=True, device=det_device, stream=True, verbose=False, batch=1, half=predict_half)
    print('Running detector at sizes:', sizes, flush=True)
    res_all = [list(detect_at_size(s)) for s in sizes]
    rows = []; t0=time.time()
    for i, items in enumerate(zip(image_ids, img_paths, *res_all)):
        img_id = items[0]; img_path = items[1]; rlist = items[2:]
        if i % 25 == 0:
            print(f'Infer {i}/{len(image_ids)} elapsed {time.time()-t0:.1f}s', flush=True)
        labels_out = []
        with Image.open(img_path) as im:
            W,H = im.size
            # union NMS
            all_xyxy = []; all_scores = []
            for r in rlist:
                if r is not None and hasattr(r,'boxes') and r.boxes is not None and len(r.boxes)>0:
                    b = r.boxes
                    xyxy = b.xyxy.cpu().numpy()
                    confs = b.conf.cpu().numpy() if getattr(b,'conf',None) is not None else np.ones((xyxy.shape[0],), dtype=np.float32)
                    if xyxy.size:
                        all_xyxy.append(xyxy); all_scores.append(confs)
            if all_xyxy:
                xyxy_u = np.concatenate(all_xyxy, 0).astype(np.float32)
                scores_u = np.concatenate(all_scores, 0).astype(np.float32)
                keep = _nms_iou_xyxy(xyxy_u, scores_u, iou_thr=float(union_iou), limit=int(max_det))
                xyxy_u = xyxy_u[keep]; scores_u = scores_u[keep]
                # build crops
                crops = []; centers = []; ws = []; hs = []
                for (x1,y1,x2,y2) in xyxy_u:
                    x = int(round(x1)); y = int(round(y1)); w = int(round(x2-x1)); h = int(round(y2-y1))
                    if w < tiny_filter or h < tiny_filter:
                        continue
                    img3 = preprocess_crop_for_cls(im, x,y,w,h, pad_frac=pad_frac, out_size=crop_size)
                    crops.append(img3)
                    cx = int(round((x1+x2)/2.0)); cy = int(round((y1+y2)/2.0))
                    cx = max(0, min(cx, W-1)); cy = max(0, min(cy, H-1))
                    centers.append((cx,cy)); ws.append(w); hs.append(h)
                if crops:
                    probs = classify_crops_batch(model, crops, device=device_cls, batch_size=512)  # (N, C)
                    top2 = np.partition(-probs, 2, axis=1)[:, :2]  # negative for descending
                    top1_prob = -top2[:,0]
                    # dynamic threshold
                    if len(top1_prob) > 0:
                        t_img = float(np.percentile(top1_prob, dyn_pct))
                        t_img = float(np.clip(t_img, dyn_clamp[0], dyn_clamp[1]))
                        thr = max(prob_floor, t_img)
                    else:
                        thr = prob_floor
                    preds = []  # (u, cx, cy, p, w, h, gap)
                    for j in range(probs.shape[0]):
                        p = probs[j]
                        k1 = int(p.argmax()); v1 = float(p[k1])
                        # compute top2
                        k2 = int(np.argpartition(p, -2)[-2]) if p.shape[0] >= 2 else k1
                        v2 = float(p[k2]) if p.shape[0] >= 2 else 0.0
                        gap = v1 - v2
                        local_thr = thr + (ambiguity_delta if gap < 0.04 else 0.0)
                        if v1 >= local_thr:
                            u = id2u_local[str(k1)] if isinstance(id2u_local, dict) else id2u_local[k1]
                            cx,cy = centers[j]
                            preds.append((u, cx, cy, v1, ws[j], hs[j], gap))
                    # unicode-aware size/grid dedup
                    if preds:
                        kept = {}  # key: (u, gx, gy) -> best (by prob)
                        for (u, cx, cy, p1, w, h, gap) in preds:
                            cell_size = max(7, int(max(1, min(w,h))//4))
                            gx = cx // cell_size; gy = cy // cell_size
                            key = (u, gx, gy)
                            if key not in kept or p1 > kept[key][3]:
                                kept[key] = (u, cx, cy, p1)
                        for u, cx, cy, p1 in kept.values():
                            labels_out.extend([u, str(cx), str(cy)])
        rows.append({'image_id': img_id, 'labels': ' '.join(labels_out)})
    sub = pd.DataFrame(rows)
    sub.to_csv(save_name, index=False)
    print('Saved', save_name, 'shape', sub.shape, flush=True)
    return sub

print('v9 supervised inference utilities ready: two_stage_supervised_submission(...)')

v9 supervised inference utilities ready: two_stage_supervised_submission(...)


In [7]:
# Driver: v9 supervised two-stage inference to produce submission_v9.csv and submission.csv
import shutil, pandas as pd
print('Running v9 two-stage supervised inference ...', flush=True)
sub = two_stage_supervised_submission(det_weights='runs/yolo8n_kuz_single/weights/best.pt',
                                      cls_ckpt='artifacts/cls_convnext_small/best.pt',
                                      sizes=(832,960,1024),
                                      conf=0.08,
                                      iou=0.65,
                                      union_iou=0.75,
                                      max_det=4000,
                                      pad_frac=0.30,
                                      crop_size=224,
                                      prob_floor=0.45,
                                      dyn_pct=85,
                                      dyn_clamp=(0.46,0.58),
                                      ambiguity_delta=0.02,
                                      tiny_filter=5,
                                      save_name='submission_v9.csv',
                                      device_cls='cuda',
                                      predict_half=True,
                                      det_device=0)
print('Copying submission_v9.csv -> submission.csv for scoring ...', flush=True)
shutil.copy2('submission_v9.csv', 'submission.csv')
print(pd.read_csv('submission.csv').head())

Running v9 two-stage supervised inference ...


Running detector at sizes: (832, 960, 1024)


Infer 0/361 elapsed 0.0s


  with torch.cuda.amp.autocast(enabled=(device=='cuda')):


Infer 25/361 elapsed 101.1s


Infer 50/361 elapsed 193.4s


Infer 75/361 elapsed 277.5s


Infer 100/361 elapsed 363.0s


Infer 125/361 elapsed 466.2s


Infer 150/361 elapsed 534.6s


Infer 175/361 elapsed 612.6s


Infer 200/361 elapsed 666.8s


Infer 225/361 elapsed 745.8s


Infer 250/361 elapsed 804.4s


Infer 275/361 elapsed 861.1s


Infer 300/361 elapsed 893.7s


Infer 325/361 elapsed 945.6s


Infer 350/361 elapsed 999.1s


Saved submission_v9.csv shape (361, 2)


Copying submission_v9.csv -> submission.csv for scoring ...


            image_id                                             labels
0        umgy007-028  U+69D8 910 1691 U+904A 528 2362 U+4F4D 715 214...
1        hnsd004-026  U+6253 687 2506 U+79CB 892 1301 U+679C 309 475...
2  200003076_00034_2  U+91D1 1937 1577 U+6E05 1929 2335 U+6210 1943 ...
3        brsk001-014  U+64AD 1990 816 U+25CB 955 1016 U+76E7 602 289...
4  200014685-00003_2  U+5AC1 936 2303 U+4E39 1182 1482 U+4ED6 702 19...
