# iMet 2020 FGVC7: Plan

Objectives:
- Get a strong baseline quickly; iterate to medal.
- Maintain rigorous CV and logging; avoid long blind runs.

Initial Baseline (Phase 1):
- Environment check: confirm GPU and correct torch stack.
- Data sanity check: train.csv, labels.csv, sample_submission.csv, image paths.
- CV: 5-fold Multilabel Stratified KFold (iterative stratification).
- Model: timm pretrained CNN (e.g., tf_efficientnet_b3_ns or nfnet_l0 if VRAM allows), multilabel BCEWithLogitsLoss.
- Image size 384→512 (start 384 for speed), AMP + gradient accumulation if needed.
- Augmentations: A.Resize->A.RandomResizedCrop(384), HFlip, ColorJitter(soft), Cutout optional; Normalize as timm pretrained.
- Optimizer: AdamW, cosine schedule with warmup. Early stopping on F1.
- Thresholding: per-fold global threshold via sweep maximizing micro-F1 on OOF; save per-fold thresholds.
- Artifacts: save OOF logits, test logits, model weights per fold. Log times/folds.

Improvements (Phase 2):
- Larger resolution (512), stronger aug (Mixup/Cutmix careful with multilabel), EMA, label-smoothing.
- Try different backbones (Swin-T/S, ConvNeXt-T, EfficientNetV2-S/B3).
- TTA (hflip + minor scale).
- Blend diverse seeds/backbones via logit averaging and re-threshold using OOF.

Validation Discipline:
- Single fold split saved and reused across runs.
- Avoid leakage: fit transforms inside folds only; no peeking.
- Track micro-F1 OOF; expect strong baseline ~0.60–0.63 at 384, improve to ≥0.65 with 512/backbone/ensemble.

Execution Plan:
1) Env check (GPU, torch install).
2) Data EDA: counts, classes, label freq, basic sanity.
3) Implement training pipeline script (train.py) with cfg and logging.
4) Smoke test on 1 fold, 1000 images, few epochs to verify.
5) Full 5-fold at 384. Save OOF/test logits.
6) Threshold sweep, create submission. Request expert review.
7) Iterate with improved backbone/resolution/ensembles until medal CV.

We will solicit expert advice at each major milestone.

In [2]:
import os, sys, subprocess, json, time, shutil, pandas as pd
from pathlib import Path

print('=== Environment Check ===', flush=True)
try:
    out = subprocess.run(['bash','-lc','nvidia-smi || true'], capture_output=True, text=True, check=False)
    print(out.stdout)
except Exception as e:
    print('nvidia-smi failed:', e)

try:
    import torch
    print('torch:', torch.__version__, 'CUDA avail:', torch.cuda.is_available())
    if torch.cuda.is_available():
        print('GPU name:', torch.cuda.get_device_name(0))
except Exception as e:
    print('torch not available yet:', e)

print('=== Data Sanity ===', flush=True)
base = Path('.')
train_dir = base/'train'
test_dir = base/'test'

# Count any files (any extension) to avoid extension mismatch issues
train_files = [p.name for p in train_dir.iterdir() if p.is_file()]
test_files = [p.name for p in test_dir.iterdir() if p.is_file()]
print('train files:', len(train_files))
print('test files:', len(test_files))
print('sample train files:', train_files[:5])
print('sample test files:', test_files[:5])

train_csv = pd.read_csv(base/'train.csv')
labels_csv = pd.read_csv(base/'labels.csv')
sub_csv = pd.read_csv(base/'sample_submission.csv')
print('train.csv shape:', train_csv.shape)
print('labels.csv shape:', labels_csv.shape)
print('sample_submission.csv shape:', sub_csv.shape)
print('train.csv head:\n', train_csv.head(3))
print('labels.csv head:\n', labels_csv.head(3))

# Determine extension by probing first few ids
def find_ext_for_id(img_id: str, roots):
    for ext in ('.jpg', '.jpeg', '.png', '.webp', '.bmp'):
        for root in roots:
            p = root/f'{img_id}{ext}'
            if p.exists():
                return ext
    # fallback: scan by prefix
    for root in roots:
        cands = list(root.glob(f'{img_id}.*'))
        if cands:
            return cands[0].suffix
    return None

train_id_col = 'id' if 'id' in train_csv.columns else ('image_id' if 'image_id' in train_csv.columns else None)
test_id_col = 'id' if 'id' in sub_csv.columns else ('image_id' if 'image_id' in sub_csv.columns else None)
print('Detected id columns -> train:', train_id_col, ' test:', test_id_col)

probe_ids = list(train_csv[train_id_col].head(5)) if train_id_col else []
probe_exts = {pid: find_ext_for_id(pid, [train_dir]) for pid in probe_ids}
print('Probe extensions:', probe_exts)
default_ext = None
vals = [e for e in probe_exts.values() if e]
if vals:
    default_ext = max(set(vals), key=vals.count)
print('Chosen default ext:', default_ext)

# Basic existence checks using detected extension or prefix matching
missing_train = 0
for img in train_csv[train_id_col].head(1000):
    if default_ext:
        exists = (train_dir/f'{img}{default_ext}').exists()
    else:
        exists = any((train_dir/f).name.startswith(img) for f in train_dir.iterdir())
    if not exists:
        missing_train += 1
print('Missing among first 1000 train ids:', missing_train)

missing_test = 0
for img in sub_csv[test_id_col].head(1000):
    if default_ext:
        exists = (test_dir/f'{img}{default_ext}').exists()
    else:
        exists = any((test_dir/f).name.startswith(img) for f in test_dir.iterdir())
    if not exists:
        missing_test += 1
print('Missing among first 1000 test ids:', missing_test)

print('Unique labels in labels.csv:', labels_csv['attribute_id'].nunique() if 'attribute_id' in labels_csv.columns else 'N/A')
print('Done.')

=== Environment Check ===


Sat Sep 27 16:44:14 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     182MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

train files: 120801
test files: 21318
sample train files: ['f4e684acbb6f2b33b45a16e586f87369.png', '727510fa873bed3a7a5b9902567b0d9f.png', 'e6b8484abfef0045ce9e4577f2ffd9e8.png', '8a074a477f1ccbd865151a9866419940.png', 'b22aa8832224499fdff061348f07bed2.png']
sample test files: ['4e0ba2b09affaf8525695752214b1dc4.png', 'ac5b7b0322f8c2ef4035b394956da403.png', 'df1d4aa72346aacaa4d4d1cc880fcc78.png', '878e96135e73845501059fb1d022d459.png', 'b007aba41e9d5a4c2df63fa452bf1640.png']
train.csv shape: (120801, 2)
labels.csv shape: (3474, 2)
sample_submission.csv shape: (21318, 2)
train.csv head:
                                  id                attribute_ids
0  4d0f6eada4ccb283551bc2f75e2ba588  3077 3187 3418 448 1625 782
1  75a9baea36b82e81263716fac427e416        2802 287 370 1419 784
2  cc7cbf14ef9e9261508ba27f9d2f4f28                      922 785
labels.csv head:
    attribute_id        attribute_name
0             0  country::afghanistan
1             1     country::alamania
2             2

In [3]:
import os, sys, subprocess, shutil
from pathlib import Path

def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

print('=== Install CUDA 12.1 torch stack and deps ===', flush=True)
# Uninstall any existing torch stack
for pkg in ('torch','torchvision','torchaudio'):
    try:
        subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', pkg], check=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    except Exception as e:
        print('uninstall error:', pkg, e)

# Clean potential stray site dirs that can shadow correct wheels
for d in (
    '/app/.pip-target/torch',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torch-2.8.0.dist-info',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torchvision-0.23.0.dist-info',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchaudio-2.8.0.dist-info',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
    '/app/.pip-target/torchgen',
    '/app/.pip-target/functorch',
):
    if os.path.exists(d):
        print('Removing', d)
        shutil.rmtree(d, ignore_errors=True)

# Install exact cu121 torch stack
pip('install',
    '--index-url', 'https://download.pytorch.org/whl/cu121',
    '--extra-index-url', 'https://pypi.org/simple',
    'torch==2.4.1', 'torchvision==0.19.1', 'torchaudio==2.4.1')

# Freeze versions for later installs
Path('constraints.txt').write_text('torch==2.4.1\ntorchvision==0.19.1\ntorchaudio==2.4.1\n')

# Core deps
pip('install', '-c', 'constraints.txt',
    'timm==1.0.9',
    'albumentations==1.4.14',
    'scikit-learn==1.5.2',
    'iterative-stratification==0.1.7',
    'opencv-python-headless==4.10.0.84',
    'pandas',
    '--upgrade-strategy', 'only-if-needed')

import torch
print('torch:', torch.__version__, 'CUDA build:', getattr(torch.version, 'cuda', None))
print('CUDA available:', torch.cuda.is_available())
assert str(getattr(torch.version, 'cuda', '12.1')).startswith('12.1'), f'Wrong CUDA build: {torch.version.cuda}'
assert torch.cuda.is_available(), 'CUDA not available'
print('GPU:', torch.cuda.get_device_name(0))
print('Done installing.')

=== Install CUDA 12.1 torch stack and deps ===


> pip install --index-url https://download.pytorch.org/whl/cu121 --extra-index-url https://pypi.org/simple torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1


Looking in indexes: https://download.pytorch.org/whl/cu121, https://pypi.org/simple


Collecting torch==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torch-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (799.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 MB 392.8 MB/s eta 0:00:00


Collecting torchvision==0.19.1
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.19.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 382.9 MB/s eta 0:00:00


Collecting torchaudio==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 233.0 MB/s eta 0:00:00


Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 128.0 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 273.7 MB/s eta 0:00:00


Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 269.7 MB/s eta 0:00:00


Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 287.3 MB/s eta 0:00:00


Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 243.6 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 238.7 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 316.4 MB/s eta 0:00:00


Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 258.0 MB/s eta 0:00:00


Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 246.8 MB/s eta 0:00:00


Collecting typing-extensions>=4.8.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 345.3 MB/s eta 0:00:00


Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 471.8 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 287.5 MB/s eta 0:00:00


Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 461.0 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 446.7 MB/s eta 0:00:00


Collecting fsspec
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 451.3 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 500.3 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 257.4 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 261.5 MB/s eta 0:00:00


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 244.7 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 194.5 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)


Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 530.9 MB/s eta 0:00:00


Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio


Successfully installed MarkupSafe-3.0.2 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.3.0 sympy-1.14.0 torch-2.4.1+cu121 torchaudio-2.4.1+cu121 torchvision-0.19.1+cu121 triton-3.0.0 typing-extensions-4.15.0


> pip install -c constraints.txt timm==1.0.9 albumentations==1.4.14 scikit-learn==1.5.2 iterative-stratification==0.1.7 opencv-python-headless==4.10.0.84 pandas --upgrade-strategy only-if-needed


Collecting timm==1.0.9
  Downloading timm-1.0.9-py3-none-any.whl (2.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 67.3 MB/s eta 0:00:00
Collecting albumentations==1.4.14
  Downloading albumentations-1.4.14-py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.0/178.0 KB 283.1 MB/s eta 0:00:00


Collecting scikit-learn==1.5.2
  Downloading scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.3/13.3 MB 205.3 MB/s eta 0:00:00
Collecting iterative-stratification==0.1.7
  Downloading iterative_stratification-0.1.7-py3-none-any.whl (8.5 kB)


Collecting opencv-python-headless==4.10.0.84
  Downloading opencv_python_headless-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.9/49.9 MB 213.3 MB/s eta 0:00:00


Collecting pandas
  Downloading pandas-2.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.4/12.4 MB 211.6 MB/s eta 0:00:00
Collecting pyyaml
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 521.1 MB/s eta 0:00:00


Collecting torch
  Downloading torch-2.4.1-cp311-cp311-manylinux1_x86_64.whl (797.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 316.9 MB/s eta 0:00:00


Collecting huggingface_hub
  Downloading huggingface_hub-0.35.1-py3-none-any.whl (563 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 563.3/563.3 KB 508.8 MB/s eta 0:00:00
Collecting torchvision
  Downloading torchvision-0.19.1-cp311-cp311-manylinux1_x86_64.whl (7.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 386.3 MB/s eta 0:00:00


Collecting safetensors
  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.8/485.8 KB 270.9 MB/s eta 0:00:00
Collecting typing-extensions>=4.9.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 398.9 MB/s eta 0:00:00


Collecting numpy>=1.24.4
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 241.4 MB/s eta 0:00:00
Collecting eval-type-backport
  Downloading eval_type_backport-0.2.2-py3-none-any.whl (5.8 kB)


Collecting pydantic>=2.7.0
  Downloading pydantic-2.11.9-py3-none-any.whl (444 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 444.9/444.9 KB 494.3 MB/s eta 0:00:00
Collecting albucore>=0.0.13
  Downloading albucore-0.0.33-py3-none-any.whl (18 kB)


Collecting scipy>=1.10.0
  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 264.4 MB/s eta 0:00:00


Collecting scikit-image>=0.21.0
  Downloading scikit_image-0.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/14.8 MB 250.2 MB/s eta 0:00:00
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 308.4/308.4 KB 466.9 MB/s eta 0:00:00


Collecting python-dateutil>=2.8.2
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 KB 454.3 MB/s eta 0:00:00
Collecting tzdata>=2022.7
  Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 347.8/347.8 KB 473.5 MB/s eta 0:00:00
Collecting pytz>=2020.1
  Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 509.2/509.2 KB 483.6 MB/s eta 0:00:00


Collecting simsimd>=5.9.2
  Downloading simsimd-6.5.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 206.5 MB/s eta 0:00:00


Collecting stringzilla>=3.10.4
  Downloading stringzilla-4.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (496 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 496.5/496.5 KB 355.6 MB/s eta 0:00:00
Collecting typing-inspection>=0.4.0
  Downloading typing_inspection-0.4.1-py3-none-any.whl (14 kB)
Collecting annotated-types>=0.6.0
  Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)


Collecting pydantic-core==2.33.2
  Downloading pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 517.7 MB/s eta 0:00:00
Collecting six>=1.5
  Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Collecting imageio!=2.35.0,>=2.33
  Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.8/315.8 KB 493.6 MB/s eta 0:00:00
Collecting lazy-loader>=0.4
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)


Collecting pillow>=10.1
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 285.9 MB/s eta 0:00:00
Collecting tifffile>=2022.8.12
  Downloading tifffile-2025.9.20-py3-none-any.whl (230 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.1/230.1 KB 458.3 MB/s eta 0:00:00
Collecting packaging>=21
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 412.7 MB/s eta 0:00:00
Collecting networkx>=3.0
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 519.0 MB/s eta 0:00:00


Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting fsspec>=2023.5.0
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 477.1 MB/s eta 0:00:00
Collecting hf-xet<2.0.0,>=1.1.3
  Downloading hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 527.4 MB/s eta 0:00:00
Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 385.1 MB/s eta 0:00:00
Collecting tqdm>=4.42.1
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 415.3 MB/s eta 0:00:00


Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 246.5 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 227.2 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 460.9 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 277.7 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 397.6 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 223.8 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 230.6 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 307.0 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 489.7 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 258.4 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 229.7 MB/s eta 0:00:00


Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 525.8 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 280.7 MB/s eta 0:00:00
Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 287.1 MB/s eta 0:00:00
Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 161.0 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 428.3 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 449.3 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 426.6 MB/s eta 0:00:00


Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 483.6 MB/s eta 0:00:00
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 428.5 MB/s eta 0:00:00


Installing collected packages: simsimd, pytz, mpmath, urllib3, tzdata, typing-extensions, tqdm, threadpoolctl, sympy, stringzilla, six, safetensors, pyyaml, pillow, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, joblib, idna, hf-xet, fsspec, filelock, eval-type-backport, charset_normalizer, certifi, annotated-types, typing-inspection, triton, tifffile, scipy, requests, python-dateutil, pydantic-core, opencv-python-headless, nvidia-cusparse-cu12, nvidia-cudnn-cu12, lazy-loader, jinja2, imageio, scikit-learn, scikit-image, pydantic, pandas, nvidia-cusolver-cu12, huggingface_hub, albucore, torch, iterative-stratification, albumentations, torchvision, timm


Successfully installed MarkupSafe-3.0.2 albucore-0.0.33 albumentations-1.4.14 annotated-types-0.7.0 certifi-2025.8.3 charset_normalizer-3.4.3 eval-type-backport-0.2.2 filelock-3.19.1 fsspec-2025.9.0 hf-xet-1.1.10 huggingface_hub-0.35.1 idna-3.10 imageio-2.37.0 iterative-stratification-0.1.7 jinja2-3.1.6 joblib-1.5.2 lazy-loader-0.4 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 opencv-python-headless-4.10.0.84 packaging-25.0 pandas-2.3.2 pillow-11.3.0 pydantic-2.11.9 pydantic-core-2.33.2 python-dateutil-2.9.0.post0 pytz-2025.2 pyyaml-6.0.3 requests-2.32.5 safetensors-0.6.2 scikit-image-0.25.2 scikit-learn-1.5.2 scipy-1.16.2 simsimd-6.5.3 six-1.1







torch: 2.4.1+cu121 CUDA build: 12.1
CUDA available: True
GPU: NVIDIA A10-24Q
Done installing.


In [4]:
# Create and save 5-fold Multilabel Stratified CV splits
import time, numpy as np, pandas as pd
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

t0 = time.time()
train_df = pd.read_csv('train.csv')
labels_df = pd.read_csv('labels.csv')
num_labels = labels_df['attribute_id'].nunique()
print('Num samples:', len(train_df), 'Num labels:', num_labels, flush=True)

# Parse attribute_ids -> list of ints
attrs = train_df['attribute_ids'].fillna('').apply(lambda s: [int(x) for x in str(s).split() if x!=''])
lens = attrs.apply(len)
print('Label cardinality: mean', lens.mean(), 'median', lens.median(), 'max', lens.max(), flush=True)

# Build dense indicator matrix (bool) for stratification
y = np.zeros((len(train_df), num_labels), dtype=np.uint8)
t1 = time.time()
for i, lab_list in enumerate(attrs):
    if lab_list:
        y[i, lab_list] = 1
    if (i+1) % 20000 == 0:
        print(f'..filled {i+1}/{len(train_df)} rows in {time.time()-t1:.1f}s', flush=True)

skf = MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=42)
folds = np.full(len(train_df), -1, dtype=np.int16)
for fold, (_, val_idx) in enumerate(skf.split(train_df.index.values, y)):
    folds[val_idx] = fold
    print(f'Assigned fold {fold}: {len(val_idx)} samples', flush=True)
assert (folds >= 0).all()

train_folds = train_df.copy()
train_folds['fold'] = folds
train_folds.to_csv('train_folds.csv', index=False)
print('Saved train_folds.csv. Time:', round(time.time()-t0,1), 's', flush=True)

Num samples: 120801 Num labels: 3474


Label cardinality: mean 4.421097507470964 median 4.0 max 26


..filled 20000/120801 rows in 0.0s


..filled 40000/120801 rows in 0.1s


..filled 60000/120801 rows in 0.1s


..filled 80000/120801 rows in 0.1s


..filled 100000/120801 rows in 0.2s


..filled 120000/120801 rows in 0.2s


KeyboardInterrupt: 

In [5]:
import sys, subprocess
def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

print('Fixing albumentations/albucore mismatch by downgrading albumentations to 1.3.1', flush=True)
pip('install', '-c', 'constraints.txt', 'albumentations==1.3.1', '--upgrade-strategy', 'only-if-needed')
import albumentations as A
print('albumentations version:', A.__version__)
from albumentations.pytorch import ToTensorV2
print('Albumentations import OK')

Fixing albumentations/albucore mismatch by downgrading albumentations to 1.3.1


> pip install -c constraints.txt albumentations==1.3.1 --upgrade-strategy only-if-needed


Collecting albumentations==1.3.1
  Downloading albumentations-1.3.1-py3-none-any.whl (125 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.7/125.7 KB 6.2 MB/s eta 0:00:00


Collecting scipy>=1.1.0
  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 208.8 MB/s eta 0:00:00


Collecting opencv-python-headless>=4.1.1
  Downloading opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (54.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.0/54.0 MB 218.1 MB/s eta 0:00:00
Collecting scikit-image>=0.16.1
  Downloading scikit_image-0.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/14.8 MB 246.7 MB/s eta 0:00:00


Collecting PyYAML
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 545.1 MB/s eta 0:00:00


Collecting numpy>=1.11.1
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 187.6 MB/s eta 0:00:00


Collecting qudida>=0.0.4
  Downloading qudida-0.0.4-py3-none-any.whl (3.5 kB)
Collecting opencv-python-headless>=4.1.1
  Downloading opencv_python_headless-4.11.0.86-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (50.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.0/50.0 MB 269.0 MB/s eta 0:00:00


Collecting scikit-learn>=0.19.1
  Downloading scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (9.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 292.7 MB/s eta 0:00:00
Collecting typing-extensions
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 383.1 MB/s eta 0:00:00
Collecting packaging>=21
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 421.1 MB/s eta 0:00:00
Collecting imageio!=2.35.0,>=2.33
  Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.8/315.8 KB 528.8 MB/s eta 0:00:00


Collecting networkx>=3.0
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 561.8 MB/s eta 0:00:00


Collecting pillow>=10.1
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 253.2 MB/s eta 0:00:00
Collecting tifffile>=2022.8.12
  Downloading tifffile-2025.9.20-py3-none-any.whl (230 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.1/230.1 KB 498.5 MB/s eta 0:00:00
Collecting lazy-loader>=0.4
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)


Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 308.4/308.4 KB 516.2 MB/s eta 0:00:00


Installing collected packages: typing-extensions, threadpoolctl, PyYAML, pillow, packaging, numpy, networkx, joblib, tifffile, scipy, opencv-python-headless, lazy-loader, imageio, scikit-learn, scikit-image, qudida, albumentations


Successfully installed PyYAML-6.0.3 albumentations-1.3.1 imageio-2.37.0 joblib-1.5.2 lazy-loader-0.4 networkx-3.5 numpy-1.26.4 opencv-python-headless-4.11.0.86 packaging-25.0 pillow-11.3.0 qudida-0.0.4 scikit-image-0.25.2 scikit-learn-1.7.2 scipy-1.16.2 threadpoolctl-3.6.0 tifffile-2025.9.20 typing-extensions-4.15.0




ImportError: cannot import name 'preserve_channel_dim' from 'albucore.utils' (/app/.pip-target/albucore/utils.py)

In [13]:
import sys, subprocess, importlib, os
def run_pip(cmd):
    print('> pip', *cmd, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *cmd], check=True)

print('Hard-reset albumentations to 1.3.1 (remove albucore), forcing overwrite', flush=True)
run_pip(['uninstall', '-y', 'albumentations', 'albucore'])
run_pip(['install', '--no-cache-dir', 'albumentations==1.3.1'])
import albumentations as A
print('albumentations version:', A.__version__)
print('albumentations file:', A.__file__)
from albumentations.pytorch import ToTensorV2
print('Albumentations import OK')

Hard-reset albumentations to 1.3.1 (remove albucore), forcing overwrite


> pip uninstall -y albumentations albucore


Found existing installation: albumentations 1.3.1
Uninstalling albumentations-1.3.1:
  Successfully uninstalled albumentations-1.3.1
> pip install --no-cache-dir albumentations==1.3.1




Collecting albumentations==1.3.1
  Downloading albumentations-1.3.1-py3-none-any.whl (125 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.7/125.7 KB 5.6 MB/s eta 0:00:00


Collecting scipy>=1.1.0
  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 169.0 MB/s eta 0:00:00
Collecting scikit-image>=0.16.1
  Downloading scikit_image-0.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/14.8 MB 240.5 MB/s eta 0:00:00
Collecting qudida>=0.0.4
  Downloading qudida-0.0.4-py3-none-any.whl (3.5 kB)
Collecting opencv-python-headless>=4.1.1


  Downloading opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (54.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.0/54.0 MB 191.8 MB/s eta 0:00:00


Collecting numpy>=1.11.1
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 319.6 MB/s eta 0:00:00
Collecting PyYAML
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 524.7 MB/s eta 0:00:00
Collecting opencv-python-headless>=4.1.1
  Downloading opencv_python_headless-4.11.0.86-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (50.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.0/50.0 MB 176.5 MB/s eta 0:00:00


Collecting scikit-learn>=0.19.1
  Downloading scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (9.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 208.1 MB/s eta 0:00:00
Collecting typing-extensions
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 388.7 MB/s eta 0:00:00


Collecting pillow>=10.1
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 540.1 MB/s eta 0:00:00
Collecting imageio!=2.35.0,>=2.33
  Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.8/315.8 KB 511.4 MB/s eta 0:00:00
Collecting packaging>=21
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 426.5 MB/s eta 0:00:00
Collecting networkx>=3.0
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 542.1 MB/s eta 0:00:00
Collecting tifffile>=2022.8.12
  Downloading tifffile-2025.9.20-py3-none-any.whl (230 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.1/230.1 KB 497.9 MB/s eta 0:00:00
Collecting lazy-loader>=0.4
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)


Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 308.4/308.4 KB 529.1 MB/s eta 0:00:00


Installing collected packages: typing-extensions, threadpoolctl, PyYAML, pillow, packaging, numpy, networkx, joblib, tifffile, scipy, opencv-python-headless, lazy-loader, imageio, scikit-learn, scikit-image, qudida, albumentations


Successfully installed PyYAML-6.0.3 albumentations-1.3.1 imageio-2.37.0 joblib-1.5.2 lazy-loader-0.4 networkx-3.5 numpy-1.26.4 opencv-python-headless-4.11.0.86 packaging-25.0 pillow-11.3.0 qudida-0.0.4 scikit-image-0.25.2 scikit-learn-1.7.2 scipy-1.16.2 threadpoolctl-3.6.0 tifffile-2025.9.20 typing-extensions-4.15.0




albumentations version: 1.3.1
albumentations file: /app/.pip-target/albumentations/__init__.py
Albumentations import OK


In [15]:
import os, sys, subprocess, time, shlex
from pathlib import Path
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold

print('=== Smoke: precompute fast folds (cardinality) + tiny 1-fold run (no-pretrained, unbuffered) ===', flush=True)
# 1) Fast folds via label-cardinality bins to avoid slow MSKF in subprocess
folds_path = Path('train_folds_smoke.csv')
if folds_path.exists():
    folds_path.unlink()
    print('Deleted existing train_folds_smoke.csv')
train_df = pd.read_csv('train.csv')
attrs = train_df['attribute_ids'].fillna('').astype(str).apply(lambda s: [int(x) for x in s.split() if x!=''])
card = attrs.apply(len).values
bins = np.clip(card, 0, 8)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
folds = np.full(len(train_df), -1, dtype=np.int16)
for f, (_, vidx) in enumerate(skf.split(np.zeros(len(train_df)), bins)):
    folds[vidx] = f
train_df2 = train_df.copy()
train_df2['fold'] = folds
train_df2.to_csv(folds_path, index=False)
print('Wrote fast folds to train_folds_smoke.csv (cardinality SKFold)', flush=True)

# 2) Run a tiny smoke of train.py using these folds
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'tf_efficientnet_b3_ns',
    '--img-size', '224',
    '--epochs', '1',
    '--batch-size', '64',
    '--val-batch-size', '96',
    '--num-workers', '4',
    '--folds', '0',
    '--folds-csv', 'train_folds_smoke.csv',
    '--out-dir', 'out_smoke_fast',
    '--limit-train-steps', '30',
    '--limit-val-steps', '10',
    '--early-stop-patience', '1',
    '--no-pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
env = dict(os.environ)
env['PYTHONUNBUFFERED'] = '1'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
elapsed = time.time() - t0
print(f'Exit code: {rc}, elapsed {elapsed/60:.1f} min', flush=True)
assert rc == 0, 'Smoke training failed'
print('Smoke training completed.')

=== Smoke: precompute fast folds (cardinality) + tiny 1-fold run (no-pretrained, unbuffered) ===


Deleted existing train_folds_smoke.csv


Wrote fast folds to train_folds_smoke.csv (cardinality SKFold)


Running: /usr/bin/python3.11 -u train.py --model tf_efficientnet_b3_ns --img-size 224 --epochs 1 --batch-size 64 --val-batch-size 96 --num-workers 4 --folds 0 --folds-csv train_folds_smoke.csv --out-dir out_smoke_fast --limit-train-steps 30 --limit-val-steps 10 --early-stop-patience 1 --no-pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):



=== VAL DIAG fold 0 epoch 1 ===
val_size=24161 probs_shape=(960, 3474) tgts_shape=(960, 3474)
probs_range=[0.000001,0.167815]
tgt_pos_rate=0.00126745 mean_pos_per_img=4.403
thr=0.2 pred_pos_rate=0.00000000 mean_pred_per_img=0.000 empty_frac=1.000000 TP=0 FP=0 FN=4227 f1@0.2=0.000000


fold 0 epoch 1 val micro-f1 0.17101 @ thr 0.090


==== Fold 0 done: best_f1 0.17101 thr 0.090 ====


OOF micro-f1 0.08716 @ thr 0.050
  state = torch.load(pth, map_location='cpu')
  model = create_fn(


Wrote submission.csv


Exit code: 0, elapsed 2.7 min


Smoke training completed.


In [18]:
import os, sys, subprocess, time, shlex
from pathlib import Path

print('=== PRODUCTION RUN: b3@384 5-fold, EMA+TTA (cardinality folds) ===', flush=True)
cmd = [
    sys.executable, 'train.py',
    '--model', 'tf_efficientnet_b3_ns',
    '--img-size', '384',
    '--epochs', '10',
    '--batch-size', '56',
    '--val-batch-size', '96',
    '--num-workers', '10',
    '--lr', '2e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '3',
    '--folds', '0,1,2,3,4',
    '--folds-csv', 'train_folds_smoke.csv',
    '--out-dir', 'out_b3_384_card',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
elapsed = time.time() - t0
print(f'Exit code: {rc}, elapsed {elapsed/3600:.2f} h', flush=True)
assert rc == 0, 'Production run failed'
print('Production run completed.')

=== PRODUCTION RUN: b3@384 5-fold, EMA+TTA (cardinality folds) ===


Running: /usr/bin/python3.11 train.py --model tf_efficientnet_b3_ns --img-size 384 --epochs 10 --batch-size 56 --val-batch-size 96 --num-workers 10 --lr 2e-4 --use-ema --tta --early-stop-patience 3 --folds 0,1,2,3,4 --folds-csv train_folds_smoke.csv --out-dir out_b3_384_card --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


fold 0 epoch 1 iter 100/1725 loss 2.4412 elapsed 0.9m


fold 0 epoch 1 iter 200/1725 loss 2.4382 elapsed 1.4m


fold 0 epoch 1 iter 300/1725 loss 2.4356 elapsed 1.9m


fold 0 epoch 1 iter 400/1725 loss 2.4342 elapsed 2.4m


fold 0 epoch 1 iter 500/1725 loss 2.4334 elapsed 2.9m


fold 0 epoch 1 iter 600/1725 loss 2.4328 elapsed 3.4m


fold 0 epoch 1 iter 700/1725 loss 2.4324 elapsed 4.0m


fold 0 epoch 1 iter 800/1725 loss 2.4321 elapsed 4.5m


fold 0 epoch 1 iter 900/1725 loss 2.4319 elapsed 5.0m


fold 0 epoch 1 iter 1000/1725 loss 2.4317 elapsed 5.5m


fold 0 epoch 1 iter 1100/1725 loss 2.4316 elapsed 6.0m


fold 0 epoch 1 iter 1200/1725 loss 2.4314 elapsed 6.5m


fold 0 epoch 1 iter 1300/1725 loss 2.4313 elapsed 7.0m


fold 0 epoch 1 iter 1400/1725 loss 2.4312 elapsed 7.5m


fold 0 epoch 1 iter 1500/1725 loss 2.4312 elapsed 8.1m


fold 0 epoch 1 iter 1600/1725 loss 2.4311 elapsed 8.6m


fold 0 epoch 1 iter 1700/1725 loss 2.4310 elapsed 9.1m



=== VAL DIAG fold 0 epoch 1 ===
val_size=24161 probs_shape=(24161, 3474) tgts_shape=(24161, 3474)
probs_range=[0.000001,0.674683]
tgt_pos_rate=0.00127242 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.00584288 mean_pred_per_img=20.298 empty_frac=0.002235 TP=44515 FP=445909 FN=62286 f1@0.2=0.149073


fold 0 epoch 1 val micro-f1 0.18500 @ thr 0.350


fold 0 epoch 2 iter 100/1725 loss 2.4300 elapsed 11.5m


fold 0 epoch 2 iter 200/1725 loss 2.4300 elapsed 12.0m


fold 0 epoch 2 iter 300/1725 loss 2.4300 elapsed 12.5m


fold 0 epoch 2 iter 400/1725 loss 2.4300 elapsed 13.0m


fold 0 epoch 2 iter 500/1725 loss 2.4300 elapsed 13.5m


fold 0 epoch 2 iter 600/1725 loss 2.4300 elapsed 14.1m


fold 0 epoch 2 iter 700/1725 loss 2.4300 elapsed 14.6m


fold 0 epoch 2 iter 800/1725 loss 2.4300 elapsed 15.1m


fold 0 epoch 2 iter 900/1725 loss 2.4300 elapsed 15.6m


fold 0 epoch 2 iter 1000/1725 loss 2.4300 elapsed 16.1m


fold 0 epoch 2 iter 1100/1725 loss 2.4300 elapsed 16.6m


fold 0 epoch 2 iter 1200/1725 loss 2.4300 elapsed 17.1m


fold 0 epoch 2 iter 1300/1725 loss 2.4300 elapsed 17.7m


fold 0 epoch 2 iter 1400/1725 loss 2.4300 elapsed 18.2m


fold 0 epoch 2 iter 1500/1725 loss 2.4300 elapsed 18.7m


fold 0 epoch 2 iter 1600/1725 loss 2.4300 elapsed 19.2m


fold 0 epoch 2 iter 1700/1725 loss 2.4300 elapsed 19.7m



=== VAL DIAG fold 0 epoch 2 ===
val_size=24161 probs_shape=(24161, 3474) tgts_shape=(24161, 3474)
probs_range=[0.000001,0.939251]
tgt_pos_rate=0.00127242 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.00633276 mean_pred_per_img=22.000 empty_frac=0.000000 TP=46455 FP=485087 FN=60346 f1@0.2=0.145549


fold 0 epoch 2 val micro-f1 0.14555 @ thr 0.050


fold 0 epoch 3 iter 100/1725 loss 2.4300 elapsed 21.6m


fold 0 epoch 3 iter 200/1725 loss 2.4300 elapsed 22.1m


fold 0 epoch 3 iter 300/1725 loss 2.4300 elapsed 22.7m


fold 0 epoch 3 iter 400/1725 loss 2.4300 elapsed 23.2m


fold 0 epoch 3 iter 500/1725 loss 2.4300 elapsed 23.7m


fold 0 epoch 3 iter 600/1725 loss 2.4300 elapsed 24.2m


fold 0 epoch 3 iter 700/1725 loss 2.4300 elapsed 24.7m


fold 0 epoch 3 iter 800/1725 loss 2.4300 elapsed 25.2m


fold 0 epoch 3 iter 900/1725 loss 2.4300 elapsed 25.7m


fold 0 epoch 3 iter 1000/1725 loss 2.4300 elapsed 26.3m


fold 0 epoch 3 iter 1100/1725 loss 2.4300 elapsed 26.8m


fold 0 epoch 3 iter 1200/1725 loss 2.4300 elapsed 27.3m


fold 0 epoch 3 iter 1300/1725 loss 2.4300 elapsed 27.8m


fold 0 epoch 3 iter 1400/1725 loss 2.4300 elapsed 28.3m


fold 0 epoch 3 iter 1500/1725 loss 2.4300 elapsed 28.8m


fold 0 epoch 3 iter 1600/1725 loss 2.4300 elapsed 29.3m


fold 0 epoch 3 iter 1700/1725 loss 2.4300 elapsed 29.8m



=== VAL DIAG fold 0 epoch 3 ===
val_size=24161 probs_shape=(24161, 3474) tgts_shape=(24161, 3474)
probs_range=[0.000001,0.968680]
tgt_pos_rate=0.00127242 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.00633276 mean_pred_per_img=22.000 empty_frac=0.000000 TP=46455 FP=485087 FN=60346 f1@0.2=0.145549


fold 0 epoch 3 val micro-f1 0.14555 @ thr 0.480


fold 0 epoch 4 iter 100/1725 loss 2.4300 elapsed 31.8m


fold 0 epoch 4 iter 200/1725 loss 2.4300 elapsed 32.3m


fold 0 epoch 4 iter 300/1725 loss 2.4300 elapsed 32.8m


fold 0 epoch 4 iter 400/1725 loss 2.4300 elapsed 33.3m


fold 0 epoch 4 iter 500/1725 loss 2.4300 elapsed 33.8m


fold 0 epoch 4 iter 600/1725 loss 2.4300 elapsed 34.3m


fold 0 epoch 4 iter 700/1725 loss 2.4300 elapsed 34.8m


fold 0 epoch 4 iter 800/1725 loss 2.4300 elapsed 35.4m


fold 0 epoch 4 iter 900/1725 loss 2.4300 elapsed 35.9m


fold 0 epoch 4 iter 1000/1725 loss 2.4300 elapsed 36.4m


fold 0 epoch 4 iter 1100/1725 loss 2.4300 elapsed 36.9m


fold 0 epoch 4 iter 1200/1725 loss 2.4300 elapsed 37.4m


fold 0 epoch 4 iter 1300/1725 loss 2.4300 elapsed 37.9m


fold 0 epoch 4 iter 1400/1725 loss 2.4300 elapsed 38.4m


fold 0 epoch 4 iter 1500/1725 loss 2.4300 elapsed 39.0m


fold 0 epoch 4 iter 1600/1725 loss 2.4300 elapsed 39.5m


fold 0 epoch 4 iter 1700/1725 loss 2.4300 elapsed 40.0m



=== VAL DIAG fold 0 epoch 4 ===
val_size=24161 probs_shape=(24161, 3474) tgts_shape=(24161, 3474)
probs_range=[0.000001,0.984481]
tgt_pos_rate=0.00127242 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.00633276 mean_pred_per_img=22.000 empty_frac=0.000000 TP=46455 FP=485087 FN=60346 f1@0.2=0.145549


fold 0 epoch 4 val micro-f1 0.14555 @ thr 0.050
Early stopping at epoch 4


==== Fold 0 done: best_f1 0.18500 thr 0.350 ====
==== Fold 1 start ====


fold 1 epoch 1 iter 100/1725 loss 2.4413 elapsed 0.5m


fold 1 epoch 1 iter 200/1725 loss 2.4379 elapsed 1.0m


fold 1 epoch 1 iter 300/1725 loss 2.4347 elapsed 1.6m


fold 1 epoch 1 iter 400/1725 loss 2.4330 elapsed 2.1m


fold 1 epoch 1 iter 500/1725 loss 2.4320 elapsed 2.6m


fold 1 epoch 1 iter 600/1725 loss 2.4313 elapsed 3.1m


fold 1 epoch 1 iter 700/1725 loss 2.4308 elapsed 3.6m


fold 1 epoch 1 iter 800/1725 loss 2.4305 elapsed 4.1m


fold 1 epoch 1 iter 900/1725 loss 2.4302 elapsed 4.7m


fold 1 epoch 1 iter 1000/1725 loss 2.4300 elapsed 5.2m


fold 1 epoch 1 iter 1100/1725 loss 2.4298 elapsed 5.7m


fold 1 epoch 1 iter 1200/1725 loss 2.4296 elapsed 6.2m


fold 1 epoch 1 iter 1300/1725 loss 2.4295 elapsed 6.7m


fold 1 epoch 1 iter 1400/1725 loss 2.4294 elapsed 7.2m


fold 1 epoch 1 iter 1500/1725 loss 2.4293 elapsed 7.7m


fold 1 epoch 1 iter 1600/1725 loss 2.4292 elapsed 8.3m


fold 1 epoch 1 iter 1700/1725 loss 2.4291 elapsed 8.8m



=== VAL DIAG fold 1 epoch 1 ===
val_size=24160 probs_shape=(24160, 3474) tgts_shape=(24160, 3474)
probs_range=[0.000001,0.520331]
tgt_pos_rate=0.00127227 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.00415277 mean_pred_per_img=14.427 empty_frac=0.006416 TP=37122 FP=311428 FN=69662 f1@0.2=0.163054


fold 1 epoch 1 val micro-f1 0.18269 @ thr 0.250


fold 1 epoch 2 iter 100/1725 loss 2.4279 elapsed 10.9m


fold 1 epoch 2 iter 200/1725 loss 2.4279 elapsed 11.4m


fold 1 epoch 2 iter 300/1725 loss 2.4279 elapsed 11.9m


fold 1 epoch 2 iter 400/1725 loss 2.4279 elapsed 12.4m


fold 1 epoch 2 iter 500/1725 loss 2.4279 elapsed 12.9m


fold 1 epoch 2 iter 600/1725 loss 2.4279 elapsed 13.4m


fold 1 epoch 2 iter 700/1725 loss 2.4279 elapsed 13.9m


fold 1 epoch 2 iter 800/1725 loss 2.4279 elapsed 14.4m


fold 1 epoch 2 iter 900/1725 loss 2.4279 elapsed 15.0m


fold 1 epoch 2 iter 1000/1725 loss 2.4279 elapsed 15.5m


fold 1 epoch 2 iter 1100/1725 loss 2.4279 elapsed 16.0m


fold 1 epoch 2 iter 1200/1725 loss 2.4279 elapsed 16.5m


fold 1 epoch 2 iter 1300/1725 loss 2.4279 elapsed 17.0m


fold 1 epoch 2 iter 1400/1725 loss 2.4279 elapsed 17.5m


fold 1 epoch 2 iter 1500/1725 loss 2.4279 elapsed 18.0m


fold 1 epoch 2 iter 1600/1725 loss 2.4279 elapsed 18.5m


fold 1 epoch 2 iter 1700/1725 loss 2.4279 elapsed 19.1m



=== VAL DIAG fold 1 epoch 2 ===
val_size=24160 probs_shape=(24160, 3474) tgts_shape=(24160, 3474)
probs_range=[0.000001,0.917003]
tgt_pos_rate=0.00127227 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.00719632 mean_pred_per_img=25.000 empty_frac=0.000000 TP=48827 FP=555173 FN=57957 f1@0.2=0.137389


fold 1 epoch 2 val micro-f1 0.13739 @ thr 0.050


fold 1 epoch 3 iter 100/1725 loss 2.4279 elapsed 21.0m


fold 1 epoch 3 iter 200/1725 loss 2.4279 elapsed 21.5m


fold 1 epoch 3 iter 300/1725 loss 2.4279 elapsed 22.0m


fold 1 epoch 3 iter 400/1725 loss 2.4279 elapsed 22.5m


fold 1 epoch 3 iter 500/1725 loss 2.4279 elapsed 23.0m


fold 1 epoch 3 iter 600/1725 loss 2.4279 elapsed 23.5m


fold 1 epoch 3 iter 700/1725 loss 2.4279 elapsed 24.1m


fold 1 epoch 3 iter 800/1725 loss 2.4279 elapsed 24.6m


fold 1 epoch 3 iter 900/1725 loss 2.4279 elapsed 25.1m


fold 1 epoch 3 iter 1000/1725 loss 2.4279 elapsed 25.6m


fold 1 epoch 3 iter 1100/1725 loss 2.4279 elapsed 26.1m


fold 1 epoch 3 iter 1200/1725 loss 2.4279 elapsed 26.6m


fold 1 epoch 3 iter 1300/1725 loss 2.4279 elapsed 27.1m


fold 1 epoch 3 iter 1400/1725 loss 2.4279 elapsed 27.6m


fold 1 epoch 3 iter 1500/1725 loss 2.4279 elapsed 28.2m


fold 1 epoch 3 iter 1600/1725 loss 2.4279 elapsed 28.7m


fold 1 epoch 3 iter 1700/1725 loss 2.4279 elapsed 29.2m



=== VAL DIAG fold 1 epoch 3 ===
val_size=24160 probs_shape=(24160, 3474) tgts_shape=(24160, 3474)
probs_range=[0.000001,0.871343]
tgt_pos_rate=0.00127227 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.00719632 mean_pred_per_img=25.000 empty_frac=0.000000 TP=48827 FP=555173 FN=57957 f1@0.2=0.137389


fold 1 epoch 3 val micro-f1 0.13818 @ thr 0.500


fold 1 epoch 4 iter 100/1725 loss 2.4279 elapsed 31.1m


fold 1 epoch 4 iter 200/1725 loss 2.4279 elapsed 31.6m


fold 1 epoch 4 iter 300/1725 loss 2.4279 elapsed 32.1m


fold 1 epoch 4 iter 400/1725 loss 2.4279 elapsed 32.6m


fold 1 epoch 4 iter 500/1725 loss 2.4279 elapsed 33.1m


fold 1 epoch 4 iter 600/1725 loss 2.4279 elapsed 33.7m


fold 1 epoch 4 iter 700/1725 loss 2.4279 elapsed 34.2m


fold 1 epoch 4 iter 800/1725 loss 2.4279 elapsed 34.7m


fold 1 epoch 4 iter 900/1725 loss 2.4279 elapsed 35.2m


fold 1 epoch 4 iter 1000/1725 loss 2.4279 elapsed 35.7m


fold 1 epoch 4 iter 1100/1725 loss 2.4279 elapsed 36.2m


fold 1 epoch 4 iter 1200/1725 loss 2.4279 elapsed 36.7m


fold 1 epoch 4 iter 1300/1725 loss 2.4279 elapsed 37.2m


fold 1 epoch 4 iter 1400/1725 loss 2.4279 elapsed 37.7m


fold 1 epoch 4 iter 1500/1725 loss 2.4279 elapsed 38.3m


fold 1 epoch 4 iter 1600/1725 loss 2.4279 elapsed 38.8m


fold 1 epoch 4 iter 1700/1725 loss 2.4279 elapsed 39.3m



=== VAL DIAG fold 1 epoch 4 ===
val_size=24160 probs_shape=(24160, 3474) tgts_shape=(24160, 3474)
probs_range=[0.000001,0.964950]
tgt_pos_rate=0.00127227 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.00719632 mean_pred_per_img=25.000 empty_frac=0.000000 TP=48827 FP=555173 FN=57957 f1@0.2=0.137389


fold 1 epoch 4 val micro-f1 0.13739 @ thr 0.050
Early stopping at epoch 4


==== Fold 1 done: best_f1 0.18269 thr 0.250 ====
==== Fold 2 start ====


In [17]:
# Precompute and lock 5-fold MSKF folds (seed=42) to train_folds.csv
import time, numpy as np, pandas as pd
from pathlib import Path
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
from scipy.sparse import csr_matrix

t0 = time.time()
train_df = pd.read_csv('train.csv')
labels_df = pd.read_csv('labels.csv')
attr_ids = sorted(labels_df['attribute_id'].unique().tolist())
attr_to_idx = {a:i for i,a in enumerate(attr_ids)}
n = len(train_df); C = len(attr_ids)
print(f'n={n} classes={C}', flush=True)

# Build sparse label matrix via rows/cols
rows, cols = [], []
for i, s in enumerate(train_df['attribute_ids'].fillna('').astype(str).tolist()):
    if s:
        for a in map(int, s.split()):
            j = attr_to_idx.get(a, None)
            if j is not None:
                rows.append(i); cols.append(j)
print(f'nonzeros={len(rows)}', flush=True)
y_sparse = csr_matrix((np.ones(len(rows), dtype=np.uint8), (rows, cols)), shape=(n, C), dtype=np.uint8)

# Dense conversion (fits in RAM ~420MB) for iterstrat; one-time cost
t1 = time.time()
y = y_sparse.toarray(order='C')
print(f'dense built in {time.time()-t1:.1f}s, shape={y.shape}, dtype={y.dtype}', flush=True)

mskf = MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=42)
folds = np.full(n, -1, np.int16)
t2 = time.time()
for f, (_, vidx) in enumerate(mskf.split(np.zeros(n), y)):
    folds[vidx] = f
    print(f'fold {f}: {len(vidx)}', flush=True)
print(f'MSKF split time: {time.time()-t2:.1f}s', flush=True)
assert (folds >= 0).all()

out = train_df.copy(); out['fold'] = folds
out.to_csv('train_folds.csv', index=False)
print(f'Saved train_folds.csv in {time.time()-t0:.1f}s')

n=120801 classes=3474


nonzeros=534073


dense built in 0.0s, shape=(120801, 3474), dtype=uint8


KeyboardInterrupt: 

In [19]:
import os, sys, subprocess, time, shlex
from pathlib import Path
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold

print('=== Diagnostic single-fold run: b3@384, epochs=5, pretrained ===', flush=True)
# Ensure fast cardinality folds exist (for speed)
folds_path = Path('train_folds_smoke.csv')
if not folds_path.exists():
    train_df = pd.read_csv('train.csv')
    attrs = train_df['attribute_ids'].fillna('').astype(str).apply(lambda s: [int(x) for x in s.split() if x!=''])
    card = attrs.apply(len).values
    bins = np.clip(card, 0, 8)
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    folds = np.full(len(train_df), -1, dtype=np.int16)
    for f, (_, vidx) in enumerate(skf.split(np.zeros(len(train_df)), bins)):
        folds[vidx] = f
    train_df2 = train_df.copy()
    train_df2['fold'] = folds
    train_df2.to_csv(folds_path, index=False)
    print('Wrote train_folds_smoke.csv')
else:
    print('Using existing train_folds_smoke.csv')

cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'tf_efficientnet_b3_ns',
    '--img-size', '384',
    '--epochs', '5',
    '--batch-size', '32',
    '--val-batch-size', '64',
    '--num-workers', '8',
    '--folds', '0',
    '--folds-csv', 'train_folds_smoke.csv',
    '--out-dir', 'out_debug',
    '--early-stop-patience', '2',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
env = dict(os.environ)
env['PYTHONUNBUFFERED'] = '1'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
elapsed = time.time() - t0
print(f'Exit code: {rc}, elapsed {elapsed/60:.1f} min', flush=True)
assert rc == 0, 'Diagnostic run failed'
print('Diagnostic run completed.')

=== Diagnostic single-fold run: b3@384, epochs=5, pretrained ===


Using existing train_folds_smoke.csv
Running: /usr/bin/python3.11 -u train.py --model tf_efficientnet_b3_ns --img-size 384 --epochs 5 --batch-size 32 --val-batch-size 64 --num-workers 8 --folds 0 --folds-csv train_folds_smoke.csv --out-dir out_debug --early-stop-patience 2 --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


fold 0 epoch 1 iter 100/3020 loss 0.0014 elapsed 0.5m


fold 0 epoch 1 iter 200/3020 loss 0.0014 elapsed 0.8m


fold 0 epoch 1 iter 300/3020 loss 0.0013 elapsed 1.1m


fold 0 epoch 1 iter 400/3020 loss 0.0013 elapsed 1.3m


fold 0 epoch 1 iter 500/3020 loss 0.0012 elapsed 1.6m


fold 0 epoch 1 iter 600/3020 loss 0.0012 elapsed 1.9m


fold 0 epoch 1 iter 700/3020 loss 0.0011 elapsed 2.2m


fold 0 epoch 1 iter 800/3020 loss 0.0011 elapsed 2.5m


fold 0 epoch 1 iter 900/3020 loss 0.0011 elapsed 2.8m


fold 0 epoch 1 iter 1000/3020 loss 0.0011 elapsed 3.1m


fold 0 epoch 1 iter 1100/3020 loss 0.0010 elapsed 3.3m


fold 0 epoch 1 iter 1200/3020 loss 0.0010 elapsed 3.6m


fold 0 epoch 1 iter 1300/3020 loss 0.0010 elapsed 3.9m


fold 0 epoch 1 iter 1400/3020 loss 0.0010 elapsed 4.2m


fold 0 epoch 1 iter 1500/3020 loss 0.0010 elapsed 4.5m


fold 0 epoch 1 iter 1600/3020 loss 0.0010 elapsed 4.8m


fold 0 epoch 1 iter 1700/3020 loss 0.0010 elapsed 5.1m


fold 0 epoch 1 iter 1800/3020 loss 0.0009 elapsed 5.3m


fold 0 epoch 1 iter 1900/3020 loss 0.0009 elapsed 5.6m


fold 0 epoch 1 iter 2000/3020 loss 0.0009 elapsed 5.9m


fold 0 epoch 1 iter 2100/3020 loss 0.0009 elapsed 6.2m


fold 0 epoch 1 iter 2200/3020 loss 0.0009 elapsed 6.5m


fold 0 epoch 1 iter 2300/3020 loss 0.0009 elapsed 6.8m


fold 0 epoch 1 iter 2400/3020 loss 0.0009 elapsed 7.1m


fold 0 epoch 1 iter 2500/3020 loss 0.0009 elapsed 7.4m


fold 0 epoch 1 iter 2600/3020 loss 0.0009 elapsed 7.6m


fold 0 epoch 1 iter 2700/3020 loss 0.0009 elapsed 7.9m


fold 0 epoch 1 iter 2800/3020 loss 0.0009 elapsed 8.2m


fold 0 epoch 1 iter 2900/3020 loss 0.0009 elapsed 8.5m


fold 0 epoch 1 iter 3000/3020 loss 0.0009 elapsed 8.8m



=== VAL DIAG fold 0 epoch 1 ===
val_size=24161 probs_shape=(24161, 3474) tgts_shape=(24161, 3474)
probs_range=[0.000000,0.999465]
tgt_pos_rate=0.00127242 mean_pos_per_img=4.420


thr=0.2 pred_pos_rate=0.02936688 mean_pred_per_img=102.021 empty_frac=0.000000 TP=100599 FP=2364319 FN=6202 f1@0.2=0.078235


fold 0 epoch 1 val micro-f1 0.54314 @ thr 0.500


fold 0 epoch 2 iter 100/3020 loss 0.0007 elapsed 10.7m


fold 0 epoch 2 iter 200/3020 loss 0.0006 elapsed 11.0m


fold 0 epoch 2 iter 300/3020 loss 0.0007 elapsed 11.2m


fold 0 epoch 2 iter 400/3020 loss 0.0007 elapsed 11.5m


fold 0 epoch 2 iter 500/3020 loss 0.0007 elapsed 11.8m


fold 0 epoch 2 iter 600/3020 loss 0.0007 elapsed 12.1m


fold 0 epoch 2 iter 700/3020 loss 0.0007 elapsed 12.4m


fold 0 epoch 2 iter 800/3020 loss 0.0007 elapsed 12.7m


fold 0 epoch 2 iter 900/3020 loss 0.0007 elapsed 13.0m


fold 0 epoch 2 iter 1000/3020 loss 0.0007 elapsed 13.3m


fold 0 epoch 2 iter 1100/3020 loss 0.0007 elapsed 13.5m


In [20]:
import os, sys, time, shlex, subprocess
from pathlib import Path
import numpy as np
import pandas as pd
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

print('=== Build fast MSKF folds: top-512 labels + cardinality bins (one-hot) ===', flush=True)
t0 = time.time()
train_df = pd.read_csv('train.csv')
labels_df = pd.read_csv('labels.csv')
attr_ids = labels_df['attribute_id'].astype(int).tolist()
n = len(train_df)
print('n samples:', n, 'num labels:', len(attr_ids), flush=True)

# Parse labels
labs = train_df['attribute_ids'].fillna('').astype(str).str.split()

# Count label frequencies
from collections import Counter
cnt = Counter()
for s in labs:
    for x in s:
        cnt[int(x)] += 1
topK = 512
top_attrs = [a for a,_ in cnt.most_common(topK)]
top_map = {a:i for i,a in enumerate(top_attrs)}

# Build reduced multilabel matrix: K (top labels) + 9 bin one-hots for cardinality (0..8, 8=8+)
K = len(top_attrs)
B = 9
y = np.zeros((n, K + B), dtype=np.uint8)
for i, s in enumerate(labs):
    if s:
        # top-K one-hots
        for x in s:
            j = top_map.get(int(x))
            if j is not None:
                y[i, j] = 1
        # cardinality bin one-hot
        c = len(s)
        b = min(c, B-1)
        y[i, K + b] = 1
    if (i+1) % 20000 == 0:
        print(f'..processed {i+1}/{n}', flush=True)

mskf = MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=42)
folds = np.full(n, -1, np.int16)
for f, (_, vidx) in enumerate(mskf.split(np.zeros(n), y)):
    folds[vidx] = f
    print('fold', f, 'size', len(vidx), flush=True)
assert (folds >= 0).all()

out_path = Path('train_folds_top512.csv')
out = train_df.copy(); out['fold'] = folds
out.to_csv(out_path, index=False)
print('Saved', str(out_path), 'in', f'{time.time()-t0:.1f}s', flush=True)

print('=== Launch 5-fold b3@384 EMA+TTA with new folds ===', flush=True)
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'tf_efficientnet_b3_ns',
    '--img-size', '384',
    '--epochs', '10',
    '--batch-size', '56',
    '--val-batch-size', '96',
    '--num-workers', '10',
    '--lr', '2e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '3',
    '--folds', '0,1,2,3,4',
    '--folds-csv', 'train_folds_top512.csv',
    '--out-dir', 'out_b3_384_top512',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t1 = time.time()
env = dict(os.environ); env['PYTHONUNBUFFERED'] = '1'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
print(f'Exit code: {rc}, elapsed {(time.time()-t1)/3600:.2f} h', flush=True)
assert rc == 0, 'Production run failed'
print('Production run completed.')

=== Build fast MSKF folds: top-512 labels + cardinality bins (one-hot) ===


n samples: 120801 num labels: 3474


..processed 20000/120801


..processed 40000/120801


..processed 60000/120801


..processed 80000/120801


..processed 100000/120801


..processed 120000/120801


fold 0 size 24189


fold 1 size 24136


fold 2 size 24188


fold 3 size 24167


fold 4 size 24121


Saved train_folds_top512.csv in 15.7s


=== Launch 5-fold b3@384 EMA+TTA with new folds ===


Running: /usr/bin/python3.11 -u train.py --model tf_efficientnet_b3_ns --img-size 384 --epochs 10 --batch-size 56 --val-batch-size 96 --num-workers 10 --lr 2e-4 --use-ema --tta --early-stop-patience 3 --folds 0,1,2,3,4 --folds-csv train_folds_top512.csv --out-dir out_b3_384_top512 --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


fold 0 epoch 1 iter 100/1725 loss 0.0014 elapsed 0.9m


fold 0 epoch 1 iter 200/1725 loss 0.0014 elapsed 1.4m


fold 0 epoch 1 iter 300/1725 loss 0.0013 elapsed 1.9m


fold 0 epoch 1 iter 400/1725 loss 0.0012 elapsed 2.4m


fold 0 epoch 1 iter 500/1725 loss 0.0012 elapsed 2.9m


fold 0 epoch 1 iter 600/1725 loss 0.0011 elapsed 3.5m


fold 0 epoch 1 iter 700/1725 loss 0.0011 elapsed 4.0m


fold 0 epoch 1 iter 800/1725 loss 0.0011 elapsed 4.5m


fold 0 epoch 1 iter 900/1725 loss 0.0010 elapsed 5.0m


fold 0 epoch 1 iter 1000/1725 loss 0.0010 elapsed 5.5m


fold 0 epoch 1 iter 1100/1725 loss 0.0010 elapsed 6.0m


fold 0 epoch 1 iter 1200/1725 loss 0.0010 elapsed 6.6m


fold 0 epoch 1 iter 1300/1725 loss 0.0010 elapsed 7.1m


fold 0 epoch 1 iter 1400/1725 loss 0.0010 elapsed 7.6m


fold 0 epoch 1 iter 1500/1725 loss 0.0009 elapsed 8.1m


fold 0 epoch 1 iter 1600/1725 loss 0.0009 elapsed 8.6m


fold 0 epoch 1 iter 1700/1725 loss nan elapsed 9.1m



=== VAL DIAG fold 0 epoch 1 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.012800,0.977794]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.03767297 mean_pred_per_img=130.876 empty_frac=0.000000 TP=98246 FP=3067511 FN=8629 f1@0.2=0.060041


fold 0 epoch 1 val micro-f1 0.49668 @ thr 0.460


fold 0 epoch 2 iter 100/1725 loss 0.0007 elapsed 11.6m


fold 0 epoch 2 iter 200/1725 loss 0.0007 elapsed 12.1m


fold 0 epoch 2 iter 300/1725 loss 0.0007 elapsed 12.6m


fold 0 epoch 2 iter 400/1725 loss 0.0007 elapsed 13.1m


fold 0 epoch 2 iter 500/1725 loss 0.0007 elapsed 13.7m


fold 0 epoch 2 iter 600/1725 loss 0.0007 elapsed 14.2m


fold 0 epoch 2 iter 700/1725 loss 0.0007 elapsed 14.7m


fold 0 epoch 2 iter 800/1725 loss 0.0007 elapsed 15.2m


fold 0 epoch 2 iter 900/1725 loss 0.0007 elapsed 15.7m


fold 0 epoch 2 iter 1000/1725 loss 0.0007 elapsed 16.2m


fold 0 epoch 2 iter 1100/1725 loss 0.0007 elapsed 16.8m


fold 0 epoch 2 iter 1200/1725 loss 0.0007 elapsed 17.3m


fold 0 epoch 2 iter 1300/1725 loss 0.0007 elapsed 17.8m


fold 0 epoch 2 iter 1400/1725 loss 0.0007 elapsed 18.3m


fold 0 epoch 2 iter 1500/1725 loss 0.0007 elapsed 18.8m


fold 0 epoch 2 iter 1600/1725 loss 0.0007 elapsed 19.3m


fold 0 epoch 2 iter 1700/1725 loss 0.0007 elapsed 19.9m



=== VAL DIAG fold 0 epoch 2 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000094,0.998641]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.02908528 mean_pred_per_img=101.042 empty_frac=0.000000 TP=101958 FP=2342153 FN=4917 f1@0.2=0.079936


fold 0 epoch 2 val micro-f1 0.58421 @ thr 0.500


fold 0 epoch 3 iter 100/1725 loss 0.0006 elapsed 21.8m


fold 0 epoch 3 iter 200/1725 loss 0.0006 elapsed 22.3m


fold 0 epoch 3 iter 300/1725 loss 0.0006 elapsed 22.8m


fold 0 epoch 3 iter 400/1725 loss 0.0006 elapsed 23.3m


fold 0 epoch 3 iter 500/1725 loss 0.0006 elapsed 23.8m


fold 0 epoch 3 iter 600/1725 loss 0.0006 elapsed 24.4m


fold 0 epoch 3 iter 700/1725 loss 0.0006 elapsed 24.9m


fold 0 epoch 3 iter 800/1725 loss 0.0006 elapsed 25.4m


fold 0 epoch 3 iter 900/1725 loss 0.0006 elapsed 25.9m


fold 0 epoch 3 iter 1000/1725 loss 0.0006 elapsed 26.4m


fold 0 epoch 3 iter 1100/1725 loss 0.0006 elapsed 26.9m


fold 0 epoch 3 iter 1200/1725 loss 0.0006 elapsed 27.4m


fold 0 epoch 3 iter 1300/1725 loss 0.0006 elapsed 28.0m


fold 0 epoch 3 iter 1400/1725 loss 0.0006 elapsed 28.5m


fold 0 epoch 3 iter 1500/1725 loss 0.0006 elapsed 29.0m


fold 0 epoch 3 iter 1600/1725 loss 0.0006 elapsed 29.5m


fold 0 epoch 3 iter 1700/1725 loss 0.0006 elapsed 30.0m



=== VAL DIAG fold 0 epoch 3 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000001,0.999317]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.02313137 mean_pred_per_img=80.358 empty_frac=0.000000 TP=101747 FP=1842042 FN=5128 f1@0.2=0.099233


fold 0 epoch 3 val micro-f1 0.60110 @ thr 0.500


fold 0 epoch 4 iter 100/1725 loss 0.0005 elapsed 31.9m


fold 0 epoch 4 iter 200/1725 loss 0.0005 elapsed 32.5m


fold 0 epoch 4 iter 300/1725 loss 0.0005 elapsed 33.0m


fold 0 epoch 4 iter 400/1725 loss 0.0005 elapsed 33.5m


fold 0 epoch 4 iter 500/1725 loss 0.0005 elapsed 34.0m


fold 0 epoch 4 iter 600/1725 loss 0.0005 elapsed 34.5m


fold 0 epoch 4 iter 700/1725 loss 0.0005 elapsed 35.0m


fold 0 epoch 4 iter 800/1725 loss 0.0005 elapsed 35.6m


fold 0 epoch 4 iter 900/1725 loss 0.0005 elapsed 36.1m


fold 0 epoch 4 iter 1000/1725 loss 0.0005 elapsed 36.6m


fold 0 epoch 4 iter 1100/1725 loss 0.0005 elapsed 37.1m


fold 0 epoch 4 iter 1200/1725 loss 0.0005 elapsed 37.6m


fold 0 epoch 4 iter 1300/1725 loss 0.0005 elapsed 38.1m


fold 0 epoch 4 iter 1400/1725 loss 0.0005 elapsed 38.7m


fold 0 epoch 4 iter 1500/1725 loss 0.0005 elapsed 39.2m


fold 0 epoch 4 iter 1600/1725 loss 0.0005 elapsed 39.7m


fold 0 epoch 4 iter 1700/1725 loss 0.0005 elapsed 40.2m



=== VAL DIAG fold 0 epoch 4 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999590]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01911626 mean_pred_per_img=66.410 empty_frac=0.000000 TP=100931 FP=1505458 FN=5944 f1@0.2=0.117823


fold 0 epoch 4 val micro-f1 0.60824 @ thr 0.500


fold 0 epoch 5 iter 100/1725 loss 0.0004 elapsed 42.1m


fold 0 epoch 5 iter 200/1725 loss 0.0004 elapsed 42.6m


fold 0 epoch 5 iter 300/1725 loss 0.0004 elapsed 43.1m


fold 0 epoch 5 iter 400/1725 loss 0.0004 elapsed 43.7m


fold 0 epoch 5 iter 500/1725 loss 0.0004 elapsed 44.2m


fold 0 epoch 5 iter 600/1725 loss 0.0004 elapsed 44.7m


fold 0 epoch 5 iter 700/1725 loss 0.0004 elapsed 45.2m


fold 0 epoch 5 iter 800/1725 loss 0.0004 elapsed 45.7m


fold 0 epoch 5 iter 900/1725 loss 0.0004 elapsed 46.3m


fold 0 epoch 5 iter 1000/1725 loss nan elapsed 46.8m


fold 0 epoch 5 iter 1100/1725 loss nan elapsed 47.3m


fold 0 epoch 5 iter 1200/1725 loss nan elapsed 47.8m


fold 0 epoch 5 iter 1300/1725 loss nan elapsed 48.3m


fold 0 epoch 5 iter 1400/1725 loss nan elapsed 48.8m


fold 0 epoch 5 iter 1500/1725 loss nan elapsed 49.3m


fold 0 epoch 5 iter 1600/1725 loss nan elapsed 49.9m


fold 0 epoch 5 iter 1700/1725 loss nan elapsed 50.4m



=== VAL DIAG fold 0 epoch 5 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999794]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01620693 mean_pred_per_img=56.303 empty_frac=0.000000 TP=99852 FP=1262058 FN=7023 f1@0.2=0.135965


fold 0 epoch 5 val micro-f1 0.60904 @ thr 0.500


fold 0 epoch 6 iter 100/1725 loss 0.0004 elapsed 52.3m


fold 0 epoch 6 iter 200/1725 loss 0.0004 elapsed 52.8m


fold 0 epoch 6 iter 300/1725 loss 0.0004 elapsed 53.3m


fold 0 epoch 6 iter 400/1725 loss nan elapsed 53.8m


fold 0 epoch 6 iter 500/1725 loss nan elapsed 54.4m


fold 0 epoch 6 iter 600/1725 loss nan elapsed 54.9m


fold 0 epoch 6 iter 700/1725 loss nan elapsed 55.4m


fold 0 epoch 6 iter 800/1725 loss nan elapsed 55.9m


fold 0 epoch 6 iter 900/1725 loss nan elapsed 56.4m


fold 0 epoch 6 iter 1000/1725 loss nan elapsed 56.9m


fold 0 epoch 6 iter 1100/1725 loss nan elapsed 57.5m


fold 0 epoch 6 iter 1200/1725 loss nan elapsed 58.0m


fold 0 epoch 6 iter 1300/1725 loss nan elapsed 58.5m


fold 0 epoch 6 iter 1400/1725 loss nan elapsed 59.0m


fold 0 epoch 6 iter 1500/1725 loss nan elapsed 59.5m


fold 0 epoch 6 iter 1600/1725 loss nan elapsed 60.0m


fold 0 epoch 6 iter 1700/1725 loss nan elapsed 60.5m



=== VAL DIAG fold 0 epoch 6 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999910]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01401615 mean_pred_per_img=48.692 empty_frac=0.000000 TP=98605 FP=1079208 FN=8270 f1@0.2=0.153508


fold 0 epoch 6 val micro-f1 0.60923 @ thr 0.500


fold 0 epoch 7 iter 100/1725 loss 0.0003 elapsed 62.5m


fold 0 epoch 7 iter 200/1725 loss 0.0003 elapsed 63.0m


fold 0 epoch 7 iter 300/1725 loss 0.0003 elapsed 63.5m


fold 0 epoch 7 iter 400/1725 loss 0.0003 elapsed 64.0m


fold 0 epoch 7 iter 500/1725 loss 0.0003 elapsed 64.5m


fold 0 epoch 7 iter 600/1725 loss 0.0003 elapsed 65.0m


fold 0 epoch 7 iter 700/1725 loss 0.0003 elapsed 65.5m


fold 0 epoch 7 iter 800/1725 loss nan elapsed 66.1m


fold 0 epoch 7 iter 900/1725 loss nan elapsed 66.6m


fold 0 epoch 7 iter 1000/1725 loss nan elapsed 67.1m


fold 0 epoch 7 iter 1100/1725 loss nan elapsed 67.6m


fold 0 epoch 7 iter 1200/1725 loss nan elapsed 68.1m


fold 0 epoch 7 iter 1300/1725 loss nan elapsed 68.6m


fold 0 epoch 7 iter 1400/1725 loss nan elapsed 69.1m


fold 0 epoch 7 iter 1500/1725 loss nan elapsed 69.7m


fold 0 epoch 7 iter 1600/1725 loss nan elapsed 70.2m


fold 0 epoch 7 iter 1700/1725 loss nan elapsed 70.7m



=== VAL DIAG fold 0 epoch 7 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999974]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01242836 mean_pred_per_img=43.176 empty_frac=0.000000 TP=97470 FP=946917 FN=9405 f1@0.2=0.169327


fold 0 epoch 7 val micro-f1 0.60768 @ thr 0.500


fold 0 epoch 8 iter 100/1725 loss 0.0003 elapsed 72.6m


fold 0 epoch 8 iter 200/1725 loss 0.0003 elapsed 73.1m


fold 0 epoch 8 iter 300/1725 loss 0.0003 elapsed 73.6m


fold 0 epoch 8 iter 400/1725 loss 0.0003 elapsed 74.1m


fold 0 epoch 8 iter 500/1725 loss 0.0003 elapsed 74.6m


fold 0 epoch 8 iter 600/1725 loss nan elapsed 75.2m


fold 0 epoch 8 iter 700/1725 loss nan elapsed 75.7m


fold 0 epoch 8 iter 800/1725 loss nan elapsed 76.2m


fold 0 epoch 8 iter 900/1725 loss nan elapsed 76.7m


fold 0 epoch 8 iter 1000/1725 loss nan elapsed 77.2m


fold 0 epoch 8 iter 1100/1725 loss nan elapsed 77.7m


fold 0 epoch 8 iter 1200/1725 loss nan elapsed 78.2m


fold 0 epoch 8 iter 1300/1725 loss nan elapsed 78.7m


fold 0 epoch 8 iter 1400/1725 loss nan elapsed 79.2m


fold 0 epoch 8 iter 1500/1725 loss nan elapsed 79.8m


fold 0 epoch 8 iter 1600/1725 loss nan elapsed 80.3m


fold 0 epoch 8 iter 1700/1725 loss nan elapsed 80.8m



=== VAL DIAG fold 0 epoch 8 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999985]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01206099 mean_pred_per_img=41.900 empty_frac=0.000000 TP=96866 FP=916650 FN=10009 f1@0.2=0.172915


fold 0 epoch 8 val micro-f1 0.60473 @ thr 0.500


fold 0 epoch 9 iter 100/1725 loss 0.0003 elapsed 82.7m


fold 0 epoch 9 iter 200/1725 loss 0.0003 elapsed 83.2m


fold 0 epoch 9 iter 300/1725 loss 0.0003 elapsed 83.7m


fold 0 epoch 9 iter 400/1725 loss nan elapsed 84.2m


fold 0 epoch 9 iter 500/1725 loss nan elapsed 84.7m


fold 0 epoch 9 iter 600/1725 loss nan elapsed 85.2m


fold 0 epoch 9 iter 700/1725 loss nan elapsed 85.8m


fold 0 epoch 9 iter 800/1725 loss nan elapsed 86.3m


fold 0 epoch 9 iter 900/1725 loss nan elapsed 86.8m


fold 0 epoch 9 iter 1000/1725 loss nan elapsed 87.3m


fold 0 epoch 9 iter 1100/1725 loss nan elapsed 87.8m


fold 0 epoch 9 iter 1200/1725 loss nan elapsed 88.3m


fold 0 epoch 9 iter 1300/1725 loss nan elapsed 88.8m


fold 0 epoch 9 iter 1400/1725 loss nan elapsed 89.3m


fold 0 epoch 9 iter 1500/1725 loss nan elapsed 89.8m


fold 0 epoch 9 iter 1600/1725 loss nan elapsed 90.3m


fold 0 epoch 9 iter 1700/1725 loss nan elapsed 90.9m



=== VAL DIAG fold 0 epoch 9 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999991]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01194632 mean_pred_per_img=41.502 empty_frac=0.000000 TP=96597 FP=907283 FN=10278 f1@0.2=0.173930


fold 0 epoch 9 val micro-f1 0.60314 @ thr 0.500
Early stopping at epoch 9


==== Fold 0 done: best_f1 0.60923 thr 0.500 ====
==== Fold 1 start ====


fold 1 epoch 1 iter 100/1726 loss 0.0014 elapsed 0.5m


fold 1 epoch 1 iter 200/1726 loss 0.0014 elapsed 1.0m


fold 1 epoch 1 iter 300/1726 loss 0.0013 elapsed 1.6m


fold 1 epoch 1 iter 400/1726 loss 0.0012 elapsed 2.1m


fold 1 epoch 1 iter 500/1726 loss 0.0012 elapsed 2.6m


fold 1 epoch 1 iter 600/1726 loss 0.0011 elapsed 3.1m


fold 1 epoch 1 iter 700/1726 loss 0.0011 elapsed 3.6m


fold 1 epoch 1 iter 800/1726 loss 0.0011 elapsed 4.1m


fold 1 epoch 1 iter 900/1726 loss 0.0010 elapsed 4.7m


fold 1 epoch 1 iter 1000/1726 loss 0.0010 elapsed 5.2m


fold 1 epoch 1 iter 1100/1726 loss 0.0010 elapsed 5.7m


fold 1 epoch 1 iter 1200/1726 loss 0.0010 elapsed 6.2m


fold 1 epoch 1 iter 1300/1726 loss 0.0010 elapsed 6.7m


fold 1 epoch 1 iter 1400/1726 loss 0.0010 elapsed 7.2m


fold 1 epoch 1 iter 1500/1726 loss 0.0009 elapsed 7.8m


fold 1 epoch 1 iter 1600/1726 loss 0.0009 elapsed 8.3m


fold 1 epoch 1 iter 1700/1726 loss 0.0009 elapsed 8.8m



=== VAL DIAG fold 1 epoch 1 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.013146,0.974764]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.03784180 mean_pred_per_img=131.462 empty_frac=0.000000 TP=98410 FP=3074567 FN=8362 f1@0.2=0.060011


fold 1 epoch 1 val micro-f1 0.49859 @ thr 0.450


fold 1 epoch 2 iter 100/1726 loss 0.0007 elapsed 10.8m


fold 1 epoch 2 iter 200/1726 loss 0.0007 elapsed 11.4m


fold 1 epoch 2 iter 300/1726 loss 0.0007 elapsed 11.9m


fold 1 epoch 2 iter 400/1726 loss 0.0007 elapsed 12.4m


fold 1 epoch 2 iter 500/1726 loss 0.0007 elapsed 12.9m


fold 1 epoch 2 iter 600/1726 loss 0.0007 elapsed 13.4m


fold 1 epoch 2 iter 700/1726 loss 0.0007 elapsed 13.9m


fold 1 epoch 2 iter 800/1726 loss 0.0007 elapsed 14.5m


fold 1 epoch 2 iter 900/1726 loss 0.0007 elapsed 15.0m


fold 1 epoch 2 iter 1000/1726 loss 0.0007 elapsed 15.5m


fold 1 epoch 2 iter 1100/1726 loss 0.0007 elapsed 16.0m


fold 1 epoch 2 iter 1200/1726 loss 0.0007 elapsed 16.5m


fold 1 epoch 2 iter 1300/1726 loss 0.0007 elapsed 17.0m


fold 1 epoch 2 iter 1400/1726 loss 0.0007 elapsed 17.6m


fold 1 epoch 2 iter 1500/1726 loss 0.0007 elapsed 18.1m


fold 1 epoch 2 iter 1600/1726 loss 0.0007 elapsed 18.6m


fold 1 epoch 2 iter 1700/1726 loss 0.0007 elapsed 19.1m



=== VAL DIAG fold 1 epoch 2 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.000502,0.997274]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.02895624 mean_pred_per_img=100.594 empty_frac=0.000000 TP=101933 FP=2326003 FN=4839 f1@0.2=0.080430


fold 1 epoch 2 val micro-f1 0.58535 @ thr 0.500


fold 1 epoch 3 iter 100/1726 loss 0.0006 elapsed 21.0m


fold 1 epoch 3 iter 200/1726 loss 0.0006 elapsed 21.6m


fold 1 epoch 3 iter 300/1726 loss 0.0006 elapsed 22.1m


fold 1 epoch 3 iter 400/1726 loss 0.0006 elapsed 22.6m


fold 1 epoch 3 iter 500/1726 loss 0.0006 elapsed 23.1m


fold 1 epoch 3 iter 600/1726 loss 0.0006 elapsed 23.6m


fold 1 epoch 3 iter 700/1726 loss 0.0006 elapsed 24.1m


fold 1 epoch 3 iter 800/1726 loss 0.0006 elapsed 24.7m


fold 1 epoch 3 iter 900/1726 loss 0.0006 elapsed 25.2m


fold 1 epoch 3 iter 1000/1726 loss 0.0006 elapsed 25.7m


fold 1 epoch 3 iter 1100/1726 loss 0.0006 elapsed 26.2m


fold 1 epoch 3 iter 1200/1726 loss 0.0006 elapsed 26.7m


fold 1 epoch 3 iter 1300/1726 loss 0.0006 elapsed 27.2m


fold 1 epoch 3 iter 1400/1726 loss 0.0006 elapsed 27.8m


fold 1 epoch 3 iter 1500/1726 loss 0.0006 elapsed 28.3m


fold 1 epoch 3 iter 1600/1726 loss 0.0006 elapsed 28.8m


fold 1 epoch 3 iter 1700/1726 loss 0.0006 elapsed 29.3m



=== VAL DIAG fold 1 epoch 3 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.000009,0.998926]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.02297969 mean_pred_per_img=79.831 empty_frac=0.000000 TP=101661 FP=1825151 FN=5111 f1@0.2=0.099982


fold 1 epoch 3 val micro-f1 0.60128 @ thr 0.500


fold 1 epoch 4 iter 100/1726 loss 0.0005 elapsed 31.2m


fold 1 epoch 4 iter 200/1726 loss 0.0005 elapsed 31.8m


fold 1 epoch 4 iter 300/1726 loss 0.0005 elapsed 32.3m


fold 1 epoch 4 iter 400/1726 loss 0.0005 elapsed 32.8m


fold 1 epoch 4 iter 500/1726 loss 0.0005 elapsed 33.3m


fold 1 epoch 4 iter 600/1726 loss 0.0005 elapsed 33.8m


fold 1 epoch 4 iter 700/1726 loss 0.0005 elapsed 34.3m


fold 1 epoch 4 iter 800/1726 loss 0.0005 elapsed 34.9m


fold 1 epoch 4 iter 900/1726 loss 0.0005 elapsed 35.4m


fold 1 epoch 4 iter 1000/1726 loss 0.0005 elapsed 35.9m


fold 1 epoch 4 iter 1100/1726 loss 0.0005 elapsed 36.4m


fold 1 epoch 4 iter 1200/1726 loss 0.0005 elapsed 36.9m


fold 1 epoch 4 iter 1300/1726 loss 0.0005 elapsed 37.4m


fold 1 epoch 4 iter 1400/1726 loss 0.0005 elapsed 38.0m


fold 1 epoch 4 iter 1500/1726 loss 0.0005 elapsed 38.5m


fold 1 epoch 4 iter 1600/1726 loss 0.0005 elapsed 39.0m


fold 1 epoch 4 iter 1700/1726 loss 0.0005 elapsed 39.5m



=== VAL DIAG fold 1 epoch 4 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.000002,0.999317]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.01898507 mean_pred_per_img=65.954 empty_frac=0.000000 TP=100748 FP=1491121 FN=6024 f1@0.2=0.118622


fold 1 epoch 4 val micro-f1 0.60766 @ thr 0.500


fold 1 epoch 5 iter 100/1726 loss 0.0004 elapsed 41.4m


fold 1 epoch 5 iter 200/1726 loss 0.0004 elapsed 42.0m


fold 1 epoch 5 iter 300/1726 loss 0.0004 elapsed 42.5m


fold 1 epoch 5 iter 400/1726 loss 0.0004 elapsed 43.0m


fold 1 epoch 5 iter 500/1726 loss 0.0004 elapsed 43.5m


fold 1 epoch 5 iter 600/1726 loss 0.0004 elapsed 44.0m


fold 1 epoch 5 iter 700/1726 loss 0.0004 elapsed 44.5m


fold 1 epoch 5 iter 800/1726 loss 0.0004 elapsed 45.0m


fold 1 epoch 5 iter 900/1726 loss 0.0004 elapsed 45.6m


fold 1 epoch 5 iter 1000/1726 loss nan elapsed 46.1m


fold 1 epoch 5 iter 1100/1726 loss nan elapsed 46.6m


fold 1 epoch 5 iter 1200/1726 loss nan elapsed 47.1m


fold 1 epoch 5 iter 1300/1726 loss nan elapsed 47.6m


fold 1 epoch 5 iter 1400/1726 loss nan elapsed 48.1m


fold 1 epoch 5 iter 1500/1726 loss nan elapsed 48.7m


fold 1 epoch 5 iter 1600/1726 loss nan elapsed 49.2m


fold 1 epoch 5 iter 1700/1726 loss nan elapsed 49.7m



=== VAL DIAG fold 1 epoch 5 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.000000,0.999402]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.01601625 mean_pred_per_img=55.640 empty_frac=0.000000 TP=99639 FP=1243299 FN=7133 f1@0.2=0.137461


fold 1 epoch 5 val micro-f1 0.60960 @ thr 0.500


fold 1 epoch 6 iter 100/1726 loss 0.0004 elapsed 51.6m


fold 1 epoch 6 iter 200/1726 loss 0.0004 elapsed 52.1m


fold 1 epoch 6 iter 300/1726 loss 0.0004 elapsed 52.7m


fold 1 epoch 6 iter 400/1726 loss 0.0004 elapsed 53.2m


fold 1 epoch 6 iter 500/1726 loss 0.0004 elapsed 53.7m


fold 1 epoch 6 iter 600/1726 loss 0.0004 elapsed 54.2m


fold 1 epoch 6 iter 700/1726 loss 0.0004 elapsed 54.7m


fold 1 epoch 6 iter 800/1726 loss 0.0004 elapsed 55.2m


fold 1 epoch 6 iter 900/1726 loss 0.0004 elapsed 55.8m


fold 1 epoch 6 iter 1000/1726 loss 0.0004 elapsed 56.3m


fold 1 epoch 6 iter 1100/1726 loss 0.0004 elapsed 56.8m


fold 1 epoch 6 iter 1200/1726 loss 0.0004 elapsed 57.3m


fold 1 epoch 6 iter 1300/1726 loss 0.0004 elapsed 57.8m


fold 1 epoch 6 iter 1400/1726 loss 0.0004 elapsed 58.3m


fold 1 epoch 6 iter 1500/1726 loss 0.0004 elapsed 58.9m


fold 1 epoch 6 iter 1600/1726 loss 0.0004 elapsed 59.4m


fold 1 epoch 6 iter 1700/1726 loss nan elapsed 59.9m



=== VAL DIAG fold 1 epoch 6 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.000001,0.999750]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.01383818 mean_pred_per_img=48.074 empty_frac=0.000000 TP=98359 FP=1061951 FN=8413 f1@0.2=0.155253


fold 1 epoch 6 val micro-f1 0.60846 @ thr 0.500


fold 1 epoch 7 iter 100/1726 loss 0.0003 elapsed 61.8m


fold 1 epoch 7 iter 200/1726 loss 0.0003 elapsed 62.3m


fold 1 epoch 7 iter 300/1726 loss 0.0003 elapsed 62.9m


fold 1 epoch 7 iter 400/1726 loss 0.0003 elapsed 63.4m


fold 1 epoch 7 iter 500/1726 loss 0.0003 elapsed 63.9m


fold 1 epoch 7 iter 600/1726 loss 0.0003 elapsed 64.4m


fold 1 epoch 7 iter 700/1726 loss 0.0003 elapsed 64.9m


fold 1 epoch 7 iter 800/1726 loss 0.0003 elapsed 65.4m


fold 1 epoch 7 iter 900/1726 loss 0.0003 elapsed 66.0m


fold 1 epoch 7 iter 1000/1726 loss 0.0003 elapsed 66.5m


fold 1 epoch 7 iter 1100/1726 loss 0.0003 elapsed 67.0m


fold 1 epoch 7 iter 1200/1726 loss 0.0003 elapsed 67.5m


fold 1 epoch 7 iter 1300/1726 loss 0.0003 elapsed 68.0m


fold 1 epoch 7 iter 1400/1726 loss 0.0003 elapsed 68.5m


fold 1 epoch 7 iter 1500/1726 loss 0.0003 elapsed 69.1m


fold 1 epoch 7 iter 1600/1726 loss 0.0003 elapsed 69.6m


fold 1 epoch 7 iter 1700/1726 loss 0.0003 elapsed 70.1m



=== VAL DIAG fold 1 epoch 7 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.000000,0.999760]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.01225629 mean_pred_per_img=42.578 empty_frac=0.000000 TP=97203 FP=930468 FN=9569 f1@0.2=0.171367


fold 1 epoch 7 val micro-f1 0.60656 @ thr 0.500


fold 1 epoch 8 iter 100/1726 loss 0.0003 elapsed 72.0m


fold 1 epoch 8 iter 200/1726 loss 0.0003 elapsed 72.5m


fold 1 epoch 8 iter 300/1726 loss 0.0003 elapsed 73.1m


fold 1 epoch 8 iter 400/1726 loss 0.0003 elapsed 73.6m


fold 1 epoch 8 iter 500/1726 loss 0.0003 elapsed 74.1m


fold 1 epoch 8 iter 600/1726 loss 0.0003 elapsed 74.6m


fold 1 epoch 8 iter 700/1726 loss 0.0003 elapsed 75.1m


fold 1 epoch 8 iter 800/1726 loss 0.0003 elapsed 75.6m


fold 1 epoch 8 iter 900/1726 loss 0.0003 elapsed 76.2m


fold 1 epoch 8 iter 1000/1726 loss 0.0003 elapsed 76.7m


fold 1 epoch 8 iter 1100/1726 loss nan elapsed 77.2m


fold 1 epoch 8 iter 1200/1726 loss nan elapsed 77.7m


fold 1 epoch 8 iter 1300/1726 loss nan elapsed 78.2m


fold 1 epoch 8 iter 1400/1726 loss nan elapsed 78.7m


fold 1 epoch 8 iter 1500/1726 loss nan elapsed 79.3m


fold 1 epoch 8 iter 1600/1726 loss nan elapsed 79.8m


fold 1 epoch 8 iter 1700/1726 loss nan elapsed 80.3m



=== VAL DIAG fold 1 epoch 8 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.000000,0.999833]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.01121518 mean_pred_per_img=38.962 empty_frac=0.000000 TP=96208 FP=844168 FN=10564 f1@0.2=0.183752


fold 1 epoch 8 val micro-f1 0.60493 @ thr 0.500
Early stopping at epoch 8


==== Fold 1 done: best_f1 0.60960 thr 0.500 ====
==== Fold 2 start ====


fold 2 epoch 1 iter 100/1725 loss 0.0014 elapsed 0.5m


fold 2 epoch 1 iter 200/1725 loss 0.0013 elapsed 1.0m


fold 2 epoch 1 iter 300/1725 loss 0.0013 elapsed 1.6m


fold 2 epoch 1 iter 400/1725 loss 0.0012 elapsed 2.1m


fold 2 epoch 1 iter 500/1725 loss 0.0012 elapsed 2.6m


fold 2 epoch 1 iter 600/1725 loss 0.0011 elapsed 3.1m


fold 2 epoch 1 iter 700/1725 loss 0.0011 elapsed 3.6m


fold 2 epoch 1 iter 800/1725 loss 0.0011 elapsed 4.2m


fold 2 epoch 1 iter 900/1725 loss 0.0010 elapsed 4.7m


fold 2 epoch 1 iter 1000/1725 loss 0.0010 elapsed 5.2m


fold 2 epoch 1 iter 1100/1725 loss 0.0010 elapsed 5.7m


fold 2 epoch 1 iter 1200/1725 loss 0.0010 elapsed 6.2m


fold 2 epoch 1 iter 1300/1725 loss 0.0010 elapsed 6.7m


fold 2 epoch 1 iter 1400/1725 loss 0.0010 elapsed 7.3m


fold 2 epoch 1 iter 1500/1725 loss 0.0009 elapsed 7.8m


fold 2 epoch 1 iter 1600/1725 loss 0.0009 elapsed 8.3m


fold 2 epoch 1 iter 1700/1725 loss 0.0009 elapsed 8.8m



=== VAL DIAG fold 2 epoch 1 ===
val_size=24188 probs_shape=(24188, 3474) tgts_shape=(24188, 3474)
probs_range=[0.011247,0.979846]
tgt_pos_rate=0.00127107 mean_pos_per_img=4.416


thr=0.2 pred_pos_rate=0.03796108 mean_pred_per_img=131.877 empty_frac=0.000000 TP=98509 FP=3091327 FN=8298 f1@0.2=0.059763


fold 2 epoch 1 val micro-f1 0.49709 @ thr 0.450


fold 2 epoch 2 iter 100/1725 loss 0.0007 elapsed 11.0m


fold 2 epoch 2 iter 200/1725 loss 0.0007 elapsed 11.5m


fold 2 epoch 2 iter 300/1725 loss 0.0007 elapsed 12.0m


fold 2 epoch 2 iter 400/1725 loss 0.0007 elapsed 12.5m


fold 2 epoch 2 iter 500/1725 loss 0.0007 elapsed 13.0m


fold 2 epoch 2 iter 600/1725 loss 0.0007 elapsed 13.6m


fold 2 epoch 2 iter 700/1725 loss 0.0007 elapsed 14.1m


fold 2 epoch 2 iter 800/1725 loss 0.0007 elapsed 14.6m


fold 2 epoch 2 iter 900/1725 loss 0.0007 elapsed 15.1m


fold 2 epoch 2 iter 1000/1725 loss 0.0007 elapsed 15.6m


fold 2 epoch 2 iter 1100/1725 loss 0.0007 elapsed 16.2m


fold 2 epoch 2 iter 1200/1725 loss 0.0007 elapsed 16.7m


fold 2 epoch 2 iter 1300/1725 loss 0.0007 elapsed 17.2m


fold 2 epoch 2 iter 1400/1725 loss 0.0007 elapsed 17.7m


fold 2 epoch 2 iter 1500/1725 loss nan elapsed 18.2m


fold 2 epoch 2 iter 1600/1725 loss nan elapsed 18.7m


fold 2 epoch 2 iter 1700/1725 loss nan elapsed 19.2m



=== VAL DIAG fold 2 epoch 2 ===
val_size=24188 probs_shape=(24188, 3474) tgts_shape=(24188, 3474)
probs_range=[0.000130,0.999524]
tgt_pos_rate=0.00127107 mean_pos_per_img=4.416


thr=0.2 pred_pos_rate=0.02886725 mean_pred_per_img=100.285 empty_frac=0.000000 TP=102021 FP=2323668 FN=4786 f1@0.2=0.080570


fold 2 epoch 2 val micro-f1 0.58535 @ thr 0.500


fold 2 epoch 3 iter 100/1725 loss 0.0006 elapsed 21.2m


fold 2 epoch 3 iter 200/1725 loss 0.0006 elapsed 21.7m


fold 2 epoch 3 iter 300/1725 loss 0.0006 elapsed 22.2m


fold 2 epoch 3 iter 400/1725 loss 0.0006 elapsed 22.7m


fold 2 epoch 3 iter 500/1725 loss 0.0006 elapsed 23.2m


fold 2 epoch 3 iter 600/1725 loss 0.0006 elapsed 23.8m


fold 2 epoch 3 iter 700/1725 loss 0.0006 elapsed 24.3m


fold 2 epoch 3 iter 800/1725 loss 0.0006 elapsed 24.8m


fold 2 epoch 3 iter 900/1725 loss 0.0006 elapsed 25.3m


fold 2 epoch 3 iter 1000/1725 loss nan elapsed 25.8m


fold 2 epoch 3 iter 1100/1725 loss nan elapsed 26.3m


fold 2 epoch 3 iter 1200/1725 loss nan elapsed 26.8m


fold 2 epoch 3 iter 1300/1725 loss nan elapsed 27.4m


fold 2 epoch 3 iter 1400/1725 loss nan elapsed 27.9m


fold 2 epoch 3 iter 1500/1725 loss nan elapsed 28.4m


fold 2 epoch 3 iter 1600/1725 loss nan elapsed 28.9m


fold 2 epoch 3 iter 1700/1725 loss nan elapsed 29.4m



=== VAL DIAG fold 2 epoch 3 ===
val_size=24188 probs_shape=(24188, 3474) tgts_shape=(24188, 3474)
probs_range=[0.000006,0.999459]
tgt_pos_rate=0.00127107 mean_pos_per_img=4.416


thr=0.2 pred_pos_rate=0.02314043 mean_pred_per_img=80.390 empty_frac=0.000000 TP=101736 FP=1842734 FN=5071 f1@0.2=0.099193


fold 2 epoch 3 val micro-f1 0.60188 @ thr 0.500


fold 2 epoch 4 iter 100/1725 loss 0.0005 elapsed 31.3m


fold 2 epoch 4 iter 200/1725 loss 0.0005 elapsed 31.8m


fold 2 epoch 4 iter 300/1725 loss 0.0005 elapsed 32.4m


fold 2 epoch 4 iter 400/1725 loss 0.0005 elapsed 32.9m


fold 2 epoch 4 iter 500/1725 loss 0.0005 elapsed 33.4m


fold 2 epoch 4 iter 600/1725 loss 0.0005 elapsed 33.9m


fold 2 epoch 4 iter 700/1725 loss 0.0005 elapsed 34.4m


fold 2 epoch 4 iter 800/1725 loss 0.0005 elapsed 34.9m


fold 2 epoch 4 iter 900/1725 loss 0.0005 elapsed 35.4m


fold 2 epoch 4 iter 1000/1725 loss 0.0005 elapsed 36.0m


fold 2 epoch 4 iter 1100/1725 loss 0.0005 elapsed 36.5m


fold 2 epoch 4 iter 1200/1725 loss 0.0005 elapsed 37.0m


In [22]:
# Queue next production run with stabilized train.py (FP32 ASL + ColorJitter) at 448
import os, sys, time, shlex, subprocess
from pathlib import Path

print('=== Launch: b3@448 5-fold, EMA+TTA, using train_folds_top512.csv ===', flush=True)
assert Path('train_folds_top512.csv').exists(), 'Missing train_folds_top512.csv'
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'tf_efficientnet_b3_ns',
    '--img-size', '448',
    '--epochs', '10',
    '--batch-size', '48',
    '--val-batch-size', '96',
    '--num-workers', '10',
    '--lr', '2e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '3',
    '--folds', '0,1,2,3,4',
    '--folds-csv', 'train_folds_top512.csv',
    '--out-dir', 'out_b3_448_top512',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
env = dict(os.environ); env['PYTHONUNBUFFERED'] = '1'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
print(f'Exit code: {rc}, elapsed {(time.time()-t0)/3600:.2f} h', flush=True)
assert rc == 0, 'b3@448 production run failed'
print('b3@448 production run completed.')

=== Launch: b3@448 5-fold, EMA+TTA, using train_folds_top512.csv ===


Running: /usr/bin/python3.11 -u train.py --model tf_efficientnet_b3_ns --img-size 448 --epochs 10 --batch-size 48 --val-batch-size 96 --num-workers 10 --lr 2e-4 --use-ema --tta --early-stop-patience 3 --folds 0,1,2,3,4 --folds-csv train_folds_top512.csv --out-dir out_b3_448_top512 --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


fold 0 epoch 1 iter 100/2012 loss 0.0014 elapsed 1.0m


fold 0 epoch 1 iter 200/2012 loss 0.0014 elapsed 1.6m


fold 0 epoch 1 iter 300/2012 loss 0.0013 elapsed 2.2m


fold 0 epoch 1 iter 400/2012 loss 0.0013 elapsed 2.8m


fold 0 epoch 1 iter 500/2012 loss 0.0012 elapsed 3.4m


fold 0 epoch 1 iter 600/2012 loss 0.0012 elapsed 4.0m


fold 0 epoch 1 iter 700/2012 loss 0.0011 elapsed 4.6m


fold 0 epoch 1 iter 800/2012 loss 0.0011 elapsed 5.2m


fold 0 epoch 1 iter 900/2012 loss 0.0011 elapsed 5.8m


fold 0 epoch 1 iter 1000/2012 loss 0.0010 elapsed 6.4m


fold 0 epoch 1 iter 1100/2012 loss 0.0010 elapsed 7.0m


fold 0 epoch 1 iter 1200/2012 loss 0.0010 elapsed 7.6m


fold 0 epoch 1 iter 1300/2012 loss 0.0010 elapsed 8.2m


fold 0 epoch 1 iter 1400/2012 loss 0.0010 elapsed 8.8m


fold 0 epoch 1 iter 1500/2012 loss 0.0010 elapsed 9.4m


fold 0 epoch 1 iter 1600/2012 loss 0.0009 elapsed 10.0m


fold 0 epoch 1 iter 1700/2012 loss 0.0009 elapsed 10.6m


fold 0 epoch 1 iter 1800/2012 loss 0.0009 elapsed 11.2m


fold 0 epoch 1 iter 1900/2012 loss 0.0009 elapsed 11.8m


fold 0 epoch 1 iter 2000/2012 loss 0.0009 elapsed 12.4m



=== VAL DIAG fold 0 epoch 1 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.007554,0.988183]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.03720266 mean_pred_per_img=129.242 empty_frac=0.000000 TP=99421 FP=3026815 FN=7454 f1@0.2=0.061502


fold 0 epoch 1 val micro-f1 0.51705 @ thr 0.470


fold 0 epoch 2 iter 100/2012 loss 0.0007 elapsed 15.5m


fold 0 epoch 2 iter 200/2012 loss 0.0007 elapsed 16.0m


fold 0 epoch 2 iter 300/2012 loss 0.0007 elapsed 16.6m


fold 0 epoch 2 iter 400/2012 loss 0.0007 elapsed 17.2m


fold 0 epoch 2 iter 500/2012 loss 0.0007 elapsed 17.8m


fold 0 epoch 2 iter 600/2012 loss 0.0007 elapsed 18.4m


fold 0 epoch 2 iter 700/2012 loss 0.0007 elapsed 19.0m


fold 0 epoch 2 iter 800/2012 loss 0.0007 elapsed 19.6m


fold 0 epoch 2 iter 900/2012 loss 0.0007 elapsed 20.2m


fold 0 epoch 2 iter 1000/2012 loss 0.0007 elapsed 20.8m


fold 0 epoch 2 iter 1100/2012 loss 0.0007 elapsed 21.4m


fold 0 epoch 2 iter 1200/2012 loss 0.0007 elapsed 22.0m


fold 0 epoch 2 iter 1300/2012 loss 0.0007 elapsed 22.6m


fold 0 epoch 2 iter 1400/2012 loss 0.0007 elapsed 23.2m


fold 0 epoch 2 iter 1500/2012 loss 0.0007 elapsed 23.8m


fold 0 epoch 2 iter 1600/2012 loss 0.0007 elapsed 24.4m


fold 0 epoch 2 iter 1700/2012 loss 0.0007 elapsed 25.0m


fold 0 epoch 2 iter 1800/2012 loss 0.0007 elapsed 25.6m


fold 0 epoch 2 iter 1900/2012 loss 0.0007 elapsed 26.2m


fold 0 epoch 2 iter 2000/2012 loss 0.0007 elapsed 26.8m



=== VAL DIAG fold 0 epoch 2 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000062,0.999027]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.02782863 mean_pred_per_img=96.677 empty_frac=0.000000 TP=102001 FP=2236511 FN=4874 f1@0.2=0.083423


fold 0 epoch 2 val micro-f1 0.58784 @ thr 0.500


fold 0 epoch 3 iter 100/2012 loss 0.0006 elapsed 29.1m


fold 0 epoch 3 iter 200/2012 loss 0.0006 elapsed 29.7m


fold 0 epoch 3 iter 300/2012 loss 0.0006 elapsed 30.3m


fold 0 epoch 3 iter 400/2012 loss 0.0006 elapsed 30.9m


fold 0 epoch 3 iter 500/2012 loss 0.0006 elapsed 31.5m


fold 0 epoch 3 iter 600/2012 loss 0.0006 elapsed 32.1m


fold 0 epoch 3 iter 700/2012 loss 0.0006 elapsed 32.7m


fold 0 epoch 3 iter 800/2012 loss 0.0006 elapsed 33.3m


fold 0 epoch 3 iter 900/2012 loss 0.0006 elapsed 33.9m


fold 0 epoch 3 iter 1000/2012 loss 0.0006 elapsed 34.5m


fold 0 epoch 3 iter 1100/2012 loss 0.0006 elapsed 35.1m


fold 0 epoch 3 iter 1200/2012 loss 0.0006 elapsed 35.7m


fold 0 epoch 3 iter 1300/2012 loss 0.0006 elapsed 36.3m


fold 0 epoch 3 iter 1400/2012 loss 0.0006 elapsed 36.9m


fold 0 epoch 3 iter 1500/2012 loss 0.0006 elapsed 37.5m


fold 0 epoch 3 iter 1600/2012 loss 0.0006 elapsed 38.1m


fold 0 epoch 3 iter 1700/2012 loss 0.0006 elapsed 38.7m


fold 0 epoch 3 iter 1800/2012 loss 0.0006 elapsed 39.3m


fold 0 epoch 3 iter 1900/2012 loss 0.0006 elapsed 39.9m


fold 0 epoch 3 iter 2000/2012 loss 0.0006 elapsed 40.5m



=== VAL DIAG fold 0 epoch 3 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000003,0.999498]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.02209133 mean_pred_per_img=76.745 empty_frac=0.000000 TP=101618 FP=1754774 FN=5257 f1@0.2=0.103519


fold 0 epoch 3 val micro-f1 0.60401 @ thr 0.500


fold 0 epoch 4 iter 100/2012 loss 0.0005 elapsed 42.8m


fold 0 epoch 4 iter 200/2012 loss 0.0005 elapsed 43.4m


fold 0 epoch 4 iter 300/2012 loss 0.0005 elapsed 44.0m


fold 0 epoch 4 iter 400/2012 loss 0.0005 elapsed 44.6m


fold 0 epoch 4 iter 500/2012 loss 0.0005 elapsed 45.2m


fold 0 epoch 4 iter 600/2012 loss 0.0005 elapsed 45.8m


fold 0 epoch 4 iter 700/2012 loss 0.0005 elapsed 46.4m


fold 0 epoch 4 iter 800/2012 loss 0.0005 elapsed 47.0m


fold 0 epoch 4 iter 900/2012 loss 0.0005 elapsed 47.6m


fold 0 epoch 4 iter 1000/2012 loss 0.0005 elapsed 48.2m


fold 0 epoch 4 iter 1100/2012 loss 0.0005 elapsed 48.8m


fold 0 epoch 4 iter 1200/2012 loss 0.0005 elapsed 49.4m


fold 0 epoch 4 iter 1300/2012 loss 0.0005 elapsed 50.0m


fold 0 epoch 4 iter 1400/2012 loss 0.0005 elapsed 50.6m


fold 0 epoch 4 iter 1500/2012 loss 0.0005 elapsed 51.2m


fold 0 epoch 4 iter 1600/2012 loss 0.0005 elapsed 51.8m


fold 0 epoch 4 iter 1700/2012 loss 0.0005 elapsed 52.4m


fold 0 epoch 4 iter 1800/2012 loss 0.0005 elapsed 53.0m


fold 0 epoch 4 iter 1900/2012 loss 0.0005 elapsed 53.6m


fold 0 epoch 4 iter 2000/2012 loss 0.0005 elapsed 54.2m



=== VAL DIAG fold 0 epoch 4 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999646]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01831340 mean_pred_per_img=63.621 empty_frac=0.000000 TP=100738 FP=1438184 FN=6137 f1@0.2=0.122418


fold 0 epoch 4 val micro-f1 0.61032 @ thr 0.500


fold 0 epoch 5 iter 100/2012 loss 0.0004 elapsed 56.5m


fold 0 epoch 5 iter 200/2012 loss 0.0004 elapsed 57.1m


fold 0 epoch 5 iter 300/2012 loss 0.0004 elapsed 57.7m


fold 0 epoch 5 iter 400/2012 loss 0.0004 elapsed 58.3m


fold 0 epoch 5 iter 500/2012 loss 0.0004 elapsed 58.9m


fold 0 epoch 5 iter 600/2012 loss 0.0004 elapsed 59.5m


fold 0 epoch 5 iter 700/2012 loss 0.0004 elapsed 60.1m


fold 0 epoch 5 iter 800/2012 loss 0.0004 elapsed 60.7m


fold 0 epoch 5 iter 900/2012 loss 0.0004 elapsed 61.3m


fold 0 epoch 5 iter 1000/2012 loss 0.0004 elapsed 61.9m


fold 0 epoch 5 iter 1100/2012 loss 0.0004 elapsed 62.5m


fold 0 epoch 5 iter 1200/2012 loss 0.0004 elapsed 63.1m


fold 0 epoch 5 iter 1300/2012 loss 0.0004 elapsed 63.7m


fold 0 epoch 5 iter 1400/2012 loss 0.0004 elapsed 64.3m


fold 0 epoch 5 iter 1500/2012 loss 0.0004 elapsed 64.9m


fold 0 epoch 5 iter 1600/2012 loss 0.0004 elapsed 65.5m


fold 0 epoch 5 iter 1700/2012 loss 0.0004 elapsed 66.1m


fold 0 epoch 5 iter 1800/2012 loss 0.0004 elapsed 66.7m


fold 0 epoch 5 iter 1900/2012 loss 0.0004 elapsed 67.3m


fold 0 epoch 5 iter 2000/2012 loss 0.0004 elapsed 67.9m



=== VAL DIAG fold 0 epoch 5 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999792]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01557610 mean_pred_per_img=54.111 empty_frac=0.000000 TP=99671 FP=1209229 FN=7204 f1@0.2=0.140801


fold 0 epoch 5 val micro-f1 0.61259 @ thr 0.500


fold 0 epoch 6 iter 100/2012 loss 0.0004 elapsed 70.2m


fold 0 epoch 6 iter 200/2012 loss 0.0004 elapsed 70.8m


fold 0 epoch 6 iter 300/2012 loss 0.0004 elapsed 71.4m


fold 0 epoch 6 iter 400/2012 loss 0.0004 elapsed 72.0m


fold 0 epoch 6 iter 500/2012 loss 0.0004 elapsed 72.6m


fold 0 epoch 6 iter 600/2012 loss 0.0004 elapsed 73.2m


fold 0 epoch 6 iter 700/2012 loss 0.0004 elapsed 73.8m


fold 0 epoch 6 iter 800/2012 loss 0.0004 elapsed 74.4m


fold 0 epoch 6 iter 900/2012 loss 0.0004 elapsed 75.0m


fold 0 epoch 6 iter 1000/2012 loss 0.0004 elapsed 75.6m


fold 0 epoch 6 iter 1100/2012 loss 0.0004 elapsed 76.2m


fold 0 epoch 6 iter 1200/2012 loss 0.0004 elapsed 76.8m


fold 0 epoch 6 iter 1300/2012 loss 0.0004 elapsed 77.4m


fold 0 epoch 6 iter 1400/2012 loss 0.0004 elapsed 78.0m


fold 0 epoch 6 iter 1500/2012 loss 0.0004 elapsed 78.6m


fold 0 epoch 6 iter 1600/2012 loss 0.0004 elapsed 79.2m


fold 0 epoch 6 iter 1700/2012 loss 0.0004 elapsed 79.8m


fold 0 epoch 6 iter 1800/2012 loss 0.0004 elapsed 80.4m


fold 0 epoch 6 iter 1900/2012 loss 0.0004 elapsed 81.0m


fold 0 epoch 6 iter 2000/2012 loss 0.0004 elapsed 81.6m



=== VAL DIAG fold 0 epoch 6 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999938]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01347444 mean_pred_per_img=46.810 empty_frac=0.000000 TP=98498 FP=1033794 FN=8377 f1@0.2=0.158975


fold 0 epoch 6 val micro-f1 0.61262 @ thr 0.500


fold 0 epoch 7 iter 100/2012 loss 0.0003 elapsed 83.9m


fold 0 epoch 7 iter 200/2012 loss 0.0003 elapsed 84.5m


fold 0 epoch 7 iter 300/2012 loss 0.0003 elapsed 85.1m


fold 0 epoch 7 iter 400/2012 loss 0.0003 elapsed 85.7m


fold 0 epoch 7 iter 500/2012 loss 0.0003 elapsed 86.3m


fold 0 epoch 7 iter 600/2012 loss 0.0003 elapsed 86.9m


fold 0 epoch 7 iter 700/2012 loss 0.0003 elapsed 87.5m


fold 0 epoch 7 iter 800/2012 loss 0.0003 elapsed 88.1m


fold 0 epoch 7 iter 900/2012 loss 0.0003 elapsed 88.7m


fold 0 epoch 7 iter 1000/2012 loss 0.0003 elapsed 89.3m


fold 0 epoch 7 iter 1100/2012 loss 0.0003 elapsed 89.9m


fold 0 epoch 7 iter 1200/2012 loss 0.0003 elapsed 90.5m


fold 0 epoch 7 iter 1300/2012 loss 0.0003 elapsed 91.1m


fold 0 epoch 7 iter 1400/2012 loss 0.0003 elapsed 91.7m


fold 0 epoch 7 iter 1500/2012 loss 0.0003 elapsed 92.3m


fold 0 epoch 7 iter 1600/2012 loss 0.0003 elapsed 92.9m


fold 0 epoch 7 iter 1700/2012 loss 0.0003 elapsed 93.5m


fold 0 epoch 7 iter 1800/2012 loss 0.0003 elapsed 94.1m


fold 0 epoch 7 iter 1900/2012 loss 0.0003 elapsed 94.7m


fold 0 epoch 7 iter 2000/2012 loss 0.0003 elapsed 95.3m



=== VAL DIAG fold 0 epoch 7 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999929]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01193456 mean_pred_per_img=41.461 empty_frac=0.000000 TP=97377 FP=905515 FN=9498 f1@0.2=0.175491


fold 0 epoch 7 val micro-f1 0.61105 @ thr 0.500


fold 0 epoch 8 iter 100/2012 loss 0.0003 elapsed 97.6m


fold 0 epoch 8 iter 200/2012 loss 0.0003 elapsed 98.2m


fold 0 epoch 8 iter 300/2012 loss 0.0003 elapsed 98.8m


fold 0 epoch 8 iter 400/2012 loss 0.0003 elapsed 99.4m


fold 0 epoch 8 iter 500/2012 loss 0.0003 elapsed 100.0m


fold 0 epoch 8 iter 600/2012 loss 0.0003 elapsed 100.6m


fold 0 epoch 8 iter 700/2012 loss 0.0003 elapsed 101.2m


fold 0 epoch 8 iter 800/2012 loss 0.0003 elapsed 101.8m


fold 0 epoch 8 iter 900/2012 loss 0.0003 elapsed 102.4m


fold 0 epoch 8 iter 1000/2012 loss 0.0003 elapsed 103.0m


fold 0 epoch 8 iter 1100/2012 loss 0.0003 elapsed 103.6m


fold 0 epoch 8 iter 1200/2012 loss 0.0003 elapsed 104.2m


fold 0 epoch 8 iter 1300/2012 loss 0.0003 elapsed 104.8m


fold 0 epoch 8 iter 1400/2012 loss 0.0003 elapsed 105.4m


fold 0 epoch 8 iter 1500/2012 loss 0.0003 elapsed 106.0m


fold 0 epoch 8 iter 1600/2012 loss 0.0003 elapsed 106.6m


fold 0 epoch 8 iter 1700/2012 loss 0.0003 elapsed 107.2m


fold 0 epoch 8 iter 1800/2012 loss 0.0003 elapsed 107.8m


fold 0 epoch 8 iter 1900/2012 loss 0.0003 elapsed 108.4m


fold 0 epoch 8 iter 2000/2012 loss 0.0003 elapsed 109.0m



=== VAL DIAG fold 0 epoch 8 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999938]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01087303 mean_pred_per_img=37.773 empty_frac=0.000000 TP=96436 FP=817253 FN=10439 f1@0.2=0.188986


fold 0 epoch 8 val micro-f1 0.61021 @ thr 0.500


fold 0 epoch 9 iter 100/2012 loss 0.0003 elapsed 111.3m


fold 0 epoch 9 iter 200/2012 loss 0.0003 elapsed 111.9m


fold 0 epoch 9 iter 300/2012 loss 0.0003 elapsed 112.5m


fold 0 epoch 9 iter 400/2012 loss 0.0003 elapsed 113.1m


fold 0 epoch 9 iter 500/2012 loss 0.0003 elapsed 113.7m


fold 0 epoch 9 iter 600/2012 loss 0.0003 elapsed 114.3m


fold 0 epoch 9 iter 700/2012 loss 0.0003 elapsed 114.9m


fold 0 epoch 9 iter 800/2012 loss 0.0003 elapsed 115.5m


fold 0 epoch 9 iter 900/2012 loss 0.0003 elapsed 116.1m


fold 0 epoch 9 iter 1000/2012 loss 0.0003 elapsed 116.7m


fold 0 epoch 9 iter 1100/2012 loss 0.0003 elapsed 117.3m


fold 0 epoch 9 iter 1200/2012 loss 0.0003 elapsed 117.9m


fold 0 epoch 9 iter 1300/2012 loss 0.0003 elapsed 118.5m


fold 0 epoch 9 iter 1400/2012 loss 0.0003 elapsed 119.1m


fold 0 epoch 9 iter 1500/2012 loss 0.0003 elapsed 119.7m


fold 0 epoch 9 iter 1600/2012 loss 0.0003 elapsed 120.3m


fold 0 epoch 9 iter 1700/2012 loss 0.0003 elapsed 120.9m


fold 0 epoch 9 iter 1800/2012 loss 0.0003 elapsed 121.5m


fold 0 epoch 9 iter 1900/2012 loss 0.0003 elapsed 122.1m


fold 0 epoch 9 iter 2000/2012 loss 0.0003 elapsed 122.7m



=== VAL DIAG fold 0 epoch 9 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999965]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01032592 mean_pred_per_img=35.872 empty_frac=0.000000 TP=95906 FP=771808 FN=10969 f1@0.2=0.196813


fold 0 epoch 9 val micro-f1 0.60977 @ thr 0.500
Early stopping at epoch 9


==== Fold 0 done: best_f1 0.61262 thr 0.500 ====
==== Fold 1 start ====


fold 1 epoch 1 iter 100/2013 loss 0.0014 elapsed 0.6m


fold 1 epoch 1 iter 200/2013 loss 0.0014 elapsed 1.2m


fold 1 epoch 1 iter 300/2013 loss 0.0013 elapsed 1.8m


fold 1 epoch 1 iter 400/2013 loss 0.0013 elapsed 2.4m


fold 1 epoch 1 iter 500/2013 loss 0.0012 elapsed 3.0m


fold 1 epoch 1 iter 600/2013 loss 0.0012 elapsed 3.6m


fold 1 epoch 1 iter 700/2013 loss 0.0011 elapsed 4.2m


fold 1 epoch 1 iter 800/2013 loss 0.0011 elapsed 4.8m


fold 1 epoch 1 iter 900/2013 loss 0.0011 elapsed 5.4m


fold 1 epoch 1 iter 1000/2013 loss 0.0010 elapsed 6.0m


fold 1 epoch 1 iter 1100/2013 loss 0.0010 elapsed 6.6m


fold 1 epoch 1 iter 1200/2013 loss 0.0010 elapsed 7.2m


fold 1 epoch 1 iter 1300/2013 loss 0.0010 elapsed 7.8m


fold 1 epoch 1 iter 1400/2013 loss 0.0010 elapsed 8.4m


fold 1 epoch 1 iter 1500/2013 loss 0.0010 elapsed 9.0m


fold 1 epoch 1 iter 1600/2013 loss 0.0010 elapsed 9.6m


fold 1 epoch 1 iter 1700/2013 loss 0.0009 elapsed 10.2m


fold 1 epoch 1 iter 1800/2013 loss 0.0009 elapsed 10.8m


fold 1 epoch 1 iter 1900/2013 loss 0.0009 elapsed 11.4m


fold 1 epoch 1 iter 2000/2013 loss 0.0009 elapsed 12.0m



=== VAL DIAG fold 1 epoch 1 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.009203,0.986488]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.03718645 mean_pred_per_img=129.186 empty_frac=0.000000 TP=99500 FP=3018527 FN=7272 f1@0.2=0.061709


fold 1 epoch 1 val micro-f1 0.51974 @ thr 0.470


fold 1 epoch 2 iter 100/2013 loss 0.0007 elapsed 14.5m


fold 1 epoch 2 iter 200/2013 loss 0.0007 elapsed 15.1m


fold 1 epoch 2 iter 300/2013 loss 0.0007 elapsed 15.7m


fold 1 epoch 2 iter 400/2013 loss 0.0007 elapsed 16.3m


fold 1 epoch 2 iter 500/2013 loss 0.0007 elapsed 16.9m


fold 1 epoch 2 iter 600/2013 loss 0.0007 elapsed 17.5m


fold 1 epoch 2 iter 700/2013 loss 0.0007 elapsed 18.0m


fold 1 epoch 2 iter 800/2013 loss 0.0007 elapsed 18.6m


fold 1 epoch 2 iter 900/2013 loss 0.0007 elapsed 19.2m


fold 1 epoch 2 iter 1000/2013 loss 0.0007 elapsed 19.8m


fold 1 epoch 2 iter 1100/2013 loss 0.0007 elapsed 20.4m


fold 1 epoch 2 iter 1200/2013 loss 0.0007 elapsed 21.0m


fold 1 epoch 2 iter 1300/2013 loss 0.0007 elapsed 21.6m


fold 1 epoch 2 iter 1400/2013 loss 0.0007 elapsed 22.2m


fold 1 epoch 2 iter 1500/2013 loss 0.0007 elapsed 22.8m


fold 1 epoch 2 iter 1600/2013 loss 0.0007 elapsed 23.4m


fold 1 epoch 2 iter 1700/2013 loss 0.0007 elapsed 24.0m


fold 1 epoch 2 iter 1800/2013 loss 0.0007 elapsed 24.6m


fold 1 epoch 2 iter 1900/2013 loss 0.0007 elapsed 25.2m


fold 1 epoch 2 iter 2000/2013 loss 0.0007 elapsed 25.8m



=== VAL DIAG fold 1 epoch 2 ===
val_size=24136 probs_shape=(24136, 3474) tgts_shape=(24136, 3474)
probs_range=[0.000067,0.998448]
tgt_pos_rate=0.00127339 mean_pos_per_img=4.424


thr=0.2 pred_pos_rate=0.02769274 mean_pred_per_img=96.205 empty_frac=0.000000 TP=102042 FP=2219952 FN=4730 f1@0.2=0.084028


fold 1 epoch 2 val micro-f1 0.58936 @ thr 0.500


fold 1 epoch 3 iter 100/2013 loss 0.0006 elapsed 28.2m


fold 1 epoch 3 iter 200/2013 loss 0.0006 elapsed 28.8m


In [25]:
import os, sys, time, shlex, subprocess
from pathlib import Path

print('=== Launch: b4@448 5-fold, EMA+TTA, using train_folds_top512.csv ===', flush=True)
assert Path('train_folds_top512.csv').exists(), 'Missing train_folds_top512.csv'
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'tf_efficientnet_b4_ns',
    '--img-size', '448',
    '--epochs', '12',
    '--batch-size', '32',
    '--val-batch-size', '64',
    '--num-workers', '10',
    '--lr', '5e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '3',
    '--folds', '0,1,2,3,4',
    '--folds-csv', 'train_folds_top512.csv',
    '--out-dir', 'out_b4_448_top512',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
env = dict(os.environ); env['PYTHONUNBUFFERED'] = '1'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
print(f'Exit code: {rc}, elapsed {(time.time()-t0)/3600:.2f} h', flush=True)
assert rc == 0, 'b4@448 production run failed'
print('b4@448 production run completed.')

=== Launch: b4@448 5-fold, EMA+TTA, using train_folds_top512.csv ===


Running: /usr/bin/python3.11 -u train.py --model tf_efficientnet_b4_ns --img-size 448 --epochs 12 --batch-size 32 --val-batch-size 64 --num-workers 10 --lr 5e-4 --use-ema --tta --early-stop-patience 3 --folds 0,1,2,3,4 --folds-csv train_folds_top512.csv --out-dir out_b4_448_top512 --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


fold 0 epoch 1 iter 100/3019 loss 0.0014 elapsed 0.9m


fold 0 epoch 1 iter 200/3019 loss 0.0013 elapsed 1.4m


fold 0 epoch 1 iter 300/3019 loss 0.0013 elapsed 1.9m


fold 0 epoch 1 iter 400/3019 loss 0.0012 elapsed 2.4m


fold 0 epoch 1 iter 500/3019 loss 0.0012 elapsed 3.0m


fold 0 epoch 1 iter 600/3019 loss 0.0011 elapsed 3.5m


fold 0 epoch 1 iter 700/3019 loss 0.0011 elapsed 4.0m


fold 0 epoch 1 iter 800/3019 loss 0.0011 elapsed 4.5m


fold 0 epoch 1 iter 900/3019 loss 0.0011 elapsed 5.1m


fold 0 epoch 1 iter 1000/3019 loss 0.0010 elapsed 5.6m


fold 0 epoch 1 iter 1100/3019 loss 0.0010 elapsed 6.1m


fold 0 epoch 1 iter 1200/3019 loss 0.0010 elapsed 6.7m


fold 0 epoch 1 iter 1300/3019 loss 0.0010 elapsed 7.2m


fold 0 epoch 1 iter 1400/3019 loss 0.0010 elapsed 7.7m


fold 0 epoch 1 iter 1500/3019 loss 0.0010 elapsed 8.3m


fold 0 epoch 1 iter 1600/3019 loss 0.0010 elapsed 8.8m


fold 0 epoch 1 iter 1700/3019 loss 0.0010 elapsed 9.3m


fold 0 epoch 1 iter 1800/3019 loss 0.0009 elapsed 9.8m


fold 0 epoch 1 iter 1900/3019 loss 0.0009 elapsed 10.4m


fold 0 epoch 1 iter 2000/3019 loss 0.0009 elapsed 10.9m


fold 0 epoch 1 iter 2100/3019 loss 0.0009 elapsed 11.4m


fold 0 epoch 1 iter 2200/3019 loss 0.0009 elapsed 12.0m


fold 0 epoch 1 iter 2300/3019 loss 0.0009 elapsed 12.5m


fold 0 epoch 1 iter 2400/3019 loss 0.0009 elapsed 13.0m


fold 0 epoch 1 iter 2500/3019 loss 0.0009 elapsed 13.6m


fold 0 epoch 1 iter 2600/3019 loss 0.0009 elapsed 14.1m


fold 0 epoch 1 iter 2700/3019 loss 0.0009 elapsed 14.6m


fold 0 epoch 1 iter 2800/3019 loss 0.0009 elapsed 15.2m


fold 0 epoch 1 iter 2900/3019 loss 0.0009 elapsed 15.7m


fold 0 epoch 1 iter 3000/3019 loss 0.0009 elapsed 16.2m



=== VAL DIAG fold 0 epoch 1 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.020371,0.734199]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.04986531 mean_pred_per_img=173.232 empty_frac=0.000000 TP=94532 FP=4095779 FN=12343 f1@0.2=0.043997


fold 0 epoch 1 val micro-f1 0.40535 @ thr 0.420


fold 0 epoch 2 iter 100/3019 loss 0.0007 elapsed 19.5m


fold 0 epoch 2 iter 200/3019 loss 0.0007 elapsed 20.0m


fold 0 epoch 2 iter 300/3019 loss 0.0007 elapsed 20.6m


fold 0 epoch 2 iter 400/3019 loss 0.0007 elapsed 21.1m


fold 0 epoch 2 iter 500/3019 loss 0.0007 elapsed 21.6m


fold 0 epoch 2 iter 600/3019 loss 0.0007 elapsed 22.1m


fold 0 epoch 2 iter 700/3019 loss 0.0007 elapsed 22.7m


fold 0 epoch 2 iter 800/3019 loss 0.0007 elapsed 23.2m


fold 0 epoch 2 iter 900/3019 loss 0.0007 elapsed 23.7m


fold 0 epoch 2 iter 1000/3019 loss 0.0007 elapsed 24.3m


fold 0 epoch 2 iter 1100/3019 loss 0.0007 elapsed 24.8m


fold 0 epoch 2 iter 1200/3019 loss 0.0007 elapsed 25.3m


fold 0 epoch 2 iter 1300/3019 loss 0.0007 elapsed 25.9m


fold 0 epoch 2 iter 1400/3019 loss 0.0007 elapsed 26.4m


fold 0 epoch 2 iter 1500/3019 loss 0.0007 elapsed 26.9m


fold 0 epoch 2 iter 1600/3019 loss 0.0007 elapsed 27.5m


fold 0 epoch 2 iter 1700/3019 loss 0.0007 elapsed 28.0m


fold 0 epoch 2 iter 1800/3019 loss 0.0007 elapsed 28.5m


fold 0 epoch 2 iter 1900/3019 loss 0.0007 elapsed 29.1m


fold 0 epoch 2 iter 2000/3019 loss 0.0007 elapsed 29.6m


fold 0 epoch 2 iter 2100/3019 loss 0.0007 elapsed 30.1m


fold 0 epoch 2 iter 2200/3019 loss 0.0007 elapsed 30.7m


fold 0 epoch 2 iter 2300/3019 loss 0.0007 elapsed 31.2m


fold 0 epoch 2 iter 2400/3019 loss 0.0007 elapsed 31.7m


fold 0 epoch 2 iter 2500/3019 loss 0.0007 elapsed 32.3m


fold 0 epoch 2 iter 2600/3019 loss 0.0007 elapsed 32.8m


fold 0 epoch 2 iter 2700/3019 loss 0.0007 elapsed 33.3m


fold 0 epoch 2 iter 2800/3019 loss 0.0007 elapsed 33.9m


fold 0 epoch 2 iter 2900/3019 loss 0.0007 elapsed 34.4m


fold 0 epoch 2 iter 3000/3019 loss 0.0007 elapsed 34.9m



=== VAL DIAG fold 0 epoch 2 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.005754,0.988331]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.06444909 mean_pred_per_img=223.896 empty_frac=0.000000 TP=103180 FP=5312644 FN=3695 f1@0.2=0.037366


fold 0 epoch 2 val micro-f1 0.55272 @ thr 0.500


fold 0 epoch 3 iter 100/3019 loss 0.0006 elapsed 37.7m


fold 0 epoch 3 iter 200/3019 loss 0.0006 elapsed 38.2m


fold 0 epoch 3 iter 300/3019 loss 0.0006 elapsed 38.7m


fold 0 epoch 3 iter 400/3019 loss 0.0006 elapsed 39.3m


fold 0 epoch 3 iter 500/3019 loss 0.0006 elapsed 39.8m


fold 0 epoch 3 iter 600/3019 loss 0.0006 elapsed 40.3m


fold 0 epoch 3 iter 700/3019 loss 0.0006 elapsed 40.9m


fold 0 epoch 3 iter 800/3019 loss 0.0006 elapsed 41.4m


fold 0 epoch 3 iter 900/3019 loss 0.0006 elapsed 41.9m


fold 0 epoch 3 iter 1000/3019 loss 0.0006 elapsed 42.5m


fold 0 epoch 3 iter 1100/3019 loss 0.0006 elapsed 43.0m


fold 0 epoch 3 iter 1200/3019 loss 0.0006 elapsed 43.5m


fold 0 epoch 3 iter 1300/3019 loss 0.0006 elapsed 44.1m


fold 0 epoch 3 iter 1400/3019 loss 0.0006 elapsed 44.6m


fold 0 epoch 3 iter 1500/3019 loss 0.0006 elapsed 45.1m


fold 0 epoch 3 iter 1600/3019 loss 0.0006 elapsed 45.7m


fold 0 epoch 3 iter 1700/3019 loss 0.0006 elapsed 46.2m


fold 0 epoch 3 iter 1800/3019 loss 0.0006 elapsed 46.7m


fold 0 epoch 3 iter 1900/3019 loss 0.0006 elapsed 47.3m


fold 0 epoch 3 iter 2000/3019 loss 0.0006 elapsed 47.8m


fold 0 epoch 3 iter 2100/3019 loss 0.0006 elapsed 48.3m


fold 0 epoch 3 iter 2200/3019 loss 0.0006 elapsed 48.8m


fold 0 epoch 3 iter 2300/3019 loss 0.0006 elapsed 49.4m


fold 0 epoch 3 iter 2400/3019 loss 0.0006 elapsed 49.9m


fold 0 epoch 3 iter 2500/3019 loss 0.0006 elapsed 50.4m


fold 0 epoch 3 iter 2600/3019 loss 0.0006 elapsed 51.0m


fold 0 epoch 3 iter 2700/3019 loss 0.0006 elapsed 51.5m


fold 0 epoch 3 iter 2800/3019 loss 0.0006 elapsed 52.0m


fold 0 epoch 3 iter 2900/3019 loss 0.0006 elapsed 52.6m


fold 0 epoch 3 iter 3000/3019 loss 0.0006 elapsed 53.1m



=== VAL DIAG fold 0 epoch 3 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000200,0.999089]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.05399261 mean_pred_per_img=187.570 empty_frac=0.000000 TP=104182 FP=4432957 FN=2693 f1@0.2=0.044867


fold 0 epoch 3 val micro-f1 0.57166 @ thr 0.500


fold 0 epoch 4 iter 100/3019 loss 0.0005 elapsed 55.9m


fold 0 epoch 4 iter 200/3019 loss 0.0005 elapsed 56.4m


fold 0 epoch 4 iter 300/3019 loss 0.0005 elapsed 56.9m


fold 0 epoch 4 iter 400/3019 loss 0.0005 elapsed 57.5m


fold 0 epoch 4 iter 500/3019 loss 0.0005 elapsed 58.0m


fold 0 epoch 4 iter 600/3019 loss 0.0005 elapsed 58.5m


fold 0 epoch 4 iter 700/3019 loss 0.0005 elapsed 59.1m


fold 0 epoch 4 iter 800/3019 loss 0.0005 elapsed 59.6m


fold 0 epoch 4 iter 900/3019 loss 0.0005 elapsed 60.1m


fold 0 epoch 4 iter 1000/3019 loss 0.0005 elapsed 60.7m


fold 0 epoch 4 iter 1100/3019 loss 0.0005 elapsed 61.2m


fold 0 epoch 4 iter 1200/3019 loss 0.0005 elapsed 61.7m


fold 0 epoch 4 iter 1300/3019 loss 0.0005 elapsed 62.3m


fold 0 epoch 4 iter 1400/3019 loss 0.0005 elapsed 62.8m


fold 0 epoch 4 iter 1500/3019 loss 0.0005 elapsed 63.3m


fold 0 epoch 4 iter 1600/3019 loss 0.0005 elapsed 63.9m


fold 0 epoch 4 iter 1700/3019 loss 0.0005 elapsed 64.4m


fold 0 epoch 4 iter 1800/3019 loss 0.0005 elapsed 64.9m


fold 0 epoch 4 iter 1900/3019 loss 0.0005 elapsed 65.5m


fold 0 epoch 4 iter 2000/3019 loss 0.0005 elapsed 66.0m


fold 0 epoch 4 iter 2100/3019 loss 0.0005 elapsed 66.5m


fold 0 epoch 4 iter 2200/3019 loss 0.0005 elapsed 67.1m


fold 0 epoch 4 iter 2300/3019 loss 0.0005 elapsed 67.6m


fold 0 epoch 4 iter 2400/3019 loss 0.0005 elapsed 68.1m


fold 0 epoch 4 iter 2500/3019 loss 0.0005 elapsed 68.7m


fold 0 epoch 4 iter 2600/3019 loss 0.0005 elapsed 69.2m


fold 0 epoch 4 iter 2700/3019 loss 0.0005 elapsed 69.7m


fold 0 epoch 4 iter 2800/3019 loss 0.0005 elapsed 70.3m


fold 0 epoch 4 iter 2900/3019 loss 0.0005 elapsed 70.8m


fold 0 epoch 4 iter 3000/3019 loss 0.0005 elapsed 71.3m



=== VAL DIAG fold 0 epoch 4 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999966]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.03541859 mean_pred_per_img=123.044 empty_frac=0.000000 TP=103456 FP=2872860 FN=3419 f1@0.2=0.067110


fold 0 epoch 4 val micro-f1 0.58290 @ thr 0.500


fold 0 epoch 5 iter 100/3019 loss 0.0004 elapsed 74.1m


fold 0 epoch 5 iter 200/3019 loss 0.0004 elapsed 74.6m


fold 0 epoch 5 iter 300/3019 loss 0.0004 elapsed 75.1m


fold 0 epoch 5 iter 400/3019 loss 0.0004 elapsed 75.7m


fold 0 epoch 5 iter 500/3019 loss 0.0004 elapsed 76.2m


fold 0 epoch 5 iter 600/3019 loss 0.0004 elapsed 76.7m


fold 0 epoch 5 iter 700/3019 loss 0.0004 elapsed 77.3m


fold 0 epoch 5 iter 800/3019 loss 0.0004 elapsed 77.8m


fold 0 epoch 5 iter 900/3019 loss 0.0004 elapsed 78.3m


fold 0 epoch 5 iter 1000/3019 loss 0.0004 elapsed 78.9m


fold 0 epoch 5 iter 1100/3019 loss 0.0004 elapsed 79.4m


fold 0 epoch 5 iter 1200/3019 loss 0.0004 elapsed 79.9m


fold 0 epoch 5 iter 1300/3019 loss 0.0004 elapsed 80.5m


fold 0 epoch 5 iter 1400/3019 loss 0.0004 elapsed 81.0m


fold 0 epoch 5 iter 1500/3019 loss 0.0004 elapsed 81.5m


fold 0 epoch 5 iter 1600/3019 loss 0.0004 elapsed 82.1m


fold 0 epoch 5 iter 1700/3019 loss 0.0004 elapsed 82.6m


fold 0 epoch 5 iter 1800/3019 loss 0.0004 elapsed 83.1m


fold 0 epoch 5 iter 1900/3019 loss 0.0004 elapsed 83.7m


fold 0 epoch 5 iter 2000/3019 loss 0.0004 elapsed 84.2m


fold 0 epoch 5 iter 2100/3019 loss 0.0004 elapsed 84.7m


fold 0 epoch 5 iter 2200/3019 loss 0.0004 elapsed 85.3m


fold 0 epoch 5 iter 2300/3019 loss 0.0004 elapsed 85.8m


fold 0 epoch 5 iter 2400/3019 loss 0.0004 elapsed 86.3m


fold 0 epoch 5 iter 2500/3019 loss 0.0004 elapsed 86.9m


fold 0 epoch 5 iter 2600/3019 loss 0.0004 elapsed 87.4m


fold 0 epoch 5 iter 2700/3019 loss 0.0004 elapsed 87.9m


fold 0 epoch 5 iter 2800/3019 loss 0.0004 elapsed 88.5m


fold 0 epoch 5 iter 2900/3019 loss 0.0004 elapsed 89.0m


fold 0 epoch 5 iter 3000/3019 loss 0.0004 elapsed 89.5m



=== VAL DIAG fold 0 epoch 5 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,0.999999]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.02319197 mean_pred_per_img=80.569 empty_frac=0.000000 TP=102019 FP=1846862 FN=4856 f1@0.2=0.099252


fold 0 epoch 5 val micro-f1 0.59714 @ thr 0.500


fold 0 epoch 6 iter 100/3019 loss 0.0004 elapsed 92.3m


fold 0 epoch 6 iter 200/3019 loss 0.0004 elapsed 92.8m


fold 0 epoch 6 iter 300/3019 loss 0.0004 elapsed 93.3m


fold 0 epoch 6 iter 400/3019 loss 0.0004 elapsed 93.9m


fold 0 epoch 6 iter 500/3019 loss 0.0004 elapsed 94.4m


fold 0 epoch 6 iter 600/3019 loss 0.0004 elapsed 95.0m


fold 0 epoch 6 iter 700/3019 loss 0.0004 elapsed 95.5m


fold 0 epoch 6 iter 800/3019 loss 0.0004 elapsed 96.0m


fold 0 epoch 6 iter 900/3019 loss 0.0004 elapsed 96.6m


fold 0 epoch 6 iter 1000/3019 loss 0.0004 elapsed 97.1m


fold 0 epoch 6 iter 1100/3019 loss 0.0004 elapsed 97.6m


fold 0 epoch 6 iter 1200/3019 loss 0.0004 elapsed 98.2m


fold 0 epoch 6 iter 1300/3019 loss 0.0004 elapsed 98.7m


fold 0 epoch 6 iter 1400/3019 loss 0.0004 elapsed 99.2m


fold 0 epoch 6 iter 1500/3019 loss 0.0004 elapsed 99.8m


fold 0 epoch 6 iter 1600/3019 loss 0.0004 elapsed 100.3m


fold 0 epoch 6 iter 1700/3019 loss 0.0004 elapsed 100.8m


fold 0 epoch 6 iter 1800/3019 loss 0.0004 elapsed 101.4m


fold 0 epoch 6 iter 1900/3019 loss 0.0004 elapsed 101.9m


fold 0 epoch 6 iter 2000/3019 loss 0.0004 elapsed 102.4m


fold 0 epoch 6 iter 2100/3019 loss 0.0004 elapsed 103.0m


fold 0 epoch 6 iter 2200/3019 loss 0.0004 elapsed 103.5m


fold 0 epoch 6 iter 2300/3019 loss 0.0004 elapsed 104.0m


fold 0 epoch 6 iter 2400/3019 loss 0.0004 elapsed 104.6m


fold 0 epoch 6 iter 2500/3019 loss 0.0004 elapsed 105.1m


fold 0 epoch 6 iter 2600/3019 loss 0.0004 elapsed 105.6m


fold 0 epoch 6 iter 2700/3019 loss 0.0004 elapsed 106.2m


fold 0 epoch 6 iter 2800/3019 loss 0.0004 elapsed 106.7m


fold 0 epoch 6 iter 2900/3019 loss 0.0004 elapsed 107.2m


fold 0 epoch 6 iter 3000/3019 loss 0.0004 elapsed 107.8m



=== VAL DIAG fold 0 epoch 6 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,1.000000]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01597677 mean_pred_per_img=55.503 empty_frac=0.000000 TP=100145 FP=1242424 FN=6730 f1@0.2=0.138184


fold 0 epoch 6 val micro-f1 0.60867 @ thr 0.500


fold 0 epoch 7 iter 100/3019 loss 0.0003 elapsed 110.5m


fold 0 epoch 7 iter 200/3019 loss 0.0003 elapsed 111.1m


fold 0 epoch 7 iter 300/3019 loss 0.0003 elapsed 111.6m


fold 0 epoch 7 iter 400/3019 loss 0.0003 elapsed 112.1m


fold 0 epoch 7 iter 500/3019 loss 0.0003 elapsed 112.7m


fold 0 epoch 7 iter 600/3019 loss 0.0003 elapsed 113.2m


fold 0 epoch 7 iter 700/3019 loss 0.0003 elapsed 113.7m


fold 0 epoch 7 iter 800/3019 loss 0.0003 elapsed 114.3m


fold 0 epoch 7 iter 900/3019 loss 0.0003 elapsed 114.8m


fold 0 epoch 7 iter 1000/3019 loss 0.0003 elapsed 115.3m


fold 0 epoch 7 iter 1100/3019 loss 0.0003 elapsed 115.9m


fold 0 epoch 7 iter 1200/3019 loss 0.0003 elapsed 116.4m


fold 0 epoch 7 iter 1300/3019 loss 0.0003 elapsed 116.9m


fold 0 epoch 7 iter 1400/3019 loss 0.0003 elapsed 117.5m


fold 0 epoch 7 iter 1500/3019 loss 0.0003 elapsed 118.0m


fold 0 epoch 7 iter 1600/3019 loss 0.0003 elapsed 118.5m


fold 0 epoch 7 iter 1700/3019 loss 0.0003 elapsed 119.1m


fold 0 epoch 7 iter 1800/3019 loss 0.0003 elapsed 119.6m


fold 0 epoch 7 iter 1900/3019 loss 0.0003 elapsed 120.1m


fold 0 epoch 7 iter 2000/3019 loss 0.0003 elapsed 120.7m


fold 0 epoch 7 iter 2100/3019 loss 0.0003 elapsed 121.2m


fold 0 epoch 7 iter 2200/3019 loss 0.0003 elapsed 121.7m


fold 0 epoch 7 iter 2300/3019 loss 0.0003 elapsed 122.3m


fold 0 epoch 7 iter 2400/3019 loss 0.0003 elapsed 122.8m


fold 0 epoch 7 iter 2500/3019 loss 0.0003 elapsed 123.4m


fold 0 epoch 7 iter 2600/3019 loss 0.0003 elapsed 123.9m


fold 0 epoch 7 iter 2700/3019 loss 0.0003 elapsed 124.4m


fold 0 epoch 7 iter 2800/3019 loss 0.0003 elapsed 125.0m


fold 0 epoch 7 iter 2900/3019 loss 0.0003 elapsed 125.5m


fold 0 epoch 7 iter 3000/3019 loss 0.0003 elapsed 126.0m



=== VAL DIAG fold 0 epoch 7 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,1.000000]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.01203555 mean_pred_per_img=41.811 empty_frac=0.000000 TP=98170 FP=913208 FN=8705 f1@0.2=0.175577


fold 0 epoch 7 val micro-f1 0.61261 @ thr 0.500


fold 0 epoch 8 iter 100/3019 loss 0.0002 elapsed 128.8m


fold 0 epoch 8 iter 200/3019 loss 0.0002 elapsed 129.3m


fold 0 epoch 8 iter 300/3019 loss 0.0002 elapsed 129.8m


fold 0 epoch 8 iter 400/3019 loss 0.0002 elapsed 130.4m


fold 0 epoch 8 iter 500/3019 loss 0.0002 elapsed 130.9m


fold 0 epoch 8 iter 600/3019 loss 0.0002 elapsed 131.4m


fold 0 epoch 8 iter 700/3019 loss 0.0002 elapsed 132.0m


fold 0 epoch 8 iter 800/3019 loss 0.0002 elapsed 132.5m


fold 0 epoch 8 iter 900/3019 loss 0.0002 elapsed 133.1m


fold 0 epoch 8 iter 1000/3019 loss 0.0002 elapsed 133.6m


fold 0 epoch 8 iter 1100/3019 loss 0.0002 elapsed 134.1m


fold 0 epoch 8 iter 1200/3019 loss 0.0002 elapsed 134.7m


fold 0 epoch 8 iter 1300/3019 loss 0.0002 elapsed 135.2m


fold 0 epoch 8 iter 1400/3019 loss 0.0002 elapsed 135.7m


fold 0 epoch 8 iter 1500/3019 loss 0.0002 elapsed 136.3m


fold 0 epoch 8 iter 1600/3019 loss 0.0002 elapsed 136.8m


fold 0 epoch 8 iter 1700/3019 loss 0.0002 elapsed 137.3m


fold 0 epoch 8 iter 1800/3019 loss 0.0002 elapsed 137.9m


fold 0 epoch 8 iter 1900/3019 loss 0.0002 elapsed 138.4m


fold 0 epoch 8 iter 2000/3019 loss 0.0002 elapsed 138.9m


fold 0 epoch 8 iter 2100/3019 loss 0.0002 elapsed 139.5m


fold 0 epoch 8 iter 2200/3019 loss 0.0002 elapsed 140.0m


fold 0 epoch 8 iter 2300/3019 loss 0.0002 elapsed 140.5m


fold 0 epoch 8 iter 2400/3019 loss 0.0002 elapsed 141.1m


fold 0 epoch 8 iter 2500/3019 loss 0.0002 elapsed 141.6m


fold 0 epoch 8 iter 2600/3019 loss 0.0002 elapsed 142.1m


fold 0 epoch 8 iter 2700/3019 loss 0.0002 elapsed 142.7m


fold 0 epoch 8 iter 2800/3019 loss 0.0002 elapsed 143.2m


fold 0 epoch 8 iter 2900/3019 loss 0.0002 elapsed 143.7m


fold 0 epoch 8 iter 3000/3019 loss 0.0002 elapsed 144.3m



=== VAL DIAG fold 0 epoch 8 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.000000,1.000000]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.00965440 mean_pred_per_img=33.539 empty_frac=0.000000 TP=96083 FP=715201 FN=10792 f1@0.2=0.209295


fold 0 epoch 8 val micro-f1 0.61213 @ thr 0.500


fold 0 epoch 9 iter 100/3019 loss 0.0002 elapsed 147.0m


fold 0 epoch 9 iter 200/3019 loss 0.0002 elapsed 147.5m


fold 0 epoch 9 iter 300/3019 loss 0.0002 elapsed 148.1m


fold 0 epoch 9 iter 400/3019 loss 0.0002 elapsed 148.6m


fold 0 epoch 9 iter 500/3019 loss 0.0002 elapsed 149.1m


fold 0 epoch 9 iter 600/3019 loss 0.0002 elapsed 149.6m


fold 0 epoch 9 iter 700/3019 loss 0.0002 elapsed 150.2m


fold 0 epoch 9 iter 800/3019 loss 0.0002 elapsed 150.7m


fold 0 epoch 9 iter 900/3019 loss 0.0002 elapsed 151.2m


fold 0 epoch 9 iter 1000/3019 loss 0.0002 elapsed 151.8m


fold 0 epoch 9 iter 1100/3019 loss 0.0002 elapsed 152.3m


fold 0 epoch 9 iter 1200/3019 loss 0.0002 elapsed 152.8m


fold 0 epoch 9 iter 1300/3019 loss 0.0002 elapsed 153.4m


fold 0 epoch 9 iter 1400/3019 loss 0.0002 elapsed 153.9m


fold 0 epoch 9 iter 1500/3019 loss 0.0002 elapsed 154.4m


fold 0 epoch 9 iter 1600/3019 loss 0.0002 elapsed 155.0m


fold 0 epoch 9 iter 1700/3019 loss 0.0002 elapsed 155.5m


fold 0 epoch 9 iter 1800/3019 loss 0.0002 elapsed 156.0m


fold 0 epoch 9 iter 1900/3019 loss 0.0002 elapsed 156.6m


fold 0 epoch 9 iter 2000/3019 loss 0.0002 elapsed 157.1m


fold 0 epoch 9 iter 2100/3019 loss 0.0002 elapsed 157.6m


fold 0 epoch 9 iter 2200/3019 loss 0.0002 elapsed 158.2m


fold 0 epoch 9 iter 2300/3019 loss 0.0002 elapsed 158.7m


fold 0 epoch 9 iter 2400/3019 loss 0.0002 elapsed 159.2m


fold 0 epoch 9 iter 2500/3019 loss 0.0002 elapsed 159.8m


fold 0 epoch 9 iter 2600/3019 loss 0.0002 elapsed 160.3m


In [26]:
import os, sys, time, shlex, subprocess
from pathlib import Path

print('=== Prepared: convnext_tiny@384 5-fold, EMA+TTA, using train_folds_top512.csv ===', flush=True)
assert Path('train_folds_top512.csv').exists(), 'Missing train_folds_top512.csv'
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'convnext_tiny_in22k',
    '--img-size', '384',
    '--epochs', '8',
    '--batch-size', '64',
    '--val-batch-size', '96',
    '--num-workers', '10',
    '--lr', '1.5e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '2',
    '--folds', '0,1,2,3,4',
    '--folds-csv', 'train_folds_top512.csv',
    '--out-dir', 'out_convnext_tiny_384_top512',
    '--pretrained'
]
print('Queued command (not running yet):', ' '.join(shlex.quote(x) for x in cmd), flush=True)

# To launch after current GPU job completes, run this cell to start the convnext_tiny baseline for ensembling.

=== Prepared: convnext_tiny@384 5-fold, EMA+TTA, using train_folds_top512.csv ===


Queued command (not running yet): /usr/bin/python3.11 -u train.py --model convnext_tiny_in22k --img-size 384 --epochs 8 --batch-size 64 --val-batch-size 96 --num-workers 10 --lr 1.5e-4 --use-ema --tta --early-stop-patience 2 --folds 0,1,2,3,4 --folds-csv train_folds_top512.csv --out-dir out_convnext_tiny_384_top512 --pretrained


In [28]:
import os, sys, time, shlex, subprocess
from pathlib import Path

print('=== Launch: convnext_tiny@384 5-fold, EMA+TTA, using train_folds_top512.csv (v2 settings) ===', flush=True)
assert Path('train_folds_top512.csv').exists(), 'Missing train_folds_top512.csv'
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'convnext_tiny_in22k',
    '--img-size', '384',
    '--epochs', '10',
    '--batch-size', '64',
    '--val-batch-size', '96',
    '--num-workers', '10',
    '--lr', '2e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '3',
    '--folds', '0,1,2,3,4',
    '--folds-csv', 'train_folds_top512.csv',
    '--out-dir', 'out_convnext_tiny_384_top512_v2',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
env = dict(os.environ); env['PYTHONUNBUFFERED'] = '1'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
print(f'Exit code: {rc}, elapsed {(time.time()-t0)/3600:.2f} h', flush=True)
assert rc == 0, 'convnext_tiny@384 v2 production run failed'
print('convnext_tiny@384 v2 production run completed.')

=== Launch: convnext_tiny@384 5-fold, EMA+TTA, using train_folds_top512.csv (v2 settings) ===


Running: /usr/bin/python3.11 -u train.py --model convnext_tiny_in22k --img-size 384 --epochs 10 --batch-size 64 --val-batch-size 96 --num-workers 10 --lr 2e-4 --use-ema --tta --early-stop-patience 3 --folds 0,1,2,3,4 --folds-csv train_folds_top512.csv --out-dir out_convnext_tiny_384_top512_v2 --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


fold 0 epoch 1 iter 100/1509 loss 0.0014 elapsed 0.8m


fold 0 epoch 1 iter 200/1509 loss 0.0013 elapsed 1.4m


fold 0 epoch 1 iter 300/1509 loss 0.0012 elapsed 2.0m


fold 0 epoch 1 iter 400/1509 loss 0.0012 elapsed 2.5m


fold 0 epoch 1 iter 500/1509 loss 0.0011 elapsed 3.1m


fold 0 epoch 1 iter 600/1509 loss 0.0011 elapsed 3.7m


fold 0 epoch 1 iter 700/1509 loss 0.0011 elapsed 4.2m


fold 0 epoch 1 iter 800/1509 loss 0.0010 elapsed 4.8m


fold 0 epoch 1 iter 900/1509 loss 0.0010 elapsed 5.4m


fold 0 epoch 1 iter 1000/1509 loss 0.0010 elapsed 5.9m


fold 0 epoch 1 iter 1100/1509 loss 0.0010 elapsed 6.5m


fold 0 epoch 1 iter 1200/1509 loss 0.0010 elapsed 7.1m


fold 0 epoch 1 iter 1300/1509 loss 0.0009 elapsed 7.6m


fold 0 epoch 1 iter 1400/1509 loss 0.0009 elapsed 8.2m


fold 0 epoch 1 iter 1500/1509 loss 0.0009 elapsed 8.8m



=== VAL DIAG fold 0 epoch 1 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.010003,0.604751]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.05146473 mean_pred_per_img=178.788 empty_frac=0.000000 TP=84819 FP=4239895 FN=22056 f1@0.2=0.038279


fold 0 epoch 1 val micro-f1 0.23136 @ thr 0.380


fold 0 epoch 2 iter 100/1509 loss 0.0007 elapsed 10.8m


fold 0 epoch 2 iter 200/1509 loss 0.0007 elapsed 11.4m


fold 0 epoch 2 iter 300/1509 loss 0.0007 elapsed 12.0m


fold 0 epoch 2 iter 400/1509 loss 0.0007 elapsed 12.5m


fold 0 epoch 2 iter 500/1509 loss 0.0007 elapsed 13.1m


fold 0 epoch 2 iter 600/1509 loss 0.0007 elapsed 13.7m


fold 0 epoch 2 iter 700/1509 loss 0.0007 elapsed 14.2m


fold 0 epoch 2 iter 800/1509 loss 0.0007 elapsed 14.8m


fold 0 epoch 2 iter 900/1509 loss 0.0007 elapsed 15.4m


fold 0 epoch 2 iter 1000/1509 loss 0.0007 elapsed 15.9m


fold 0 epoch 2 iter 1100/1509 loss 0.0007 elapsed 16.5m


fold 0 epoch 2 iter 1200/1509 loss 0.0007 elapsed 17.1m


fold 0 epoch 2 iter 1300/1509 loss 0.0007 elapsed 17.6m


fold 0 epoch 2 iter 1400/1509 loss 0.0007 elapsed 18.2m


fold 0 epoch 2 iter 1500/1509 loss 0.0007 elapsed 18.8m



=== VAL DIAG fold 0 epoch 2 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.008020,0.826771]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.04041846 mean_pred_per_img=140.414 empty_frac=0.000000 TP=91947 FP=3304521 FN=14928 f1@0.2=0.052491


fold 0 epoch 2 val micro-f1 0.38873 @ thr 0.410


fold 0 epoch 3 iter 100/1509 loss 0.0005 elapsed 20.6m


fold 0 epoch 3 iter 200/1509 loss 0.0005 elapsed 21.2m


fold 0 epoch 3 iter 300/1509 loss 0.0005 elapsed 21.7m


fold 0 epoch 3 iter 400/1509 loss 0.0005 elapsed 22.3m


In [44]:
import os, sys, time, numpy as np, pandas as pd
from pathlib import Path

print('=== Blend setup: weighted logit averaging with global thr sweep or cardinality match ===', flush=True)

def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))

def probs_to_logits(p, eps=1e-5):
    p = np.clip(p, eps, 1.0 - eps)
    return np.log(p / (1.0 - p))

def micro_f1_from_probs(probs, targets, thr=0.2):
    preds = (probs >= thr).astype(np.uint8)
    t = targets.astype(np.uint8)
    tp = np.logical_and(preds == 1, t == 1).sum(dtype=np.int64)
    fp = np.logical_and(preds == 1, t == 0).sum(dtype=np.int64)
    fn = np.logical_and(preds == 0, t == 1).sum(dtype=np.int64)
    denom = 2 * tp + fp + fn
    return float((2 * tp) / denom) if denom > 0 else 0.0

def build_y_true(train_csv='train.csv', labels_csv='labels.csv'):
    train_df = pd.read_csv(train_csv)
    labels_df = pd.read_csv(labels_csv)
    attr_ids = sorted(labels_df['attribute_id'].astype(int).unique().tolist())
    attr_to_idx = {a:i for i,a in enumerate(attr_ids)}
    y_true = np.zeros((len(train_df), len(attr_ids)), dtype=np.uint8)
    for i, s in enumerate(train_df['attribute_ids'].fillna('').astype(str)):
        if s:
            for a in map(int, s.split()):
                j = attr_to_idx.get(a, None)
                if j is not None:
                    y_true[i, j] = 1
    return train_df, np.array(attr_ids, dtype=np.int32), y_true

def load_model_artifacts(model_dir: Path):
    model_dir = Path(model_dir)
    oof_p = model_dir/'oof_probs.npy'
    test_p = model_dir/'test_probs.npy'
    meta_p = model_dir/'oof_meta.csv'
    oof = np.load(oof_p) if oof_p.exists() else None
    test = np.load(test_p) if test_p.exists() else None
    meta = pd.read_csv(meta_p) if meta_p.exists() else None
    return oof, test, meta

def blend_equal_weight(model_dirs, write_submission=True, out_name='submission_blend.csv', default_thr=0.50, cardinality_target=None, weights=None):
    model_dirs = [Path(d) for d in model_dirs]
    train_df, idx_to_attr, y_true = build_y_true('train.csv', 'labels.csv')
    train_mean_labels = float((y_true.sum(axis=1)).mean())
    # Load all artifacts
    models = []
    for d in model_dirs:
        oof, test, meta = load_model_artifacts(d)
        if test is None:
            print(f'[WARN] Missing test_probs.npy in {d}, skipping this model for test blend')
        models.append({'dir': d, 'oof': oof, 'test': test, 'meta': meta})

    have_test = [m for m in models if m['test'] is not None]
    if len(have_test) == 0:
        print('[INFO] No test outputs yet; cannot write submission.')
        return None, None, None

    # Prepare weights
    if weights is not None:
        if len(weights) != len(have_test):
            print('[WARN] Provided weights length does not match number of test models; ignoring weights.')
            weights_use = None
        else:
            weights_use = np.array(weights, dtype=np.float64)
    else:
        weights_use = None

    # Blend test in logit space
    test_logits_list = [probs_to_logits(m['test']) for m in have_test]
    if weights_use is None:
        Zt = np.mean(np.stack(test_logits_list, axis=0), axis=0)
    else:
        Zt = np.average(np.stack(test_logits_list, axis=0), axis=0, weights=weights_use)
    Pt = sigmoid(Zt)

    # Determine threshold
    have_oof = [m for m in models if m['oof'] is not None and m['meta'] is not None]
    oof_f1, best_thr = None, None
    if len(have_oof) > 0:
        oof_logits_list = [probs_to_logits(m['oof']) for m in have_oof]
        if weights_use is None or len(have_oof) != len(have_test):
            Zb = np.mean(np.stack(oof_logits_list, axis=0), axis=0)
        else:
            Zb = np.average(np.stack(oof_logits_list, axis=0), axis=0, weights=weights_use)
        Pb = sigmoid(Zb)
        thrs = np.arange(0.48, 0.5201, 0.002)
        f1s = [micro_f1_from_probs(Pb, y_true, thr=t) for t in thrs]
        bi = int(np.argmax(f1s))
        best_thr = float(thrs[bi])
        oof_f1 = float(f1s[bi])
        print(f'Blended OOF micro-f1 {oof_f1:.5f} @ thr {best_thr:.3f}')
    elif cardinality_target is not None:
        thrs = np.arange(0.48, 0.5201, 0.002)
        means = [float((Pt >= t).sum(axis=1).mean()) for t in thrs]
        target = float(cardinality_target)
        bi = int(np.argmin([abs(m - target) for m in means]))
        best_thr = float(thrs[bi])
        print(f'[CARD] Train mean labels/img={train_mean_labels:.3f}, target={target:.3f}, chosen thr={best_thr:.3f} (pred_mean={means[bi]:.3f})')
    else:
        best_thr = float(default_thr)
        print(f'[INFO] No OOF available; using default threshold {best_thr:.3f}')

    if write_submission and best_thr is not None:
        sub = pd.read_csv('sample_submission.csv')
        ids = sub['id'].values
        rows = []
        for i in range(len(ids)):
            p = Pt[i]
            pred_idx = np.where(p >= best_thr)[0].tolist()
            if len(pred_idx) == 0:
                pred_idx = [int(np.argmax(p))]
            pred_attr = [int(idx_to_attr[j]) for j in sorted(set(pred_idx))]
            rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
        sub_df = pd.DataFrame(rows)
        sub_df.to_csv(out_name, index=False)
        print(f'Wrote {out_name} with thr={best_thr:.3f} using {len(have_test)} models')
    else:
        print('[INFO] Skipping submission write (best_thr not available).')
    return oof_f1, best_thr, Pt

# Example usage (will run later when artifacts exist):
MODEL_DIRS = [
    'out_b3_384_top512',
    'out_b3_448_top512',
    'out_convnext_tiny_384_top512',
]
print('Ready. Call blend_equal_weight(MODEL_DIRS, weights=[2,1,1], cardinality_target=4.42) to create weighted submission.', flush=True)

=== Blend setup: weighted logit averaging with global thr sweep or cardinality match ===


Ready. Call blend_equal_weight(MODEL_DIRS, weights=[2,1,1], cardinality_target=4.42) to create weighted submission.


In [33]:
import os, sys, glob, numpy as np, pandas as pd
from pathlib import Path
import train as trn

print('=== Inference-only helper: generate test_probs.npy from available fold weights ===', flush=True)

def infer_available_folds(out_dir: str, model_name: str, img_size: int, val_batch_size: int = 96, num_workers: int = 10, use_tta: bool = True):
    out = Path(out_dir)
    pths = sorted(out.glob(f'{model_name.replace("/","_")}_fold*.pth'))
    if len(pths) == 0:
        print(f'[SKIP] No weights found in {out_dir}')
        return None
    # Build minimal cfg object expected by train.infer_test
    class Cfg: pass
    cfg = Cfg()
    cfg.model = model_name
    cfg.img_size = img_size
    cfg.val_batch_size = val_batch_size
    cfg.num_workers = num_workers
    cfg.sample_sub = 'sample_submission.csv'
    cfg.test_dir = Path('test')
    cfg.use_ema = False
    cfg.tta = bool(use_tta)

    # Label mapping and image extension
    labels_df = pd.read_csv('labels.csv')
    attr_ids = sorted(labels_df['attribute_id'].astype(int).unique().tolist())
    attr_to_idx = {a:i for i,a in enumerate(attr_ids)}
    idx_to_attr = np.array(attr_ids, dtype=np.int32)
    train_df = pd.read_csv('train.csv')
    img_ext = trn.detect_ext(Path('train'), [train_df['id'].iloc[0]])

    print(f'[INFO] {out_dir}: found {len(pths)} fold weights -> running test inference (tta={cfg.tta})')
    ids, probs = trn.infer_test(cfg, [str(p) for p in pths], len(attr_ids), img_ext, attr_to_idx, idx_to_attr)
    np.save(out / 'test_probs.npy', probs)
    print(f'[DONE] Saved {out_dir}/test_probs.npy with shape {probs.shape}')
    return probs

# Example: run inference for partial dirs; re-run any time new weights appear
TRY_JOBS = [
    ('out_b3_384_top512', 'tf_efficientnet_b3_ns', 384),
    ('out_b3_448_top512', 'tf_efficientnet_b3_ns', 448),
    ('out_b3_384_card', 'tf_efficientnet_b3_ns', 384),
    ('out_convnext_tiny_384_top512', 'convnext_tiny_in22k', 384),
    ('out_convnext_tiny_384_top512_v2', 'convnext_tiny_in22k', 384),
]
for d, m, sz in TRY_JOBS:
    if Path(d).exists():
        try:
            infer_available_folds(d, m, sz, val_batch_size=96, num_workers=10, use_tta=True)
        except Exception as e:
            print(f'[WARN] Inference failed for {d}: {e}')
    else:
        print(f'[SKIP] Missing dir {d}')

print('Ready. You can re-run this cell any time new fold weights appear to refresh test_probs.npy for blending.', flush=True)

=== Inference-only helper: generate test_probs.npy from available fold weights ===


[INFO] out_b3_384_top512: found 2 fold weights -> running test inference (tta=True)


  state = torch.load(pth, map_location='cpu')
  model = create_fn(


[DONE] Saved out_b3_384_top512/test_probs.npy with shape (21318, 3474)
[INFO] out_b3_448_top512: found 1 fold weights -> running test inference (tta=True)


  state = torch.load(pth, map_location='cpu')
  model = create_fn(


[DONE] Saved out_b3_448_top512/test_probs.npy with shape (21318, 3474)
[INFO] out_b3_384_card: found 2 fold weights -> running test inference (tta=True)


  state = torch.load(pth, map_location='cpu')
  model = create_fn(


[DONE] Saved out_b3_384_card/test_probs.npy with shape (21318, 3474)
[INFO] out_convnext_tiny_384_top512: found 1 fold weights -> running test inference (tta=True)


  state = torch.load(pth, map_location='cpu')
  model = create_fn(


[DONE] Saved out_convnext_tiny_384_top512/test_probs.npy with shape (21318, 3474)
[SKIP] Missing dir out_convnext_tiny_384_top512_v2
Ready. You can re-run this cell any time new fold weights appear to refresh test_probs.npy for blending.


In [29]:
import os, sys, time, shlex, subprocess
from pathlib import Path

print('=== Launch: convnext_small@384 5-fold, EMA+TTA, using train_folds_top512.csv ===', flush=True)
assert Path('train_folds_top512.csv').exists(), 'Missing train_folds_top512.csv'
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'convnext_small_in22k',
    '--img-size', '384',
    '--epochs', '10',
    '--batch-size', '56',
    '--val-batch-size', '96',
    '--num-workers', '10',
    '--lr', '2e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '3',
    '--folds', '0,1,2,3,4',
    '--folds-csv', 'train_folds_top512.csv',
    '--out-dir', 'out_convnext_small_384_top512',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
env = dict(os.environ); env['PYTHONUNBUFFERED'] = '1'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
print(f'Exit code: {rc}, elapsed {(time.time()-t0)/3600:.2f} h', flush=True)
assert rc == 0, 'convnext_small@384 production run failed'
print('convnext_small@384 production run completed.')

=== Launch: convnext_small@384 5-fold, EMA+TTA, using train_folds_top512.csv ===


Running: /usr/bin/python3.11 -u train.py --model convnext_small_in22k --img-size 384 --epochs 10 --batch-size 56 --val-batch-size 96 --num-workers 10 --lr 2e-4 --use-ema --tta --early-stop-patience 3 --folds 0,1,2,3,4 --folds-csv train_folds_top512.csv --out-dir out_convnext_small_384_top512 --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


fold 0 epoch 1 iter 100/1725 loss 0.0014 elapsed 1.0m


fold 0 epoch 1 iter 200/1725 loss 0.0013 elapsed 1.8m


fold 0 epoch 1 iter 300/1725 loss 0.0013 elapsed 2.6m


fold 0 epoch 1 iter 400/1725 loss 0.0012 elapsed 3.4m


fold 0 epoch 1 iter 500/1725 loss 0.0012 elapsed 4.2m


fold 0 epoch 1 iter 600/1725 loss 0.0011 elapsed 5.0m


fold 0 epoch 1 iter 700/1725 loss 0.0011 elapsed 5.8m


fold 0 epoch 1 iter 800/1725 loss 0.0011 elapsed 6.6m


fold 0 epoch 1 iter 900/1725 loss 0.0010 elapsed 7.4m


fold 0 epoch 1 iter 1000/1725 loss 0.0010 elapsed 8.2m


fold 0 epoch 1 iter 1100/1725 loss 0.0010 elapsed 9.1m


fold 0 epoch 1 iter 1200/1725 loss 0.0010 elapsed 9.9m


fold 0 epoch 1 iter 1300/1725 loss 0.0010 elapsed 10.7m


fold 0 epoch 1 iter 1400/1725 loss 0.0009 elapsed 11.5m


fold 0 epoch 1 iter 1500/1725 loss 0.0009 elapsed 12.3m


fold 0 epoch 1 iter 1600/1725 loss 0.0009 elapsed 13.1m


fold 0 epoch 1 iter 1700/1725 loss 0.0009 elapsed 13.9m



=== VAL DIAG fold 0 epoch 1 ===
val_size=24189 probs_shape=(24189, 3474) tgts_shape=(24189, 3474)
probs_range=[0.005726,0.854392]
tgt_pos_rate=0.00127183 mean_pos_per_img=4.418


thr=0.2 pred_pos_rate=0.05700069 mean_pred_per_img=198.020 empty_frac=0.000000 TP=86114 FP=4703801 FN=20761 f1@0.2=0.035172


fold 0 epoch 1 val micro-f1 0.26133 @ thr 0.410


fold 0 epoch 2 iter 100/1725 loss 0.0006 elapsed 17.0m


fold 0 epoch 2 iter 200/1725 loss 0.0006 elapsed 17.8m


fold 0 epoch 2 iter 300/1725 loss 0.0006 elapsed 18.6m


fold 0 epoch 2 iter 400/1725 loss 0.0006 elapsed 19.4m


fold 0 epoch 2 iter 500/1725 loss 0.0006 elapsed 20.2m


fold 0 epoch 2 iter 600/1725 loss 0.0006 elapsed 21.0m


fold 0 epoch 2 iter 700/1725 loss 0.0006 elapsed 21.8m


fold 0 epoch 2 iter 800/1725 loss 0.0006 elapsed 22.6m


fold 0 epoch 2 iter 900/1725 loss 0.0006 elapsed 23.4m


fold 0 epoch 2 iter 1000/1725 loss 0.0006 elapsed 24.2m


fold 0 epoch 2 iter 1100/1725 loss 0.0006 elapsed 25.0m


fold 0 epoch 2 iter 1200/1725 loss 0.0006 elapsed 25.8m


fold 0 epoch 2 iter 1300/1725 loss 0.0006 elapsed 26.6m


fold 0 epoch 2 iter 1400/1725 loss 0.0006 elapsed 27.4m


In [34]:
print('=== Blending now: b3@384_top512 + b3@448_top512 + b3@384_card + convnext_tiny + OOF(threshold)=out_smoke_b3_384 ===', flush=True)
dirs = ['out_b3_384_top512','out_b3_448_top512','out_b3_384_card','out_convnext_tiny_384_top512','out_smoke_b3_384']
oof_f1, best_thr, Pt = blend_equal_weight(dirs, write_submission=True, out_name='submission.csv', default_thr=0.50)
print('Blend done. best_thr =', best_thr, 'OOF_f1 =', oof_f1)

=== Blending now: b3@384_top512 + b3@448_top512 + b3@384_card + convnext_tiny + OOF(threshold)=out_smoke_b3_384 ===


[WARN] Missing test_probs.npy in out_smoke_b3_384, skipping this model for test blend


Blended OOF micro-f1 0.08233 @ thr 0.050


Wrote submission.csv with thr=0.050 using 4 models
Blend done. best_thr = 0.05 OOF_f1 = 0.08233134629987264


In [35]:
print('=== Re-blend without OOF (use default thr=0.50) ===', flush=True)
dirs = ['out_b3_384_top512','out_b3_448_top512','out_b3_384_card','out_convnext_tiny_384_top512']
oof_f1, best_thr, Pt = blend_equal_weight(dirs, write_submission=True, out_name='submission.csv', default_thr=0.50)
print('Reblend done. best_thr =', best_thr, 'OOF_f1 =', oof_f1)

=== Re-blend without OOF (use default thr=0.50) ===


[INFO] No complete OOF across models; using default threshold 0.500.


Wrote submission.csv with thr=0.500 using 4 models
Reblend done. best_thr = 0.5 OOF_f1 = None


In [38]:
print('=== Re-blend with cardinality matching to train mean (~4.42) ===', flush=True)
dirs = ['out_b3_384_top512','out_b3_448_top512','out_b3_384_card','out_convnext_tiny_384_top512']
oof_f1, best_thr, Pt = blend_equal_weight(dirs, write_submission=True, out_name='submission.csv', default_thr=0.50, cardinality_target=4.42)
print('Reblend (cardinality) done. best_thr =', best_thr, 'OOF_f1 =', oof_f1)

=== Re-blend with cardinality matching to train mean (~4.42) ===


[CARD] Train mean labels/img=4.421, target=4.420, chosen thr=0.550 (pred_mean=4.559)
Wrote submission.csv with thr=0.550 using 4 models
Reblend (cardinality) done. best_thr = 0.5499999999999998 OOF_f1 = None


In [39]:
print('=== Safety blend: b3@384_top512 + b3@448_top512 + convnext_tiny | cardinality target=4.42 ===', flush=True)
dirs = ['out_b3_384_top512','out_b3_448_top512','out_convnext_tiny_384_top512']
oof_f1, best_thr, Pt = blend_equal_weight(dirs, write_submission=True, out_name='submission.csv', default_thr=0.50, cardinality_target=4.42)
print('Safety blend done. thr=', best_thr, 'OOF_f1=', oof_f1)

=== Safety blend: b3@384_top512 + b3@448_top512 + convnext_tiny | cardinality target=4.42 ===


[CARD] Train mean labels/img=4.421, target=4.420, chosen thr=0.495 (pred_mean=4.421)
Wrote submission.csv with thr=0.495 using 3 models
Safety blend done. thr= 0.4949999999999998 OOF_f1= None


In [40]:
import os, sys, time, shlex, subprocess
from pathlib import Path

print('=== Launch: b5@456 folds 0,1, EMA+TTA, using train_folds_top512.csv ===', flush=True)
assert Path('train_folds_top512.csv').exists(), 'Missing train_folds_top512.csv'
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'tf_efficientnet_b5_ns',
    '--img-size', '456',
    '--epochs', '10',
    '--batch-size', '24',
    '--val-batch-size', '64',
    '--num-workers', '10',
    '--lr', '3e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '3',
    '--folds', '0,1',
    '--folds-csv', 'train_folds_top512.csv',
    '--out-dir', 'out_b5_456_top512',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
env = dict(os.environ); env['PYTHONUNBUFFERED'] = '1'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
print(f'Exit code: {rc}, elapsed {(time.time()-t0)/3600:.2f} h', flush=True)
assert rc == 0, 'b5@456 run failed'
print('b5@456 folds 0,1 completed.')

=== Launch: b5@456 folds 0,1, EMA+TTA, using train_folds_top512.csv ===


Running: /usr/bin/python3.11 -u train.py --model tf_efficientnet_b5_ns --img-size 456 --epochs 10 --batch-size 24 --val-batch-size 64 --num-workers 10 --lr 3e-4 --use-ema --tta --early-stop-patience 3 --folds 0,1 --folds-csv train_folds_top512.csv --out-dir out_b5_456_top512 --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


Traceback (most recent call last):
  File "/var/lib/simon/agent_run_states/imet-2020-fgvc7-20250927-162858/train.py", line 484, in <module>
    main()
  File "/var/lib/simon/agent_run_states/imet-2020-fgvc7-20250927-162858/train.py", line 445, in main
    val_idx, oof_probs, thr, f1 = train_one_fold(cfg, f, train_df, num_classes, img_ext, attr_to_idx)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/simon/agent_run_states/imet-2020-fgvc7-20250927-162858/train.py", line 250, in train_one_fold
    logits = model(imgs)
             ^^^^^^^^^^^
  File "/app/.pip-target/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.pip-target/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.pip-target/timm/models/efficientn

Exit code: 1, elapsed 0.02 h


AssertionError: b5@456 run failed

In [41]:
import os, sys, time, shlex, subprocess
from pathlib import Path

print('=== Relaunch: b5@456 fold 0 ONLY (reduced BS to avoid OOM) ===', flush=True)
assert Path('train_folds_top512.csv').exists(), 'Missing train_folds_top512.csv'
cmd = [
    sys.executable, '-u', 'train.py',
    '--model', 'tf_efficientnet_b5_ns',
    '--img-size', '456',
    '--epochs', '10',
    '--batch-size', '16',
    '--val-batch-size', '48',
    '--num-workers', '10',
    '--lr', '3e-4',
    '--use-ema',
    '--tta',
    '--early-stop-patience', '3',
    '--folds', '0',
    '--folds-csv', 'train_folds_top512.csv',
    '--out-dir', 'out_b5_456_top512',
    '--pretrained'
]
print('Running:', ' '.join(shlex.quote(x) for x in cmd), flush=True)
t0 = time.time()
env = dict(os.environ); env['PYTHONUNBUFFERED'] = '1'; env['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env)
try:
    for line in p.stdout:
        print(line, end='')
finally:
    rc = p.wait()
print(f'Exit code: {rc}, elapsed {(time.time()-t0)/3600:.2f} h', flush=True)
assert rc == 0, 'b5@456 fold0 run failed'
print('b5@456 fold 0 completed.')

=== Relaunch: b5@456 fold 0 ONLY (reduced BS to avoid OOM) ===


Running: /usr/bin/python3.11 -u train.py --model tf_efficientnet_b5_ns --img-size 456 --epochs 10 --batch-size 16 --val-batch-size 48 --num-workers 10 --lr 3e-4 --use-ema --tta --early-stop-patience 3 --folds 0 --folds-csv train_folds_top512.csv --out-dir out_b5_456_top512 --pretrained


Detected image extension: .png
==== Fold 0 start ====
  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(enabled=True):


Traceback (most recent call last):
  File "/var/lib/simon/agent_run_states/imet-2020-fgvc7-20250927-162858/train.py", line 484, in <module>
    main()
  File "/var/lib/simon/agent_run_states/imet-2020-fgvc7-20250927-162858/train.py", line 445, in main
    val_idx, oof_probs, thr, f1 = train_one_fold(cfg, f, train_df, num_classes, img_ext, attr_to_idx)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/simon/agent_run_states/imet-2020-fgvc7-20250927-162858/train.py", line 250, in train_one_fold
    logits = model(imgs)
             ^^^^^^^^^^^
  File "/app/.pip-target/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.pip-target/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.pip-target/timm/models/efficientn

In [42]:
print('=== Safety blend #2: b3@384_top512 + b3@448_top512 + convnext_tiny | cardinality target=4.33 ===', flush=True)
dirs = ['out_b3_384_top512','out_b3_448_top512','out_convnext_tiny_384_top512']
oof_f1, best_thr, Pt = blend_equal_weight(dirs, write_submission=True, out_name='submission_alt.csv', default_thr=0.50, cardinality_target=4.33)
print('Safety blend #2 done. thr=', best_thr, 'OOF_f1=', oof_f1)

=== Safety blend #2: b3@384_top512 + b3@448_top512 + convnext_tiny | cardinality target=4.33 ===


[CARD] Train mean labels/img=4.421, target=4.330, chosen thr=0.500 (pred_mean=4.332)
Wrote submission_alt.csv with thr=0.500 using 3 models
Safety blend #2 done. thr= 0.4999999999999998 OOF_f1= None


In [45]:
print('=== Weighted blends: 2:1:1 (b3_384:b3_448:convnext), prob clip->logits, narrow thr sweep via cardinality ===', flush=True)
dirs_main = ['out_b3_384_top512','out_b3_448_top512','out_convnext_tiny_384_top512']

# Main pick: match train mean ~4.42, overwrite submission.csv
oof_f1, best_thr, Pt = blend_equal_weight(dirs_main, weights=[2,1,1], write_submission=True, out_name='submission.csv', cardinality_target=4.42)
print('[MAIN] submission.csv thr=', best_thr, 'OOF_f1=', oof_f1)

# Backup: slightly lower cardinality target ~4.36
oof_f12, best_thr2, Pt2 = blend_equal_weight(dirs_main, weights=[2,1,1], write_submission=True, out_name='submission_weighted_alt.csv', cardinality_target=4.36)
print('[ALT ] submission_weighted_alt.csv thr=', best_thr2, 'OOF_f1=', oof_f12)

# Optional hedge: b3-only (drop convnext), 2:1 weight, at 4.42
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
oof_f1b3, thr_b3, _ = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=True, out_name='submission_b3_only.csv', cardinality_target=4.42)
print('[B3  ] submission_b3_only.csv thr=', thr_b3, 'OOF_f1=', oof_f1b3)
print('Weighted blending complete.')

=== Weighted blends: 2:1:1 (b3_384:b3_448:convnext), prob clip->logits, narrow thr sweep via cardinality ===


[CARD] Train mean labels/img=4.421, target=4.420, chosen thr=0.496 (pred_mean=4.403)
Wrote submission.csv with thr=0.496 using 3 models
[MAIN] submission.csv thr= 0.496 OOF_f1= None


[CARD] Train mean labels/img=4.421, target=4.360, chosen thr=0.498 (pred_mean=4.367)
Wrote submission_weighted_alt.csv with thr=0.498 using 3 models
[ALT ] submission_weighted_alt.csv thr= 0.498 OOF_f1= None


[CARD] Train mean labels/img=4.421, target=4.420, chosen thr=0.502 (pred_mean=4.406)
Wrote submission_b3_only.csv with thr=0.502 using 2 models
[B3  ] submission_b3_only.csv thr= 0.502 OOF_f1= None
Weighted blending complete.


In [46]:
import numpy as np, pandas as pd
from pathlib import Path

print('=== Post-process: per-group caps on weighted blend (2:1:1) ===', flush=True)

# 1) Recompute weighted blend probs (do not write submission here)
dirs_main = ['out_b3_384_top512','out_b3_448_top512','out_convnext_tiny_384_top512']
oof_f1, best_thr, Pt = blend_equal_weight(dirs_main, weights=[2,1,1], write_submission=False, out_name='noop.csv', cardinality_target=4.42)
print('[BLEND] thr chosen =', best_thr, 'OOF_f1=', oof_f1)

# 2) Build group mapping from labels.csv prefixes
labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids = labels_df['attribute_id'].astype(int).tolist()
attr_to_idx = {a:i for i,a in enumerate(sorted(attr_ids))}
idx_to_attr = np.array(sorted(attr_ids), dtype=np.int32)
idx_to_group = labels_df.set_index('attribute_id').loc[idx_to_attr, 'group'].values

# Inspect available groups
groups = pd.Series(idx_to_group).value_counts().index.tolist()
print('[GROUPS]', groups)

# 3) Define caps per group (fallback default=5)
caps = {
    'country': 1,
    'culture': 1,
    'century': 1,
    'object_type': 1,
    'technique': 2,
    'material': 2,
    'color': 1,
    'tag': 5,
    'tags': 5,
    'subject': 5,
}
default_cap = 5

# 4) Apply threshold then per-group caps
sub = pd.read_csv('sample_submission.csv')
ids = sub['id'].values
rows = []
thr = float(best_thr if best_thr is not None else 0.5)
for i in range(len(ids)):
    p = Pt[i]
    cand = np.where(p >= thr)[0]
    if cand.size == 0:
        # ensure at least one label
        top1 = int(np.argmax(p)); cand = np.array([top1], dtype=np.int64)
    # group -> indices within cand
    kept = []
    # sort candidates by prob desc
    cand_sorted = cand[np.argsort(-p[cand])]
    used_per_group = {}
    for j in cand_sorted:
        g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
        kcap = caps.get(g, default_cap)
        c = used_per_group.get(g, 0)
        if c < kcap:
            kept.append(j); used_per_group[g] = c + 1
    if len(kept) == 0:
        kept = [int(cand_sorted[0])]
    pred_attr = [int(idx_to_attr[j]) for j in kept]
    rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})

sub_caps = pd.DataFrame(rows)
sub_caps.to_csv('submission_caps.csv', index=False)
print('Wrote submission_caps.csv with per-group caps. thr=', thr)

=== Post-process: per-group caps on weighted blend (2:1:1) ===


[CARD] Train mean labels/img=4.421, target=4.420, chosen thr=0.496 (pred_mean=4.403)
[INFO] Skipping submission write (best_thr not available).
[BLEND] thr chosen = 0.496 OOF_f1= None
[GROUPS] ['medium', 'tags', 'culture', 'country', 'dimension']


Wrote submission_caps.csv with per-group caps. thr= 0.496


In [47]:
import shutil, os
print('=== Set final submission to per-group capped blend ===', flush=True)
src, dst = 'submission_caps.csv', 'submission.csv'
assert os.path.exists(src), f'Missing {src}; run cell 26 first.'
shutil.copyfile(src, dst)
print(f'Copied {src} -> {dst}')

=== Set final submission to per-group capped blend ===


Copied submission_caps.csv -> submission.csv


In [48]:
import numpy as np, pandas as pd, os, shutil
from pathlib import Path

print('=== Post-process: b3-only (2:1) weighted blend with per-group caps ===', flush=True)

# 1) Recompute b3-only weighted blend probs (no write) with a slightly lower target (4.38)
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
oof_f1_b3, thr_b3, Pt_b3 = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.38)
print('[BLEND B3] thr chosen =', thr_b3, 'OOF_f1=', oof_f1_b3)

# 2) Group mapping
labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids = labels_df['attribute_id'].astype(int).tolist()
attr_to_idx = {a:i for i,a in enumerate(sorted(attr_ids))}
idx_to_attr = np.array(sorted(attr_ids), dtype=np.int32)
idx_to_group = labels_df.set_index('attribute_id').loc[idx_to_attr, 'group'].values

# 3) Caps per group
caps = {
    'country': 1,
    'culture': 1,
    'century': 1,
    'object_type': 1,
    'technique': 2,
    'material': 2,
    'color': 1,
    'tag': 5,
    'tags': 5,
    'subject': 5,
}
default_cap = 5

# 4) Threshold then apply caps
sub = pd.read_csv('sample_submission.csv')
ids = sub['id'].values
rows = []
thr = float(thr_b3 if thr_b3 is not None else 0.5)
for i in range(len(ids)):
    p = Pt_b3[i]
    cand = np.where(p >= thr)[0]
    if cand.size == 0:
        top1 = int(np.argmax(p)); cand = np.array([top1], dtype=np.int64)
    cand_sorted = cand[np.argsort(-p[cand])]
    kept, used_per_group = [], {}
    for j in cand_sorted:
        g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
        kcap = caps.get(g, default_cap)
        c = used_per_group.get(g, 0)
        if c < kcap:
            kept.append(j); used_per_group[g] = c + 1
    if len(kept) == 0:
        kept = [int(cand_sorted[0])]
    pred_attr = [int(idx_to_attr[j]) for j in kept]
    rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})

sub_b3_caps = pd.DataFrame(rows)
sub_b3_caps.to_csv('submission_b3_caps.csv', index=False)
print('Wrote submission_b3_caps.csv with per-group caps. thr=', thr)

# 5) Make it the active submission for scoring
shutil.copyfile('submission_b3_caps.csv', 'submission.csv')
print('Copied submission_b3_caps.csv -> submission.csv')

=== Post-process: b3-only (2:1) weighted blend with per-group caps ===


[CARD] Train mean labels/img=4.421, target=4.380, chosen thr=0.504 (pred_mean=4.370)
[INFO] Skipping submission write (best_thr not available).
[BLEND B3] thr chosen = 0.504 OOF_f1= None


Wrote submission_b3_caps.csv with per-group caps. thr= 0.504
Copied submission_b3_caps.csv -> submission.csv


In [49]:
import numpy as np, pandas as pd, shutil, os
from pathlib import Path

print('=== Per-group caps v2 tuned to observed groups (medium, dimension, country, culture, tags) ===', flush=True)

def make_caps_submission(model_dirs, weights, target_card, out_path):
    # Blend to get probs
    oof_f1, best_thr, Pt = blend_equal_weight(model_dirs, weights=weights, write_submission=False, out_name='noop.csv', cardinality_target=target_card)
    print(f'[BLEND] thr={best_thr} target={target_card} models={model_dirs} weights={weights}')
    # Groups
    labels_df = pd.read_csv('labels.csv')
    labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
    attr_ids = labels_df['attribute_id'].astype(int).tolist()
    idx_to_attr = np.array(sorted(attr_ids), dtype=np.int32)
    idx_to_group = labels_df.set_index('attribute_id').loc[idx_to_attr, 'group'].values
    # Caps tuned to observed groups
    # Seen: ['medium','tags','culture','country','dimension']
    caps = {
        'country': 1,
        'culture': 1,
        'medium': 2,        # limit materials/technique-like to 2
        'dimension': 1,     # typically one dimension-related attr
        'tags': 5,
        'tag': 5,
    }
    default_cap = 3
    # Threshold + caps
    sub = pd.read_csv('sample_submission.csv')
    ids = sub['id'].values
    thr = float(best_thr if best_thr is not None else 0.5)
    rows = []
    for i in range(len(ids)):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap)
            c = used.get(g, 0)
            if c < kcap:
                kept.append(j); used[g] = c + 1
        if len(kept) == 0:
            kept = [int(cand_sorted[0])]
        pred_attr = [int(idx_to_attr[j]) for j in kept]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
    sub_df = pd.DataFrame(rows)
    sub_df.to_csv(out_path, index=False)
    print('Wrote', out_path, 'thr=', thr)

# Produce two variants quickly
dirs_weighted = ['out_b3_384_top512','out_b3_448_top512','out_convnext_tiny_384_top512']
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']

make_caps_submission(dirs_weighted, weights=[2,1,1], target_card=4.42, out_path='submission_caps_v2_weighted.csv')
make_caps_submission(dirs_b3, weights=[2,1], target_card=4.38, out_path='submission_caps_v2_b3.csv')

# Default active: weighted v2
shutil.copyfile('submission_caps_v2_weighted.csv', 'submission.csv')
print('Set submission.csv -> submission_caps_v2_weighted.csv')

=== Per-group caps v2 tuned to observed groups (medium, dimension, country, culture, tags) ===


[CARD] Train mean labels/img=4.421, target=4.420, chosen thr=0.496 (pred_mean=4.403)
[INFO] Skipping submission write (best_thr not available).
[BLEND] thr=0.496 target=4.42 models=['out_b3_384_top512', 'out_b3_448_top512', 'out_convnext_tiny_384_top512'] weights=[2, 1, 1]


Wrote submission_caps_v2_weighted.csv thr= 0.496


[CARD] Train mean labels/img=4.421, target=4.380, chosen thr=0.504 (pred_mean=4.370)
[INFO] Skipping submission write (best_thr not available).
[BLEND] thr=0.504 target=4.38 models=['out_b3_384_top512', 'out_b3_448_top512'] weights=[2, 1]


Wrote submission_caps_v2_b3.csv thr= 0.504
Set submission.csv -> submission_caps_v2_weighted.csv


In [50]:
import numpy as np, pandas as pd, shutil
from pathlib import Path

print('=== B3-only v2 caps @ target=4.42 with post-cap thr sweep to mean~4.40 ===', flush=True)

# 1) Get b3-only blended probs (weights 2:1). We'll sweep thr post caps.
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
oof_f1_b3, thr_b3_base, Pt_b3 = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.42)
print('[BLEND B3] base thr=', thr_b3_base, 'OOF_f1=', oof_f1_b3)

# 2) Mapping and v2 caps tuned to observed groups
labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids = labels_df['attribute_id'].astype(int).tolist()
idx_to_attr = np.array(sorted(attr_ids), dtype=np.int32)
idx_to_group = labels_df.set_index('attribute_id').loc[idx_to_attr, 'group'].values
caps = {
    'country': 1,
    'culture': 1,
    'medium': 2,
    'dimension': 1,
    'tags': 5,
    'tag': 5,
}
default_cap = 3

sub = pd.read_csv('sample_submission.csv')
ids = sub['id'].values

def apply_caps(Pt, thr):
    rows = []
    counts = []
    for i in range(len(ids)):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap)
            c = used.get(g, 0)
            if c < kcap:
                kept.append(j); used[g] = c + 1
        if len(kept) == 0:
            kept = [int(cand_sorted[0])]
        pred_attr = [int(idx_to_attr[j]) for j in kept]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
        counts.append(len(pred_attr))
    return rows, float(np.mean(counts))

# 3) Sweep global thr after caps to hit mean ~4.40
thrs = np.arange(0.494, 0.5061, 0.002)
target_mean = 4.40
best = None
best_rows = None
for t in thrs:
    rows_t, mean_t = apply_caps(Pt_b3, t)
    delta = abs(mean_t - target_mean)
    print(f'[SWEEP] thr={t:.3f} post-cap mean={mean_t:.3f} delta={delta:.3f}')
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t)
        best_rows = rows_t

_, best_thr_cap, best_mean = best
print(f'[CHOSEN] thr={best_thr_cap:.3f} post-cap mean={best_mean:.3f}')
sub_df = pd.DataFrame(best_rows)
out_path = 'submission_b3_caps_442.csv'
sub_df.to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print('Wrote', out_path, 'and set as submission.csv')

=== B3-only v2 caps @ target=4.42 with post-cap thr sweep to mean~4.40 ===


[CARD] Train mean labels/img=4.421, target=4.420, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).
[BLEND B3] base thr= 0.502 OOF_f1= None


[SWEEP] thr=0.494 post-cap mean=4.000 delta=0.400
[SWEEP] thr=0.496 post-cap mean=3.975 delta=0.425


[SWEEP] thr=0.498 post-cap mean=3.952 delta=0.448
[SWEEP] thr=0.500 post-cap mean=3.929 delta=0.471


[SWEEP] thr=0.502 post-cap mean=3.903 delta=0.497
[SWEEP] thr=0.504 post-cap mean=3.880 delta=0.520


[SWEEP] thr=0.506 post-cap mean=3.857 delta=0.543
[CHOSEN] thr=0.494 post-cap mean=4.000
Wrote submission_b3_caps_442.csv and set as submission.csv


In [54]:
import numpy as np, pandas as pd, shutil
from pathlib import Path

print('=== B3-only v2 caps @ target=4.42 with wider post-cap thr sweep to hit mean~4.40 ===', flush=True)
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
oof_f1_b3, thr_b3_base, Pt_b3 = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.42)
print('[BLEND B3] base thr=', thr_b3_base, 'OOF_f1=', oof_f1_b3)

labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids = labels_df['attribute_id'].astype(int).tolist()
idx_to_attr = np.array(sorted(attr_ids), dtype=np.int32)
idx_to_group = labels_df.set_index('attribute_id').loc[idx_to_attr, 'group'].values
caps = {
    'country': 1,
    'culture': 1,
    'medium': 2,
    'dimension': 1,
    'tags': 5,
    'tag': 5,
}
default_cap = 3

sub = pd.read_csv('sample_submission.csv')
ids = sub['id'].values

def apply_caps(Pt, thr):
    rows = []; counts = []
    for i in range(len(ids)):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap)
            c = used.get(g, 0)
            if c < kcap:
                kept.append(j); used[g] = c + 1
        if len(kept) == 0:
            kept = [int(cand_sorted[0])]
        pred_attr = [int(idx_to_attr[j]) for j in kept]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
        counts.append(len(pred_attr))
    return rows, float(np.mean(counts))

# Wider sweep to reach post-cap mean ~4.40
thrs = np.arange(0.460, 0.5201, 0.002)
target_mean = 4.40
best = None; best_rows = None
for t in thrs:
    rows_t, mean_t = apply_caps(Pt_b3, t)
    delta = abs(mean_t - target_mean)
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t); best_rows = rows_t
    if int((t-0.460)/0.002) % 10 == 0:
        print(f'[SWEEP] thr={t:.3f} post-cap mean={mean_t:.3f}')

_, best_thr_cap, best_mean = best
print(f'[CHOSEN] thr={best_thr_cap:.3f} post-cap mean={best_mean:.3f}')
sub_df = pd.DataFrame(best_rows)
out_path = 'submission_b3_caps_442_tuned.csv'
sub_df.to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print('Wrote', out_path, 'and set as submission.csv')

=== B3-only v2 caps @ target=4.42 with wider post-cap thr sweep to hit mean~4.40 ===


[CARD] Train mean labels/img=4.421, target=4.420, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).
[BLEND B3] base thr= 0.502 OOF_f1= None


[SWEEP] thr=0.460 post-cap mean=4.436


[SWEEP] thr=0.480 post-cap mean=4.173


[SWEEP] thr=0.502 post-cap mean=3.903


[SWEEP] thr=0.520 post-cap mean=3.692
[CHOSEN] thr=0.462 post-cap mean=4.409
Wrote submission_b3_caps_442_tuned.csv and set as submission.csv


In [60]:
import numpy as np, pandas as pd, shutil
from pathlib import Path

print('=== B3-only v2 caps @ target=4.38 with wider post-cap thr sweep to hit mean~4.40 ===', flush=True)
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
oof_f1_b3, thr_b3_base, Pt_b3 = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.38)
print('[BLEND B3] base thr=', thr_b3_base, 'OOF_f1=', oof_f1_b3)

labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids = labels_df['attribute_id'].astype(int).tolist()
idx_to_attr = np.array(sorted(attr_ids), dtype=np.int32)
idx_to_group = labels_df.set_index('attribute_id').loc[idx_to_attr, 'group'].values
caps = {
    'country': 1,
    'culture': 1,
    'medium': 2,
    'dimension': 1,
    'tags': 5,
    'tag': 5,
}
default_cap = 3

sub = pd.read_csv('sample_submission.csv')
ids = sub['id'].values

def apply_caps(Pt, thr):
    rows = []; counts = []
    for i in range(len(ids)):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap)
            c = used.get(g, 0)
            if c < kcap:
                kept.append(j); used[g] = c + 1
        if len(kept) == 0:
            kept = [int(cand_sorted[0])]
        pred_attr = [int(idx_to_attr[j]) for j in kept]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
        counts.append(len(pred_attr))
    return rows, float(np.mean(counts))

# Wider sweep to reach post-cap mean ~4.40
thrs = np.arange(0.460, 0.5201, 0.002)
target_mean = 4.40
best = None; best_rows = None
for t in thrs:
    rows_t, mean_t = apply_caps(Pt_b3, t)
    delta = abs(mean_t - target_mean)
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t); best_rows = rows_t
    if int((t-0.460)/0.002) % 10 == 0:
        print(f'[SWEEP] thr={t:.3f} post-cap mean={mean_t:.3f}')

_, best_thr_cap, best_mean = best
print(f'[CHOSEN] thr={best_thr_cap:.3f} post-cap mean={best_mean:.3f}')
sub_df = pd.DataFrame(best_rows)
out_path = 'submission_b3_caps_438_tuned.csv'
sub_df.to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print('Wrote', out_path, 'and set as submission.csv')

=== B3-only v2 caps @ target=4.38 with wider post-cap thr sweep to hit mean~4.40 ===


[CARD] Train mean labels/img=4.421, target=4.380, chosen thr=0.504 (pred_mean=4.370)
[INFO] Skipping submission write (best_thr not available).
[BLEND B3] base thr= 0.504 OOF_f1= None


[SWEEP] thr=0.460 post-cap mean=4.436


[SWEEP] thr=0.480 post-cap mean=4.173


[SWEEP] thr=0.502 post-cap mean=3.903


[SWEEP] thr=0.520 post-cap mean=3.692
[CHOSEN] thr=0.462 post-cap mean=4.409
Wrote submission_b3_caps_438_tuned.csv and set as submission.csv


In [53]:
import numpy as np, pandas as pd, shutil
from pathlib import Path

print('=== B3-only v2 caps @ target=4.35 with wider post-cap thr sweep to hit mean~4.40 ===', flush=True)
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
oof_f1_b3, thr_b3_base, Pt_b3 = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.35)
print('[BLEND B3] base thr=', thr_b3_base, 'OOF_f1=', oof_f1_b3)

labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids = labels_df['attribute_id'].astype(int).tolist()
idx_to_attr = np.array(sorted(attr_ids), dtype=np.int32)
idx_to_group = labels_df.set_index('attribute_id').loc[idx_to_attr, 'group'].values
caps = {
    'country': 1,
    'culture': 1,
    'medium': 2,
    'dimension': 1,
    'tags': 5,
    'tag': 5,
}
default_cap = 3

sub = pd.read_csv('sample_submission.csv')
ids = sub['id'].values

def apply_caps(Pt, thr):
    rows = []; counts = []
    for i in range(len(ids)):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap)
            c = used.get(g, 0)
            if c < kcap:
                kept.append(j); used[g] = c + 1
        if len(kept) == 0:
            kept = [int(cand_sorted[0])]
        pred_attr = [int(idx_to_attr[j]) for j in kept]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
        counts.append(len(pred_attr))
    return rows, float(np.mean(counts))

# Wider sweep to reach post-cap mean ~4.40
thrs = np.arange(0.460, 0.5201, 0.002)
target_mean = 4.40
best = None; best_rows = None
for t in thrs:
    rows_t, mean_t = apply_caps(Pt_b3, t)
    delta = abs(mean_t - target_mean)
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t); best_rows = rows_t
    if int((t-0.460)/0.002) % 10 == 0:
        print(f'[SWEEP] thr={t:.3f} post-cap mean={mean_t:.3f}')

_, best_thr_cap, best_mean = best
print(f'[CHOSEN] thr={best_thr_cap:.3f} post-cap mean={best_mean:.3f}')
sub_df = pd.DataFrame(best_rows)
out_path = 'submission_b3_caps_435_tuned.csv'
sub_df.to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print('Wrote', out_path, 'and set as submission.csv')

=== B3-only v2 caps @ target=4.35 with wider post-cap thr sweep to hit mean~4.40 ===


[CARD] Train mean labels/img=4.421, target=4.350, chosen thr=0.506 (pred_mean=4.335)
[INFO] Skipping submission write (best_thr not available).
[BLEND B3] base thr= 0.506 OOF_f1= None


[SWEEP] thr=0.460 post-cap mean=4.436


[SWEEP] thr=0.502 post-cap mean=3.903


[SWEEP] thr=0.520 post-cap mean=3.692
[CHOSEN] thr=0.462 post-cap mean=4.409
Wrote submission_b3_caps_435_tuned.csv and set as submission.csv


In [56]:
import sys, subprocess, os, time, math, gc, numpy as np, pandas as pd
from pathlib import Path
from PIL import Image

print('=== Install and prepare CLIP kNN label transfer pipeline ===', flush=True)

# 0) Ensure deps (open_clip_torch, faiss-cpu) are installed
def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

try:
    import open_clip
except Exception:
    pip('install', 'open_clip_torch==2.26.1')
    import open_clip
try:
    import faiss
except Exception:
    pip('install', 'faiss-cpu==1.8.0.post1')
    import faiss

import torch
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as T

# 1) Dataset for CLIP embedding extraction
class ImgDataset(Dataset):
    def __init__(self, ids, root: Path, ext: str, transform):
        self.ids = ids
        self.root = Path(root)
        self.ext = ext
        self.t = transform
    def __len__(self):
        return len(self.ids)
    def __getitem__(self, i):
        img_id = self.ids[i]
        p = self.root / f'{img_id}{self.ext}'
        with Image.open(p) as im:
            im = im.convert('RGB')
            x = self.t(im)
        return x, i

# 2) Extract CLIP embeddings for train/test and save
def extract_clip_embeddings(model_name='ViT-B-32', pretrained='laion2b_s34b_b79k', image_size=224, batch_size=256, num_workers=8, device='cuda'):
    import train as trn
    base = Path('.')
    train_df = pd.read_csv('train.csv')
    test_df = pd.read_csv('sample_submission.csv')
    # detect extension (png)
    img_ext = trn.detect_ext(Path('train'), [train_df['id'].iloc[0]])
    print('Detected ext:', img_ext, flush=True)

    # Load CLIP
    print('Loading CLIP:', model_name, pretrained, flush=True)
    model, _, preprocess = open_clip.create_model_and_transforms(model_name, pretrained=pretrained, device=device)
    model.eval()
    # Replace preprocess with minimal equivalent to keep control over size/normalize
    # open_clip preprocess already matches model's expected stats
    transform = preprocess

    def run_split(split_name, ids, img_root, out_path):
        ds = ImgDataset(ids, img_root, img_ext, transform)
        dl = DataLoader(ds, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True)
        feats = np.zeros((len(ids), model.visual.output_dim), dtype=np.float32)
        t0 = time.time()
        seen = 0
        with torch.no_grad():
            for bi, (xb, idx) in enumerate(dl):
                xb = xb.to(device, non_blocking=True)
                with torch.cuda.amp.autocast(enabled=True):
                    f = model.encode_image(xb)
                f = f.float()
                f = torch.nn.functional.normalize(f, dim=1)
                feats[idx.numpy()] = f.cpu().numpy()
                seen += xb.size(0)
                if (bi+1) % 20 == 0:
                    dt = time.time()-t0
                    ips = seen / max(dt,1e-6)
                    print(f'[{split_name}] {seen}/{len(ids)} in {dt/60:.1f}m ({ips:.1f} img/s)', flush=True)
        np.save(out_path, feats)
        print(f'[{split_name}] Saved {out_path} shape={feats.shape}', flush=True)
        del feats; gc.collect()

    # Train embeddings
    train_ids = train_df['id'].tolist()
    test_ids = test_df['id'].tolist()
    if not Path('clip_train_emb.npy').exists():
        run_split('train', train_ids, Path('train'), 'clip_train_emb.npy')
    else:
        print('[train] clip_train_emb.npy exists, skipping')
    if not Path('clip_test_emb.npy').exists():
        run_split('test', test_ids, Path('test'), 'clip_test_emb.npy')
    else:
        print('[test] clip_test_emb.npy exists, skipping')
    print('CLIP embedding extraction done.')

print('Ready: run this cell to install deps and define functions, then call extract_clip_embeddings() in the next cell.', flush=True)

=== Install and prepare CLIP kNN label transfer pipeline ===


> pip install open_clip_torch==2.26.1


Collecting open_clip_torch==2.26.1
  Downloading open_clip_torch-2.26.1-py3-none-any.whl (1.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 49.4 MB/s eta 0:00:00
Collecting tqdm
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 416.6 MB/s eta 0:00:00
Collecting ftfy
  Downloading ftfy-6.3.1-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.8/44.8 KB 235.4 MB/s eta 0:00:00


Collecting huggingface-hub
  Downloading huggingface_hub-0.35.1-py3-none-any.whl (563 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 563.3/563.3 KB 280.6 MB/s eta 0:00:00


Collecting regex
  Downloading regex-2025.9.18-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (798 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 KB 133.2 MB/s eta 0:00:00
Collecting torchvision
  Downloading torchvision-0.23.0-cp311-cp311-manylinux_2_28_x86_64.whl (8.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.6/8.6 MB 320.8 MB/s eta 0:00:00
Collecting torch>=1.9.0


  Downloading torch-2.8.0-cp311-cp311-manylinux_2_28_x86_64.whl (888.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 888.1/888.1 MB 320.9 MB/s eta 0:00:00


Collecting timm
  Downloading timm-1.0.20-py3-none-any.whl (2.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 368.4 MB/s eta 0:00:00
Collecting nvidia-cufile-cu12==1.13.1.3
  Downloading nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 259.3 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.9.90
  Downloading nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl (63.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 272.3 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.5.8.93
  Downloading nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (288.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 MB 320.5 MB/s eta 0:00:00


Collecting nvidia-cufft-cu12==11.3.3.83
  Downloading nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 331.3 MB/s eta 0:00:00
Collecting typing-extensions>=4.10.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 421.1 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.8.90
  Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 301.5 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.27.3
  Downloading nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.4/322.4 MB 287.8 MB/s eta 0:00:00


Collecting fsspec
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 510.6 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.8.4.1
  Downloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl (594.3 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 594.3/594.3 MB 299.1 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.7.3.90
  Downloading nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl (267.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 267.5/267.5 MB 369.5 MB/s eta 0:00:00


Collecting nvidia-cusparselt-cu12==0.7.1
  Downloading nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 287.2/287.2 MB 495.5 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.8.90
  Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 KB 417.0 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 475.7 MB/s eta 0:00:00
Collecting nvidia-nvjitlink-cu12==12.8.93
  Downloading nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.3/39.3 MB 323.8 MB/s eta 0:00:00


Collecting nvidia-cuda-nvrtc-cu12==12.8.93
  Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 407.8 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.10.2.21
  Downloading nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 706.8/706.8 MB 265.1 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.8.90
  Downloading nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90.0/90.0 KB 350.9 MB/s eta 0:00:00
Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 493.8 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting triton==3.4.0
  Downloading triton-3.4.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.5/155.5 MB 157.7 MB/s eta 0:00:00
Collecting sympy>=1.13.3
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 487.1 MB/s eta 0:00:00


Collecting setuptools>=40.8.0
  Downloading setuptools-80.9.0-py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 536.6 MB/s eta 0:00:00
Collecting wcwidth
  Downloading wcwidth-0.2.14-py2.py3-none-any.whl (37 kB)
Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 452.8 MB/s eta 0:00:00
Collecting hf-xet<2.0.0,>=1.1.3
  Downloading hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 563.9 MB/s eta 0:00:00


Collecting pyyaml>=5.1
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 539.2 MB/s eta 0:00:00
Collecting packaging>=20.9
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 431.0 MB/s eta 0:00:00


Collecting safetensors
  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.8/485.8 KB 541.8 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 565.4 MB/s eta 0:00:00


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 317.7 MB/s eta 0:00:00
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 545.4 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB)
Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 515.2 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1


  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 393.3 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 443.3 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 505.9 MB/s eta 0:00:00


ERROR: Operation cancelled by user


KeyboardInterrupt: 

In [57]:
import sys, subprocess, os, time, math, gc, numpy as np, pandas as pd
from pathlib import Path
from PIL import Image

print('=== Safe install for CLIP kNN (no torch upgrade) and define extract ===', flush=True)

def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

# 1) Install open_clip without deps to avoid torch override; then install minimal deps under constraints
try:
    import open_clip
    print('open_clip already available')
except Exception:
    assert Path('constraints.txt').exists(), 'constraints.txt missing'
    pip('install', '--no-deps', 'open_clip_torch==2.26.1')
    # Minimal deps; honor torch constraints to prevent upgrades
    pip('install', '-c', 'constraints.txt', 'ftfy==6.3.1', 'tqdm', 'huggingface-hub==0.35.1', 'regex', 'safetensors', 'pillow', '--upgrade-strategy', 'only-if-needed')
    import open_clip  # noqa

try:
    import faiss  # noqa
    print('faiss available')
except Exception:
    pip('install', 'faiss-cpu==1.8.0.post1')
    import faiss  # noqa

import torch
from torch.utils.data import Dataset, DataLoader

class ImgDataset(Dataset):
    def __init__(self, ids, root: Path, ext: str, transform):
        self.ids = ids
        self.root = Path(root)
        self.ext = ext
        self.t = transform
    def __len__(self):
        return len(self.ids)
    def __getitem__(self, i):
        img_id = self.ids[i]
        p = self.root / f'{img_id}{self.ext}'
        with Image.open(p) as im:
            im = im.convert('RGB')
            x = self.t(im)
        return x, i

def extract_clip_embeddings(model_name='ViT-B-32', pretrained='laion2b_s34b_b79k', batch_size=256, num_workers=8, device='cuda'):
    import open_clip
    import train as trn
    train_df = pd.read_csv('train.csv')
    test_df = pd.read_csv('sample_submission.csv')
    img_ext = trn.detect_ext(Path('train'), [train_df['id'].iloc[0]])
    print('Detected ext:', img_ext, flush=True)
    print('Loading CLIP:', model_name, pretrained, flush=True)
    model, _, preprocess = open_clip.create_model_and_transforms(model_name, pretrained=pretrained, device=device)
    model.eval()
    transform = preprocess

    def run_split(split_name, ids, img_root, out_path):
        dl = DataLoader(ImgDataset(ids, img_root, img_ext, transform), batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True)
        feats = np.zeros((len(ids), model.visual.output_dim), dtype=np.float32)
        t0 = time.time(); seen = 0
        with torch.no_grad():
            for bi, (xb, idx) in enumerate(dl):
                xb = xb.to(device, non_blocking=True)
                with torch.cuda.amp.autocast(enabled=True):
                    f = model.encode_image(xb)
                f = torch.nn.functional.normalize(f.float(), dim=1)
                feats[idx.numpy()] = f.cpu().numpy()
                seen += xb.size(0)
                if (bi+1) % 20 == 0:
                    dt = time.time()-t0
                    print(f'[{split_name}] {seen}/{len(ids)} ({seen/max(dt,1e-6):.1f} img/s) elapsed {dt/60:.1f}m', flush=True)
        np.save(out_path, feats)
        print(f'[{split_name}] Saved {out_path} shape={feats.shape}', flush=True)
        del feats; gc.collect()

    train_ids = train_df['id'].tolist()
    test_ids = test_df['id'].tolist()
    if not Path('clip_train_emb.npy').exists():
        run_split('train', train_ids, Path('train'), 'clip_train_emb.npy')
    else:
        print('[train] clip_train_emb.npy exists, skipping')
    if not Path('clip_test_emb.npy').exists():
        run_split('test', test_ids, Path('test'), 'clip_test_emb.npy')
    else:
        print('[test] clip_test_emb.npy exists, skipping')
    print('CLIP embedding extraction done.')

print('Safe installer ready. Next: run extract_clip_embeddings() to generate CLIP embeddings, then build FAISS kNN and blend.', flush=True)

=== Safe install for CLIP kNN (no torch upgrade) and define extract ===


> pip install --no-deps open_clip_torch==2.26.1


Collecting open_clip_torch==2.26.1
  Downloading open_clip_torch-2.26.1-py3-none-any.whl (1.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 55.6 MB/s eta 0:00:00
Installing collected packages: open_clip_torch
Successfully installed open_clip_torch-2.26.1
> pip install -c constraints.txt ftfy==6.3.1 tqdm huggingface-hub==0.35.1 regex safetensors pillow --upgrade-strategy only-if-needed


Collecting ftfy==6.3.1
  Downloading ftfy-6.3.1-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.8/44.8 KB 3.3 MB/s eta 0:00:00
Collecting tqdm
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 25.6 MB/s eta 0:00:00
Collecting huggingface-hub==0.35.1
  Downloading huggingface_hub-0.35.1-py3-none-any.whl (563 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 563.3/563.3 KB 82.8 MB/s eta 0:00:00


Collecting regex
  Downloading regex-2025.9.18-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (798 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 KB 450.7 MB/s eta 0:00:00
Collecting safetensors
  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.8/485.8 KB 294.3 MB/s eta 0:00:00


Collecting pillow
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 443.1 MB/s eta 0:00:00
Collecting wcwidth
  Downloading wcwidth-0.2.14-py2.py3-none-any.whl (37 kB)
Collecting hf-xet<2.0.0,>=1.1.3
  Downloading hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 478.2 MB/s eta 0:00:00


Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 414.0 MB/s eta 0:00:00
Collecting pyyaml>=5.1
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 505.9 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting typing-extensions>=3.7.4.3
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 403.5 MB/s eta 0:00:00
Collecting fsspec>=2023.5.0
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 488.5 MB/s eta 0:00:00
Collecting packaging>=20.9
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 401.7 MB/s eta 0:00:00


Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 465.7 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 404.0 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 441.4 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 469.7 MB/s eta 0:00:00


Installing collected packages: wcwidth, urllib3, typing-extensions, tqdm, safetensors, regex, pyyaml, pillow, packaging, idna, hf-xet, fsspec, filelock, charset_normalizer, certifi, requests, ftfy, huggingface-hub


Successfully installed certifi-2025.8.3 charset_normalizer-3.4.3 filelock-3.19.1 fsspec-2025.9.0 ftfy-6.3.1 hf-xet-1.1.10 huggingface-hub-0.35.1 idna-3.10 packaging-25.0 pillow-11.3.0 pyyaml-6.0.3 regex-2025.9.18 requests-2.32.5 safetensors-0.6.2 tqdm-4.67.1 typing-extensions-4.15.0 urllib3-2.5.0 wcwidth-0.2.14


Collecting faiss-cpu==1.8.0.post1
  Downloading faiss_cpu-1.8.0.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.0/27.0 MB 179.1 MB/s eta 0:00:00


Collecting numpy<2.0,>=1.0
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 260.0 MB/s eta 0:00:00
Collecting packaging
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 396.0 MB/s eta 0:00:00


Installing collected packages: packaging, numpy, faiss-cpu


Successfully installed faiss-cpu-1.8.0.post1 numpy-1.26.4 packaging-25.0
Safe installer ready. Next: run extract_clip_embeddings() to generate CLIP embeddings, then build FAISS kNN and blend.




In [58]:
print('=== Extracting CLIP embeddings (ViT-B/32 @224, BS=256) for kNN label transfer ===', flush=True)
try:
    extract_clip_embeddings(model_name='ViT-B-32', pretrained='laion2b_s34b_b79k', batch_size=256, num_workers=8, device='cuda')
except Exception as e:
    print('CLIP extraction failed:', e)

=== Extracting CLIP embeddings (ViT-B/32 @224, BS=256) for kNN label transfer ===


Detected ext: .png


Loading CLIP: ViT-B-32 laion2b_s34b_b79k


  checkpoint = torch.load(checkpoint_path, map_location=map_location)


  with torch.cuda.amp.autocast(enabled=True):


[train] 5120/120801 (1150.8 img/s) elapsed 0.1m


[train] 10240/120801 (1457.1 img/s) elapsed 0.1m


[train] 15360/120801 (1469.3 img/s) elapsed 0.2m


[train] 20480/120801 (1562.4 img/s) elapsed 0.2m


[train] 25600/120801 (1554.2 img/s) elapsed 0.3m


[train] 30720/120801 (1601.4 img/s) elapsed 0.3m


[train] 35840/120801 (1596.7 img/s) elapsed 0.4m


[train] 40960/120801 (1626.7 img/s) elapsed 0.4m


[train] 46080/120801 (1617.7 img/s) elapsed 0.5m


[train] 51200/120801 (1641.7 img/s) elapsed 0.5m


[train] 56320/120801 (1633.5 img/s) elapsed 0.6m


[train] 61440/120801 (1651.5 img/s) elapsed 0.6m


[train] 66560/120801 (1643.1 img/s) elapsed 0.7m


[train] 71680/120801 (1658.2 img/s) elapsed 0.7m


[train] 76800/120801 (1652.7 img/s) elapsed 0.8m


[train] 81920/120801 (1668.3 img/s) elapsed 0.8m


[train] 87040/120801 (1661.6 img/s) elapsed 0.9m


[train] 92160/120801 (1674.6 img/s) elapsed 0.9m


[train] 97280/120801 (1670.3 img/s) elapsed 1.0m


[train] 102400/120801 (1679.1 img/s) elapsed 1.0m


[train] 107520/120801 (1677.2 img/s) elapsed 1.1m


[train] 112640/120801 (1680.3 img/s) elapsed 1.1m


[train] 117760/120801 (1681.7 img/s) elapsed 1.2m


[train] Saved clip_train_emb.npy shape=(120801, 512)


[test] 5120/21318 (1135.0 img/s) elapsed 0.1m


[test] 10240/21318 (1423.9 img/s) elapsed 0.1m


[test] 15360/21318 (1487.9 img/s) elapsed 0.2m


[test] 20480/21318 (1581.3 img/s) elapsed 0.2m


[test] Saved clip_test_emb.npy shape=(21318, 512)


CLIP embedding extraction done.


In [59]:
import numpy as np, pandas as pd, os, time, faiss, shutil
from pathlib import Path

print('=== kNN label transfer with CLIP + b3-only logit blend + v2 caps + tuned threshold ===', flush=True)

# 0) Preconditions: embeddings must exist
train_emb_p = Path('clip_train_emb.npy')
test_emb_p = Path('clip_test_emb.npy')
if not (train_emb_p.exists() and test_emb_p.exists()):
    print('[SKIP] Missing CLIP embeddings. Run extract_clip_embeddings() first.')
else:
    t0 = time.time()
    Xtr = np.load(train_emb_p).astype(np.float32)
    Xte = np.load(test_emb_p).astype(np.float32)
    # Assumed already L2-normalized in extractor
    print('Loaded embeddings:', Xtr.shape, Xte.shape, flush=True)

    # 1) Build FAISS index (cosine via inner-product on normalized vectors)
    d = Xtr.shape[1]
    index = faiss.IndexFlatIP(d)
    index.add(Xtr)
    print('FAISS index built. nt=', index.ntotal, 'dim=', d, flush=True)

    # 2) kNN search
    k = 20
    sims, nn_idx = index.search(Xte, k)  # sims in [-1,1] due to cosine
    sims = np.maximum(sims, 0.0).astype(np.float32)  # clamp negatives to 0
    print('kNN done. sims shape:', sims.shape, 'idx shape:', nn_idx.shape, flush=True)

    # 3) Build train label matrix (CSR-like via manual aggregation)
    labels_df = pd.read_csv('labels.csv')
    attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
    attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
    C = len(attr_ids_sorted)
    train_df = pd.read_csv('train.csv')
    n = len(train_df)
    # Prepare list of sets for memory efficiency during gather
    lab_sets = []
    for s in train_df['attribute_ids'].fillna('').astype(str).tolist():
        if s:
            lab_sets.append([attr_to_col[int(x)] for x in s.split() if x!='' and int(x) in attr_to_col])
        else:
            lab_sets.append([])
    print('Built train label indices (lists).', flush=True)

    # 4) Compute kNN label probabilities for test: weighted average over neighbors
    ntest = Xte.shape[0]
    probs_knn = np.zeros((ntest, C), dtype=np.float32)
    wsum = sims.sum(axis=1, keepdims=True) + 1e-8
    for i in range(ntest):
        nn = nn_idx[i]
        ws = sims[i] / wsum[i]
        # accumulate weighted labels
        for nbr, w in zip(nn, ws):
            for c in lab_sets[nbr]:
                probs_knn[i, c] += float(w)
        if (i+1) % 2000 == 0:
            print(f'[kNN] processed {i+1}/{ntest}', flush=True)
    print('kNN label transfer complete.', flush=True)

    # 5) Get b3-only model probs on test via our blender
    def sigmoid(x):
        return 1.0 / (1.0 + np.exp(-x))
    def probs_to_logits(p, eps=1e-5):
        p = np.clip(p, eps, 1.0 - eps)
        return np.log(p / (1.0 - p))
    from __main__ import blend_equal_weight  # defined earlier in this notebook
    dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
    _, _, Pt_b3 = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
    print('Blender produced model probs:', None if Pt_b3 is None else Pt_b3.shape, flush=True)

    # 6) Logit blend: z_final = z_model + alpha * z_knn
    alpha = 0.5
    Zm = probs_to_logits(Pt_b3)
    Zk = probs_to_logits(probs_knn)
    Zf = Zm + alpha * Zk
    Pf = sigmoid(Zf)
    del Zm, Zk, Zf;

    # 7) v2 caps + post-cap threshold sweep to target mean ~4.40
    labels_df = pd.read_csv('labels.csv')
    labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
    idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
    caps = {
        'country': 1,
        'culture': 1,
        'medium': 2,
        'dimension': 1,
        'tags': 5,
        'tag': 5,
    }
    default_cap = 3
    sub = pd.read_csv('sample_submission.csv')
    ids = sub['id'].values

    def apply_caps(Pt, thr):
        rows = []; counts = []
        for i in range(Pt.shape[0]):
            p = Pt[i]
            cand = np.where(p >= thr)[0]
            if cand.size == 0:
                cand = np.array([int(np.argmax(p))], dtype=np.int64)
            cand_sorted = cand[np.argsort(-p[cand])]
            used, kept = {}, []
            for j in cand_sorted:
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap)
                c = used.get(g, 0)
                if c < kcap:
                    kept.append(j); used[g] = c + 1
            if len(kept) == 0:
                kept = [int(cand_sorted[0])]
            pred_attr = [int(attr_ids_sorted[j]) for j in kept]
            rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
            counts.append(len(pred_attr))
        return rows, float(np.mean(counts))

    thrs = np.arange(0.460, 0.5201, 0.002)
    target_mean = 4.40
    best = None; best_rows = None
    for t in thrs:
        rows_t, mean_t = apply_caps(Pf, t)
        delta = abs(mean_t - target_mean)
        if (best is None) or (delta < best[0]):
            best = (delta, t, mean_t); best_rows = rows_t
        # periodic log
        if int((t-0.460)/0.002) % 10 == 0:
            print(f'[SWEEP] thr={t:.3f} post-cap mean={mean_t:.3f}')
    _, best_thr_cap, best_mean = best
    print(f'[CHOSEN] thr={best_thr_cap:.3f} post-cap mean={best_mean:.3f}')
    sub_df = pd.DataFrame(best_rows)
    out_path = 'submission_knn_blend_tuned.csv'
    sub_df.to_csv(out_path, index=False)
    shutil.copyfile(out_path, 'submission.csv')
    print('Wrote', out_path, 'and set as submission.csv. Elapsed:', round((time.time()-t0)/60,1),'min', flush=True)

=== kNN label transfer with CLIP + b3-only logit blend + v2 caps + tuned threshold ===


Loaded embeddings: (120801, 512) (21318, 512)


FAISS index built. nt= 120801 dim= 512


kNN done. sims shape: (21318, 20) idx shape: (21318, 20)


Built train label indices (lists).


[kNN] processed 2000/21318


[kNN] processed 4000/21318


[kNN] processed 6000/21318


[kNN] processed 8000/21318


[kNN] processed 10000/21318


[kNN] processed 12000/21318


[kNN] processed 14000/21318


[kNN] processed 16000/21318


[kNN] processed 18000/21318


[kNN] processed 20000/21318


kNN label transfer complete.


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).
Blender produced model probs: (21318, 3474)


[SWEEP] thr=0.460 post-cap mean=3.097


[SWEEP] thr=0.480 post-cap mean=2.995


[SWEEP] thr=0.502 post-cap mean=2.888


[SWEEP] thr=0.520 post-cap mean=2.804
[CHOSEN] thr=0.460 post-cap mean=3.097
Wrote submission_knn_blend_tuned.csv and set as submission.csv. Elapsed: 1.2 min


In [61]:
import numpy as np, pandas as pd, shutil
from pathlib import Path

print('=== B3-only v2 caps @ target=4.38, weights 1:1, wide post-cap thr sweep to hit mean~4.40 ===', flush=True)
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
oof_f1_b3, thr_b3_base, Pt_b3 = blend_equal_weight(dirs_b3, weights=[1,1], write_submission=False, out_name='noop.csv', cardinality_target=4.38)
print('[BLEND B3 1:1] base thr=', thr_b3_base, 'OOF_f1=', oof_f1_b3)

labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids = labels_df['attribute_id'].astype(int).tolist()
idx_to_attr = np.array(sorted(attr_ids), dtype=np.int32)
idx_to_group = labels_df.set_index('attribute_id').loc[idx_to_attr, 'group'].values
caps = {
    'country': 1,
    'culture': 1,
    'medium': 2,
    'dimension': 1,
    'tags': 5,
    'tag': 5,
}
default_cap = 3

sub = pd.read_csv('sample_submission.csv')
ids = sub['id'].values

def apply_caps(Pt, thr):
    rows = []; counts = []
    for i in range(len(ids)):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap)
            c = used.get(g, 0)
            if c < kcap:
                kept.append(j); used[g] = c + 1
        if len(kept) == 0:
            kept = [int(cand_sorted[0])]
        pred_attr = [int(idx_to_attr[j]) for j in kept]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
        counts.append(len(pred_attr))
    return rows, float(np.mean(counts))

# Wide sweep to reach post-cap mean ~4.40
thrs = np.arange(0.460, 0.5201, 0.002)
target_mean = 4.40
best = None; best_rows = None
for t in thrs:
    rows_t, mean_t = apply_caps(Pt_b3, t)
    delta = abs(mean_t - target_mean)
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t); best_rows = rows_t
    if int((t-0.460)/0.002) % 10 == 0:
        print(f'[SWEEP] thr={t:.3f} post-cap mean={mean_t:.3f}')

_, best_thr_cap, best_mean = best
print(f'[CHOSEN] thr={best_thr_cap:.3f} post-cap mean={best_mean:.3f}')
sub_df = pd.DataFrame(best_rows)
out_path = 'submission_b3_caps_438_w11_tuned.csv'
sub_df.to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print('Wrote', out_path, 'and set as submission.csv')

=== B3-only v2 caps @ target=4.38, weights 1:1, wide post-cap thr sweep to hit mean~4.40 ===


[CARD] Train mean labels/img=4.421, target=4.380, chosen thr=0.504 (pred_mean=4.394)
[INFO] Skipping submission write (best_thr not available).
[BLEND B3 1:1] base thr= 0.504 OOF_f1= None


[SWEEP] thr=0.460 post-cap mean=4.456


[SWEEP] thr=0.480 post-cap mean=4.197


[SWEEP] thr=0.502 post-cap mean=3.922


[SWEEP] thr=0.520 post-cap mean=3.711
[CHOSEN] thr=0.464 post-cap mean=4.400
Wrote submission_b3_caps_438_w11_tuned.csv and set as submission.csv


In [62]:
import numpy as np, pandas as pd, os, time, faiss, shutil
from pathlib import Path

print('=== kNN label transfer v2: k=50, alpha=0.3, b3-only blend, v2 caps, tuned threshold ===', flush=True)

train_emb_p = Path('clip_train_emb.npy')
test_emb_p = Path('clip_test_emb.npy')
if not (train_emb_p.exists() and test_emb_p.exists()):
    print('[SKIP] Missing CLIP embeddings.')
else:
    t0 = time.time()
    Xtr = np.load(train_emb_p).astype(np.float32)
    Xte = np.load(test_emb_p).astype(np.float32)
    print('Loaded embeddings:', Xtr.shape, Xte.shape, flush=True)

    d = Xtr.shape[1]
    index = faiss.IndexFlatIP(d)
    index.add(Xtr)
    print('FAISS index built. nt=', index.ntotal, 'dim=', d, flush=True)

    k = 50
    sims, nn_idx = index.search(Xte, k)
    sims = np.maximum(sims, 0.0).astype(np.float32)
    print('kNN done. sims shape:', sims.shape, 'idx shape:', nn_idx.shape, flush=True)

    labels_df = pd.read_csv('labels.csv')
    attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
    attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
    C = len(attr_ids_sorted)
    train_df = pd.read_csv('train.csv')
    lab_lists = []
    for s in train_df['attribute_ids'].fillna('').astype(str).tolist():
        if s:
            lab_lists.append([attr_to_col[int(x)] for x in s.split() if x!='' and int(x) in attr_to_col])
        else:
            lab_lists.append([])
    print('Built train label lists.', flush=True)

    ntest = Xte.shape[0]
    probs_knn = np.zeros((ntest, C), dtype=np.float32)
    wsum = sims.sum(axis=1, keepdims=True) + 1e-8
    for i in range(ntest):
        nn = nn_idx[i]
        ws = sims[i] / wsum[i]
        for nbr, w in zip(nn, ws):
            for c in lab_lists[nbr]:
                probs_knn[i, c] += float(w)
        if (i+1) % 2000 == 0:
            print(f'[kNN] {i+1}/{ntest}', flush=True)

    def sigmoid(x):
        return 1.0 / (1.0 + np.exp(-x))
    def probs_to_logits(p, eps=1e-5):
        p = np.clip(p, eps, 1.0 - eps)
        return np.log(p / (1.0 - p))

    from __main__ import blend_equal_weight
    dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
    _, _, Pt_b3 = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
    print('Model probs shape:', None if Pt_b3 is None else Pt_b3.shape, flush=True)

    alpha = 0.3
    Zm = probs_to_logits(Pt_b3)
    Zk = probs_to_logits(probs_knn)
    Pf = sigmoid(Zm + alpha * Zk)
    del Zm, Zk

    labels_df = pd.read_csv('labels.csv')
    labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
    idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
    caps = {
        'country': 1,
        'culture': 1,
        'medium': 2,
        'dimension': 1,
        'tags': 5,
        'tag': 5,
    }
    default_cap = 3
    sub = pd.read_csv('sample_submission.csv')
    ids = sub['id'].values

    def apply_caps(Pt, thr):
        rows = []; counts = []
        for i in range(Pt.shape[0]):
            p = Pt[i]
            cand = np.where(p >= thr)[0]
            if cand.size == 0:
                cand = np.array([int(np.argmax(p))], dtype=np.int64)
            cand_sorted = cand[np.argsort(-p[cand])]
            used, kept = {}, []
            for j in cand_sorted:
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap)
                c = used.get(g, 0)
                if c < kcap:
                    kept.append(j); used[g] = c + 1
            if len(kept) == 0:
                kept = [int(cand_sorted[0])]
            pred_attr = [int(attr_ids_sorted[j]) for j in kept]
            rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
            counts.append(len(pred_attr))
        return rows, float(np.mean(counts))

    thrs = np.arange(0.440, 0.5401, 0.002)
    target_mean = 4.40
    best = None; best_rows = None
    for t in thrs:
        rows_t, mean_t = apply_caps(Pf, t)
        delta = abs(mean_t - target_mean)
        if (best is None) or (delta < best[0]):
            best = (delta, t, mean_t); best_rows = rows_t
        if int((t-0.440)/0.002) % 15 == 0:
            print(f'[SWEEP] thr={t:.3f} post-cap mean={mean_t:.3f}')

    _, best_thr_cap, best_mean = best
    print(f'[CHOSEN] thr={best_thr_cap:.3f} post-cap mean={best_mean:.3f}')
    sub_df = pd.DataFrame(best_rows)
    out_path = 'submission_knn_blend_tuned_v2.csv'
    sub_df.to_csv(out_path, index=False)
    shutil.copyfile(out_path, 'submission.csv')
    print('Wrote', out_path, 'and set as submission.csv. Elapsed:', round((time.time()-t0)/60,1), 'min', flush=True)

=== kNN label transfer v2: k=50, alpha=0.3, b3-only blend, v2 caps, tuned threshold ===


Loaded embeddings: (120801, 512) (21318, 512)


FAISS index built. nt= 120801 dim= 512


kNN done. sims shape: (21318, 50) idx shape: (21318, 50)


Built train label lists.


[kNN] 2000/21318


[kNN] 4000/21318


[kNN] 6000/21318


[kNN] 8000/21318


[kNN] 10000/21318


[kNN] 12000/21318


[kNN] 14000/21318


[kNN] 16000/21318


[kNN] 18000/21318


[kNN] 20000/21318


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).
Model probs shape: (21318, 3474)


[SWEEP] thr=0.440 post-cap mean=3.431


[SWEEP] thr=0.470 post-cap mean=3.231


[SWEEP] thr=0.500 post-cap mean=3.040


[SWEEP] thr=0.530 post-cap mean=2.860


[CHOSEN] thr=0.440 post-cap mean=3.431
Wrote submission_knn_blend_tuned_v2.csv and set as submission.csv. Elapsed: 1.4 min


In [63]:
import numpy as np, pandas as pd, os, time, faiss, shutil
from pathlib import Path

print('=== kNN label transfer v3: softmax-weighted (tau=0.10) + IDF^0.5 + per-neighbor |L|^-1 + prob-blend beta=0.10 + caps + min-total ===', flush=True)

t0 = time.time()
train_emb_p = Path('clip_train_emb.npy')
test_emb_p = Path('clip_test_emb.npy')
assert train_emb_p.exists() and test_emb_p.exists(), 'Missing CLIP embeddings; run extraction first.'
Xtr = np.load(train_emb_p).astype(np.float32)
Xte = np.load(test_emb_p).astype(np.float32)
print('Loaded embeddings:', Xtr.shape, Xte.shape, flush=True)

# Build FAISS IP (cosine on L2-normalized feats)
d = Xtr.shape[1]
index = faiss.IndexFlatIP(d)
index.add(Xtr)
print('FAISS index built. nt=', index.ntotal, 'dim=', d, flush=True)

# Parameters per expert
k = 100
tau = 0.10
eta = 1.0  # per-neighbor normalization by |L_j|^eta
gamma_idf = 0.5

# kNN search
sims, nn_idx = index.search(Xte, k)  # cosine similarities
print('kNN done. sims shape:', sims.shape, 'idx shape:', nn_idx.shape, flush=True)

# Labels and priors
labels_df = pd.read_csv('labels.csv')
attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
C = len(attr_ids_sorted)
train_df = pd.read_csv('train.csv')
n_train = len(train_df)

# Build train label lists and |L_j|
lab_lists = []
lab_sizes = np.zeros(n_train, dtype=np.int32)
df_counts = np.zeros(C, dtype=np.int32)  # document frequency per class
for i, s in enumerate(train_df['attribute_ids'].fillna('').astype(str).tolist()):
    if s:
        cols = [attr_to_col[int(x)] for x in s.split() if x!='' and int(x) in attr_to_col]
        lab_lists.append(cols)
        lab_sizes[i] = len(cols)
        for c in set(cols):
            df_counts[c] += 1
    else:
        lab_lists.append([])
        lab_sizes[i] = 0

# IDF weights
idf = np.log(n_train / (df_counts.astype(np.float32) + 1.0))
idf = np.clip(idf, 0.0, None) ** gamma_idf  # idf^gamma

# Softmax weights over neighbors with temperature tau
sims_sm = sims / max(tau, 1e-6)
sims_sm = sims_sm - sims_sm.max(axis=1, keepdims=True)  # stabilize
w = np.exp(sims_sm)
w /= (w.sum(axis=1, keepdims=True) + 1e-12)

# Optional prune by cumulative weight >= 0.95 (skip for speed/consistency)

# Accumulate class scores with per-neighbor normalization by |L_j|^eta
ntest = Xte.shape[0]
probs_knn = np.zeros((ntest, C), dtype=np.float32)
for i in range(ntest):
    nn = nn_idx[i]
    wi = w[i]
    for rank, (nbr, wj) in enumerate(zip(nn, wi)):
        Lj = lab_lists[nbr]
        if not Lj:
            continue
        denom = (lab_sizes[nbr] ** eta) if eta > 0 else 1.0
        add = float(wj) / float(max(denom, 1.0))
        for c in Lj:
            probs_knn[i, c] += add
    if (i+1) % 2000 == 0:
        print(f'[kNN-accum] {i+1}/{ntest}', flush=True)

# Apply IDF and renormalize per image to sum 1
probs_knn *= idf[None, :]
row_sums = probs_knn.sum(axis=1, keepdims=True)
probs_knn = np.divide(probs_knn, np.where(row_sums > 0, row_sums, 1.0), out=np.zeros_like(probs_knn), where=row_sums>0)

# Load b3-only model probs and do probability-space blend
from __main__ import blend_equal_weight
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
_, _, P_model = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
assert P_model is not None and P_model.shape == probs_knn.shape, f'Shape mismatch: {None if P_model is None else P_model.shape} vs {probs_knn.shape}'
beta = 0.10
P_final = (1.0 - beta) * P_model + beta * probs_knn

# Caps + min-total enforcement
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
base_caps = {
    'country': 1,
    'culture': 1,
    'medium': 2,
    'dimension': 1,
    'tags': 5,
    'tag': 5,
}
default_cap = 3
sub = pd.read_csv('sample_submission.csv')
ids = sub['id'].values

def apply_caps_with_min(Pt, thr, caps, min_total, fill_probs):
    rows = []; counts = []
    for i in range(Pt.shape[0]):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []
        kept_set = set()
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap)
            c = used.get(g, 0)
            if c < kcap:
                kept.append(j); kept_set.add(j); used[g] = c + 1
        # Enforce min_total by filling from highest model probs not yet selected
        if len(kept) < min_total:
            order = np.argsort(-fill_probs[i])
            for j in order:
                if j in kept_set:
                    continue
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap)
                c = used.get(g, 0)
                if c < kcap:
                    kept.append(j); kept_set.add(j); used[g] = c + 1
                    if len(kept) >= min_total:
                        break
        if len(kept) == 0:
            kept = [int(np.argmax(p))]
        pred_attr = [int(attr_ids_sorted[j]) for j in kept]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
        counts.append(len(pred_attr))
    return rows, float(np.mean(counts))

# Sweep threshold to hit ~4.40 with min_total=3; if too low, relax tags cap and min_total=4
def sweep_and_write(Pt, caps, min_total, out_path, thr_lo=0.46, thr_hi=0.54, thr_step=0.002):
    thrs = np.arange(thr_lo, thr_hi + 1e-9, thr_step)
    target_mean = 4.40
    best = None; best_rows = None
    for t in thrs:
        rows_t, mean_t = apply_caps_with_min(Pt, t, caps, min_total, fill_probs=P_model)
        delta = abs(mean_t - target_mean)
        if (best is None) or (delta < best[0]):
            best = (delta, t, mean_t); best_rows = rows_t
    _, best_thr, best_mean = best
    sub_df = pd.DataFrame(best_rows)
    sub_df.to_csv(out_path, index=False)
    shutil.copyfile(out_path, 'submission.csv')
    print(f'[WRITE] {out_path} thr={best_thr:.3f} post-cap mean={best_mean:.3f} min_total={min_total} caps[tags]={caps.get("tags", None)}')
    return best_thr, best_mean

# Attempt 1: base caps, min_total=3
thr1, mean1 = sweep_and_write(P_final, base_caps.copy(), min_total=3, out_path='submission_knn_softmax_probblend.csv')

# If still under ~4.35, try tags cap=6 and/or min_total=4
if mean1 < 4.35:
    caps2 = base_caps.copy(); caps2['tags'] = 6; caps2['tag'] = 6
    thr2, mean2 = sweep_and_write(P_final, caps2, min_total=3, out_path='submission_knn_softmax_probblend_tags6.csv')
    if mean2 < 4.35:
        thr3, mean3 = sweep_and_write(P_final, caps2, min_total=4, out_path='submission_knn_softmax_probblend_tags6_min4.csv')

print('Done. Total elapsed: {:.1f} min'.format((time.time()-t0)/60.0), flush=True)

=== kNN label transfer v3: softmax-weighted (tau=0.10) + IDF^0.5 + per-neighbor |L|^-1 + prob-blend beta=0.10 + caps + min-total ===


Loaded embeddings: (120801, 512) (21318, 512)


FAISS index built. nt= 120801 dim= 512


kNN done. sims shape: (21318, 100) idx shape: (21318, 100)


[kNN-accum] 2000/21318


[kNN-accum] 4000/21318


[kNN-accum] 6000/21318


[kNN-accum] 8000/21318


[kNN-accum] 10000/21318


[kNN-accum] 12000/21318


[kNN-accum] 14000/21318


[kNN-accum] 16000/21318


[kNN-accum] 18000/21318


[kNN-accum] 20000/21318


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).


[WRITE] submission_knn_softmax_probblend.csv thr=0.460 post-cap mean=4.087 min_total=3 caps[tags]=5


[WRITE] submission_knn_softmax_probblend_tags6.csv thr=0.460 post-cap mean=4.093 min_total=3 caps[tags]=6


[WRITE] submission_knn_softmax_probblend_tags6_min4.csv thr=0.482 post-cap mean=4.395 min_total=4 caps[tags]=6
Done. Total elapsed: 5.0 min


In [None]:
import numpy as np, pandas as pd, shutil
from pathlib import Path

print('=== Post-process: enforce min-total=4 on kNN softmax blend (tags cap=6) using P_model for fills ===', flush=True)

# Inputs
sub_path = Path('submission_knn_softmax_probblend_tags6.csv')
assert sub_path.exists(), 'Missing submission_knn_softmax_probblend_tags6.csv from kNN v3 cell'

# Load mapping and model probabilities for fill ranking
labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
default_cap = 3
caps = {'country':1,'culture':1,'medium':2,'dimension':1,'tags':6,'tag':6}

# Get model-only probs for ranking fills
from __main__ import blend_equal_weight
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
_, _, P_model = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
assert P_model is not None, 'Failed to load model probabilities'

# Load existing submission
sub = pd.read_csv(sub_path)
ids = sub['id'].values
pred_lists = [str(x).strip().split() if isinstance(x, str) else [] for x in sub['attribute_ids'].values]

rows = []; counts = []
for i, (img_id, attrs) in enumerate(zip(ids, pred_lists)):
    chosen_attr_ids = [int(a) for a in attrs if a!='']
    chosen_idx = [attr_to_col[a] for a in chosen_attr_ids if a in attr_to_col]
    used = {}
    for j in chosen_idx:
        g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
        used[g] = used.get(g, 0) + 1
    # Fill until min-total=4 respecting caps
    while len(chosen_idx) < 4:
        # pick next best by P_model
        order = np.argsort(-P_model[i])
        picked = False
        for j in order:
            if j in chosen_idx:
                continue
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap)
            c = used.get(g, 0)
            if c < kcap:
                chosen_idx.append(int(j)); used[g] = c + 1; picked = True
                break
        if not picked:
            break  # no more slots available due to caps
    pred_attr = [int(attr_ids_sorted[j]) for j in chosen_idx]
    rows.append({'id': img_id, 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
    counts.append(len(pred_attr))

out_path = 'submission_knn_softmax_probblend_tags6_min4_fill.csv'
pd.DataFrame(rows).to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print(f'Wrote {out_path} (mean count={np.mean(counts):.3f}) and set as submission.csv')

In [64]:
import numpy as np, pandas as pd, time, shutil, faiss
from pathlib import Path

print('=== kNN union hedge: add up to L=2 kNN labels (P_knn>=0.10) to model+caps; sweep thr to mean~4.40 ===', flush=True)
t0 = time.time()

# Load CLIP embeddings
Xtr = np.load('clip_train_emb.npy').astype(np.float32)
Xte = np.load('clip_test_emb.npy').astype(np.float32)
d = Xtr.shape[1]
index = faiss.IndexFlatIP(d); index.add(Xtr)

# Params (per expert): k=150, tau=0.10, eta=1.0, idf^0.5
k = 150; tau = 0.10; eta = 1.0; gamma_idf = 0.5
sims, nn_idx = index.search(Xte, k)
print('kNN search done:', sims.shape, flush=True)

# Labels mapping
labels_df = pd.read_csv('labels.csv')
attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
C = len(attr_ids_sorted)
train_df = pd.read_csv('train.csv')
n_train = len(train_df)

# Train label lists and sizes + df counts
lab_lists = []; lab_sizes = np.zeros(n_train, dtype=np.int32); df_counts = np.zeros(C, dtype=np.int32)
for i, s in enumerate(train_df['attribute_ids'].fillna('').astype(str).tolist()):
    if s:
        cols = [attr_to_col[int(x)] for x in s.split() if x!='' and int(x) in attr_to_col]
        lab_lists.append(cols); lab_sizes[i] = len(cols)
        for c in set(cols): df_counts[c] += 1
    else:
        lab_lists.append([]); lab_sizes[i] = 0

# Softmax weights with temperature
sims_sm = sims / max(tau,1e-6)
sims_sm -= sims_sm.max(axis=1, keepdims=True)
w = np.exp(sims_sm); w /= (w.sum(axis=1, keepdims=True) + 1e-12)

# Accumulate probs_knn with per-neighbor |L|^-eta
ntest = Xte.shape[0]
probs_knn = np.zeros((ntest, C), dtype=np.float32)
for i in range(ntest):
    nn = nn_idx[i]; wi = w[i]
    for nbr, wj in zip(nn, wi):
        Lj = lab_lists[nbr]
        if not Lj: continue
        denom = (lab_sizes[nbr] ** eta) if eta > 0 else 1.0
        add = float(wj) / float(max(denom, 1.0))
        for c in Lj: probs_knn[i, c] += add
    if (i+1) % 2000 == 0: print(f'[kNN-accum] {i+1}/{ntest}', flush=True)

# IDF^gamma and renormalize
idf = np.log(n_train / (df_counts.astype(np.float32) + 1.0))
idf = np.clip(idf, 0.0, None) ** gamma_idf
probs_knn *= idf[None, :]
row_sums = probs_knn.sum(axis=1, keepdims=True)
probs_knn = np.divide(probs_knn, np.where(row_sums > 0, row_sums, 1.0), out=np.zeros_like(probs_knn), where=row_sums>0)

# Load model probabilities (b3 2:1)
from __main__ import blend_equal_weight
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
_, _, P_model = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
assert P_model is not None and P_model.shape == probs_knn.shape

# Groups and caps (tags cap=6 as per expert bump), default cap=3
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
caps = {'country':1,'culture':1,'medium':2,'dimension':1,'tags':6,'tag':6}
default_cap = 3
ids = pd.read_csv('sample_submission.csv')['id'].values

def build_union_rows(thr, L=2, gate=0.10, min_total=3):
    rows = []; counts = []
    for i in range(P_model.shape[0]):
        p = P_model[i]; pk = probs_knn[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0: cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used = {}; kept = []; kept_set = set()
        # apply caps on model-selected first
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap); c = used.get(g, 0)
            if c < kcap: kept.append(j); kept_set.add(j); used[g] = c + 1
        # union: add up to L labels from kNN by pk desc with gate
        if L > 0:
            order_knn = np.argsort(-pk)
            added = 0
            for j in order_knn:
                if pk[j] < gate: break
                if j in kept_set: continue
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap); c = used.get(g, 0)
                if c < kcap:
                    kept.append(int(j)); kept_set.add(int(j)); used[g] = c + 1; added += 1
                    if added >= L: break
        # enforce min_total using model ranking
        if len(kept) < min_total:
            order = np.argsort(-p)
            for j in order:
                if j in kept_set: continue
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap); c = used.get(g, 0)
                if c < kcap:
                    kept.append(int(j)); kept_set.add(int(j)); used[g] = c + 1
                    if len(kept) >= min_total: break
        if len(kept) == 0: kept = [int(np.argmax(p))]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(int(attr_ids_sorted[j])) for j in kept)})
        counts.append(len(kept))
    return rows, float(np.mean(counts))

# Sweep thr to target mean ~4.40 after union; L=2, gate=0.10, min_total=3
thrs = np.arange(0.460, 0.5401, 0.002)
target_mean = 4.40
best = None; best_rows = None
for t in thrs:
    rows_t, mean_t = build_union_rows(t, L=2, gate=0.10, min_total=3)
    delta = abs(mean_t - target_mean)
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t); best_rows = rows_t
    if int((t-0.460)/0.002) % 10 == 0:
        print(f'[SWEEP] thr={t:.3f} post-union mean={mean_t:.3f}', flush=True)

_, best_thr, best_mean = best
out_path = 'submission_union_knn_add2.csv'
pd.DataFrame(best_rows).to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print(f'[WRITE] {out_path} thr={best_thr:.3f} post-union mean={best_mean:.3f} | Elapsed {(time.time()-t0)/60:.1f}m', flush=True)

=== kNN union hedge: add up to L=2 kNN labels (P_knn>=0.10) to model+caps; sweep thr to mean~4.40 ===


kNN search done: (21318, 150)


[kNN-accum] 2000/21318


[kNN-accum] 4000/21318


[kNN-accum] 6000/21318


[kNN-accum] 8000/21318


[kNN-accum] 10000/21318


[kNN-accum] 12000/21318


[kNN-accum] 14000/21318


[kNN-accum] 16000/21318


[kNN-accum] 18000/21318


[kNN-accum] 20000/21318


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).


[SWEEP] thr=0.460 post-union mean=4.696


[SWEEP] thr=0.480 post-union mean=4.458


[SWEEP] thr=0.502 post-union mean=4.234


[SWEEP] thr=0.520 post-union mean=4.070


[SWEEP] thr=0.540 post-union mean=3.919


[WRITE] submission_union_knn_add2.csv thr=0.486 post-union mean=4.395 | Elapsed 2.7m


In [65]:
import numpy as np, pandas as pd, time, shutil, faiss
from pathlib import Path

print('=== kNN v3 variants: (A) tau=0.05, beta=0.10; (B) tau=0.10, beta=0.15, restrict to {country,culture,dimension} ===', flush=True)
t0 = time.time()

# Common: load embeddings and build FAISS once with k_max=150
Xtr = np.load('clip_train_emb.npy').astype(np.float32)
Xte = np.load('clip_test_emb.npy').astype(np.float32)
d = Xtr.shape[1]
index = faiss.IndexFlatIP(d); index.add(Xtr)
k_max = 150
sims_all, nn_idx_all = index.search(Xte, k_max)
print('FAISS kNN ready:', sims_all.shape, flush=True)

# Labels and train annotations
labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
C = len(attr_ids_sorted)
train_df = pd.read_csv('train.csv')
n_train = len(train_df)
lab_lists = []; lab_sizes = np.zeros(n_train, dtype=np.int32); df_counts = np.zeros(C, dtype=np.int32)
for i, s in enumerate(train_df['attribute_ids'].fillna('').astype(str).tolist()):
    if s:
        cols = [attr_to_col.get(int(x)) for x in s.split() if x!='' and (int(x) in attr_to_col)]
        cols = [c for c in cols if c is not None]
        lab_lists.append(cols); lab_sizes[i] = len(cols)
        for c in set(cols): df_counts[c] += 1
    else:
        lab_lists.append([]); lab_sizes[i] = 0
idf = np.log(n_train / (df_counts.astype(np.float32) + 1.0))
idf = np.clip(idf, 0.0, None) ** 0.5  # gamma_idf=0.5

# Load model probabilities (b3 2:1)
from __main__ import blend_equal_weight
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
_, _, P_model = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
assert P_model is not None and P_model.shape[1] == C

sub = pd.read_csv('sample_submission.csv'); ids = sub['id'].values
base_caps = {'country':1,'culture':1,'medium':2,'dimension':1,'tags':6,'tag':6}
default_cap = 3

def apply_caps_with_min(Pt, thr, caps, min_total, fill_probs):
    rows = []; counts = []
    for i in range(Pt.shape[0]):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []; kept_set = set()
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap); c = used.get(g, 0)
            if c < kcap:
                kept.append(j); kept_set.add(j); used[g] = c + 1
        if len(kept) < min_total:
            order = np.argsort(-fill_probs[i])
            for j in order:
                if j in kept_set: continue
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap); c = used.get(g, 0)
                if c < kcap:
                    kept.append(j); kept_set.add(j); used[g] = c + 1
                    if len(kept) >= min_total: break
        if len(kept) == 0:
            kept = [int(np.argmax(p))]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(int(attr_ids_sorted[j])) for j in kept)})
        counts.append(len(kept))
    return rows, float(np.mean(counts))

def sweep_and_write(Pt, caps, min_total, out_path, thr_lo=0.46, thr_hi=0.54, thr_step=0.002):
    thrs = np.arange(thr_lo, thr_hi + 1e-9, thr_step)
    target_mean = 4.40
    best = None; best_rows = None
    for t in thrs:
        rows_t, mean_t = apply_caps_with_min(Pt, t, caps, min_total, fill_probs=P_model)
        delta = abs(mean_t - target_mean)
        if (best is None) or (delta < best[0]):
            best = (delta, t, mean_t); best_rows = rows_t
    _, best_thr, best_mean = best
    pd.DataFrame(best_rows).to_csv(out_path, index=False)
    shutil.copyfile(out_path, 'submission.csv')
    print(f'[WRITE] {out_path} thr={best_thr:.3f} post-cap mean={best_mean:.3f} min_total={min_total}')
    return best_thr, best_mean

def build_probs_knn(k_use, tau):
    sims = sims_all[:, :k_use]
    nn_idx = nn_idx_all[:, :k_use]
    sims_sm = sims / max(tau, 1e-6)
    sims_sm -= sims_sm.max(axis=1, keepdims=True)
    w = np.exp(sims_sm); w /= (w.sum(axis=1, keepdims=True) + 1e-12)
    ntest = sims.shape[0]
    pk = np.zeros((ntest, C), dtype=np.float32)
    for i in range(ntest):
        wi = w[i]; nn = nn_idx[i]
        for nbr, wj in zip(nn, wi):
            Lj = lab_lists[nbr]
            if not Lj: continue
            denom = float(max(lab_sizes[nbr], 1))
            add = float(wj) / denom
            for c in Lj: pk[i, c] += add
        if (i+1) % 2000 == 0: print(f'[accum] {i+1}/{ntest}', flush=True)
    pk *= idf[None, :]
    rs = pk.sum(axis=1, keepdims=True)
    pk = np.divide(pk, np.where(rs>0, rs, 1.0), out=np.zeros_like(pk), where=rs>0)
    return pk

# Variant A: tau=0.05, k=150, beta=0.10, min_total=4, caps tags=6
print('--- Variant A: tau=0.05, k=150, beta=0.10 ---', flush=True)
Pk_A = build_probs_knn(150, 0.05)
beta_A = 0.10
P_final_A = (1.0 - beta_A) * P_model + beta_A * Pk_A
sweep_and_write(P_final_A, base_caps, min_total=4, out_path='submission_knn_softmax_probblend_tau005.csv')

# Variant B: tau=0.10, k=150, beta=0.15, restrict kNN to only country/culture/dimension
print('--- Variant B: tau=0.10, k=150, beta=0.15, restrict groups ---', flush=True)
Pk_B = build_probs_knn(150, 0.10)
mask_groups = np.isin(idx_to_group, ['country','culture','dimension'])
Pk_B_masked = Pk_B.copy(); Pk_B_masked[:, ~mask_groups] = 0.0
beta_B = 0.15
P_final_B = (1.0 - beta_B) * P_model + beta_B * Pk_B_masked
sweep_and_write(P_final_B, base_caps, min_total=3, out_path='submission_knn_softmax_probblend_groupmask.csv')

print('Done variants. Elapsed {:.1f}m'.format((time.time()-t0)/60.0), flush=True)

=== kNN v3 variants: (A) tau=0.05, beta=0.10; (B) tau=0.10, beta=0.15, restrict to {country,culture,dimension} ===


FAISS kNN ready: (21318, 150)


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).
--- Variant A: tau=0.05, k=150, beta=0.10 ---


[accum] 2000/21318


[accum] 4000/21318


[accum] 6000/21318


[accum] 8000/21318


[accum] 10000/21318


[accum] 12000/21318


[accum] 14000/21318


[accum] 16000/21318


[accum] 18000/21318


[accum] 20000/21318


[WRITE] submission_knn_softmax_probblend_tau005.csv thr=0.482 post-cap mean=4.395 min_total=4
--- Variant B: tau=0.10, k=150, beta=0.15, restrict groups ---


[accum] 2000/21318


[accum] 4000/21318


[accum] 6000/21318


[accum] 8000/21318


[accum] 10000/21318


[accum] 12000/21318


[accum] 14000/21318


[accum] 16000/21318


[accum] 18000/21318


[accum] 20000/21318


[WRITE] submission_knn_softmax_probblend_groupmask.csv thr=0.460 post-cap mean=3.825 min_total=3
Done variants. Elapsed 4.7m


In [66]:
import shutil, pandas as pd, numpy as np
print('=== Set submission to best kNN variant: tau=0.05 prob-blend (min_total=4) ===', flush=True)
src = 'submission_knn_softmax_probblend_tau005.csv'
df = pd.read_csv(src)
mean_cnt = df['attribute_ids'].fillna('').apply(lambda s: 0 if s=='' else len(str(s).split())).mean()
print(f'Source {src} mean count={mean_cnt:.3f}')
shutil.copyfile(src, 'submission.csv')
print('Copied', src, '-> submission.csv')

=== Set submission to best kNN variant: tau=0.05 prob-blend (min_total=4) ===


Source submission_knn_softmax_probblend_tau005.csv mean count=4.395
Copied submission_knn_softmax_probblend_tau005.csv -> submission.csv


In [67]:
import numpy as np, pandas as pd, shutil
from pathlib import Path

print('=== Enforce per-group min-1 on {country,culture,dimension} for tau=0.05 blend (min_total=4) ===', flush=True)
src = Path('submission_knn_softmax_probblend_tau005.csv')
assert src.exists(), 'Missing base submission file submission_knn_softmax_probblend_tau005.csv'

# Load labels and group mapping
labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values

# Caps and required groups
caps = {'country':1,'culture':1,'dimension':1,'medium':2,'tags':6,'tag':6}
default_cap = 3
required_groups = ['country','culture','dimension']

# Load model probs for selecting best fills within a group
from __main__ import blend_equal_weight
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
_, _, P_model = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
assert P_model is not None, 'Failed to load P_model'

# Build per-group column indices
group_to_cols = {}
for j, aid in enumerate(attr_ids_sorted):
    g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
    group_to_cols.setdefault(g, []).append(j)

# Load base submission and enforce per-group minimums
sub = pd.read_csv(src)
ids = sub['id'].values
pred_lists = [str(x).strip().split() if isinstance(x, str) else [] for x in sub['attribute_ids'].values]

rows = []; counts = []
for i, (img_id, attrs) in enumerate(zip(ids, pred_lists)):
    chosen_attr_ids = [int(a) for a in attrs if a!='']
    chosen_idx = [attr_to_col[a] for a in chosen_attr_ids if a in attr_to_col]
    used = {}
    for j in chosen_idx:
        g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
        used[g] = used.get(g, 0) + 1
    chosen_set = set(chosen_idx)
    # Enforce per-group min-1 for each required group if under its cap
    for g in required_groups:
        if used.get(g, 0) >= 1:
            continue
        cols = group_to_cols.get(g, [])
        if not cols:
            continue
        kcap = caps.get(g, default_cap)
        c = used.get(g, 0)
        if c >= kcap:
            continue
        # select top label by model prob within this group not already chosen
        probs_g = P_model[i, cols]
        order = np.argsort(-probs_g)
        picked = False
        for oi in order:
            j = int(cols[int(oi)])
            if j in chosen_set:
                continue
            chosen_idx.append(j); chosen_set.add(j); used[g] = c + 1; picked = True
            break
        # if nothing picked (all already present), skip
    pred_attr = [int(attr_ids_sorted[j]) for j in chosen_idx]
    rows.append({'id': img_id, 'attribute_ids': ' '.join(str(x) for x in pred_attr)})
    counts.append(len(pred_attr))

out_path = 'submission_knn_tau005_minGroup.csv'
pd.DataFrame(rows).to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print(f'Wrote {out_path} (mean count={np.mean(counts):.3f}) and set as submission.csv')

=== Enforce per-group min-1 on {country,culture,dimension} for tau=0.05 blend (min_total=4) ===


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).


Wrote submission_knn_tau005_minGroup.csv (mean count=5.808) and set as submission.csv


In [68]:
import numpy as np, pandas as pd, shutil, faiss, time
from pathlib import Path

print('=== kNN tau=0.05 prob-blend with per-group min-1 (country,culture,dimension) + thr sweep to mean~4.44 ===', flush=True)
t0 = time.time()

# Load embeddings and build FAISS (reuse config k=150, tau=0.05, IDF^0.5, |L|^-1)
Xtr = np.load('clip_train_emb.npy').astype(np.float32)
Xte = np.load('clip_test_emb.npy').astype(np.float32)
d = Xtr.shape[1]
index = faiss.IndexFlatIP(d); index.add(Xtr)
k = 150; tau = 0.05; gamma_idf = 0.5
sims, nn_idx = index.search(Xte, k)

# Labels and train annotations
labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
C = len(attr_ids_sorted)
train_df = pd.read_csv('train.csv')
n_train = len(train_df)
lab_lists = []; lab_sizes = np.zeros(n_train, dtype=np.int32); df_counts = np.zeros(C, dtype=np.int32)
for i, s in enumerate(train_df['attribute_ids'].fillna('').astype(str).tolist()):
    if s:
        cols = [attr_to_col[int(x)] for x in s.split() if x!='' and int(x) in attr_to_col]
        lab_lists.append(cols); lab_sizes[i] = len(cols)
        for c in set(cols): df_counts[c] += 1
    else:
        lab_lists.append([]); lab_sizes[i] = 0
idf = np.log(n_train / (df_counts.astype(np.float32) + 1.0))
idf = np.clip(idf, 0.0, None) ** gamma_idf

# Softmax weights
sims_sm = sims / max(tau,1e-6); sims_sm -= sims_sm.max(axis=1, keepdims=True)
w = np.exp(sims_sm); w /= (w.sum(axis=1, keepdims=True) + 1e-12)

# Accumulate kNN probs with |L|^-1 and IDF^0.5 + renorm
ntest = Xte.shape[0]
Pk = np.zeros((ntest, C), dtype=np.float32)
for i in range(ntest):
    wi = w[i]; nn = nn_idx[i]
    for nbr, wj in zip(nn, wi):
        Lj = lab_lists[nbr]
        if not Lj: continue
        add = float(wj) / float(max(lab_sizes[nbr], 1))
        for c in Lj: Pk[i, c] += add
    if (i+1) % 2000 == 0: print(f'[accum] {i+1}/{ntest}', flush=True)
Pk *= idf[None, :]
rs = Pk.sum(axis=1, keepdims=True)
Pk = np.divide(Pk, np.where(rs>0, rs, 1.0), out=np.zeros_like(Pk), where=rs>0)

# Load model probs and prob-blend (beta=0.10)
from __main__ import blend_equal_weight
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
_, _, P_model = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
beta = 0.10
P_final = (1.0 - beta) * P_model + beta * Pk

# Caps, required groups, and per-group min rule
caps = {'country':1,'culture':1,'dimension':1,'medium':2,'tags':6,'tag':6}
default_cap = 3
required_groups = ['country','culture','dimension']
group_to_cols = {}
for j, aid in enumerate(attr_ids_sorted):
    g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
    group_to_cols.setdefault(g, []).append(j)

ids = pd.read_csv('sample_submission.csv')['id'].values

def apply_caps_min_total_and_groupmins(Pt, thr, min_total=4):
    rows = []; counts = []
    for i in range(Pt.shape[0]):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []; kept_set = set()
        # Apply caps on thresholded set
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap); c = used.get(g, 0)
            if c < kcap:
                kept.append(j); kept_set.add(j); used[g] = c + 1
        # Enforce per-group min-1 for required groups using P_model ranking within the group
        for g in required_groups:
            if used.get(g, 0) >= 1: continue
            cols = group_to_cols.get(g, [])
            if not cols: continue
            kcap = caps.get(g, default_cap); c = used.get(g, 0)
            if c >= kcap: continue
            probs_g = P_model[i, cols]
            order = np.argsort(-probs_g)
            for oi in order:
                j = int(cols[int(oi)])
                if j in kept_set: continue
                kept.append(j); kept_set.add(j); used[g] = c + 1
                break
        # Enforce min_total using P_model globally
        if len(kept) < min_total:
            order = np.argsort(-P_model[i])
            for j in order:
                if j in kept_set: continue
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap); c = used.get(g, 0)
                if c < kcap:
                    kept.append(j); kept_set.add(j); used[g] = c + 1
                    if len(kept) >= min_total: break
        if len(kept) == 0:
            kept = [int(np.argmax(p))]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(int(attr_ids_sorted[j])) for j in kept)})
        counts.append(len(kept))
    return rows, float(np.mean(counts))

# Sweep global threshold to hit mean ~4.44 (recall-leaning per expert)
thrs = np.arange(0.460, 0.5401, 0.002)
target_mean = 4.44
best = None; best_rows = None
for t in thrs:
    rows_t, mean_t = apply_caps_min_total_and_groupmins(P_final, t, min_total=4)
    delta = abs(mean_t - target_mean)
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t); best_rows = rows_t
    if int((t-0.460)/0.002) % 10 == 0:
        print(f'[SWEEP] thr={t:.3f} post-rule mean={mean_t:.3f}', flush=True)

_, best_thr, best_mean = best
out_path = 'submission_knn_tau005_minGroup_swept.csv'
pd.DataFrame(best_rows).to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print(f'[WRITE] {out_path} thr={best_thr:.3f} post-rule mean={best_mean:.3f} | Elapsed {(time.time()-t0)/60:.1f}m', flush=True)

=== kNN tau=0.05 prob-blend with per-group min-1 (country,culture,dimension) + thr sweep to mean~4.44 ===


[accum] 2000/21318


[accum] 4000/21318


[accum] 6000/21318


[accum] 8000/21318


[accum] 10000/21318


[accum] 12000/21318


[accum] 14000/21318


[accum] 16000/21318


[accum] 18000/21318


[accum] 20000/21318


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).


[SWEEP] thr=0.460 post-rule mean=5.471


[SWEEP] thr=0.480 post-rule mean=5.287


[SWEEP] thr=0.502 post-rule mean=5.118


[SWEEP] thr=0.520 post-rule mean=4.997


[SWEEP] thr=0.540 post-rule mean=4.880


[WRITE] submission_knn_tau005_minGroup_swept.csv thr=0.540 post-rule mean=4.880 | Elapsed 2.4m


In [69]:
import numpy as np, pandas as pd, shutil, faiss, time
from pathlib import Path

print('=== Re-sweep: kNN tau=0.05 prob-blend + per-group min-1 with higher thr range to hit mean~4.44 ===', flush=True)
t0 = time.time()

# Recompute P_final (tau=0.05, k=150, IDF^0.5, |L|^-1) to avoid depending on cached variables
Xtr = np.load('clip_train_emb.npy').astype(np.float32)
Xte = np.load('clip_test_emb.npy').astype(np.float32)
d = Xtr.shape[1]
index = faiss.IndexFlatIP(d); index.add(Xtr)
k = 150; tau = 0.05; gamma_idf = 0.5
sims, nn_idx = index.search(Xte, k)

labels_df = pd.read_csv('labels.csv')
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
C = len(attr_ids_sorted)
train_df = pd.read_csv('train.csv')
n_train = len(train_df)
lab_lists = []; lab_sizes = np.zeros(n_train, dtype=np.int32); df_counts = np.zeros(C, dtype=np.int32)
for i, s in enumerate(train_df['attribute_ids'].fillna('').astype(str).tolist()):
    if s:
        cols = [attr_to_col[int(x)] for x in s.split() if x!='' and int(x) in attr_to_col]
        lab_lists.append(cols); lab_sizes[i] = len(cols)
        for c in set(cols): df_counts[c] += 1
    else:
        lab_lists.append([]); lab_sizes[i] = 0
idf = np.log(n_train / (df_counts.astype(np.float32) + 1.0))
idf = np.clip(idf, 0.0, None) ** gamma_idf

sims_sm = sims / max(tau,1e-6); sims_sm -= sims_sm.max(axis=1, keepdims=True)
w = np.exp(sims_sm); w /= (w.sum(axis=1, keepdims=True) + 1e-12)

ntest = Xte.shape[0]
Pk = np.zeros((ntest, C), dtype=np.float32)
for i in range(ntest):
    wi = w[i]; nn = nn_idx[i]
    for nbr, wj in zip(nn, wi):
        Lj = lab_lists[nbr]
        if not Lj: continue
        add = float(wj) / float(max(lab_sizes[nbr], 1))
        for c in Lj: Pk[i, c] += add
    if (i+1) % 2000 == 0: print(f'[accum] {i+1}/{ntest}', flush=True)
Pk *= idf[None, :]
rs = Pk.sum(axis=1, keepdims=True)
Pk = np.divide(Pk, np.where(rs>0, rs, 1.0), out=np.zeros_like(Pk), where=rs>0)

from __main__ import blend_equal_weight
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
_, _, P_model = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
beta = 0.10
P_final = (1.0 - beta) * P_model + beta * Pk

# Caps and per-group min rule
caps = {'country':1,'culture':1,'dimension':1,'medium':2,'tags':6,'tag':6}
default_cap = 3
required_groups = ['country','culture','dimension']
group_to_cols = {}
for j, aid in enumerate(attr_ids_sorted):
    g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
    group_to_cols.setdefault(g, []).append(j)

ids = pd.read_csv('sample_submission.csv')['id'].values

def apply_caps_min_total_and_groupmins(Pt, thr, min_total=4):
    rows = []; counts = []
    for i in range(Pt.shape[0]):
        p = Pt[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0:
            cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used, kept = {}, []; kept_set = set()
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap); c = used.get(g, 0)
            if c < kcap:
                kept.append(j); kept_set.add(j); used[g] = c + 1
        # per-group min-1 for structural groups
        for g in required_groups:
            if used.get(g, 0) >= 1: continue
            cols = group_to_cols.get(g, [])
            if not cols: continue
            kcap = caps.get(g, default_cap); c = used.get(g, 0)
            if c >= kcap: continue
            probs_g = P_model[i, cols]
            order = np.argsort(-probs_g)
            for oi in order:
                j = int(cols[int(oi)])
                if j in kept_set: continue
                kept.append(j); kept_set.add(j); used[g] = c + 1
                break
        # enforce min_total using model ranking
        if len(kept) < min_total:
            order = np.argsort(-P_model[i])
            for j in order:
                if j in kept_set: continue
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap); c = used.get(g, 0)
                if c < kcap:
                    kept.append(j); kept_set.add(j); used[g] = c + 1
                    if len(kept) >= min_total: break
        if len(kept) == 0:
            kept = [int(np.argmax(p))]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(int(attr_ids_sorted[j])) for j in kept)})
        counts.append(len(kept))
    return rows, float(np.mean(counts))

# Higher threshold sweep to bring mean down near 4.44
thrs = np.arange(0.560, 0.7001, 0.002)
target_mean = 4.44
best = None; best_rows = None
for t in thrs:
    rows_t, mean_t = apply_caps_min_total_and_groupmins(P_final, t, min_total=4)
    delta = abs(mean_t - target_mean)
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t); best_rows = rows_t
    if int((t-0.560)/0.002) % 20 == 0:
        print(f'[SWEEP] thr={t:.3f} post-rule mean={mean_t:.3f}', flush=True)

_, best_thr, best_mean = best
out_path = 'submission_knn_tau005_minGroup_swept_hi.csv'
pd.DataFrame(best_rows).to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print(f'[WRITE] {out_path} thr={best_thr:.3f} post-rule mean={best_mean:.3f} | Elapsed {(time.time()-t0)/60:.1f}m', flush=True)

=== Re-sweep: kNN tau=0.05 prob-blend + per-group min-1 with higher thr range to hit mean~4.44 ===


[accum] 2000/21318


[accum] 4000/21318


[accum] 6000/21318


[accum] 8000/21318


[accum] 10000/21318


[accum] 12000/21318


[accum] 14000/21318


[accum] 16000/21318


[accum] 18000/21318


[accum] 20000/21318


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).


[SWEEP] thr=0.560 post-rule mean=4.774


[SWEEP] thr=0.600 post-rule mean=4.606


[SWEEP] thr=0.640 post-rule mean=4.480


[SWEEP] thr=0.680 post-rule mean=4.380


[WRITE] submission_knn_tau005_minGroup_swept_hi.csv thr=0.654 post-rule mean=4.441 | Elapsed 3.8m


In [70]:
import numpy as np, pandas as pd, time, shutil, faiss
from pathlib import Path

print('=== kNN union (restricted): add up to L=3 kNN labels from {country,culture,dimension} with gate=0.08; sweep thr to mean~4.40 ===', flush=True)
t0 = time.time()

# Load CLIP embeddings
Xtr = np.load('clip_train_emb.npy').astype(np.float32)
Xte = np.load('clip_test_emb.npy').astype(np.float32)
d = Xtr.shape[1]
index = faiss.IndexFlatIP(d); index.add(Xtr)

# Params
k = 150; tau = 0.10; eta = 1.0; gamma_idf = 0.5
sims, nn_idx = index.search(Xte, k)
print('kNN search done:', sims.shape, flush=True)

# Labels mapping
labels_df = pd.read_csv('labels.csv')
attr_ids_sorted = np.array(sorted(labels_df['attribute_id'].astype(int).unique().tolist()), dtype=np.int32)
attr_to_col = {a:i for i,a in enumerate(attr_ids_sorted)}
C = len(attr_ids_sorted)
train_df = pd.read_csv('train.csv')
n_train = len(train_df)

# Train label lists and sizes + df counts
lab_lists = []; lab_sizes = np.zeros(n_train, dtype=np.int32); df_counts = np.zeros(C, dtype=np.int32)
for i, s in enumerate(train_df['attribute_ids'].fillna('').astype(str).tolist()):
    if s:
        cols = [attr_to_col[int(x)] for x in s.split() if x!='' and int(x) in attr_to_col]
        lab_lists.append(cols); lab_sizes[i] = len(cols)
        for c in set(cols): df_counts[c] += 1
    else:
        lab_lists.append([]); lab_sizes[i] = 0

# Softmax weights with temperature
sims_sm = sims / max(tau,1e-6)
sims_sm -= sims_sm.max(axis=1, keepdims=True)
w = np.exp(sims_sm); w /= (w.sum(axis=1, keepdims=True) + 1e-12)

# Accumulate probs_knn with per-neighbor |L|^-eta
ntest = Xte.shape[0]
probs_knn = np.zeros((ntest, C), dtype=np.float32)
for i in range(ntest):
    nn = nn_idx[i]; wi = w[i]
    for nbr, wj in zip(nn, wi):
        Lj = lab_lists[nbr]
        if not Lj: continue
        denom = (lab_sizes[nbr] ** eta) if eta > 0 else 1.0
        add = float(wj) / float(max(denom, 1.0))
        for c in Lj: probs_knn[i, c] += add
    if (i+1) % 2000 == 0: print(f'[kNN-accum] {i+1}/{ntest}', flush=True)

# IDF^gamma and renormalize
idf = np.log(n_train / (df_counts.astype(np.float32) + 1.0))
idf = np.clip(idf, 0.0, None) ** gamma_idf
probs_knn *= idf[None, :]
row_sums = probs_knn.sum(axis=1, keepdims=True)
probs_knn = np.divide(probs_knn, np.where(row_sums > 0, row_sums, 1.0), out=np.zeros_like(probs_knn), where=row_sums>0)

# Load model probabilities (b3 2:1)
from __main__ import blend_equal_weight
dirs_b3 = ['out_b3_384_top512','out_b3_448_top512']
_, _, P_model = blend_equal_weight(dirs_b3, weights=[2,1], write_submission=False, out_name='noop.csv', cardinality_target=4.40)
assert P_model is not None and P_model.shape == probs_knn.shape

# Groups and caps
labels_df['group'] = labels_df['attribute_name'].astype(str).str.split('::').str[0].str.lower()
idx_to_group = labels_df.set_index('attribute_id').loc[attr_ids_sorted, 'group'].values
allowed_groups = {'country','culture','dimension'}
caps = {'country':1,'culture':1,'dimension':1,'medium':2,'tags':6,'tag':6}
default_cap = 3
ids = pd.read_csv('sample_submission.csv')['id'].values

def build_union_rows_restricted(thr, L=3, gate=0.08, min_total=3):
    rows = []; counts = []
    for i in range(P_model.shape[0]):
        p = P_model[i]; pk = probs_knn[i]
        cand = np.where(p >= thr)[0]
        if cand.size == 0: cand = np.array([int(np.argmax(p))], dtype=np.int64)
        cand_sorted = cand[np.argsort(-p[cand])]
        used = {}; kept = []; kept_set = set()
        # model thresholded + caps
        for j in cand_sorted:
            g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
            kcap = caps.get(g, default_cap); c = used.get(g, 0)
            if c < kcap: kept.append(j); kept_set.add(j); used[g] = c + 1
        # restricted union: only structural groups
        if L > 0:
            order_knn = np.argsort(-pk)
            added = 0
            for j in order_knn:
                if pk[j] < gate: break
                if j in kept_set: continue
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                if g not in allowed_groups: continue
                kcap = caps.get(g, default_cap); c = used.get(g, 0)
                if c < kcap:
                    kept.append(int(j)); kept_set.add(int(j)); used[g] = c + 1; added += 1
                    if added >= L: break
        # enforce min_total using model ranking
        if len(kept) < min_total:
            order = np.argsort(-p)
            for j in order:
                if j in kept_set: continue
                g = idx_to_group[j] if j < len(idx_to_group) else 'misc'
                kcap = caps.get(g, default_cap); c = used.get(g, 0)
                if c < kcap:
                    kept.append(int(j)); kept_set.add(int(j)); used[g] = c + 1
                    if len(kept) >= min_total: break
        if len(kept) == 0: kept = [int(np.argmax(p))]
        rows.append({'id': ids[i], 'attribute_ids': ' '.join(str(int(attr_ids_sorted[j])) for j in kept)})
        counts.append(len(kept))
    return rows, float(np.mean(counts))

# Sweep thr to target mean ~4.40
thrs = np.arange(0.460, 0.5401, 0.002)
target_mean = 4.40
best = None; best_rows = None
for t in thrs:
    rows_t, mean_t = build_union_rows_restricted(t, L=3, gate=0.08, min_total=3)
    delta = abs(mean_t - target_mean)
    if (best is None) or (delta < best[0]):
        best = (delta, t, mean_t); best_rows = rows_t
    if int((t-0.460)/0.002) % 10 == 0:
        print(f'[SWEEP] thr={t:.3f} post-union mean={mean_t:.3f}', flush=True)

_, best_thr, best_mean = best
out_path = 'submission_union_knn_structL3.csv'
pd.DataFrame(best_rows).to_csv(out_path, index=False)
shutil.copyfile(out_path, 'submission.csv')
print(f'[WRITE] {out_path} thr={best_thr:.3f} post-union mean={best_mean:.3f} | Elapsed {(time.time()-t0)/60:.1f}m', flush=True)

=== kNN union (restricted): add up to L=3 kNN labels from {country,culture,dimension} with gate=0.08; sweep thr to mean~4.40 ===


kNN search done: (21318, 150)


[kNN-accum] 2000/21318


[kNN-accum] 4000/21318


[kNN-accum] 6000/21318


[kNN-accum] 8000/21318


[kNN-accum] 10000/21318


[kNN-accum] 12000/21318


[kNN-accum] 14000/21318


[kNN-accum] 16000/21318


[kNN-accum] 18000/21318


[kNN-accum] 20000/21318


[CARD] Train mean labels/img=4.421, target=4.400, chosen thr=0.502 (pred_mean=4.406)
[INFO] Skipping submission write (best_thr not available).


[SWEEP] thr=0.460 post-union mean=4.646


[SWEEP] thr=0.480 post-union mean=4.404


[SWEEP] thr=0.502 post-union mean=4.175


[SWEEP] thr=0.520 post-union mean=4.009


[SWEEP] thr=0.540 post-union mean=3.855


[WRITE] submission_union_knn_structL3.csv thr=0.480 post-union mean=4.404 | Elapsed 2.8m
