# Herbarium 2021 FGVC8 - Plan

Goal: Build a GPU-accelerated image classification pipeline to achieve a medal-worthy macro-F1 on the leaderboard.

Plan:
- Env check: verify GPU availability and correct CUDA stack.
- Data audit: load train/test metadata.json, inspect fields, class counts, image-path mapping, and sample_submission format.
- CV protocol: stratified K-fold by label; ensure deterministic seeds.
- Baseline model: torchvision pretrained backbone (e.g., convnext_tiny / efficientnet_v2_s) fine-tune with mixed precision.
- Augmentations: standard image aug (resize, crop, color jitter, flips); class-balanced sampling.
- Training loop: early stopping, lr scheduling (OneCycle/Step), label smoothing; log progress by epoch/fold.
- Inference: TTA, save predictions to submission.csv in required format.
- Iteration: cache OOF and test logits; analyze errors; consider higher res or stronger backbones; blend if time.

Checkpoints requiring expert review:
- After this plan and environment/data audit
- After baseline CV setup
- After first baseline training results
- After improvements/ensembling decisions

Notes:
- Keep jobs fast; subsample for smoke tests first.
- Always print timing per fold and verify artifacts exist.

In [1]:
import os, json, sys, subprocess, time, textwrap, pandas as pd
from pathlib import Path

def run(cmd):
    print('> ', ' '.join(cmd), flush=True)
    return subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True).stdout

print('Env check: nvidia-smi')
print(run(['bash','-lc','nvidia-smi || true']))
print('Python:', sys.version)

base = Path('.')
train_dir = base/'train'
test_dir = base/'test'
ss_path = base/'sample_submission.csv'

print('List dir train/test:')
print('train exists:', train_dir.exists(), 'test exists:', test_dir.exists())
print('train/images dirs:', len(list((train_dir/'images').glob('*'))))
print('test/images dirs:', len(list((test_dir/'images').glob('*'))))

def load_json(p):
    with open(p, 'r') as f:
        return json.load(f)

train_meta = load_json(train_dir/'metadata.json')
test_meta = load_json(test_dir/'metadata.json')
print('train_meta keys:', list(train_meta.keys()))
print('test_meta keys:', list(test_meta.keys()))

# Inspect a few entries
def head_dict(d, n=3):
    # if dict of lists or dict with key 'images', try to summarize
    if isinstance(d, dict) and 'images' in d:
        imgs = d['images'][:n]
        print('images sample:', imgs)
    elif isinstance(d, dict) and 'annotations' in d:
        print('annotations sample:', d['annotations'][:n])
    else:
        # print first n key:val pairs
        for i, (k,v) in enumerate(d.items()):
            if i>=n: break
            print(k, type(v), (v if isinstance(v,(int,str,float)) else '...'))

print('Train meta head:')
head_dict(train_meta)
print('Test meta head:')
head_dict(test_meta)

# Try to infer schema commonly used in Herbarium competitions
# Expect fields like: images (list of dicts with file_name, id), annotations (list with image_id, category_id), categories (list with id, name)
images = train_meta.get('images', [])
ann = train_meta.get('annotations', [])
cats = train_meta.get('categories', [])
print(f'Counts - images: {len(images)}, annotations: {len(ann)}, categories: {len(cats)}')

if images and ann and cats:
    import pandas as pd
    df_img = pd.DataFrame(images)
    df_anno = pd.DataFrame(ann)
    df_cat = pd.DataFrame(cats)
    print('df_img columns:', df_img.columns.tolist())
    print('df_anno columns:', df_anno.columns.tolist())
    print('df_cat columns:', df_cat.columns.tolist())
    # Merge labels
    if 'id' in df_img.columns and 'image_id' in df_anno.columns:
        df = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')
        if 'category_id' in df.columns:
            print('Train rows:', len(df))
            print('Unique classes:', df['category_id'].nunique())
            print('Label distribution (head):')
            print(df['category_id'].value_counts().head(10))
            # Build relative image path
            # Expect file_name in images and path under train/images/
            if 'file_name' in df_img.columns:
                pass

if ss_path.exists():
    ss = pd.read_csv(ss_path)
    print('sample_submission head:')
    print(ss.head())
else:
    print('sample_submission.csv missing')

print('Done audit at', time.strftime('%Y-%m-%d %H:%M:%S'))

Env check: nvidia-smi
>  bash -lc nvidia-smi || true


Sat Sep 27 04:39:42 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     182MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

train_meta keys: ['annotations', 'categories', 'images', 'info', 'institutions', 'licenses']
test_meta keys: ['images', 'info', 'licenses']
Train meta head:
images sample: [{'file_name': 'images/604/92/1608432.jpg', 'height': 1000, 'id': 1608432, 'license': 0, 'width': 671}, {'file_name': 'images/604/92/796948.jpg', 'height': 1000, 'id': 796948, 'license': 0, 'width': 678}, {'file_name': 'images/604/92/994447.jpg', 'height': 1000, 'id': 994447, 'license': 0, 'width': 671}]
Test meta head:
images sample: [{'file_name': 'images/000/0.jpg', 'height': 1000, 'id': '0', 'license': 0, 'width': 666}, {'file_name': 'images/000/1.jpg', 'height': 1000, 'id': '1', 'license': 0, 'width': 672}, {'file_name': 'images/000/2.jpg', 'height': 1000, 'id': '2', 'license': 0, 'width': 669}]
Counts - images: 1779953, annotations: 1779953, categories: 64500


df_img columns: ['file_name', 'height', 'id', 'license', 'width']
df_anno columns: ['category_id', 'id', 'image_id', 'institution_id']
df_cat columns: ['family', 'id', 'name', 'order']


Train rows: 1779953
Unique classes: 64500
Label distribution (head):
category_id
42811    2647
25229    1713
48372    1630
42843    1324
22344    1260
42580    1152
11117    1091
56357    1058
26119    1007
58922     987
Name: count, dtype: int64
sample_submission head:
   Id  Predicted
0   0          0
1   1          0
2   2          0
3   3          0
4   4          0
Done audit at 2025-09-27 04:39:49


In [2]:
# Safety-net baseline: Global majority-class submission
import json, pandas as pd
from pathlib import Path

base = Path('.')
train_meta = json.load(open(base/'train'/'metadata.json','r'))
test_meta = json.load(open(base/'test'/'metadata.json','r'))

df_img = pd.DataFrame(train_meta['images'])
df_anno = pd.DataFrame(train_meta['annotations'])
df = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')
mode_label = int(df['category_id'].mode().iloc[0])
print('Global mode category_id:', mode_label)

ss = pd.read_csv(base/'sample_submission.csv')
print('Sample submission shape:', ss.shape)
sub = ss.copy()
sub['Predicted'] = mode_label
sub.to_csv('submission.csv', index=False)
print('Wrote submission.csv with global majority class. Head:')
print(sub.head())

Global mode category_id: 42811
Sample submission shape: (477806, 2)
Wrote submission.csv with global majority class. Head:
   Id  Predicted
0   0      42811
1   1      42811
2   2      42811
3   3      42811
4   4      42811


In [3]:
# Metadata-only F0-conditioned baseline
import json, pandas as pd, numpy as np
from pathlib import Path

base = Path('.')
train_meta = json.load(open(base/'train'/'metadata.json','r'))
test_meta = json.load(open(base/'test'/'metadata.json','r'))
ss = pd.read_csv(base/'sample_submission.csv')

df_img = pd.DataFrame(train_meta['images'])[['file_name','width','height','id']]
df_anno = pd.DataFrame(train_meta['annotations'])[['image_id','category_id']]
df_tr = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')

def parse_F0_F1(s):
    # s like 'images/604/92/1608432.jpg'
    parts = s.split('/')
    f0 = parts[1] if len(parts) > 1 else ''
    f1 = parts[2] if len(parts) > 2 else ''
    return f0, f1

F0_F1 = df_tr['file_name'].map(parse_F0_F1)
df_tr['F0'] = [t[0] for t in F0_F1]
df_tr['F1'] = [t[1] for t in F0_F1]
df_tr['aspect'] = (df_tr['width'] / df_tr['height']).astype(float)
width_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
height_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
df_tr['wb'] = pd.cut(df_tr['width'], bins=width_bins, include_lowest=True).astype(str)
df_tr['hb'] = pd.cut(df_tr['height'], bins=height_bins, include_lowest=True).astype(str)

global_mode = int(df_tr['category_id'].mode().iloc[0])
print('Global mode:', global_mode)

def mode_map(df, keys):
    # returns dict mapping tuple(keys) -> modal category_id
    grp = df.groupby(keys)['category_id'].agg(lambda x: x.value_counts().idxmax())
    return grp.to_dict()

m_F0_wb_hb = mode_map(df_tr, ['F0','wb','hb'])
m_F0_wb = mode_map(df_tr, ['F0','wb'])
m_F0_hb = mode_map(df_tr, ['F0','hb'])
m_F0 = mode_map(df_tr, ['F0'])

# Build test features
df_te = pd.DataFrame(test_meta['images'])[['id','file_name','width','height']].copy()
F0_F1_te = df_te['file_name'].map(parse_F0_F1)
df_te['F0'] = [t[0] for t in F0_F1_te]
df_te['F1'] = [t[1] for t in F0_F1_te]
df_te['wb'] = pd.cut(df_te['width'], bins=width_bins, include_lowest=True).astype(str)
df_te['hb'] = pd.cut(df_te['height'], bins=height_bins, include_lowest=True).astype(str)

def predict_row(r):
    k3 = (r['F0'], r['wb'], r['hb'])
    if k3 in m_F0_wb_hb: return int(m_F0_wb_hb[k3])
    k2 = (r['F0'], r['wb'])
    if k2 in m_F0_wb: return int(m_F0_wb[k2])
    k2b = (r['F0'], r['hb'])
    if k2b in m_F0_hb: return int(m_F0_hb[k2b])
    k1 = (r['F0'],)
    if k1 in m_F0: return int(m_F0[r['F0']])
    return global_mode

t0 = pd.Timestamp.now()
preds = df_te.apply(predict_row, axis=1)
elapsed = (pd.Timestamp.now() - t0).total_seconds()
print(f'Predicted {len(preds)} rows in {elapsed:.2f}s')

# Align to sample submission order
sub = ss.copy()
id2pred = dict(zip(df_te['id'].astype(str), preds))
sub['Predicted'] = sub['Id'].astype(str).map(id2pred).fillna(global_mode).astype(int)
sub.to_csv('submission.csv', index=False)
sub.to_csv('submission_f0.csv', index=False)
print('Wrote submission.csv and submission_f0.csv. Head:')
print(sub.head())
print('Coverage check:')
covered = (sub['Predicted'] != global_mode).mean()
print('Fraction not falling back to global:', f'{covered:.3f}')

Global mode: 42811


Predicted 477806 rows in 2.47s


Wrote submission.csv and submission_f0.csv. Head:
   Id  Predicted
0   0         23
1   1         23
2   2         23
3   3         23
4   4         23
Coverage check:
Fraction not falling back to global: 0.998


In [4]:
# Enhanced metadata baseline: add F1 and aspect bins to fallback chain
import json, pandas as pd, numpy as np
from pathlib import Path

base = Path('.')
train_meta = json.load(open(base/'train'/'metadata.json','r'))
test_meta = json.load(open(base/'test'/'metadata.json','r'))
ss = pd.read_csv(base/'sample_submission.csv')

df_img = pd.DataFrame(train_meta['images'])[['file_name','width','height','id']]
df_anno = pd.DataFrame(train_meta['annotations'])[['image_id','category_id']]
df_tr = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')

def parse_F0_F1(s):
    parts = s.split('/')
    f0 = parts[1] if len(parts) > 1 else ''
    f1 = parts[2] if len(parts) > 2 else ''
    return f0, f1

F0_F1 = df_tr['file_name'].map(parse_F0_F1)
df_tr['F0'] = [t[0] for t in F0_F1]
df_tr['F1'] = [t[1] for t in F0_F1]
df_tr['aspect'] = (df_tr['width'] / df_tr['height']).astype(float)
width_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
height_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
aspect_bins = [0, 0.6, 0.75, 0.9, 1.0, 1.1, 1.3, 2.5]
df_tr['wb'] = pd.cut(df_tr['width'], bins=width_bins, include_lowest=True).astype(str)
df_tr['hb'] = pd.cut(df_tr['height'], bins=height_bins, include_lowest=True).astype(str)
df_tr['ab'] = pd.cut(df_tr['aspect'], bins=aspect_bins, include_lowest=True).astype(str)

global_mode = int(df_tr['category_id'].mode().iloc[0])
print('Global mode:', global_mode)

def mode_map(df, keys):
    grp = df.groupby(keys)['category_id'].agg(lambda x: x.value_counts().idxmax())
    return grp.to_dict()

# Build maps with increasing specificity
m_F0_F1_wb_hb = mode_map(df_tr, ['F0','F1','wb','hb'])
m_F0_F1_ab   = mode_map(df_tr, ['F0','F1','ab'])
m_F0_wb_hb   = mode_map(df_tr, ['F0','wb','hb'])
m_F0_ab      = mode_map(df_tr, ['F0','ab'])
m_F0_F1      = mode_map(df_tr, ['F0','F1'])
m_F0_wb      = mode_map(df_tr, ['F0','wb'])
m_F0_hb      = mode_map(df_tr, ['F0','hb'])
m_F0         = mode_map(df_tr, ['F0'])

# Test features
df_te = pd.DataFrame(test_meta['images'])[['id','file_name','width','height']].copy()
F0_F1_te = df_te['file_name'].map(parse_F0_F1)
df_te['F0'] = [t[0] for t in F0_F1_te]
df_te['F1'] = [t[1] for t in F0_F1_te]
df_te['aspect'] = (df_te['width'] / df_te['height']).astype(float)
df_te['wb'] = pd.cut(df_te['width'], bins=width_bins, include_lowest=True).astype(str)
df_te['hb'] = pd.cut(df_te['height'], bins=height_bins, include_lowest=True).astype(str)
df_te['ab'] = pd.cut(df_te['aspect'], bins=aspect_bins, include_lowest=True).astype(str)

levels = [
    ('F0_F1_wb_hb', lambda r: m_F0_F1_wb_hb.get((r['F0'], r['F1'], r['wb'], r['hb']))),
    ('F0_F1_ab',    lambda r: m_F0_F1_ab.get((r['F0'], r['F1'], r['ab']))),
    ('F0_wb_hb',    lambda r: m_F0_wb_hb.get((r['F0'], r['wb'], r['hb']))),
    ('F0_ab',       lambda r: m_F0_ab.get((r['F0'], r['ab']))),
    ('F0_F1',       lambda r: m_F0_F1.get((r['F0'], r['F1']))),
    ('F0_wb',       lambda r: m_F0_wb.get((r['F0'], r['wb']))),
    ('F0_hb',       lambda r: m_F0_hb.get((r['F0'], r['hb']))),
    ('F0',          lambda r: m_F0.get((r['F0'],)))
]

pred = np.full(len(df_te), global_mode, dtype=np.int64)
covered = np.zeros(len(df_te), dtype=bool)

for name, fn in levels:
    if covered.all():
        break
    idx = np.where(~covered)[0]
    # vectorized-ish apply on remaining subset
    vals = [fn(df_te.iloc[i]) for i in idx]
    take = [v is not None for v in vals]
    if any(take):
        pred_idx = np.array(idx)[np.array(take)]
        pred_vals = np.array([int(v) for v in np.array(vals, dtype=object)[np.array(take)]])
        pred[pred_idx] = pred_vals
        covered[pred_idx] = True
    print(f'Level {name}: newly covered {covered.mean():.3f} cumul.')

sub = ss.copy()
id2pred = dict(zip(df_te['id'].astype(str), pred.tolist()))
sub['Predicted'] = sub['Id'].astype(str).map(id2pred).fillna(global_mode).astype(int)
sub.to_csv('submission.csv', index=False)
sub.to_csv('submission_f0_ext.csv', index=False)
print('Wrote submission.csv and submission_f0_ext.csv. Head:')
print(sub.head())

Global mode: 42811


Level F0_F1_wb_hb: newly covered 0.000 cumul.


Level F0_F1_ab: newly covered 0.000 cumul.


Level F0_wb_hb: newly covered 0.994 cumul.
Level F0_ab: newly covered 0.994 cumul.
Level F0_F1: newly covered 0.994 cumul.


Level F0_wb: newly covered 0.994 cumul.
Level F0_hb: newly covered 1.000 cumul.
Level F0: newly covered 1.000 cumul.


Wrote submission.csv and submission_f0_ext.csv. Head:
   Id  Predicted
0   0         23
1   1         23
2   2         23
3   3         23
4   4         23


In [5]:
# Install Torch cu121 stack and vision deps; verify GPU
import os, sys, subprocess, shutil
from pathlib import Path

def pip(*args):
    print('> pip', ' '.join(args), flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

for pkg in ('torch','torchvision','torchaudio'):
    subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', pkg], check=False)

for d in (
    '/app/.pip-target/torch',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
):
    if os.path.exists(d):
        print('Removing', d)
        shutil.rmtree(d, ignore_errors=True)

pip('install', '--no-cache-dir', '--index-url', 'https://download.pytorch.org/whl/cu121', 'torch==2.4.1', 'torchvision==0.19.1', 'torchaudio==2.4.1')
Path('constraints.txt').write_text('torch==2.4.1\ntorchvision==0.19.1\ntorchaudio==2.4.1\n')
pip('install', '-c', 'constraints.txt', 'timm==1.0.9', 'albumentations==1.4.10', 'opencv-python-headless==4.10.0.84', '--upgrade-strategy', 'only-if-needed')

import torch, torchvision
print('torch:', torch.__version__, 'cuda build:', getattr(torch.version,'cuda', None), 'CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))
else:
    raise SystemExit('CUDA not available; cannot proceed to CNN training')





> pip install --no-cache-dir --index-url https://download.pytorch.org/whl/cu121 torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1




Looking in indexes: https://download.pytorch.org/whl/cu121


Collecting torch==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torch-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (799.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 MB 403.7 MB/s eta 0:00:00


Collecting torchvision==0.19.1
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.19.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 446.7 MB/s eta 0:00:00


Collecting torchaudio==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 89.8 MB/s eta 0:00:00


Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 246.8 MB/s eta 0:00:00


Collecting nvidia-nccl-cu12==2.20.5
  Downloading https://download.pytorch.org/whl/cu121/nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 330.6 MB/s eta 0:00:00


Collecting filelock
  Downloading https://download.pytorch.org/whl/filelock-3.13.1-py3-none-any.whl (11 kB)


Collecting sympy
  Downloading https://download.pytorch.org/whl/sympy-1.13.3-py3-none-any.whl (6.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 531.1 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 416.8 MB/s eta 0:00:00


Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 285.6 MB/s eta 0:00:00


Collecting fsspec
  Downloading https://download.pytorch.org/whl/fsspec-2024.6.1-py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.6/177.6 KB 476.1 MB/s eta 0:00:00


Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 313.1 MB/s eta 0:00:00


Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 193.9 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading https://download.pytorch.org/whl/triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 374.3 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 502.1 MB/s eta 0:00:00


Collecting typing-extensions>=4.8.0
  Downloading https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl (37 kB)


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading https://download.pytorch.org/whl/cu121/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 454.9 MB/s eta 0:00:00


Collecting jinja2
  Downloading https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 KB 457.2 MB/s eta 0:00:00


Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 193.9 MB/s eta 0:00:00


Collecting networkx
  Downloading https://download.pytorch.org/whl/networkx-3.3-py3-none-any.whl (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 266.3 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 449.0 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
  Downloading https://download.pytorch.org/whl/cu121/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 317.0 MB/s eta 0:00:00


Collecting numpy
  Downloading https://download.pytorch.org/whl/numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 341.8 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading https://download.pytorch.org/whl/pillow-11.0.0-cp311-cp311-manylinux_2_28_x86_64.whl (4.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 275.3 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading https://download.pytorch.org/whl/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 381.6 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading https://download.pytorch.org/whl/MarkupSafe-2.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28 kB)


Collecting mpmath<1.4,>=1.1.0
  Downloading https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 539.2 MB/s eta 0:00:00


Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio


Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 numpy-1.26.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.0.0 sympy-1.13.3 torch-2.4.1+cu121 torchaudio-2.4.1+cu121 torchvision-0.19.1+cu121 triton-3.0.0 typing-extensions-4.12.2


> pip install -c constraints.txt timm==1.0.9 albumentations==1.4.10 opencv-python-headless==4.10.0.84 --upgrade-strategy only-if-needed


Collecting timm==1.0.9
  Downloading timm-1.0.9-py3-none-any.whl (2.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 58.5 MB/s eta 0:00:00
Collecting albumentations==1.4.10
  Downloading albumentations-1.4.10-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.9/161.9 KB 423.9 MB/s eta 0:00:00


Collecting opencv-python-headless==4.10.0.84
  Downloading opencv_python_headless-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.9/49.9 MB 188.8 MB/s eta 0:00:00
Collecting huggingface_hub
  Downloading huggingface_hub-0.35.1-py3-none-any.whl (563 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 563.3/563.3 KB 522.2 MB/s eta 0:00:00


Collecting safetensors
  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.8/485.8 KB 431.1 MB/s eta 0:00:00
Collecting pyyaml
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 514.0 MB/s eta 0:00:00
Collecting torch
  Downloading torch-2.4.1-cp311-cp311-manylinux1_x86_64.whl (797.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 208.3 MB/s eta 0:00:00


Collecting torchvision
  Downloading torchvision-0.19.1-cp311-cp311-manylinux1_x86_64.whl (7.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 234.1 MB/s eta 0:00:00
Collecting typing-extensions>=4.9.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 382.0 MB/s eta 0:00:00
Collecting scikit-image>=0.21.0
  Downloading scikit_image-0.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/14.8 MB 141.6 MB/s eta 0:00:00


Collecting pydantic>=2.7.0
  Downloading pydantic-2.11.9-py3-none-any.whl (444 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 444.9/444.9 KB 495.6 MB/s eta 0:00:00
Collecting albucore>=0.0.11
  Downloading albucore-0.0.33-py3-none-any.whl (18 kB)


Collecting scikit-learn>=1.3.2
  Downloading scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (9.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 159.4 MB/s eta 0:00:00


Collecting numpy<2,>=1.24.4
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 114.6 MB/s eta 0:00:00


Collecting scipy>=1.10.0
  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 197.9 MB/s eta 0:00:00


Collecting stringzilla>=3.10.4
  Downloading stringzilla-4.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (496 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 496.5/496.5 KB 528.9 MB/s eta 0:00:00


Collecting simsimd>=5.9.2
  Downloading simsimd-6.5.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 135.3 MB/s eta 0:00:00
Collecting typing-inspection>=0.4.0
  Downloading typing_inspection-0.4.1-py3-none-any.whl (14 kB)
Collecting annotated-types>=0.6.0
  Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)


Collecting pydantic-core==2.33.2
  Downloading pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 398.1 MB/s eta 0:00:00
Collecting lazy-loader>=0.4
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)
Collecting tifffile>=2022.8.12
  Downloading tifffile-2025.9.20-py3-none-any.whl (230 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.1/230.1 KB 368.6 MB/s eta 0:00:00
Collecting imageio!=2.35.0,>=2.33
  Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.8/315.8 KB 411.7 MB/s eta 0:00:00


Collecting packaging>=21
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 368.1 MB/s eta 0:00:00


Collecting pillow>=10.1
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 239.1 MB/s eta 0:00:00
Collecting networkx>=3.0
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 506.4 MB/s eta 0:00:00
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 308.4/308.4 KB 482.1 MB/s eta 0:00:00


Collecting tqdm>=4.42.1
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 447.9 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 398.4 MB/s eta 0:00:00
Collecting hf-xet<2.0.0,>=1.1.3
  Downloading hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 222.6 MB/s eta 0:00:00
Collecting fsspec>=2023.5.0
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 508.4 MB/s eta 0:00:00


Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 206.5 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 140.4 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 467.9 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 197.6 MB/s eta 0:00:00


Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 224.5 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 191.4 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 133.8 MB/s eta 0:00:00


Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 156.7 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 186.1 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 445.0 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 468.1 MB/s eta 0:00:00
Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 257.0 MB/s eta 0:00:00
Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 221.6 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 206.1 MB/s eta 0:00:00
Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 148.4 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 438.8 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 485.0 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 472.4 MB/s eta 0:00:00


Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 162.5 MB/s eta 0:00:00
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 486.7 MB/s eta 0:00:00


Installing collected packages: simsimd, mpmath, urllib3, typing-extensions, tqdm, threadpoolctl, sympy, stringzilla, safetensors, pyyaml, pillow, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, joblib, idna, hf-xet, fsspec, filelock, charset_normalizer, certifi, annotated-types, typing-inspection, triton, tifffile, scipy, requests, pydantic-core, opencv-python-headless, nvidia-cusparse-cu12, nvidia-cudnn-cu12, lazy-loader, jinja2, imageio, scikit-learn, scikit-image, pydantic, nvidia-cusolver-cu12, huggingface_hub, albucore, torch, albumentations, torchvision, timm


Successfully installed MarkupSafe-3.0.2 albucore-0.0.33 albumentations-1.4.10 annotated-types-0.7.0 certifi-2025.8.3 charset_normalizer-3.4.3 filelock-3.19.1 fsspec-2025.9.0 hf-xet-1.1.10 huggingface_hub-0.35.1 idna-3.10 imageio-2.37.0 jinja2-3.1.6 joblib-1.5.2 lazy-loader-0.4 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 opencv-python-headless-4.10.0.84 packaging-25.0 pillow-11.3.0 pydantic-2.11.9 pydantic-core-2.33.2 pyyaml-6.0.3 requests-2.32.5 safetensors-0.6.2 scikit-image-0.25.2 scikit-learn-1.7.2 scipy-1.16.2 simsimd-6.5.3 stringzilla-4.0.14 sympy-1.14.0 threadpoolctl-3.6.0 tifffile-2025.9.20 timm-1.0.9 torch-2.4.1 torchvision-0.19.1 tqd







torch: 2.4.1+cu121 cuda build: 12.1 CUDA available: True
GPU: NVIDIA A10-24Q


In [7]:
# Dummy-ready CNN pipeline scaffold (hot-swap to real images when available)
import os, json, math, random, time, gc
from pathlib import Path
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
import torchvision.transforms as T
from PIL import Image
import timm

base = Path('.')
train_dir = base/'train'
test_dir = base/'test'

# Build train/test DataFrames
train_meta = json.load(open(train_dir/'metadata.json','r'))
test_meta = json.load(open(test_dir/'metadata.json','r'))
df_img = pd.DataFrame(train_meta['images'])[['id','file_name','width','height']]
df_anno = pd.DataFrame(train_meta['annotations'])[['image_id','category_id']]
df_cat = pd.DataFrame(train_meta['categories'])[['id','name','family','order']]
df_train = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')
df_train['path'] = df_train['file_name'].apply(lambda s: str(train_dir / s))
df_test = pd.DataFrame(test_meta['images'])[['id','file_name','width','height']].copy()
df_test['path'] = df_test['file_name'].apply(lambda s: str(test_dir / s))

# Label encoding
unique_cids = sorted(df_train['category_id'].unique())
cid2idx = {c:i for i,c in enumerate(unique_cids)}
idx2cid = np.array(unique_cids, dtype=np.int64)
df_train['label'] = df_train['category_id'].map(cid2idx).astype(np.int64)
num_classes = len(unique_cids)
print('Train rows:', len(df_train), 'Num classes:', num_classes)

def count_jpgs():
    # quick diagnostics for image availability
    import subprocess
    cmd = "bash -lc 'shopt -s nullglob; arr=(train/images/*/*/*.jpg); echo ${#arr[@]}; arr=(test/images/*/*.jpg); echo ${#arr[@]}'"
    out = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True).stdout.strip().splitlines()
    if len(out)>=2:
        print(f'jpg counts \u2192 train: {out[0]}, test: {out[1]}')
    else:
        print('jpg count check unavailable')
count_jpgs()

class HerbariumDataset(Dataset):
    def __init__(self, df, mode='train', img_size=384):
        self.df = df.reset_index(drop=True)
        self.mode = mode
        self.img_size = img_size
        self.tf_train = T.Compose([
            T.RandomResizedCrop(img_size, scale=(0.7,1.0), ratio=(0.75,1.33)),
            T.RandomHorizontalFlip(p=0.5),
            T.ToTensor(),
            T.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
        ])
        self.tf_test = T.Compose([
            T.Resize(img_size, interpolation=T.InterpolationMode.BICUBIC),
            T.CenterCrop(img_size),
            T.ToTensor(),
            T.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
        ])
    def __len__(self):
        return len(self.df)
    def _load_image(self, path):
        try:
            if os.path.exists(path):
                with Image.open(path) as im:
                    return im.convert('RGB')
        except Exception as e:
            pass
        # Fallback dummy if file missing/unreadable
        return Image.new('RGB', (self.img_size, self.img_size), (0,0,0))
    def __getitem__(self, idx):
        r = self.df.iloc[idx]
        img = self._load_image(r['path'])
        if self.mode == 'train':
            img = self.tf_train(img)
            label = int(r['label'])
            return img, label
        else:
            img = self.tf_test(img)
            return img, str(r['id'])

def make_model(backbone='convnext_tiny', num_classes=1):
        model = timm.create_model(backbone, pretrained=True, num_classes=num_classes)
        model = model.to('cuda' if torch.cuda.is_available() else 'cpu')
        return model

def make_sampler(labels, power=0.5):
    # inverse sqrt frequency weights by default
    vals, counts = np.unique(labels, return_counts=True)
    freq = np.zeros(labels.max()+1, dtype=np.float64)
    freq[vals] = counts
    w = 1.0 / np.clip(freq, 1, None)**power
    weights = w[labels]
    return WeightedRandomSampler(weights=torch.as_tensor(weights, dtype=torch.float32), num_samples=len(labels), replacement=True)

def train_smoke(df_train, epochs=1, batch_size=32, img_size=384, max_rows=4096):
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    torch.backends.cudnn.benchmark = True
    df_small = df_train.sample(n=min(max_rows, len(df_train)), random_state=42) if len(df_train) > max_rows else df_train.copy()
    ds = HerbariumDataset(df_small, mode='train', img_size=img_size)
    sampler = make_sampler(df_small['label'].values)
    dl = DataLoader(ds, batch_size=batch_size, sampler=sampler, num_workers=2, pin_memory=True)
    model = make_model('convnext_tiny', num_classes=num_classes)
    criterion = nn.CrossEntropyLoss(label_smoothing=0.1)
    optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4, weight_decay=0.02)
    scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())
    model.train()
    t0 = time.time()
    for ep in range(epochs):
        running = 0.0
        n = 0
        for bi, (x,y) in enumerate(dl):
            x = x.to(device, non_blocking=True)
            y = torch.as_tensor(y, device=device)
            optimizer.zero_grad(set_to_none=True)
            with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):
                logits = model(x)
                loss = criterion(logits, y)
            scaler.scale(loss).step(optimizer)
            scaler.update()
            running += loss.item()*x.size(0)
            n += x.size(0)
            if (bi+1) % 50 == 0:
                elapsed = time.time()-t0
                print(f'Epoch {ep} Batch {bi+1}/{len(dl)} loss {running/max(n,1):.4f} elapsed {elapsed:.1f}s', flush=True)
        print(f'Epoch {ep} done. Avg loss {running/max(n,1):.4f}')
    return model

def infer_test(model, df_test, batch_size=64, img_size=384):
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    torch.backends.cudnn.benchmark = True
    ds = HerbariumDataset(df_test, mode='test', img_size=img_size)
    dl = DataLoader(ds, batch_size=batch_size, shuffle=False, num_workers=2, pin_memory=True)
    model.eval()
    preds = []
    ids = []
    t0 = time.time()
    with torch.no_grad():
        for bi, (x, id_batch) in enumerate(dl):
            x = x.to(device, non_blocking=True)
            with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):
                logits = model(x)
            pred_idx = torch.argmax(logits, dim=1).detach().cpu().numpy()
            preds.append(pred_idx)
            ids.extend(list(id_batch))
            if (bi+1) % 50 == 0:
                print(f'Infer batch {bi+1}/{len(dl)} elapsed {time.time()-t0:.1f}s')
    preds = np.concatenate(preds) if preds else np.array([], dtype=np.int64)
    pred_cids = idx2cid[preds] if len(preds)>0 else np.array([], dtype=np.int64)
    sub = pd.read_csv(base/'sample_submission.csv')
    id2pred = dict(zip(ids, pred_cids.tolist()))
    sub['Predicted'] = sub['Id'].astype(str).map(id2pred).fillna(int(df_train['category_id'].mode().iloc[0])).astype(int)
    return sub

print('CNN scaffold ready (PIL + torchvision transforms). Functions: train_smoke(), infer_test(). No training executed by default.')

  from .autonotebook import tqdm as notebook_tqdm


Train rows: 1779953 Num classes: 64500


jpg counts → train: 1779953, test: 477806
CNN scaffold ready (torchvision transforms). Functions: train_smoke(), infer_test(). No training executed by default.


In [8]:
# Smoke-run CNN scaffold (no training) and write submission_dummy.csv (small test subset)
print('Starting CNN smoke run (no training) ...')
model = train_smoke(df_train, epochs=0, batch_size=8, img_size=224, max_rows=64)

# Use a tiny subset of test to avoid heavy inference; fill rest with global mode
df_te_small = df_test.sample(n=min(256, len(df_test)), random_state=42).copy()
sub_dummy = infer_test(model, df_te_small, batch_size=64, img_size=224)
sub_dummy.to_csv('submission_dummy.csv', index=False)
print('Wrote submission_dummy.csv (from small test subset + global mode fallback). Head:')
print(sub_dummy.head())
print('Done.')

Starting CNN smoke run (no training) ...


INFO:timm.models._builder:Loading pretrained weights from Hugging Face hub (timm/convnext_tiny.in12k_ft_in1k)


INFO:timm.models._hub:[timm/convnext_tiny.in12k_ft_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.


INFO:timm.models._builder:Missing keys (head.fc.weight, head.fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.


  scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())


  with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):


Wrote submission_dummy.csv (from small test subset + global mode fallback). Head:
   Id  Predicted
0   0      42811
1   1      42811
2   2      42811
3   3      42811
4   4      42811
Done.


In [9]:
# Smoothed metadata baseline: min-count gating on fallback chain
import json, pandas as pd, numpy as np
from pathlib import Path

base = Path('.')
train_meta = json.load(open(base/'train'/'metadata.json','r'))
test_meta = json.load(open(base/'test'/'metadata.json','r'))
ss = pd.read_csv(base/'sample_submission.csv')

df_img = pd.DataFrame(train_meta['images'])[['file_name','width','height','id']]
df_anno = pd.DataFrame(train_meta['annotations'])[['image_id','category_id']]
df_tr = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')

def parse_F0_F1(s):
    parts = s.split('/')
    f0 = parts[1] if len(parts) > 1 else ''
    f1 = parts[2] if len(parts) > 2 else ''
    return f0, f1

F0_F1 = df_tr['file_name'].map(parse_F0_F1)
df_tr['F0'] = [t[0] for t in F0_F1]
df_tr['F1'] = [t[1] for t in F0_F1]
df_tr['aspect'] = (df_tr['width'] / df_tr['height']).astype(float)
width_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
height_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
aspect_bins = [0, 0.6, 0.75, 0.9, 1.0, 1.1, 1.3, 2.5]
df_tr['wb'] = pd.cut(df_tr['width'], bins=width_bins, include_lowest=True).astype(str)
df_tr['hb'] = pd.cut(df_tr['height'], bins=height_bins, include_lowest=True).astype(str)
df_tr['ab'] = pd.cut(df_tr['aspect'], bins=aspect_bins, include_lowest=True).astype(str)

global_mode = int(df_tr['category_id'].mode().iloc[0])
print('Global mode:', global_mode)

def mode_map_thresh(df, keys, min_count=10):
    grp = df.groupby(keys)['category_id']
    counts = grp.size().rename('n')
    modal = grp.agg(lambda x: x.value_counts().idxmax())
    m = pd.concat([modal, counts], axis=1).reset_index()
    m = m[m['n'] >= min_count]
    # build dict: key_tuple -> modal category
    keycols = keys.copy()
    m['key'] = list(map(tuple, m[keycols].values.tolist()))
    return dict(zip(m['key'], m['category_id']))

min_n = 10
m_F0_F1_wb_hb = mode_map_thresh(df_tr, ['F0','F1','wb','hb'], min_n)
m_F0_F1_ab   = mode_map_thresh(df_tr, ['F0','F1','ab'], min_n)
m_F0_wb_hb   = mode_map_thresh(df_tr, ['F0','wb','hb'], min_n)
m_F0_ab      = mode_map_thresh(df_tr, ['F0','ab'], min_n)
m_F0_F1      = mode_map_thresh(df_tr, ['F0','F1'], min_n)
m_F0_wb      = mode_map_thresh(df_tr, ['F0','wb'], min_n)
m_F0_hb      = mode_map_thresh(df_tr, ['F0','hb'], min_n)
m_F0         = mode_map_thresh(df_tr, ['F0'], min_n)

df_te = pd.DataFrame(test_meta['images'])[['id','file_name','width','height']].copy()
F0_F1_te = df_te['file_name'].map(parse_F0_F1)
df_te['F0'] = [t[0] for t in F0_F1_te]
df_te['F1'] = [t[1] for t in F0_F1_te]
df_te['aspect'] = (df_te['width'] / df_te['height']).astype(float)
df_te['wb'] = pd.cut(df_te['width'], bins=width_bins, include_lowest=True).astype(str)
df_te['hb'] = pd.cut(df_te['height'], bins=height_bins, include_lowest=True).astype(str)
df_te['ab'] = pd.cut(df_te['aspect'], bins=aspect_bins, include_lowest=True).astype(str)

levels = [
    ('F0_F1_wb_hb', lambda r: m_F0_F1_wb_hb.get((r['F0'], r['F1'], r['wb'], r['hb']))),
    ('F0_F1_ab',    lambda r: m_F0_F1_ab.get((r['F0'], r['F1'], r['ab']))),
    ('F0_wb_hb',    lambda r: m_F0_wb_hb.get((r['F0'], r['wb'], r['hb']))),
    ('F0_ab',       lambda r: m_F0_ab.get((r['F0'], r['ab']))),
    ('F0_F1',       lambda r: m_F0_F1.get((r['F0'], r['F1']))),
    ('F0_wb',       lambda r: m_F0_wb.get((r['F0'], r['wb']))),
    ('F0_hb',       lambda r: m_F0_hb.get((r['F0'], r['hb']))),
    ('F0',          lambda r: m_F0.get((r['F0'],)))
]

pred = np.full(len(df_te), global_mode, dtype=np.int64)
covered = np.zeros(len(df_te), dtype=bool)

for name, fn in levels:
    if covered.all():
        break
    idx = np.where(~covered)[0]
    vals = [fn(df_te.iloc[i]) for i in idx]
    take = [v is not None for v in vals]
    if any(take):
        pred_idx = np.array(idx)[np.array(take)]
        pred_vals = np.array([int(v) for v in np.array(vals, dtype=object)[np.array(take)]])
        pred[pred_idx] = pred_vals
        covered[pred_idx] = True
    print(f'Level {name}: covered {covered.mean():.3f}')

sub = ss.copy()
id2pred = dict(zip(df_te['id'].astype(str), pred.tolist()))
sub['Predicted'] = sub['Id'].astype(str).map(id2pred).fillna(global_mode).astype(int)
sub.to_csv('submission.csv', index=False)
sub.to_csv('submission_f0_smoothed.csv', index=False)
print('Wrote submission.csv and submission_f0_smoothed.csv. Head:')
print(sub.head())

Global mode: 42811


Level F0_F1_wb_hb: covered 0.000


Level F0_F1_ab: covered 0.000


Level F0_wb_hb: covered 0.990
Level F0_ab: covered 0.991


Level F0_F1: covered 0.991
Level F0_wb: covered 0.991


Level F0_hb: covered 1.000
Level F0: covered 1.000


Wrote submission.csv and submission_f0_smoothed.csv. Head:
   Id  Predicted
0   0         23
1   1         23
2   2         23
3   3         23
4   4         23


In [10]:
# Per-F0 Naive Bayes baseline with Laplace smoothing and top-K classes
import json, pandas as pd, numpy as np, time
from pathlib import Path

base = Path('.')
train_meta = json.load(open(base/'train'/'metadata.json','r'))
test_meta = json.load(open(base/'test'/'metadata.json','r'))
ss = pd.read_csv(base/'sample_submission.csv')

# Prepare train features
df_img = pd.DataFrame(train_meta['images'])[['file_name','width','height','id']]
df_anno = pd.DataFrame(train_meta['annotations'])[['image_id','category_id']]
df_tr = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')

def parse_F0_F1(s):
    parts = s.split('/')
    f0 = parts[1] if len(parts) > 1 else ''
    f1 = parts[2] if len(parts) > 2 else ''
    return f0, f1

F0_F1 = df_tr['file_name'].map(parse_F0_F1)
df_tr['F0'] = [t[0] for t in F0_F1]
df_tr['F1'] = [t[1] for t in F0_F1]
df_tr['aspect'] = (df_tr['width'] / df_tr['height']).astype(float)
width_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
height_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
aspect_bins = [0, 0.6, 0.75, 0.9, 1.0, 1.1, 1.3, 2.5]
df_tr['wb'] = pd.cut(df_tr['width'], bins=width_bins, include_lowest=True).astype(str)
df_tr['hb'] = pd.cut(df_tr['height'], bins=height_bins, include_lowest=True).astype(str)
df_tr['ab'] = pd.cut(df_tr['aspect'], bins=aspect_bins, include_lowest=True).astype(str)

# Fixed bin label universes (consistent mapping train/test)
wb_labels = pd.cut(pd.Series(width_bins[:-1]) + 1e-6, bins=width_bins, include_lowest=True).astype(str).unique().tolist()
hb_labels = pd.cut(pd.Series(height_bins[:-1]) + 1e-6, bins=height_bins, include_lowest=True).astype(str).unique().tolist()
ab_labels = pd.cut(pd.Series(aspect_bins[:-1]) + 1e-6, bins=aspect_bins, include_lowest=True).astype(str).unique().tolist()
wb2i = {l:i for i,l in enumerate(wb_labels)}
hb2i = {l:i for i,l in enumerate(hb_labels)}
ab2i = {l:i for i,l in enumerate(ab_labels)}
Bwb, Bhb, Bab = len(wb2i), len(hb2i), len(ab2i)

global_mode = int(df_tr['category_id'].mode().iloc[0])
print('Global mode:', global_mode)

# Build per-F0 NB parameters
alpha = 1.0
topK = 200
models = {}  # F0 -> dict with classes, log_prior, logPwb[Bwb,K], logPhb[Bhb,K], logPab[Bab,K]
t0 = time.time()
for f0, g in df_tr.groupby('F0', sort=False):
    cls_counts = g['category_id'].value_counts()
    classes = cls_counts.head(topK).index.values.astype(np.int64)
    counts = cls_counts.head(topK).values.astype(np.float64)
    C = len(classes)
    if C == 0:
        continue
    prior = (counts + alpha) / (counts.sum() + alpha * C)
    # Initialize count matrices with alpha for smoothing
    Cwb = np.full((Bwb, C), alpha, dtype=np.float64)
    Chb = np.full((Bhb, C), alpha, dtype=np.float64)
    Cab = np.full((Bab, C), alpha, dtype=np.float64)
    # Precompute bin indices for group
    gi_wb = g['wb'].map(wb2i).fillna(-1).astype(int).values
    gi_hb = g['hb'].map(hb2i).fillna(-1).astype(int).values
    gi_ab = g['ab'].map(ab2i).fillna(-1).astype(int).values
    gi_cls = g['category_id'].values
    # For each candidate class, accumulate bin counts
    cls2pos = {c:i for i,c in enumerate(classes)}
    for idx_row in range(len(g)):
        c = gi_cls[idx_row]
        j = cls2pos.get(c, None)
        if j is None:
            continue
        iw, ih, ia = gi_wb[idx_row], gi_hb[idx_row], gi_ab[idx_row]
        if iw >= 0: Cwb[iw, j] += 1.0
        if ih >= 0: Chb[ih, j] += 1.0
        if ia >= 0: Cab[ia, j] += 1.0
    # Normalize to probabilities and take logs
    log_prior = np.log(prior + 1e-12)
    logPwb = np.log(Cwb / Cwb.sum(axis=0, keepdims=True))
    logPhb = np.log(Chb / Chb.sum(axis=0, keepdims=True))
    logPab = np.log(Cab / Cab.sum(axis=0, keepdims=True))
    models[f0] = dict(classes=classes, log_prior=log_prior, logPwb=logPwb, logPhb=logPhb, logPab=logPab)
    if len(models) % 50 == 0:
        print(f'Built NB for {len(models)} F0 shards, elapsed {time.time()-t0:.1f}s', flush=True)
print(f'Total F0 shards modeled: {len(models)} in {time.time()-t0:.1f}s')

# Prepare test features
df_te = pd.DataFrame(test_meta['images'])[['id','file_name','width','height']].copy()
F0_F1_te = df_te['file_name'].map(parse_F0_F1)
df_te['F0'] = [t[0] for t in F0_F1_te]
df_te['aspect'] = (df_te['width'] / df_te['height']).astype(float)
df_te['wb'] = pd.cut(df_te['width'], bins=width_bins, include_lowest=True).astype(str)
df_te['hb'] = pd.cut(df_te['height'], bins=height_bins, include_lowest=True).astype(str)
df_te['ab'] = pd.cut(df_te['aspect'], bins=aspect_bins, include_lowest=True).astype(str)

# Inference per F0
preds = np.full(len(df_te), global_mode, dtype=np.int64)
t1 = time.time()
for f0, g in df_te.groupby('F0', sort=False):
    idx = g.index.values
    m = models.get(f0, None)
    if m is None:
        continue
    classes = m['classes']  # (K,)
    K = classes.shape[0]
    logp = np.tile(m['log_prior'][None, :], (len(g), 1))  # (n,K)
    iw = g['wb'].map(wb2i).fillna(-1).astype(int).values
    ih = g['hb'].map(hb2i).fillna(-1).astype(int).values
    ia = g['ab'].map(ab2i).fillna(-1).astype(int).values
    # Add likelihood terms where bins are known
    sel = iw >= 0
    if sel.any():
        logp[sel] += m['logPwb'][iw[sel], :]
    sel = ih >= 0
    if sel.any():
        logp[sel] += m['logPhb'][ih[sel], :]
    sel = ia >= 0
    if sel.any():
        logp[sel] += 0.5 * m['logPab'][ia[sel], :]
    jj = np.argmax(logp, axis=1)
    preds[idx] = classes[jj]
    if len(models) >= 1 and (len(idx) >= 20000):
        print(f'F0 {f0}: predicted {len(idx)} rows', flush=True)
print(f'Inference done in {time.time()-t1:.1f}s')

# Build submission
sub = ss.copy()
id2pred = dict(zip(df_te['id'].astype(str), preds.tolist()))
sub['Predicted'] = sub['Id'].astype(str).map(id2pred).fillna(global_mode).astype(int)
sub.to_csv('submission.csv', index=False)
sub.to_csv('submission_nb_f0.csv', index=False)
print('Wrote submission.csv and submission_nb_f0.csv. Head:')
print(sub.head())

Global mode: 42811


Built NB for 50 F0 shards, elapsed 0.6s


Built NB for 100 F0 shards, elapsed 0.9s


Built NB for 150 F0 shards, elapsed 1.1s


Built NB for 200 F0 shards, elapsed 1.4s


Built NB for 250 F0 shards, elapsed 1.7s


Built NB for 300 F0 shards, elapsed 1.9s


Built NB for 350 F0 shards, elapsed 2.1s


Built NB for 400 F0 shards, elapsed 2.4s


Built NB for 450 F0 shards, elapsed 2.6s


Built NB for 500 F0 shards, elapsed 2.8s


Built NB for 550 F0 shards, elapsed 3.0s


Built NB for 600 F0 shards, elapsed 3.2s


Total F0 shards modeled: 645 in 3.4s


Inference done in 0.7s


Wrote submission.csv and submission_nb_f0.csv. Head:
   Id  Predicted
0   0         23
1   1         23
2   2         23
3   3         23
4   4         23


In [11]:
# Tuned per-F0 Naive Bayes (topK=500, prior alpha=0.5, likelihood alpha=1.5, wb=1.0, hb=1.0, ab=0.3)
import json, pandas as pd, numpy as np, time
from pathlib import Path

base = Path('.')
train_meta = json.load(open(base/'train'/'metadata.json','r'))
test_meta = json.load(open(base/'test'/'metadata.json','r'))
ss = pd.read_csv(base/'sample_submission.csv')

df_img = pd.DataFrame(train_meta['images'])[['file_name','width','height','id']]
df_anno = pd.DataFrame(train_meta['annotations'])[['image_id','category_id']]
df_tr = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')

def parse_F0_F1(s):
    parts = s.split('/')
    f0 = parts[1] if len(parts) > 1 else ''
    f1 = parts[2] if len(parts) > 2 else ''
    return f0, f1

F0_F1 = df_tr['file_name'].map(parse_F0_F1)
df_tr['F0'] = [t[0] for t in F0_F1]
df_tr['aspect'] = (df_tr['width'] / df_tr['height']).astype(float)
width_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
height_bins = [0, 600, 700, 800, 900, 1000, 1100, 2000]
aspect_bins = [0, 0.6, 0.75, 0.9, 1.0, 1.1, 1.3, 2.5]
df_tr['wb'] = pd.cut(df_tr['width'], bins=width_bins, include_lowest=True).astype(str)
df_tr['hb'] = pd.cut(df_tr['height'], bins=height_bins, include_lowest=True).astype(str)
df_tr['ab'] = pd.cut(df_tr['aspect'], bins=aspect_bins, include_lowest=True).astype(str)

wb_labels = pd.cut(pd.Series(width_bins[:-1]) + 1e-6, bins=width_bins, include_lowest=True).astype(str).unique().tolist()
hb_labels = pd.cut(pd.Series(height_bins[:-1]) + 1e-6, bins=height_bins, include_lowest=True).astype(str).unique().tolist()
ab_labels = pd.cut(pd.Series(aspect_bins[:-1]) + 1e-6, bins=aspect_bins, include_lowest=True).astype(str).unique().tolist()
wb2i = {l:i for i,l in enumerate(wb_labels)}
hb2i = {l:i for i,l in enumerate(hb_labels)}
ab2i = {l:i for i,l in enumerate(ab_labels)}
Bwb, Bhb, Bab = len(wb2i), len(hb2i), len(ab2i)

global_mode = int(df_tr['category_id'].mode().iloc[0])
print('Global mode:', global_mode)

alpha_prior = 0.5
alpha_like = 1.5
topK = 500
w_wb, w_hb, w_ab = 1.0, 1.0, 0.3

models = {}
t0 = time.time()
for f0, g in df_tr.groupby('F0', sort=False):
    cls_counts = g['category_id'].value_counts()
    classes = cls_counts.head(topK).index.values.astype(np.int64)
    counts = cls_counts.head(topK).values.astype(np.float64)
    C = len(classes)
    if C == 0:
        continue
    prior = (counts + alpha_prior) / (counts.sum() + alpha_prior * C)
    Cwb = np.full((Bwb, C), alpha_like, dtype=np.float64)
    Chb = np.full((Bhb, C), alpha_like, dtype=np.float64)
    Cab = np.full((Bab, C), alpha_like, dtype=np.float64)
    gi_wb = g['wb'].map(wb2i).fillna(-1).astype(int).values
    gi_hb = g['hb'].map(hb2i).fillna(-1).astype(int).values
    gi_ab = g['ab'].map(ab2i).fillna(-1).astype(int).values
    gi_cls = g['category_id'].values
    cls2pos = {c:i for i,c in enumerate(classes)}
    for idx_row in range(len(g)):
        j = cls2pos.get(gi_cls[idx_row], None)
        if j is None:
            continue
        iw, ih, ia = gi_wb[idx_row], gi_hb[idx_row], gi_ab[idx_row]
        if iw >= 0: Cwb[iw, j] += 1.0
        if ih >= 0: Chb[ih, j] += 1.0
        if ia >= 0: Cab[ia, j] += 1.0
    log_prior = np.log(prior + 1e-12)
    logPwb = np.log(Cwb / Cwb.sum(axis=0, keepdims=True))
    logPhb = np.log(Chb / Chb.sum(axis=0, keepdims=True))
    logPab = np.log(Cab / Cab.sum(axis=0, keepdims=True))
    models[f0] = dict(classes=classes, log_prior=log_prior, logPwb=logPwb, logPhb=logPhb, logPab=logPab)
    if len(models) % 50 == 0:
        print(f'Built tuned NB for {len(models)} F0 shards, elapsed {time.time()-t0:.1f}s', flush=True)
print(f'Total F0 shards modeled (tuned): {len(models)} in {time.time()-t0:.1f}s')

df_te = pd.DataFrame(test_meta['images'])[['id','file_name','width','height']].copy()
F0_F1_te = df_te['file_name'].map(parse_F0_F1)
df_te['F0'] = [t[0] for t in F0_F1_te]
df_te['aspect'] = (df_te['width'] / df_te['height']).astype(float)
df_te['wb'] = pd.cut(df_te['width'], bins=width_bins, include_lowest=True).astype(str)
df_te['hb'] = pd.cut(df_te['height'], bins=height_bins, include_lowest=True).astype(str)
df_te['ab'] = pd.cut(df_te['aspect'], bins=aspect_bins, include_lowest=True).astype(str)

preds = np.full(len(df_te), global_mode, dtype=np.int64)
t1 = time.time()
for f0, g in df_te.groupby('F0', sort=False):
    idx = g.index.values
    m = models.get(f0, None)
    if m is None:
        continue
    classes = m['classes']
    K = classes.shape[0]
    logp = np.tile(m['log_prior'][None, :], (len(g), 1))
    iw = g['wb'].map(wb2i).fillna(-1).astype(int).values
    ih = g['hb'].map(hb2i).fillna(-1).astype(int).values
    ia = g['ab'].map(ab2i).fillna(-1).astype(int).values
    sel = iw >= 0
    if sel.any():
        logp[sel] += w_wb * m['logPwb'][iw[sel], :]
    sel = ih >= 0
    if sel.any():
        logp[sel] += w_hb * m['logPhb'][ih[sel], :]
    sel = ia >= 0
    if sel.any():
        logp[sel] += w_ab * m['logPab'][ia[sel], :]
    jj = np.argmax(logp, axis=1)
    preds[idx] = classes[jj]
print(f'Tuned NB inference done in {time.time()-t1:.1f}s')

sub = ss.copy()
id2pred = dict(zip(df_te['id'].astype(str), preds.tolist()))
sub['Predicted'] = sub['Id'].astype(str).map(id2pred).fillna(global_mode).astype(int)
sub.to_csv('submission.csv', index=False)
sub.to_csv('submission_nb_f0_tuned.csv', index=False)
print('Wrote submission.csv and submission_nb_f0_tuned.csv. Head:')
print(sub.head())

Global mode: 42811


Built tuned NB for 50 F0 shards, elapsed 0.6s


Built tuned NB for 100 F0 shards, elapsed 0.9s


Built tuned NB for 150 F0 shards, elapsed 1.1s


Built tuned NB for 200 F0 shards, elapsed 1.4s


Built tuned NB for 250 F0 shards, elapsed 1.7s


Built tuned NB for 300 F0 shards, elapsed 1.9s


Built tuned NB for 350 F0 shards, elapsed 2.1s


Built tuned NB for 400 F0 shards, elapsed 2.4s


Built tuned NB for 450 F0 shards, elapsed 2.6s


Built tuned NB for 500 F0 shards, elapsed 2.8s




Built tuned NB for 550 F0 shards, elapsed 3.0s


Built tuned NB for 600 F0 shards, elapsed 3.2s


Total F0 shards modeled (tuned): 645 in 3.4s


Tuned NB inference done in 0.7s


Wrote submission.csv and submission_nb_f0_tuned.csv. Head:
   Id  Predicted
0   0         23
1   1         23
2   2         23
3   3         23
4   4         23


In [None]:
# Blend NB(tuned) with smoothed fallback using per-F0 class prior as tie-break
import json, pandas as pd, numpy as np
from pathlib import Path

base = Path('.')
train_meta = json.load(open(base/'train'/'metadata.json','r'))
test_meta = json.load(open(base/'test'/'metadata.json','r'))

# Load candidate submissions
sub_nb = pd.read_csv('submission_nb_f0_tuned.csv') if Path('submission_nb_f0_tuned.csv').exists() else pd.read_csv('submission_nb_f0.csv')
sub_sm = pd.read_csv('submission_f0_smoothed.csv') if Path('submission_f0_smoothed.csv').exists() else pd.read_csv('submission_f0_ext.csv')

# Build per-F0 class prior from train
df_img = pd.DataFrame(train_meta['images'])[['file_name','width','height','id']]
df_anno = pd.DataFrame(train_meta['annotations'])[['image_id','category_id']]
df_tr = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')
def parse_F0(s):
    parts = s.split('/')
    return parts[1] if len(parts) > 1 else ''
df_tr['F0'] = df_tr['file_name'].map(parse_F0)
prior_f0 = df_tr.groupby(['F0','category_id']).size().rename('cnt').reset_index()
prior_f0['key'] = list(zip(prior_f0['F0'], prior_f0['category_id']))
prior_map = prior_f0.set_index('key')['cnt'].to_dict()

# Test F0 per Id
df_te = pd.DataFrame(test_meta['images'])[['id','file_name']].copy()
df_te['F0'] = df_te['file_name'].map(parse_F0)
id2f0 = dict(zip(df_te['id'].astype(str), df_te['F0']))

# Align and blend
sub = sub_nb.merge(sub_sm, on='Id', how='left', suffixes=('_nb','_sm'))
sub['Predicted_sm'].fillna(sub['Predicted_nb'], inplace=True)
def choose_row(r):
    f0 = id2f0.get(str(r['Id']), '')
    c_nb = int(r['Predicted_nb'])
    c_sm = int(r['Predicted_sm'])
    if c_nb == c_sm:
        return c_nb
    cnt_nb = prior_map.get((f0, c_nb), 0)
    cnt_sm = prior_map.get((f0, c_sm), 0)
    # prefer higher prior within F0; tie -> NB
    return c_nb if cnt_nb >= cnt_sm else c_sm

sub['Predicted'] = sub.apply(choose_row, axis=1).astype(int)
out = sub[['Id','Predicted']].copy()
out.to_csv('submission.csv', index=False)
out.to_csv('submission_blend_nb_smoothed.csv', index=False)
print('Wrote submission.csv and submission_blend_nb_smoothed.csv. Head:')
print(out.head())

In [17]:
# Main CNN training run per expert plan (ConvNeXtV2-Base @384, steps-based, EMA, AMP, class-balanced sampler) - updates-based scheduler
import os, math, time, random, json, gc
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
from sklearn.metrics import f1_score
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
import torchvision.transforms as T
import timm
from timm.utils import ModelEmaV2
from timm.data.mixup import Mixup
from timm.loss import SoftTargetCrossEntropy

base = Path('.')
train_dir = base/'train'
test_dir = base/'test'

# Reuse df_train, df_test, idx2cid, cid2idx, num_classes from earlier cell if present; otherwise, build them
try:
    df_train
    df_test
    idx2cid
    cid2idx
    num_classes
except NameError:
    train_meta = json.load(open(train_dir/'metadata.json','r'))
    test_meta = json.load(open(test_dir/'metadata.json','r'))
    df_img = pd.DataFrame(train_meta['images'])[['id','file_name','width','height']]
    df_anno = pd.DataFrame(train_meta['annotations'])[['image_id','category_id']]
    df_train = df_anno.merge(df_img, left_on='image_id', right_on='id', how='left')
    df_train['path'] = df_train['file_name'].apply(lambda s: str(train_dir / s))
    df_test = pd.DataFrame(test_meta['images'])[['id','file_name','width','height']].copy()
    df_test['path'] = df_test['file_name'].apply(lambda s: str(test_dir / s))
    unique_cids = sorted(df_train['category_id'].unique())
    cid2idx = {c:i for i,c in enumerate(unique_cids)}
    idx2cid = np.array(unique_cids, dtype=np.int64)
    df_train['label'] = df_train['category_id'].map(cid2idx).astype(np.int64)
    num_classes = len(unique_cids)
print('Train rows:', len(df_train), 'Num classes:', num_classes)

def seed_everything(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = False
    torch.backends.cudnn.benchmark = True
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

class HerbariumDataset(Dataset):
    def __init__(self, df, mode='train', img_size=384):
        self.df = df.reset_index(drop=True)
        self.mode = mode
        self.img_size = img_size
        self.tf_train = T.Compose([
            T.RandomResizedCrop(img_size, scale=(0.7,1.0), ratio=(0.75,1.33)),
            T.RandomHorizontalFlip(p=0.5),
            T.ColorJitter(0.1,0.1,0.1,0.05),
            T.RandomRotation(degrees=15),
            T.ToTensor(),
            T.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
        ])
        self.tf_val = T.Compose([
            T.Resize(img_size, interpolation=T.InterpolationMode.BICUBIC),
            T.CenterCrop(img_size),
            T.ToTensor(),
            T.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
        ])
    def __len__(self):
        return len(self.df)
    def _load_image(self, path):
        try:
            if os.path.exists(path):
                with Image.open(path) as im:
                    return im.convert('RGB')
        except Exception:
            pass
        return Image.new('RGB', (self.img_size, self.img_size), (0,0,0))
    def __getitem__(self, idx):
        r = self.df.iloc[idx]
        img = self._load_image(r['path'])
        if self.mode == 'train':
            img = self.tf_train(img)
            return img, int(r['label'])
        elif self.mode == 'val':
            img = self.tf_val(img)
            return img, int(r['label'])
        else:
            img = self.tf_val(img)
            return img, str(r['id'])

def make_sampler(labels, power=0.5):
    vals, counts = np.unique(labels, return_counts=True)
    max_label = int(labels.max()) if len(labels)>0 else 0
    freq = np.zeros(max_label+1, dtype=np.float64)
    freq[vals] = counts
    w = 1.0 / np.clip(freq, 1, None)**power
    weights = w[labels]
    return WeightedRandomSampler(weights=torch.as_tensor(weights, dtype=torch.float32), num_samples=len(labels), replacement=True)

def macro_f1_from_logits(logits, y_true):
    y_pred = logits.argmax(1)
    return f1_score(y_true, y_pred, average='macro', zero_division=0)

def top1_acc_from_logits(logits, y_true):
    with torch.no_grad():
        pred = logits.argmax(1)
        return (pred == y_true).float().mean().item()

def make_val_split_min1_train(df, val_frac=0.05, seed=42):
    rng = np.random.default_rng(seed)
    by_class = df.groupby('label').indices
    val_indices = []
    p = val_frac
    for lbl, idxs in by_class.items():
        idxs = np.array(list(idxs))
        n = idxs.size
        if n <= 1:
            continue
        if rng.random() < p:
            choice = int(rng.choice(idxs))
            val_indices.append(choice)
    target_val = int(len(df) * val_frac)
    if len(val_indices) < target_val:
        need = target_val - len(val_indices)
        candidates = []
        for lbl, idxs in by_class.items():
            idxs = np.array(list(idxs))
            if idxs.size >= 3:
                candidates.append(int(idxs[0]))
        if candidates:
            extra = rng.choice(candidates, size=min(need, len(candidates)), replace=False)
            val_indices.extend(list(map(int, extra)))
    val_set = set(val_indices)
    va_idx = np.array([i for i in range(len(df)) if i in val_set], dtype=np.int64)
    tr_idx = np.array([i for i in range(len(df)) if i not in val_set], dtype=np.int64)
    tr_labels = set(df.iloc[tr_idx]['label'].unique().tolist())
    for lbl, idxs in by_class.items():
        if lbl not in tr_labels:
            idxs = list(idxs)
            moved = False
            for j in idxs:
                if j in val_set:
                    val_set.remove(j)
                    moved = True
                    break
            if moved:
                tr_labels.add(lbl)
    va_idx = np.array(sorted(list(val_set)), dtype=np.int64)
    tr_idx = np.array([i for i in range(len(df)) if i not in val_set], dtype=np.int64)
    return tr_idx, va_idx

def freeze_backbone_unfreeze_head(model):
    for n, p in model.named_parameters():
        p.requires_grad = ('head' in n)

def unfreeze_all(model):
    for p in model.parameters():
        p.requires_grad = True

def train_main(
    backbone='convnextv2_base',
    img_size=384,
    batch_size=32,
    eff_batch=128,
    updates_total=8_000,
    warmup_updates=300,
    lr_base=3e-4,
    weight_decay=0.02,
    seed=42,
    mixup_alpha=0.2,
    cutmix_alpha=0.2,
    mix_prob=0.8,
    val_frac=0.05,
    ckpt_dir='ckpts_main',
    head_warmup_updates=600,
    lr_head=3e-3
):
    seed_everything(seed)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    os.makedirs(ckpt_dir, exist_ok=True)

    # Split
    tr_idx, va_idx = make_val_split_min1_train(df_train, val_frac=val_frac, seed=seed)
    dtr = df_train.iloc[tr_idx].reset_index(drop=True)
    dva = df_train.iloc[va_idx].reset_index(drop=True)
    print(f'Train/Val sizes: {len(dtr)}/{len(dva)} | classes in train: {dtr.label.nunique()}')

    ds_tr = HerbariumDataset(dtr, mode='train', img_size=img_size)
    ds_va = HerbariumDataset(dva, mode='val', img_size=img_size)
    sampler = make_sampler(dtr['label'].values, power=0.5)
    num_workers = 8
    dl_tr = DataLoader(ds_tr, batch_size=batch_size, sampler=sampler, pin_memory=True, num_workers=num_workers, persistent_workers=True, prefetch_factor=2)
    dl_va = DataLoader(ds_va, batch_size=max(64, batch_size), shuffle=False, pin_memory=True, num_workers=num_workers, persistent_workers=True, prefetch_factor=2)

    model = timm.create_model(backbone, pretrained=True, num_classes=num_classes)
    if hasattr(model, 'set_grad_checkpointing'):
        try:
            model.set_grad_checkpointing(True)
        except Exception:
            pass
    model.to(device)
    model.to(memory_format=torch.channels_last)

    ema = ModelEmaV2(model, decay=0.999, device=device)

    accum_steps = max(1, eff_batch // batch_size)
    scaled_lr = lr_base * ((batch_size * accum_steps) / 256.0)

    # Optimizers: start with head-only
    freeze_backbone_unfreeze_head(model)
    head_params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.AdamW(head_params, lr=lr_head, weight_decay=weight_decay)

    def cosine_lr(u):
        if u < warmup_updates:
            return (u + 1) / max(1, warmup_updates)
        t = (u - warmup_updates) / max(1, updates_total - warmup_updates)
        t = min(1.0, max(0.0, t))
        return 0.5 * (1 + math.cos(math.pi * t))

    mixup_fn = Mixup(mixup_alpha=mixup_alpha, cutmix_alpha=cutmix_alpha, prob=mix_prob, switch_prob=0.5, mode='batch', label_smoothing=0.1, num_classes=num_classes)
    criterion_soft = SoftTargetCrossEntropy()
    criterion_hard = nn.CrossEntropyLoss(label_smoothing=0.1)
    criterion_val = nn.CrossEntropyLoss(reduction='mean')

    scaler = torch.amp.GradScaler('cuda', enabled=torch.cuda.is_available())

    best_f1 = -1.0
    best_path = None
    micro_step = 0
    global_step = 0
    running_loss = 0.0
    samples_seen = 0
    last_train_top1 = 0.0
    t0 = time.time()

    model.train()
    while global_step < updates_total:
        for it, (x, y) in enumerate(dl_tr):
            if global_step >= updates_total:
                break

            # Unfreeze and switch optimizer after head warmup
            if global_step == head_warmup_updates:
                unfreeze_all(model)
                optimizer = torch.optim.AdamW(model.parameters(), lr=scaled_lr, weight_decay=weight_decay)

                # Reset EMA to keep tracking post-unfreeze weights smoothly
                # Note: EMA state continues; no reset of ema needed.

            x = x.to(device, non_blocking=True).to(memory_format=torch.channels_last)
            y = y.to(device, non_blocking=True)

            # LR policy: head phase uses fixed lr_head; after unfreeze use cosine schedule
            if global_step >= head_warmup_updates:
                cur_lr = scaled_lr * cosine_lr(global_step)
                for pg in optimizer.param_groups:
                    pg['lr'] = cur_lr
            else:
                for pg in optimizer.param_groups:
                    pg['lr'] = lr_head

            # Mixup OFF during head-only phase and during last ~1500 updates
            use_mix = (mixup_fn is not None) and (global_step >= head_warmup_updates) and (global_step < max(0, updates_total - 1_500))
            if use_mix:
                x, y_mix = mixup_fn(x, y)

            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                logits = model(x)
                loss = criterion_soft(logits, y_mix) if use_mix else criterion_hard(logits, y)

            # Train mini-batch top-1 (sanity); compute against hard labels
            last_train_top1 = top1_acc_from_logits(logits.detach(), y)

            loss = loss / accum_steps
            scaler.scale(loss).backward()
            micro_step += 1

            if (micro_step % accum_steps) == 0:
                scaler.step(optimizer)
                scaler.update()
                optimizer.zero_grad(set_to_none=True)
                ema.update(model)
                global_step += 1

            running_loss += loss.item() * x.size(0) * accum_steps
            samples_seen += x.size(0)

            if (micro_step % 200) == 0:
                elapsed = time.time() - t0
                avg_loss = running_loss / max(1, samples_seen)
                cur_lr_print = optimizer.param_groups[0]['lr']
                print(f'update {global_step}/{updates_total} | micro {micro_step} | avg_loss {avg_loss:.4f} | lr {cur_lr_print:.2e} | train_top1 {last_train_top1*100:.2f}% | elapsed {elapsed/60:.1f}m', flush=True)

            # Validation cadence: every 500 updates until 2k, then every 1k
            need_val = False
            if global_step > 0 and global_step < 2000 and (global_step % 500 == 0):
                need_val = True
            elif global_step >= 2000 and (global_step % 1000 == 0):
                need_val = True
            if need_val or global_step >= updates_total:
                val_f1, val_top1, val_loss = evaluate(model, ema, dl_va, device, criterion_val)
                is_best = val_f1 > best_f1
                best_f1 = max(best_f1, val_f1)
                ckpt_path = Path(ckpt_dir)/f'model_upd{global_step}_f1{val_f1:.5f}.pt'
                save_checkpoint(model, ema, optimizer, global_step, best_f1, ckpt_path)
                if is_best:
                    best_path = ckpt_path
                print(f'Validation @update {global_step}: macro-F1={val_f1:.5f} | top1={val_top1*100:.2f}% | loss={val_loss:.4f} | best={best_f1:.5f} | saved={ckpt_path.name}', flush=True)

    print('Training done. Best ckpt:', best_path)
    return dict(best_ckpt=str(best_path) if best_path else None, best_f1=best_f1)

def evaluate(model, ema, dl_va, device, criterion_val):
    # Use EMA model directly, compute F1, top1, and CE loss
    ema_model = ema.module
    was_training = ema_model.training
    ema_model.eval()
    y_preds = []
    y_trues = []
    running_loss = 0.0
    n_items = 0
    with torch.no_grad():
        for x, y in dl_va:
            x = x.to(device, non_blocking=True).to(memory_format=torch.channels_last)
            y = y.to(device, non_blocking=True)
            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                logits = ema_model(x)
                loss = criterion_val(logits, y)
            running_loss += loss.item() * x.size(0)
            n_items += x.size(0)
            y_preds.append(torch.argmax(logits, dim=1).detach().cpu().numpy())
            y_trues.append(y.detach().cpu().numpy())
    y_pred = np.concatenate(y_preds) if y_preds else np.array([], dtype=np.int64)
    y_true = np.concatenate(y_trues) if y_trues else np.array([], dtype=np.int64)
    f1 = f1_score(y_true, y_pred, average='macro', zero_division=0) if len(y_true) else 0.0
    top1 = (y_pred == y_true).mean() if len(y_true) else 0.0
    val_loss = running_loss / max(1, n_items)
    if was_training:
        ema_model.train()
    return float(f1), float(top1), float(val_loss)

def save_checkpoint(model, ema, optimizer, step, best_f1, path):
    state = {
        'model': model.state_dict(),
        'ema': ema.state_dict(),
        'optimizer': optimizer.state_dict(),
        'step': step,
        'best_f1': best_f1,
    }
    torch.save(state, path)

print('Launching updates-based training with head-only warmup: convnextv2_base @384 | eff_batch=128 | updates_total=8k ...')
train_summary = train_main(
    backbone='convnextv2_base',
    img_size=384,
    batch_size=32,
    eff_batch=128,
    updates_total=8_000,
    warmup_updates=300,
    lr_base=3e-4,
    weight_decay=0.02,
    seed=42,
    mixup_alpha=0.2,
    cutmix_alpha=0.2,
    mix_prob=0.8,
    val_frac=0.05,
    ckpt_dir='ckpts_main',
    head_warmup_updates=600,
    lr_head=3e-3
)
print('Train summary:', train_summary)

Train rows: 1779953 Num classes: 64500
Launching updates-based training with head-only warmup: convnextv2_base @384 | eff_batch=128 | updates_total=8k ...


Train/Val sizes: 1726076/53877 | classes in train: 64500


INFO:timm.models._builder:Loading pretrained weights from Hugging Face hub (timm/convnextv2_base.fcmae_ft_in22k_in1k)


INFO:timm.models._hub:[timm/convnextv2_base.fcmae_ft_in22k_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.


INFO:timm.models._builder:Missing keys (head.fc.weight, head.fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.




update 50/8000 | micro 200 | avg_loss 12.9173 | lr 3.00e-03 | train_top1 0.00% | elapsed 0.8m


update 100/8000 | micro 400 | avg_loss 12.4533 | lr 3.00e-03 | train_top1 6.25% | elapsed 1.6m


update 150/8000 | micro 600 | avg_loss 12.0611 | lr 3.00e-03 | train_top1 0.00% | elapsed 2.4m


update 200/8000 | micro 800 | avg_loss 11.7077 | lr 3.00e-03 | train_top1 0.00% | elapsed 3.3m


update 250/8000 | micro 1000 | avg_loss 11.4349 | lr 3.00e-03 | train_top1 6.25% | elapsed 4.1m


update 300/8000 | micro 1200 | avg_loss 11.2187 | lr 3.00e-03 | train_top1 0.00% | elapsed 4.9m


update 350/8000 | micro 1400 | avg_loss 11.0438 | lr 3.00e-03 | train_top1 0.00% | elapsed 5.7m


update 400/8000 | micro 1600 | avg_loss 10.8962 | lr 3.00e-03 | train_top1 3.12% | elapsed 6.5m


update 450/8000 | micro 1800 | avg_loss 10.7662 | lr 3.00e-03 | train_top1 3.12% | elapsed 7.3m


update 500/8000 | micro 2000 | avg_loss 10.6565 | lr 3.00e-03 | train_top1 3.12% | elapsed 8.2m


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 500: macro-F1=0.00132 | top1=0.48% | loss=10.9631 | best=0.00132 | saved=model_upd500_f10.00132.pt


  return fn(*args, **kwargs)


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 500: macro-F1=0.00132 | top1=0.48% | loss=10.9631 | best=0.00132 | saved=model_upd500_f10.00132.pt


  return fn(*args, **kwargs)


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 500: macro-F1=0.00132 | top1=0.48% | loss=10.9631 | best=0.00132 | saved=model_upd500_f10.00132.pt


  return fn(*args, **kwargs)


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 500: macro-F1=0.00132 | top1=0.48% | loss=10.9631 | best=0.00132 | saved=model_upd500_f10.00132.pt


  return fn(*args, **kwargs)


update 550/8000 | micro 2200 | avg_loss 10.5535 | lr 3.00e-03 | train_top1 0.00% | elapsed 35.4m


update 600/8000 | micro 2400 | avg_loss 10.4588 | lr 3.00e-03 | train_top1 9.38% | elapsed 36.2m


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 650/8000 | micro 2600 | avg_loss 10.3896 | lr 1.49e-04 | train_top1 6.25% | elapsed 39.9m


update 700/8000 | micro 2800 | avg_loss 10.3086 | lr 1.49e-04 | train_top1 0.00% | elapsed 43.6m


update 750/8000 | micro 3000 | avg_loss 10.2387 | lr 1.49e-04 | train_top1 0.00% | elapsed 47.3m


update 800/8000 | micro 3200 | avg_loss 10.1705 | lr 1.48e-04 | train_top1 0.00% | elapsed 51.1m


update 850/8000 | micro 3400 | avg_loss 10.1133 | lr 1.48e-04 | train_top1 9.38% | elapsed 54.8m


update 900/8000 | micro 3600 | avg_loss 10.0528 | lr 1.48e-04 | train_top1 3.12% | elapsed 58.5m


update 950/8000 | micro 3800 | avg_loss 10.0004 | lr 1.47e-04 | train_top1 0.00% | elapsed 62.2m


update 1000/8000 | micro 4000 | avg_loss 9.9518 | lr 1.47e-04 | train_top1 6.25% | elapsed 65.9m


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 1000: macro-F1=0.00612 | top1=1.66% | loss=10.1875 | best=0.00612 | saved=model_upd1000_f10.00612.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 1000: macro-F1=0.00612 | top1=1.66% | loss=10.1875 | best=0.00612 | saved=model_upd1000_f10.00612.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 1000: macro-F1=0.00612 | top1=1.66% | loss=10.1875 | best=0.00612 | saved=model_upd1000_f10.00612.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 1000: macro-F1=0.00612 | top1=1.66% | loss=10.1875 | best=0.00612 | saved=model_upd1000_f10.00612.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 1050/8000 | micro 4200 | avg_loss 9.9080 | lr 1.47e-04 | train_top1 9.38% | elapsed 96.2m


update 1100/8000 | micro 4400 | avg_loss 9.8630 | lr 1.46e-04 | train_top1 0.00% | elapsed 99.9m


update 1150/8000 | micro 4600 | avg_loss 9.8216 | lr 1.46e-04 | train_top1 12.50% | elapsed 103.6m


update 1200/8000 | micro 4800 | avg_loss 9.7828 | lr 1.45e-04 | train_top1 9.38% | elapsed 107.3m


update 1250/8000 | micro 5000 | avg_loss 9.7440 | lr 1.44e-04 | train_top1 3.12% | elapsed 111.0m


update 1300/8000 | micro 5200 | avg_loss 9.7068 | lr 1.44e-04 | train_top1 15.62% | elapsed 114.7m


update 1350/8000 | micro 5400 | avg_loss 9.6664 | lr 1.43e-04 | train_top1 9.38% | elapsed 118.4m


update 1400/8000 | micro 5600 | avg_loss 9.6344 | lr 1.43e-04 | train_top1 9.38% | elapsed 122.1m


update 1450/8000 | micro 5800 | avg_loss 9.5995 | lr 1.42e-04 | train_top1 0.00% | elapsed 125.8m


update 1500/8000 | micro 6000 | avg_loss 9.5683 | lr 1.41e-04 | train_top1 0.00% | elapsed 129.5m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 1500: macro-F1=0.01169 | top1=3.04% | loss=9.5558 | best=0.01169 | saved=model_upd1500_f10.01169.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 1500: macro-F1=0.01169 | top1=3.04% | loss=9.5558 | best=0.01169 | saved=model_upd1500_f10.01169.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 1500: macro-F1=0.01169 | top1=3.04% | loss=9.5558 | best=0.01169 | saved=model_upd1500_f10.01169.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 1500: macro-F1=0.01169 | top1=3.04% | loss=9.5558 | best=0.01169 | saved=model_upd1500_f10.01169.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 1550/8000 | micro 6200 | avg_loss 9.5353 | lr 1.40e-04 | train_top1 18.75% | elapsed 159.7m


update 1600/8000 | micro 6400 | avg_loss 9.5044 | lr 1.40e-04 | train_top1 0.00% | elapsed 163.4m


update 1650/8000 | micro 6600 | avg_loss 9.4724 | lr 1.39e-04 | train_top1 15.62% | elapsed 167.1m


update 1700/8000 | micro 6800 | avg_loss 9.4426 | lr 1.38e-04 | train_top1 0.00% | elapsed 170.9m


update 1750/8000 | micro 7000 | avg_loss 9.4134 | lr 1.37e-04 | train_top1 6.25% | elapsed 174.5m


update 1800/8000 | micro 7200 | avg_loss 9.3826 | lr 1.36e-04 | train_top1 0.00% | elapsed 178.3m


update 1850/8000 | micro 7400 | avg_loss 9.3544 | lr 1.36e-04 | train_top1 18.75% | elapsed 182.0m


update 1900/8000 | micro 7600 | avg_loss 9.3272 | lr 1.35e-04 | train_top1 9.38% | elapsed 185.7m


update 1950/8000 | micro 7800 | avg_loss 9.2960 | lr 1.34e-04 | train_top1 0.00% | elapsed 189.4m


update 2000/8000 | micro 8000 | avg_loss 9.2693 | lr 1.33e-04 | train_top1 9.38% | elapsed 193.1m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 2000: macro-F1=0.01775 | top1=4.53% | loss=9.1338 | best=0.01775 | saved=model_upd2000_f10.01775.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 2000: macro-F1=0.01775 | top1=4.53% | loss=9.1338 | best=0.01775 | saved=model_upd2000_f10.01775.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 2000: macro-F1=0.01775 | top1=4.53% | loss=9.1338 | best=0.01775 | saved=model_upd2000_f10.01775.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 2000: macro-F1=0.01775 | top1=4.53% | loss=9.1338 | best=0.01775 | saved=model_upd2000_f10.01775.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 2050/8000 | micro 8200 | avg_loss 9.2415 | lr 1.32e-04 | train_top1 3.12% | elapsed 223.3m


update 2100/8000 | micro 8400 | avg_loss 9.2144 | lr 1.31e-04 | train_top1 12.50% | elapsed 227.0m


update 2150/8000 | micro 8600 | avg_loss 9.1859 | lr 1.30e-04 | train_top1 15.62% | elapsed 230.7m


update 2200/8000 | micro 8800 | avg_loss 9.1601 | lr 1.29e-04 | train_top1 0.00% | elapsed 234.4m


update 2250/8000 | micro 9000 | avg_loss 9.1343 | lr 1.28e-04 | train_top1 15.62% | elapsed 238.1m


update 2300/8000 | micro 9200 | avg_loss 9.1094 | lr 1.26e-04 | train_top1 0.00% | elapsed 241.8m


update 2350/8000 | micro 9400 | avg_loss 9.0836 | lr 1.25e-04 | train_top1 25.00% | elapsed 245.5m


update 2400/8000 | micro 9600 | avg_loss 9.0590 | lr 1.24e-04 | train_top1 18.75% | elapsed 249.2m


update 2450/8000 | micro 9800 | avg_loss 9.0342 | lr 1.23e-04 | train_top1 25.00% | elapsed 252.9m


update 2500/8000 | micro 10000 | avg_loss 9.0075 | lr 1.22e-04 | train_top1 12.50% | elapsed 256.6m


update 2550/8000 | micro 10200 | avg_loss 8.9843 | lr 1.21e-04 | train_top1 6.25% | elapsed 260.3m


update 2600/8000 | micro 10400 | avg_loss 8.9610 | lr 1.19e-04 | train_top1 21.88% | elapsed 264.0m


update 2650/8000 | micro 10600 | avg_loss 8.9359 | lr 1.18e-04 | train_top1 15.62% | elapsed 267.7m


update 2700/8000 | micro 10800 | avg_loss 8.9130 | lr 1.17e-04 | train_top1 15.62% | elapsed 271.4m


update 2750/8000 | micro 11000 | avg_loss 8.8886 | lr 1.16e-04 | train_top1 3.12% | elapsed 275.1m


update 2800/8000 | micro 11200 | avg_loss 8.8640 | lr 1.14e-04 | train_top1 12.50% | elapsed 278.8m


update 2850/8000 | micro 11400 | avg_loss 8.8406 | lr 1.13e-04 | train_top1 18.75% | elapsed 282.5m


update 2900/8000 | micro 11600 | avg_loss 8.8161 | lr 1.12e-04 | train_top1 0.00% | elapsed 286.2m


update 2950/8000 | micro 11800 | avg_loss 8.7916 | lr 1.10e-04 | train_top1 12.50% | elapsed 289.9m


update 3000/8000 | micro 12000 | avg_loss 8.7681 | lr 1.09e-04 | train_top1 3.12% | elapsed 293.6m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 3000: macro-F1=0.03439 | top1=7.88% | loss=8.4459 | best=0.03439 | saved=model_upd3000_f10.03439.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 3000: macro-F1=0.03439 | top1=7.88% | loss=8.4459 | best=0.03439 | saved=model_upd3000_f10.03439.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 3000: macro-F1=0.03439 | top1=7.88% | loss=8.4459 | best=0.03439 | saved=model_upd3000_f10.03439.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 3000: macro-F1=0.03439 | top1=7.88% | loss=8.4459 | best=0.03439 | saved=model_upd3000_f10.03439.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 3050/8000 | micro 12200 | avg_loss 8.7439 | lr 1.08e-04 | train_top1 21.88% | elapsed 323.9m


update 3100/8000 | micro 12400 | avg_loss 8.7223 | lr 1.06e-04 | train_top1 9.38% | elapsed 327.6m


update 3150/8000 | micro 12600 | avg_loss 8.6994 | lr 1.05e-04 | train_top1 34.38% | elapsed 331.3m


update 3200/8000 | micro 12800 | avg_loss 8.6766 | lr 1.03e-04 | train_top1 25.00% | elapsed 335.0m


update 3250/8000 | micro 13000 | avg_loss 8.6538 | lr 1.02e-04 | train_top1 0.00% | elapsed 338.7m


update 3300/8000 | micro 13200 | avg_loss 8.6331 | lr 1.01e-04 | train_top1 0.00% | elapsed 342.4m


update 3350/8000 | micro 13400 | avg_loss 8.6123 | lr 9.91e-05 | train_top1 18.75% | elapsed 346.1m


update 3400/8000 | micro 13600 | avg_loss 8.5919 | lr 9.76e-05 | train_top1 21.88% | elapsed 349.8m


update 3450/8000 | micro 13800 | avg_loss 8.5698 | lr 9.62e-05 | train_top1 12.50% | elapsed 353.5m


update 3500/8000 | micro 14000 | avg_loss 8.5483 | lr 9.47e-05 | train_top1 0.00% | elapsed 357.2m


update 3550/8000 | micro 14200 | avg_loss 8.5275 | lr 9.32e-05 | train_top1 25.00% | elapsed 360.9m


update 3600/8000 | micro 14400 | avg_loss 8.5062 | lr 9.17e-05 | train_top1 34.38% | elapsed 364.6m


update 3650/8000 | micro 14600 | avg_loss 8.4857 | lr 9.02e-05 | train_top1 21.88% | elapsed 368.3m


update 3700/8000 | micro 14800 | avg_loss 8.4632 | lr 8.87e-05 | train_top1 18.75% | elapsed 372.0m


update 3750/8000 | micro 15000 | avg_loss 8.4435 | lr 8.72e-05 | train_top1 12.50% | elapsed 375.7m


update 3800/8000 | micro 15200 | avg_loss 8.4226 | lr 8.57e-05 | train_top1 25.00% | elapsed 379.4m


update 3850/8000 | micro 15400 | avg_loss 8.4015 | lr 8.42e-05 | train_top1 21.88% | elapsed 383.1m


update 3900/8000 | micro 15600 | avg_loss 8.3813 | lr 8.27e-05 | train_top1 25.00% | elapsed 386.8m


update 3950/8000 | micro 15800 | avg_loss 8.3615 | lr 8.11e-05 | train_top1 25.00% | elapsed 390.5m


update 4000/8000 | micro 16000 | avg_loss 8.3425 | lr 7.96e-05 | train_top1 31.25% | elapsed 394.2m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 4000: macro-F1=0.05524 | top1=11.27% | loss=7.7639 | best=0.05524 | saved=model_upd4000_f10.05524.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 4000: macro-F1=0.05524 | top1=11.27% | loss=7.7639 | best=0.05524 | saved=model_upd4000_f10.05524.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 4000: macro-F1=0.05524 | top1=11.27% | loss=7.7639 | best=0.05524 | saved=model_upd4000_f10.05524.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 4000: macro-F1=0.05524 | top1=11.27% | loss=7.7639 | best=0.05524 | saved=model_upd4000_f10.05524.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 4050/8000 | micro 16200 | avg_loss 8.3230 | lr 7.81e-05 | train_top1 37.50% | elapsed 424.5m


update 4100/8000 | micro 16400 | avg_loss 8.3037 | lr 7.66e-05 | train_top1 25.00% | elapsed 428.2m


update 4150/8000 | micro 16600 | avg_loss 8.2855 | lr 7.50e-05 | train_top1 3.12% | elapsed 431.9m


update 4200/8000 | micro 16800 | avg_loss 8.2684 | lr 7.35e-05 | train_top1 18.75% | elapsed 435.6m


update 4250/8000 | micro 17000 | avg_loss 8.2495 | lr 7.20e-05 | train_top1 0.00% | elapsed 439.3m


update 4300/8000 | micro 17200 | avg_loss 8.2306 | lr 7.04e-05 | train_top1 0.00% | elapsed 443.0m


update 4350/8000 | micro 17400 | avg_loss 8.2130 | lr 6.89e-05 | train_top1 0.00% | elapsed 446.7m


update 4400/8000 | micro 17600 | avg_loss 8.1958 | lr 6.74e-05 | train_top1 18.75% | elapsed 450.4m


update 4450/8000 | micro 17800 | avg_loss 8.1775 | lr 6.59e-05 | train_top1 18.75% | elapsed 454.1m


update 4500/8000 | micro 18000 | avg_loss 8.1590 | lr 6.44e-05 | train_top1 12.50% | elapsed 457.8m


update 4550/8000 | micro 18200 | avg_loss 8.1409 | lr 6.28e-05 | train_top1 0.00% | elapsed 461.5m


update 4600/8000 | micro 18400 | avg_loss 8.1236 | lr 6.13e-05 | train_top1 0.00% | elapsed 465.2m


update 4650/8000 | micro 18600 | avg_loss 8.1042 | lr 5.98e-05 | train_top1 0.00% | elapsed 468.9m


update 4700/8000 | micro 18800 | avg_loss 8.0856 | lr 5.83e-05 | train_top1 43.75% | elapsed 472.6m


update 4750/8000 | micro 19000 | avg_loss 8.0685 | lr 5.69e-05 | train_top1 40.62% | elapsed 476.3m


update 4800/8000 | micro 19200 | avg_loss 8.0518 | lr 5.54e-05 | train_top1 18.75% | elapsed 480.0m


update 4850/8000 | micro 19400 | avg_loss 8.0347 | lr 5.39e-05 | train_top1 31.25% | elapsed 483.7m


update 4900/8000 | micro 19600 | avg_loss 8.0169 | lr 5.24e-05 | train_top1 31.25% | elapsed 487.4m


update 4950/8000 | micro 19800 | avg_loss 7.9999 | lr 5.10e-05 | train_top1 6.25% | elapsed 491.1m


update 5000/8000 | micro 20000 | avg_loss 7.9832 | lr 4.95e-05 | train_top1 0.00% | elapsed 494.8m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 5000: macro-F1=0.07740 | top1=14.51% | loss=7.1650 | best=0.07740 | saved=model_upd5000_f10.07740.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 5000: macro-F1=0.07740 | top1=14.51% | loss=7.1650 | best=0.07740 | saved=model_upd5000_f10.07740.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




In [26]:
# Inference with optional 2x TTA (orig + hflip) and dual-tau outputs; adds optional per-F0 masking and per-F0 prior adjustment
import os, re, math, time, json
from pathlib import Path
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
import torchvision.transforms.functional as TF
import timm

def find_best_ckpt(ckpt_dir='ckpts_main'):
    ckpt_dir = Path(ckpt_dir)
    files = sorted(ckpt_dir.glob('model_upd*_f1*.pt'))
    if not files:
        print('No checkpoints found in', ckpt_dir)
        return None
    def parse_f1(p):
        m = re.search(r'_f1([0-9]+\.[0-9]+)\.pt$', p.name)
        return float(m.group(1)) if m else -1.0
    files = sorted(files, key=parse_f1, reverse=True)
    best = files[0]
    print('Selected best ckpt by filename f1:', best.name)
    return best

def load_model_from_ckpt(ckpt_path, backbone='convnextv2_base', num_classes=64500, device='cuda'):
    model = timm.create_model(backbone, pretrained=False, num_classes=num_classes)
    state = torch.load(ckpt_path, map_location='cpu')
    # Load base model weights first (non-strict to allow head adaptation)
    model.load_state_dict(state.get('model', {}), strict=False)
    # If EMA is present, load EMA and then copy its weights into the model
    try:
        if 'ema' in state:
            from timm.utils import ModelEmaV2
            ema = ModelEmaV2(model, decay=0.999)
            ema.load_state_dict(state['ema'], strict=False)
            model.load_state_dict(ema.module.state_dict(), strict=True)
            print('Loaded EMA weights into model')
    except Exception as e:
        print('EMA load skipped:', e)
    model.to(device)
    model.to(memory_format=torch.channels_last)
    model.eval()
    return model

def class_log_prior(labels, num_classes):
    counts = np.bincount(labels.astype(int), minlength=num_classes).astype(np.float64)
    return np.log(counts + 1.0)

def build_f0_maps(df_train, num_classes):
    # Parse F0 from file_name
    def parse_f0(s):
        parts = str(s).split('/')
        return parts[1] if len(parts) > 1 else ''
    f0 = df_train['file_name'].map(parse_f0)
    labels = df_train['label'].astype(int).values
    df_tmp = pd.DataFrame({'f0': f0, 'label': labels})
    # Per-F0 allowed classes
    mask_f0 = {}  # f0 -> (num_classes,) tensor with 0 for allowed and -inf for disallowed
    prior_f0 = {} # f0 -> (num_classes,) tensor of log prior
    for key, g in df_tmp.groupby('f0', sort=False):
        counts = np.bincount(g['label'].values, minlength=num_classes).astype(np.float64)
        allowed = (counts > 0).astype(np.float32)
        m = torch.full((num_classes,), -1e9, dtype=torch.float32)
        m[torch.from_numpy(allowed.astype(bool))] = 0.0
        mask_f0[key] = m
        prior = np.log(counts + 1.0)
        prior_f0[key] = torch.from_numpy(prior.astype(np.float32))
    return mask_f0, prior_f0

def tta2_logits(model, x):
    # 2x TTA: original + horizontal flip
    logits_list = []
    logits_list.append(model(x))
    logits_list.append(model(torch.flip(x, dims=[3])))
    return sum(logits_list) / len(logits_list)

def infer_test(
    df_test, idx2cid,
    backbone='convnextv2_base', img_size=384, batch_size=128, num_workers=6, ckpt_dir='ckpts_main',
    use_tta=False, tau_list=(None, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8), out_prefix='submission_cnn',
    use_f0_mask=False, use_per_f0_prior=False
):
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    ckpt = find_best_ckpt(ckpt_dir)
    assert ckpt is not None, 'No checkpoint to load'
    num_classes = len(idx2cid)
    model = load_model_from_ckpt(ckpt, backbone=backbone, num_classes=num_classes, device=device)

    # Optional F0 maps
    mask_f0 = None
    prior_f0 = None
    if use_f0_mask or use_per_f0_prior:
        mask_f0, prior_f0 = build_f0_maps(df_train, num_classes)

    # id -> f0 map from df_test
    def parse_f0_from_path(s):
        parts = str(s).split('/')
        return parts[1] if len(parts) > 1 else ''
    id2f0 = dict(zip(df_test['id'].astype(str), df_test['file_name'].map(parse_f0_from_path)))

    ds = HerbariumDataset(df_test, mode='test', img_size=img_size)
    dl = DataLoader(ds, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True, persistent_workers=False, prefetch_factor=2)

    # Global adjustment once (fallback when not per-F0)
    base_adj_global = class_log_prior(df_train['label'].values, num_classes)
    base_adj_global = torch.from_numpy(base_adj_global).to(device=device, dtype=torch.float32)

    id_list = []
    pred_buffers = {tau: [] for tau in tau_list}

    t0 = time.time()
    with torch.no_grad():
        for bi, (x, ids) in enumerate(dl):
            x = x.to(device, non_blocking=True).to(memory_format=torch.channels_last)
            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                logits = tta2_logits(model, x) if use_tta else model(x)
            # Per-sample F0 mask/prior stacks if enabled
            if use_f0_mask or use_per_f0_prior:
                f0_list = [id2f0.get(str(i), '') for i in ids]
            # Apply F0 mask (add large negative to disallowed classes)
            if use_f0_mask:
                m_list = [mask_f0.get(f0, None) for f0 in f0_list]
                m_stack = torch.stack([m if m is not None else torch.zeros(num_classes, dtype=torch.float32) for m in m_list], dim=0).to(device)
                logits = logits + m_stack
            # For each tau, adjust and argmax
            for tau in tau_list:
                if tau is None:
                    logits_adj = logits
                else:
                    if use_per_f0_prior:
                        adj_list = [prior_f0.get(f0, base_adj_global.cpu()).to(device) for f0 in (f0_list if (use_f0_mask or use_per_f0_prior) else [])]
                        if adj_list:
                            adj_stack = torch.stack(adj_list, dim=0).to(device)
                            logits_adj = logits - float(tau) * adj_stack
                        else:
                            logits_adj = logits - float(tau) * base_adj_global[None, :]
                    else:
                        logits_adj = logits - float(tau) * base_adj_global[None, :]
                pred_idx = torch.argmax(logits_adj, dim=1).detach().cpu().numpy()
                pred_buffers[tau].append(pred_idx)
            id_list.extend(list(ids))
            if (bi+1) % 50 == 0:
                print(f'Infer batch {bi+1}/{len(dl)} | elapsed {(time.time()-t0)/60:.1f}m', flush=True)

    outs = {}
    for tau in tau_list:
        pred_idx = np.concatenate(pred_buffers[tau]) if pred_buffers[tau] else np.array([], dtype=np.int64)
        pred_cids = idx2cid[pred_idx] if len(pred_idx)>0 else np.array([], dtype=np.int64)
        sub = pd.read_csv('sample_submission.csv')
        id2pred = dict(zip(id_list, pred_cids.tolist()))
        default_fill = int(sub['Predicted'].mode().iloc[0]) if 'Predicted' in sub.columns else 0
        sub['Predicted'] = sub['Id'].astype(str).map(id2pred).fillna(default_fill).astype(int)
        suffix = 'tauNone' if tau is None else 'tau' + str(tau).replace('.', '_')
        out_path = f'{out_prefix}_{suffix}.csv'
        sub.to_csv(out_path, index=False)
        print('Wrote', out_path, 'rows:', len(sub))
        outs[tau] = out_path
    return outs

print('Inference ready. Fast safety (no TTA, dual tau):')
print("infer_test(df_test, idx2cid, backbone='convnextv2_base', img_size=384, batch_size=128, num_workers=6, ckpt_dir='ckpts_main', use_tta=False, tau_list=(None,0.5), out_prefix='submission_cnn', use_f0_mask=False, use_per_f0_prior=False)")
print('For stronger run later, set use_tta=True (2x), use_per_f0_prior=True, and sweep tau in (None,0.2..0.8). Ensure you see: Loaded EMA weights into model.')

Inference ready. Fast safety (no TTA, dual tau):
infer_test(df_test, idx2cid, backbone='convnextv2_base', img_size=384, batch_size=128, num_workers=6, ckpt_dir='ckpts_main', use_tta=False, tau_list=(None,0.5), out_prefix='submission_cnn', use_f0_mask=False, use_per_f0_prior=False)
For stronger run later, set use_tta=True (2x), use_per_f0_prior=True, and sweep tau in (None,0.2..0.8). Ensure you see: Loaded EMA weights into model.


In [24]:
# Resume-capable training utilities (non-executing until called). Safe to add while main cell runs.
import os, math, time, json, random, numpy as np, torch, torch.nn as nn
from pathlib import Path

def load_resume_state(resume_path, model, ema, optimizer):
    start_step, best_f1 = 0, -1.0
    if resume_path is None or not os.path.exists(str(resume_path)):
        return start_step, best_f1, optimizer
    state = torch.load(resume_path, map_location='cpu')
    # Load model and EMA regardless of optimizer issues
    model.load_state_dict(state.get('model', {}), strict=False)
    if state.get('ema', None) is not None:
        try:
            ema.load_state_dict(state['ema'], strict=False)
        except Exception as e:
            print('EMA resume load skipped:', e)
    # Set resume metadata first
    start_step = int(state.get('step', 0))
    best_f1 = float(state.get('best_f1', -1.0))
    # Try optimizer resume; if it fails, keep training without it (do NOT reset step/best_f1)
    if 'optimizer' in state and optimizer is not None:
        try:
            optimizer.load_state_dict(state['optimizer'])
        except Exception as e:
            print('Optimizer state load failed; continuing without optimizer state:', e)
    print(f'Resumed from {resume_path} at step {start_step}, best_f1={best_f1:.5f}')
    return start_step, best_f1, optimizer

def train_main(
    backbone='convnextv2_base',
    img_size=384,
    batch_size=32,
    eff_batch=128,
    updates_total=8_000,
    warmup_updates=300,
    lr_base=3e-4,
    weight_decay=0.02,
    seed=42,
    mixup_alpha=0.2,
    cutmix_alpha=0.2,
    mix_prob=0.8,
    val_frac=0.05,
    ckpt_dir='ckpts_main',
    head_warmup_updates=600,
    lr_head=3e-3,
    resume_path=None
):
    # Reuse objects from earlier cells: df_train, HerbariumDataset, make_sampler, top1_acc_from_logits, evaluate, save_checkpoint, seed_everything
    seed_everything(seed)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    os.makedirs(ckpt_dir, exist_ok=True)

    tr_idx, va_idx = make_val_split_min1_train(df_train, val_frac=val_frac, seed=seed)
    dtr = df_train.iloc[tr_idx].reset_index(drop=True)
    dva = df_train.iloc[va_idx].reset_index(drop=True)
    print(f'Train/Val sizes: {len(dtr)}/{len(dva)} | classes in train: {dtr.label.nunique()}')

    ds_tr = HerbariumDataset(dtr, mode='train', img_size=img_size)
    ds_va = HerbariumDataset(dva, mode='val', img_size=img_size)
    sampler = make_sampler(dtr['label'].values, power=0.5)
    num_workers = 8
    dl_tr = torch.utils.data.DataLoader(ds_tr, batch_size=batch_size, sampler=sampler, pin_memory=True, num_workers=num_workers, persistent_workers=True, prefetch_factor=2)
    dl_va = torch.utils.data.DataLoader(ds_va, batch_size=max(64, batch_size), shuffle=False, pin_memory=True, num_workers=num_workers, persistent_workers=True, prefetch_factor=2)

    model = timm.create_model(backbone, pretrained=True, num_classes=len(idx2cid))
    if hasattr(model, 'set_grad_checkpointing'):
        try: model.set_grad_checkpointing(True)
        except Exception: pass
    model.to(device)
    model.to(memory_format=torch.channels_last)

    ema = ModelEmaV2(model, decay=0.999, device=device)

    accum_steps = max(1, eff_batch // batch_size)
    scaled_lr = lr_base * ((batch_size * accum_steps) / 256.0)

    # Phase-aware optimizer init
    def freeze_backbone_unfreeze_head(m):
        for n, p in m.named_parameters():
            p.requires_grad = ('head' in n)
    def unfreeze_all(m):
        for p in m.parameters():
            p.requires_grad = True

    optimizer = None
    start_step = 0
    best_f1 = -1.0

    # Tentatively start in head-only mode; may be replaced after resume load
    freeze_backbone_unfreeze_head(model)
    head_params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.AdamW(head_params, lr=lr_head, weight_decay=weight_decay)

    # Try to resume
    start_step, best_f1, optimizer = load_resume_state(resume_path, model, ema, optimizer)

    # If we've passed warmup, switch to full optimizer
    if start_step >= head_warmup_updates:
        unfreeze_all(model)
        optimizer = torch.optim.AdamW(model.parameters(), lr=scaled_lr, weight_decay=weight_decay)

    def cosine_lr(u):
        if u < warmup_updates:
            return (u + 1) / max(1, warmup_updates)
        t = (u - warmup_updates) / max(1, updates_total - warmup_updates)
        t = min(1.0, max(0.0, t))
        return 0.5 * (1 + math.cos(math.pi * t))

    mixup_fn = Mixup(mixup_alpha=mixup_alpha, cutmix_alpha=cutmix_alpha, prob=mix_prob, switch_prob=0.5, mode='batch', label_smoothing=0.1, num_classes=len(idx2cid))
    criterion_soft = SoftTargetCrossEntropy()
    criterion_hard = nn.CrossEntropyLoss(label_smoothing=0.1)
    criterion_val = nn.CrossEntropyLoss(reduction='mean')
    scaler = torch.amp.GradScaler('cuda', enabled=torch.cuda.is_available())

    best_path = None
    micro_step = start_step * accum_steps
    global_step = start_step
    running_loss = 0.0
    samples_seen = 0
    last_train_top1 = 0.0
    t0 = time.time()

    model.train()
    while global_step < updates_total:
        for it, (x, y) in enumerate(dl_tr):
            if global_step >= updates_total:
                break

            if (global_step == head_warmup_updates) and any(not p.requires_grad for p in model.parameters()):
                unfreeze_all(model)
                optimizer = torch.optim.AdamW(model.parameters(), lr=scaled_lr, weight_decay=weight_decay)

            x = x.to(device, non_blocking=True).to(memory_format=torch.channels_last)
            y = y.to(device, non_blocking=True)

            if global_step >= head_warmup_updates:
                cur_lr = scaled_lr * cosine_lr(global_step)
                for pg in optimizer.param_groups: pg['lr'] = cur_lr
            else:
                for pg in optimizer.param_groups: pg['lr'] = lr_head

            use_mix = (mixup_fn is not None) and (global_step >= head_warmup_updates) and (global_step < max(0, updates_total - 1500))
            if use_mix:
                x, y_mix = mixup_fn(x, y)

            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                logits = model(x)
                loss = criterion_soft(logits, y_mix) if use_mix else criterion_hard(logits, y)

            last_train_top1 = top1_acc_from_logits(logits.detach(), y)
            loss = loss / accum_steps
            scaler.scale(loss).backward()
            micro_step += 1

            if (micro_step % accum_steps) == 0:
                scaler.step(optimizer)
                scaler.update()
                optimizer.zero_grad(set_to_none=True)
                ema.update(model)
                global_step += 1

            running_loss += loss.item() * x.size(0) * accum_steps
            samples_seen += x.size(0)

            if (micro_step % 200) == 0:
                elapsed = time.time() - t0
                avg_loss = running_loss / max(1, samples_seen)
                cur_lr_print = optimizer.param_groups[0]['lr']
                print(f'update {global_step}/{updates_total} | micro {micro_step} | avg_loss {avg_loss:.4f} | lr {cur_lr_print:.2e} | train_top1 {last_train_top1*100:.2f}% | elapsed {elapsed/60:.1f}m', flush=True)

            need_val = False
            if global_step > 0 and global_step < 2000 and (global_step % 500 == 0):
                need_val = True
            elif global_step >= 2000 and (global_step % 1000 == 0):
                need_val = True
            if need_val or global_step >= updates_total:
                val_f1, val_top1, val_loss = evaluate(model, ema, dl_va, device, criterion_val)
                is_best = val_f1 > best_f1
                best_f1 = max(best_f1, val_f1)
                ckpt_path = Path(ckpt_dir)/f'model_upd{global_step}_f1{val_f1:.5f}.pt'
                save_checkpoint(model, ema, optimizer, global_step, best_f1, ckpt_path)
                if is_best:
                    best_path = ckpt_path
                print(f'Validation @update {global_step}: macro-F1={val_f1:.5f} | top1={val_top1*100:.2f}% | loss={val_loss:.4f} | best={best_f1:.5f} | saved={ckpt_path.name}', flush=True)

    print('Training done. Best ckpt:', best_path)
    return dict(best_ckpt=str(best_path) if best_path else None, best_f1=best_f1)

print('Resume-capable train_main(resume_path=...) defined. Use after pausing to safely resume from a saved ckpt.')

Resume-capable train_main(resume_path=...) defined. Use after pausing to safely resume from a saved ckpt.


In [19]:
# Safety submission inference: no TTA, dual tau
print('Running safety inference with best ckpt (no TTA, tau=None and 0.5) ...', flush=True)
outs = infer_test(
    df_test, idx2cid,
    backbone='convnextv2_base', img_size=384, batch_size=128, num_workers=6, ckpt_dir='ckpts_main',
    use_tta=False, tau_list=(None, 0.5), out_prefix='submission_cnn',
    use_f0_mask=False, use_per_f0_prior=False
)
print('Inference outputs:', outs, flush=True)

Running safety inference with best ckpt (no TTA, tau=None and 0.5) ...


Selected best ckpt by filename f1: model_upd5000_f10.07740.pt


  state = torch.load(ckpt_path, map_location='cpu')


EMA load skipped: 'ModelEmaV2' object has no attribute 'copy_to'


Infer batch 50/3733 | elapsed 0.9m


Infer batch 100/3733 | elapsed 1.7m


Infer batch 150/3733 | elapsed 2.4m


Infer batch 200/3733 | elapsed 3.2m


Infer batch 250/3733 | elapsed 3.9m


Infer batch 300/3733 | elapsed 4.7m


Infer batch 350/3733 | elapsed 5.5m


Infer batch 400/3733 | elapsed 6.2m


Infer batch 450/3733 | elapsed 7.0m


Infer batch 500/3733 | elapsed 7.8m


Infer batch 550/3733 | elapsed 8.5m


Infer batch 600/3733 | elapsed 9.3m


Infer batch 650/3733 | elapsed 10.1m


Infer batch 700/3733 | elapsed 10.8m


Infer batch 750/3733 | elapsed 11.6m


Infer batch 800/3733 | elapsed 12.4m


Infer batch 850/3733 | elapsed 13.1m


Infer batch 900/3733 | elapsed 13.9m


Infer batch 950/3733 | elapsed 14.7m


Infer batch 1000/3733 | elapsed 15.5m


Infer batch 1050/3733 | elapsed 16.2m


Infer batch 1100/3733 | elapsed 17.0m


Infer batch 1150/3733 | elapsed 17.8m


Infer batch 1200/3733 | elapsed 18.5m


Infer batch 1250/3733 | elapsed 19.3m


Infer batch 1300/3733 | elapsed 20.1m


Infer batch 1350/3733 | elapsed 20.8m


Infer batch 1400/3733 | elapsed 21.6m


Infer batch 1450/3733 | elapsed 22.4m


Infer batch 1500/3733 | elapsed 23.1m


Infer batch 1550/3733 | elapsed 23.9m


Infer batch 1600/3733 | elapsed 24.7m


Infer batch 1650/3733 | elapsed 25.4m


Infer batch 1700/3733 | elapsed 26.2m


Infer batch 1750/3733 | elapsed 27.0m


Infer batch 1800/3733 | elapsed 27.7m


Infer batch 1850/3733 | elapsed 28.5m


Infer batch 1900/3733 | elapsed 29.3m


Infer batch 1950/3733 | elapsed 30.1m


Infer batch 2000/3733 | elapsed 30.8m


Infer batch 2050/3733 | elapsed 31.6m


Infer batch 2100/3733 | elapsed 32.4m


Infer batch 2150/3733 | elapsed 33.1m


Infer batch 2200/3733 | elapsed 33.9m


Infer batch 2250/3733 | elapsed 34.7m


Infer batch 2300/3733 | elapsed 35.4m


Infer batch 2350/3733 | elapsed 36.2m


Infer batch 2400/3733 | elapsed 37.0m


Infer batch 2450/3733 | elapsed 37.7m


Infer batch 2500/3733 | elapsed 38.5m


Infer batch 2550/3733 | elapsed 39.3m


Infer batch 2600/3733 | elapsed 40.0m


Infer batch 2650/3733 | elapsed 40.8m


Infer batch 2700/3733 | elapsed 41.6m


Infer batch 2750/3733 | elapsed 42.3m


Infer batch 2800/3733 | elapsed 43.1m


Infer batch 2850/3733 | elapsed 43.9m


Infer batch 2900/3733 | elapsed 44.6m


Infer batch 2950/3733 | elapsed 45.4m


Infer batch 3000/3733 | elapsed 46.2m


Infer batch 3050/3733 | elapsed 47.0m


Infer batch 3100/3733 | elapsed 47.7m


Infer batch 3150/3733 | elapsed 48.5m


Infer batch 3200/3733 | elapsed 49.3m


Infer batch 3250/3733 | elapsed 50.0m


Infer batch 3300/3733 | elapsed 50.8m


Infer batch 3350/3733 | elapsed 51.6m


Infer batch 3400/3733 | elapsed 52.3m


Infer batch 3450/3733 | elapsed 53.1m


Infer batch 3500/3733 | elapsed 53.9m


Infer batch 3550/3733 | elapsed 54.6m


Infer batch 3600/3733 | elapsed 55.4m


Infer batch 3650/3733 | elapsed 56.2m


Infer batch 3700/3733 | elapsed 56.9m


Wrote submission_cnn_tauNone.csv rows: 477806


Wrote submission_cnn_tau0_5.csv rows: 477806
Inference outputs: {None: 'submission_cnn_tauNone.csv', 0.5: 'submission_cnn_tau0_5.csv'}


In [25]:
# Safety: set current best CNN submission as fallback, then resume training to 10k updates
import shutil, os, pandas as pd
best_fast_sub = 'submission_cnn_tau0_5.csv' if os.path.exists('submission_cnn_tau0_5.csv') else 'submission_cnn_tauNone.csv'
if os.path.exists(best_fast_sub):
    shutil.copyfile(best_fast_sub, 'submission.csv')
    print(f'Copied {best_fast_sub} -> submission.csv')
    try:
        print(pd.read_csv('submission.csv').head())
    except Exception as e:
        print('Readback head failed:', e)
else:
    print('No CNN submission found to copy as safety.')

print('Resuming training from upd5000 to 10k updates ...', flush=True)
train_summary = train_main(
  backbone='convnextv2_base', img_size=384,
  batch_size=32, eff_batch=128,
  updates_total=10_000, warmup_updates=300, lr_base=3e-4, weight_decay=0.02, seed=42,
  mixup_alpha=0.2, cutmix_alpha=0.2, mix_prob=0.8,
  val_frac=0.05, ckpt_dir='ckpts_main',
  head_warmup_updates=600, lr_head=3e-3,
  resume_path='ckpts_main/model_upd5000_f10.07740.pt'
)
print('Train summary:', train_summary)

Copied submission_cnn_tau0_5.csv -> submission.csv
   Id  Predicted
0   0      18454
1   1       5434
2   2      21267
3   3      10029
4   4      33711
Resuming training from upd5000 to 10k updates ...


Train/Val sizes: 1726076/53877 | classes in train: 64500


INFO:timm.models._builder:Loading pretrained weights from Hugging Face hub (timm/convnextv2_base.fcmae_ft_in22k_in1k)


INFO:timm.models._hub:[timm/convnextv2_base.fcmae_ft_in22k_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.


INFO:timm.models._builder:Missing keys (head.fc.weight, head.fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.


  state = torch.load(resume_path, map_location='cpu')


Optimizer state load failed; continuing without optimizer state: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
Resumed from ckpts_main/model_upd5000_f10.07740.pt at step 5000, best_f1=0.07740


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 5000: macro-F1=0.07740 | top1=14.51% | loss=7.1650 | best=0.07740 | saved=model_upd5000_f10.07740.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 5000: macro-F1=0.07740 | top1=14.51% | loss=7.1650 | best=0.07740 | saved=model_upd5000_f10.07740.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 5000: macro-F1=0.07740 | top1=14.51% | loss=7.1650 | best=0.07740 | saved=model_upd5000_f10.07740.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 5050/10000 | micro 20200 | avg_loss 4.5682 | lr 7.75e-05 | train_top1 68.75% | elapsed 23.5m


update 5100/10000 | micro 20400 | avg_loss 4.3496 | lr 7.62e-05 | train_top1 0.00% | elapsed 27.2m


update 5150/10000 | micro 20600 | avg_loss 4.2186 | lr 7.50e-05 | train_top1 3.12% | elapsed 30.9m


update 5200/10000 | micro 20800 | avg_loss 4.0455 | lr 7.38e-05 | train_top1 0.00% | elapsed 34.6m


update 5250/10000 | micro 21000 | avg_loss 3.9238 | lr 7.26e-05 | train_top1 34.38% | elapsed 38.3m


update 5300/10000 | micro 21200 | avg_loss 3.7959 | lr 7.14e-05 | train_top1 90.62% | elapsed 42.0m


update 5350/10000 | micro 21400 | avg_loss 3.7272 | lr 7.02e-05 | train_top1 0.00% | elapsed 45.7m


update 5400/10000 | micro 21600 | avg_loss 3.6619 | lr 6.90e-05 | train_top1 90.62% | elapsed 49.4m


update 5450/10000 | micro 21800 | avg_loss 3.6131 | lr 6.77e-05 | train_top1 84.38% | elapsed 53.1m


update 5500/10000 | micro 22000 | avg_loss 3.5598 | lr 6.65e-05 | train_top1 18.75% | elapsed 56.8m


update 5550/10000 | micro 22200 | avg_loss 3.5171 | lr 6.53e-05 | train_top1 87.50% | elapsed 60.5m


update 5600/10000 | micro 22400 | avg_loss 3.5182 | lr 6.41e-05 | train_top1 34.38% | elapsed 64.2m


update 5650/10000 | micro 22600 | avg_loss 3.7013 | lr 6.29e-05 | train_top1 18.75% | elapsed 67.9m


update 5700/10000 | micro 22800 | avg_loss 3.8471 | lr 6.17e-05 | train_top1 37.50% | elapsed 71.6m


update 5750/10000 | micro 23000 | avg_loss 3.9767 | lr 6.05e-05 | train_top1 40.62% | elapsed 75.3m


update 5800/10000 | micro 23200 | avg_loss 4.0983 | lr 5.94e-05 | train_top1 59.38% | elapsed 79.0m


update 5850/10000 | micro 23400 | avg_loss 4.2039 | lr 5.82e-05 | train_top1 6.25% | elapsed 82.7m


update 5900/10000 | micro 23600 | avg_loss 4.2987 | lr 5.70e-05 | train_top1 3.12% | elapsed 86.4m


update 5950/10000 | micro 23800 | avg_loss 4.3812 | lr 5.58e-05 | train_top1 40.62% | elapsed 90.1m


update 6000/10000 | micro 24000 | avg_loss 4.4598 | lr 5.46e-05 | train_top1 0.00% | elapsed 93.8m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 6000: macro-F1=0.08185 | top1=14.76% | loss=7.2555 | best=0.08185 | saved=model_upd6000_f10.08185.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 6000: macro-F1=0.08185 | top1=14.76% | loss=7.2555 | best=0.08185 | saved=model_upd6000_f10.08185.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 6000: macro-F1=0.08185 | top1=14.76% | loss=7.2555 | best=0.08185 | saved=model_upd6000_f10.08185.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 6000: macro-F1=0.08185 | top1=14.76% | loss=7.2555 | best=0.08185 | saved=model_upd6000_f10.08185.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 6050/10000 | micro 24200 | avg_loss 4.5308 | lr 5.35e-05 | train_top1 43.75% | elapsed 124.1m


update 6100/10000 | micro 24400 | avg_loss 4.5992 | lr 5.23e-05 | train_top1 0.00% | elapsed 127.8m


update 6150/10000 | micro 24600 | avg_loss 4.6607 | lr 5.12e-05 | train_top1 15.62% | elapsed 131.5m


update 6200/10000 | micro 24800 | avg_loss 4.7139 | lr 5.00e-05 | train_top1 15.62% | elapsed 135.2m


update 6250/10000 | micro 25000 | avg_loss 4.7657 | lr 4.89e-05 | train_top1 43.75% | elapsed 138.9m


update 6300/10000 | micro 25200 | avg_loss 4.8159 | lr 4.77e-05 | train_top1 40.62% | elapsed 142.6m


update 6350/10000 | micro 25400 | avg_loss 4.8525 | lr 4.66e-05 | train_top1 0.00% | elapsed 146.3m


update 6400/10000 | micro 25600 | avg_loss 4.8938 | lr 4.55e-05 | train_top1 28.12% | elapsed 150.0m


update 6450/10000 | micro 25800 | avg_loss 4.9288 | lr 4.44e-05 | train_top1 18.75% | elapsed 153.7m


update 6500/10000 | micro 26000 | avg_loss 4.9624 | lr 4.33e-05 | train_top1 34.38% | elapsed 157.4m


update 6550/10000 | micro 26200 | avg_loss 4.9923 | lr 4.22e-05 | train_top1 50.00% | elapsed 161.1m


update 6600/10000 | micro 26400 | avg_loss 5.0242 | lr 4.11e-05 | train_top1 0.00% | elapsed 164.8m


update 6650/10000 | micro 26600 | avg_loss 5.0526 | lr 4.00e-05 | train_top1 62.50% | elapsed 168.5m


update 6700/10000 | micro 26800 | avg_loss 5.0799 | lr 3.89e-05 | train_top1 0.00% | elapsed 172.2m


update 6750/10000 | micro 27000 | avg_loss 5.1028 | lr 3.79e-05 | train_top1 43.75% | elapsed 175.9m


update 6800/10000 | micro 27200 | avg_loss 5.1245 | lr 3.68e-05 | train_top1 18.75% | elapsed 179.6m


update 6850/10000 | micro 27400 | avg_loss 5.1436 | lr 3.58e-05 | train_top1 28.12% | elapsed 183.3m


update 6900/10000 | micro 27600 | avg_loss 5.1598 | lr 3.48e-05 | train_top1 53.12% | elapsed 187.0m


update 6950/10000 | micro 27800 | avg_loss 5.1765 | lr 3.37e-05 | train_top1 21.88% | elapsed 190.7m


update 7000/10000 | micro 28000 | avg_loss 5.1941 | lr 3.27e-05 | train_top1 37.50% | elapsed 194.4m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 7000: macro-F1=0.09592 | top1=16.78% | loss=7.1565 | best=0.09592 | saved=model_upd7000_f10.09592.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 7000: macro-F1=0.09592 | top1=16.78% | loss=7.1565 | best=0.09592 | saved=model_upd7000_f10.09592.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 7000: macro-F1=0.09592 | top1=16.78% | loss=7.1565 | best=0.09592 | saved=model_upd7000_f10.09592.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 7000: macro-F1=0.09592 | top1=16.78% | loss=7.1565 | best=0.09592 | saved=model_upd7000_f10.09592.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 7050/10000 | micro 28200 | avg_loss 5.2081 | lr 3.17e-05 | train_top1 37.50% | elapsed 224.7m


update 7100/10000 | micro 28400 | avg_loss 5.2251 | lr 3.07e-05 | train_top1 59.38% | elapsed 228.4m


update 7150/10000 | micro 28600 | avg_loss 5.2375 | lr 2.98e-05 | train_top1 6.25% | elapsed 232.1m


update 7200/10000 | micro 28800 | avg_loss 5.2478 | lr 2.88e-05 | train_top1 25.00% | elapsed 235.8m


update 7250/10000 | micro 29000 | avg_loss 5.2580 | lr 2.79e-05 | train_top1 43.75% | elapsed 239.5m


update 7300/10000 | micro 29200 | avg_loss 5.2686 | lr 2.69e-05 | train_top1 0.00% | elapsed 243.2m


update 7350/10000 | micro 29400 | avg_loss 5.2760 | lr 2.60e-05 | train_top1 53.12% | elapsed 246.9m


update 7400/10000 | micro 29600 | avg_loss 5.2848 | lr 2.51e-05 | train_top1 6.25% | elapsed 250.6m


update 7450/10000 | micro 29800 | avg_loss 5.2925 | lr 2.42e-05 | train_top1 50.00% | elapsed 254.3m


update 7500/10000 | micro 30000 | avg_loss 5.3012 | lr 2.33e-05 | train_top1 50.00% | elapsed 258.0m


update 7550/10000 | micro 30200 | avg_loss 5.3103 | lr 2.24e-05 | train_top1 53.12% | elapsed 261.7m


update 7600/10000 | micro 30400 | avg_loss 5.3177 | lr 2.16e-05 | train_top1 56.25% | elapsed 265.4m


update 7650/10000 | micro 30600 | avg_loss 5.3225 | lr 2.07e-05 | train_top1 3.12% | elapsed 269.1m


update 7700/10000 | micro 30800 | avg_loss 5.3293 | lr 1.99e-05 | train_top1 6.25% | elapsed 272.8m


update 7750/10000 | micro 31000 | avg_loss 5.3346 | lr 1.91e-05 | train_top1 43.75% | elapsed 276.5m


update 7800/10000 | micro 31200 | avg_loss 5.3406 | lr 1.83e-05 | train_top1 40.62% | elapsed 280.3m


update 7850/10000 | micro 31400 | avg_loss 5.3468 | lr 1.75e-05 | train_top1 50.00% | elapsed 284.0m


update 7900/10000 | micro 31600 | avg_loss 5.3508 | lr 1.67e-05 | train_top1 0.00% | elapsed 287.7m


update 7950/10000 | micro 31800 | avg_loss 5.3569 | lr 1.59e-05 | train_top1 40.62% | elapsed 291.4m


update 8000/10000 | micro 32000 | avg_loss 5.3616 | lr 1.52e-05 | train_top1 37.50% | elapsed 295.1m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 8000: macro-F1=0.10895 | top1=18.48% | loss=6.9769 | best=0.10895 | saved=model_upd8000_f10.10895.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 8000: macro-F1=0.10895 | top1=18.48% | loss=6.9769 | best=0.10895 | saved=model_upd8000_f10.10895.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 8000: macro-F1=0.10895 | top1=18.48% | loss=6.9769 | best=0.10895 | saved=model_upd8000_f10.10895.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 8000: macro-F1=0.10895 | top1=18.48% | loss=6.9769 | best=0.10895 | saved=model_upd8000_f10.10895.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


update 8050/10000 | micro 32200 | avg_loss 5.3660 | lr 1.45e-05 | train_top1 18.75% | elapsed 325.3m


update 8100/10000 | micro 32400 | avg_loss 5.3683 | lr 1.38e-05 | train_top1 50.00% | elapsed 329.1m


update 8150/10000 | micro 32600 | avg_loss 5.3723 | lr 1.31e-05 | train_top1 31.25% | elapsed 332.8m


update 8200/10000 | micro 32800 | avg_loss 5.3751 | lr 1.24e-05 | train_top1 46.88% | elapsed 336.5m


update 8250/10000 | micro 33000 | avg_loss 5.3773 | lr 1.17e-05 | train_top1 40.62% | elapsed 340.2m


update 8300/10000 | micro 33200 | avg_loss 5.3810 | lr 1.11e-05 | train_top1 50.00% | elapsed 343.9m


update 8350/10000 | micro 33400 | avg_loss 5.3852 | lr 1.05e-05 | train_top1 46.88% | elapsed 347.6m


update 8400/10000 | micro 33600 | avg_loss 5.3907 | lr 9.86e-06 | train_top1 43.75% | elapsed 351.3m


update 8450/10000 | micro 33800 | avg_loss 5.3949 | lr 9.27e-06 | train_top1 37.50% | elapsed 355.0m


update 8500/10000 | micro 34000 | avg_loss 5.3980 | lr 8.69e-06 | train_top1 56.25% | elapsed 358.7m


update 8550/10000 | micro 34200 | avg_loss 5.3906 | lr 8.13e-06 | train_top1 37.50% | elapsed 362.4m


update 8600/10000 | micro 34400 | avg_loss 5.3830 | lr 7.59e-06 | train_top1 62.50% | elapsed 366.1m


update 8650/10000 | micro 34600 | avg_loss 5.3770 | lr 7.07e-06 | train_top1 50.00% | elapsed 369.8m


update 8700/10000 | micro 34800 | avg_loss 5.3704 | lr 6.56e-06 | train_top1 56.25% | elapsed 373.5m


update 8750/10000 | micro 35000 | avg_loss 5.3643 | lr 6.07e-06 | train_top1 50.00% | elapsed 377.2m


update 8800/10000 | micro 35200 | avg_loss 5.3585 | lr 5.60e-06 | train_top1 50.00% | elapsed 380.9m


update 8850/10000 | micro 35400 | avg_loss 5.3526 | lr 5.15e-06 | train_top1 46.88% | elapsed 384.6m


update 8900/10000 | micro 35600 | avg_loss 5.3467 | lr 4.72e-06 | train_top1 62.50% | elapsed 388.3m


update 8950/10000 | micro 35800 | avg_loss 5.3411 | lr 4.30e-06 | train_top1 43.75% | elapsed 392.0m


update 9000/10000 | micro 36000 | avg_loss 5.3357 | lr 3.91e-06 | train_top1 62.50% | elapsed 395.7m




  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Validation @update 9000: macro-F1=0.11838 | top1=19.67% | loss=6.8369 | best=0.11838 | saved=model_upd9000_f10.11838.pt


  return fn(*args, **kwargs)


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]




KeyboardInterrupt: 

In [27]:
# Strong inference (EMA, 2x TTA, per-F0 prior, tau sweep up to 0.8)
print('Running strong inference (EMA, 2x TTA, per-F0 prior, tau sweep) ...', flush=True)
outs_strong = infer_test(
    df_test, idx2cid,
    backbone='convnextv2_base', img_size=384,
    batch_size=128, num_workers=6, ckpt_dir='ckpts_main',
    use_tta=True,
    tau_list=(None, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8),
    out_prefix='submission_cnn_strong',
    use_f0_mask=False,
    use_per_f0_prior=True
)
print('Strong inference outputs:', outs_strong, flush=True)

Running strong inference (EMA, 2x TTA, per-F0 prior, tau sweep) ...


Selected best ckpt by filename f1: model_upd9000_f10.11838.pt


  state = torch.load(ckpt_path, map_location='cpu')


Loaded EMA weights into model


Infer batch 50/3733 | elapsed 1.6m


Infer batch 100/3733 | elapsed 3.2m


Infer batch 150/3733 | elapsed 4.8m


Infer batch 200/3733 | elapsed 6.4m


Infer batch 250/3733 | elapsed 8.0m


Infer batch 300/3733 | elapsed 9.5m


Infer batch 350/3733 | elapsed 11.1m


Infer batch 400/3733 | elapsed 12.7m


Infer batch 450/3733 | elapsed 14.3m


Infer batch 500/3733 | elapsed 15.9m


Infer batch 550/3733 | elapsed 17.5m


Infer batch 600/3733 | elapsed 19.1m


Infer batch 650/3733 | elapsed 20.7m


Infer batch 700/3733 | elapsed 22.3m


Infer batch 750/3733 | elapsed 23.9m


Infer batch 800/3733 | elapsed 25.5m


Infer batch 850/3733 | elapsed 27.1m


Infer batch 900/3733 | elapsed 28.7m


Infer batch 950/3733 | elapsed 30.3m


Infer batch 1000/3733 | elapsed 31.9m


Infer batch 1050/3733 | elapsed 33.5m


Infer batch 1100/3733 | elapsed 35.1m


Infer batch 1150/3733 | elapsed 36.6m


Infer batch 1200/3733 | elapsed 38.2m


Infer batch 1250/3733 | elapsed 39.8m


Infer batch 1300/3733 | elapsed 41.4m


Infer batch 1350/3733 | elapsed 43.0m


Infer batch 1400/3733 | elapsed 44.6m


Infer batch 1450/3733 | elapsed 46.2m


Infer batch 1500/3733 | elapsed 47.8m


Infer batch 1550/3733 | elapsed 49.4m


Infer batch 1600/3733 | elapsed 51.0m


Infer batch 1650/3733 | elapsed 52.6m


Infer batch 1700/3733 | elapsed 54.2m


Infer batch 1750/3733 | elapsed 55.8m


Infer batch 1800/3733 | elapsed 57.4m


Infer batch 1850/3733 | elapsed 59.0m


Infer batch 1900/3733 | elapsed 60.6m


Infer batch 1950/3733 | elapsed 62.2m


Infer batch 2000/3733 | elapsed 63.8m


Infer batch 2050/3733 | elapsed 65.4m


Infer batch 2100/3733 | elapsed 66.9m


Infer batch 2150/3733 | elapsed 68.5m


Infer batch 2200/3733 | elapsed 70.1m


Infer batch 2250/3733 | elapsed 71.7m


Infer batch 2300/3733 | elapsed 73.3m


Infer batch 2350/3733 | elapsed 74.9m


Infer batch 2400/3733 | elapsed 76.5m


Infer batch 2450/3733 | elapsed 78.1m


Infer batch 2500/3733 | elapsed 79.6m


Infer batch 2550/3733 | elapsed 81.2m


Infer batch 2600/3733 | elapsed 82.8m


Infer batch 2650/3733 | elapsed 84.4m


Infer batch 2700/3733 | elapsed 86.0m


Infer batch 2750/3733 | elapsed 87.6m


Infer batch 2800/3733 | elapsed 89.2m


Infer batch 2850/3733 | elapsed 90.7m


Infer batch 2900/3733 | elapsed 92.3m


Infer batch 2950/3733 | elapsed 93.9m


Infer batch 3000/3733 | elapsed 95.5m


Infer batch 3050/3733 | elapsed 97.1m


Infer batch 3100/3733 | elapsed 98.7m


Infer batch 3150/3733 | elapsed 100.3m


Infer batch 3200/3733 | elapsed 101.9m


Infer batch 3250/3733 | elapsed 103.5m


Infer batch 3300/3733 | elapsed 105.1m


Infer batch 3350/3733 | elapsed 106.7m


Infer batch 3400/3733 | elapsed 108.3m


Infer batch 3450/3733 | elapsed 109.9m


Infer batch 3500/3733 | elapsed 111.5m


Infer batch 3550/3733 | elapsed 113.0m


Infer batch 3600/3733 | elapsed 114.6m


Infer batch 3650/3733 | elapsed 116.2m


Infer batch 3700/3733 | elapsed 117.8m


Wrote submission_cnn_strong_tauNone.csv rows: 477806


Wrote submission_cnn_strong_tau0_2.csv rows: 477806


Wrote submission_cnn_strong_tau0_3.csv rows: 477806


Wrote submission_cnn_strong_tau0_4.csv rows: 477806


Wrote submission_cnn_strong_tau0_5.csv rows: 477806


Wrote submission_cnn_strong_tau0_6.csv rows: 477806


Wrote submission_cnn_strong_tau0_7.csv rows: 477806


Wrote submission_cnn_strong_tau0_8.csv rows: 477806
Strong inference outputs: {None: 'submission_cnn_strong_tauNone.csv', 0.2: 'submission_cnn_strong_tau0_2.csv', 0.3: 'submission_cnn_strong_tau0_3.csv', 0.4: 'submission_cnn_strong_tau0_4.csv', 0.5: 'submission_cnn_strong_tau0_5.csv', 0.6: 'submission_cnn_strong_tau0_6.csv', 0.7: 'submission_cnn_strong_tau0_7.csv', 0.8: 'submission_cnn_strong_tau0_8.csv'}


In [28]:
# Select strong inference submission (tau=0.5 primary; tau=0.7 as fallback if desired)
import shutil, os, pandas as pd
cand_primary = 'submission_cnn_strong_tau0_5.csv'
cand_secondary = 'submission_cnn_strong_tau0_7.csv'
choose = cand_primary if os.path.exists(cand_primary) else (cand_secondary if os.path.exists(cand_secondary) else None)
assert choose is not None, 'No strong inference submission found'
shutil.copyfile(choose, 'submission.csv')
print(f'Copied {choose} -> submission.csv')
print(pd.read_csv('submission.csv').head())

Copied submission_cnn_strong_tau0_5.csv -> submission.csv
   Id  Predicted
0   0      18454
1   1       5434
2   2      21267
3   3      10029
4   4      63426
