# Herbarium 2021 - FGVC8: Medal Plan v1

Objectives:
- Establish a fast, correct baseline and produce a valid submission quickly.
- Build reliable CV mirroring test and iterate to medal via stronger backbones/augs and ensembling.

Environment:
- GPU confirmed available (A10, CUDA 12.x). Will install torch cu121 + timm + albumentations.

Data understanding:
- Train: ~1.78M images, ~64.5k species (long tail). Test: ~478k images.
- Metadata: COCO-like lists for images/annotations; categories length ~64.5k.
- Images are stored under train/images/<shard>/* and test/images/<shard>/*.

Validation:
- Stratified KFold by species_id. Start with 1-fold holdout smoke, then 5-fold.
- Track macro-F1 OOF; save per-class stats; ensure deterministic splits saved to disk.

Baseline model:
- timm pretrained classifier; start convnext_tiny/efficientnet_b0 at 224px, AMP, label smoothing.
- Augs: RandomResizedCrop(224), HFlip, ColorJitter light. BalancedSampler or class-weighted loss.
- Loss: CrossEntropy with label smoothing; consider focal loss if needed.
- Optim: AdamW, cosine LR, warmup; early stopping; EMA optional.

Scaling:
- Increase resolution to 384/448; stronger augs (RandAug, MixUp/CutMix).
- Backbones: convnext_base/large; efficientnet_v2_s/m; eva02_tiny if available.
- Class reweighting/sampling for long-tail; fine-tune head+last stages first, then full.

Efficiency:
- Cache train/test filelists and class map; num_workers tuned; pin_memory; prefetch factor.
- Log elapsed times per epoch/fold; save OOF logits and test logits to .npy for blending.

Submission:
- Predict top-1 species_id per test image; ensure mapping aligns with sample_submission Id order.

Expert questions:
- Best-performing backbones/resolutions for this dataset under time constraints?
- Preferred sampling vs loss strategy for long-tail (CB loss vs focal vs LDAM)?
- Any known metadata pitfalls (e.g., institution leakage) and recommended CV protocol?
- Ensembling approaches that moved LB here?

Next steps:
1) Install exact cu121 torch stack + deps.
2) Build filelist from metadata, label encoder, and a tiny dataloader smoke run.
3) Train a 1-fold baseline at 224px; produce a first submission.
4) Request expert review before scaling training.

In [2]:
import os, json, time, subprocess, sys, shutil
from pathlib import Path

def run(cmd):
    return subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True).stdout

print('Checking GPU with nvidia-smi...\n', flush=True)
print(run(['bash','-lc','nvidia-smi || true']))

root = Path.cwd()
print('CWD:', root)

data_root = root
train_dir = data_root / 'train'
test_dir = data_root / 'test'
print('Train dir exists:', train_dir.exists(), ' Test dir exists:', test_dir.exists())

# List top-level files
print('Top-level files:', [p.name for p in data_root.iterdir()])

# Inspect metadata files
def load_json(p):
    with open(p, 'r') as f:
        return json.load(f)

train_meta_path = train_dir / 'metadata.json'
test_meta_path = test_dir / 'metadata.json'
train_meta = load_json(train_meta_path)
test_meta = load_json(test_meta_path)

def brief_meta(meta, name):
    keys = list(meta.keys())[:10]
    print(f'{name} keys (first 10):', keys)
    # Common fields often include annotations, images, licenses, dataset, etc.
    for k in ['annotations','images','licenses','info','categories','species','image_dir','dataset','annotations_path']:
        if k in meta:
            v = meta[k]
            if isinstance(v, list):
                print(f'  {k}: list len={len(v)}')
            elif isinstance(v, dict):
                print(f'  {k}: dict keys={list(v.keys())[:10]}')
            else:
                print(f'  {k}: type={type(v).__name__}')

print('\nTrain metadata:')
brief_meta(train_meta, 'train_meta')
print('\nTest metadata:')
brief_meta(test_meta, 'test_meta')

# Sample few image subfolders
train_img_root = train_dir / 'images'
test_img_root = test_dir / 'images'
def sample_dirs(p, n=5):
    if not p.exists():
        return []
    return [d.name for d in list(p.iterdir()) if d.is_dir()][:n]

print('\nSample train image subdirs:', sample_dirs(train_img_root))
print('Sample test image subdirs:', sample_dirs(test_img_root))

# Check sample_submission format
ss_path = data_root / 'sample_submission.csv'
if ss_path.exists():
    import pandas as pd
    ss = pd.read_csv(ss_path)
    print('\nsample_submission.csv head:')
    print(ss.head())
    print('Columns:', ss.columns.tolist(), ' Shape:', ss.shape)
else:
    print('sample_submission.csv not found!')

print('\nEnvironment check complete. If GPU missing above, we will exit in the next step.')

Checking GPU with nvidia-smi...



Mon Sep 29 03:39:57 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     182MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                


Train metadata:
train_meta keys (first 10): ['annotations', 'categories', 'images', 'info', 'institutions', 'licenses']
  annotations: list len=1779953
  images: list len=1779953
  licenses: list len=3
  info: dict keys=['contributor', 'date_created', 'description', 'url', 'version', 'year']
  categories: list len=64500

Test metadata:
test_meta keys (first 10): ['images', 'info', 'licenses']
  images: list len=477806
  licenses: list len=3
  info: dict keys=['contributor', 'date_created', 'description', 'url', 'version', 'year']

Sample train image subdirs: ['544', '036', '392', '519', '110']
Sample test image subdirs: ['036', '392', '110', '037', '375']



sample_submission.csv head:
   Id  Predicted
0   0          0
1   1          0
2   2          0
3   3          0
4   4          0
Columns: ['Id', 'Predicted']  Shape: (477806, 2)

Environment check complete. If GPU missing above, we will exit in the next step.


In [9]:
import random
from PIL import Image

def verify_image_paths(meta, base_dir, n=8, name='train'):
    print(f'\nVerifying {name} image paths...')
    idxs = list(range(min(len(meta['images']), 1)))
    while len(idxs) < n and len(idxs) < len(meta['images']):
        r = random.randrange(len(meta['images']))
        if r not in idxs:
            idxs.append(r)
    ok = 0
    for i in idxs:
        rel = meta['images'][i]['file_name']
        # file_name is relative to split root (e.g., 'images/604/92/xxxx.jpg')
        if rel.startswith('images/'):
            p = base_dir / rel
        else:
            # Fallback: assume already under images/ directory provided separately
            p = (base_dir / 'images') / rel
        exists = p.exists()
        print(f'  idx={i} rel={rel} -> path={p} exists={exists}')
        if exists and ok == 0:
            try:
                im = Image.open(p)
                print('    opened:', im.size, im.mode)
                im.close()
                ok += 1
            except Exception as e:
                print('    open failed:', e)
    return

verify_image_paths(train_meta, train_dir, n=8, name='train')
verify_image_paths(test_meta, test_dir, n=8, name='test')


Verifying train image paths...
  idx=0 rel=images/604/92/1608432.jpg -> path=/var/lib/simon/agent_run_states/herbarium-2021-fgvc8-20250929-022751/train/images/604/92/1608432.jpg exists=True
    opened: (671, 1000) RGB
  idx=199934 rel=images/577/03/2213991.jpg -> path=/var/lib/simon/agent_run_states/herbarium-2021-fgvc8-20250929-022751/train/images/577/03/2213991.jpg exists=True
  idx=339915 rel=images/422/13/1522494.jpg -> path=/var/lib/simon/agent_run_states/herbarium-2021-fgvc8-20250929-022751/train/images/422/13/1522494.jpg exists=True
  idx=541572 rel=images/589/41/386076.jpg -> path=/var/lib/simon/agent_run_states/herbarium-2021-fgvc8-20250929-022751/train/images/589/41/386076.jpg exists=True
  idx=299819 rel=images/291/50/1564602.jpg -> path=/var/lib/simon/agent_run_states/herbarium-2021-fgvc8-20250929-022751/train/images/291/50/1564602.jpg exists=True
  idx=384492 rel=images/433/13/2124494.jpg -> path=/var/lib/simon/agent_run_states/herbarium-2021-fgvc8-20250929-022751/train/i

In [4]:
import pandas as pd
from collections import Counter

# Fallback baseline: predict the most frequent category_id from train annotations for all test rows
ann = train_meta['annotations']
cnt = Counter(a['category_id'] for a in ann)
top_cat, top_freq = cnt.most_common(1)[0]
print('Most frequent category_id:', top_cat, 'freq:', top_freq)

# Build submission in test image order (Id is row index matching test_meta['images'] order)
n_test = len(test_meta['images'])
sub = pd.DataFrame({'Id': range(n_test), 'Predicted': [top_cat]*n_test})
sub_path = Path('submission.csv')
sub.to_csv(sub_path, index=False)
print('Wrote', sub_path.resolve(), 'shape:', sub.shape)
print(sub.head())

Most frequent category_id: 42811 freq: 2647


Wrote /var/lib/simon/agent_run_states/herbarium-2021-fgvc8-20250929-022751/submission.csv shape: (477806, 2)
   Id  Predicted
0   0      42811
1   1      42811
2   2      42811
3   3      42811
4   4      42811


In [5]:
import itertools
import collections
import pandas as pd

def inspect_image_records(meta, name='train', n=3):
    imgs = meta['images']
    print(f'\n{name} images[0] keys:', list(imgs[0].keys()))
    for i in range(min(n, len(imgs))):
        rec = imgs[i]
        sample = {k: rec[k] for k in list(rec.keys())[:10]}
        print(f'  {name}[{i}] sample:', sample)
    # key presence frequencies (first 50 keys aggregated)
    key_counts = collections.Counter(itertools.chain.from_iterable([list(d.keys()) for d in imgs[:5000]]))
    print(f'{name} key frequencies (top 20):', key_counts.most_common(20))

def inspect_annotation_records(meta, name='train', n=3):
    if 'annotations' not in meta:
        print(f'\n{name} has no annotations')
        return
    anns = meta['annotations']
    print(f'\n{name} annotations[0] keys:', list(anns[0].keys()))
    for i in range(min(n, len(anns))):
        rec = anns[i]
        sample = {k: rec[k] for k in list(rec.keys())[:10]}
        print(f'  {name}_ann[{i}] sample:', sample)

inspect_image_records(train_meta, 'train', n=2)
inspect_annotation_records(train_meta, 'train', n=2)
inspect_image_records(test_meta, 'test', n=2)


train images[0] keys: ['file_name', 'height', 'id', 'license', 'width']
  train[0] sample: {'file_name': 'images/604/92/1608432.jpg', 'height': 1000, 'id': 1608432, 'license': 0, 'width': 671}
  train[1] sample: {'file_name': 'images/604/92/796948.jpg', 'height': 1000, 'id': 796948, 'license': 0, 'width': 678}
train key frequencies (top 20): [('file_name', 5000), ('height', 5000), ('id', 5000), ('license', 5000), ('width', 5000)]

train annotations[0] keys: ['category_id', 'id', 'image_id', 'institution_id']
  train_ann[0] sample: {'category_id': 60492, 'id': 1608432, 'image_id': 1608432, 'institution_id': 0}
  train_ann[1] sample: {'category_id': 60492, 'id': 796948, 'image_id': 796948, 'institution_id': 0}

test images[0] keys: ['file_name', 'height', 'id', 'license', 'width']
  test[0] sample: {'file_name': 'images/000/0.jpg', 'height': 1000, 'id': '0', 'license': 0, 'width': 666}
  test[1] sample: {'file_name': 'images/000/1.jpg', 'height': 1000, 'id': '1', 'license': 0, 'width': 

In [6]:
import pandas as pd
from collections import defaultdict, Counter

# Build (width,height) -> top category_id mapping from train
img_recs = {img['id']: (img['width'], img['height']) for img in train_meta['images']}
wh_to_cats = defaultdict(Counter)
for ann in train_meta['annotations']:
    img_id = ann['image_id']
    wh = img_recs.get(img_id, None)
    if wh is None:
        continue
    wh_to_cats[wh][ann['category_id']] += 1

# For each (w,h), pick most common category
wh_to_topcat = {wh: cnt.most_common(1)[0][0] for wh, cnt in wh_to_cats.items()}
print('Unique (w,h) keys:', len(wh_to_topcat))

# Global fallback
global_cnt = Counter(a['category_id'] for a in train_meta['annotations'])
global_top = global_cnt.most_common(1)[0][0]
print('Global top category:', global_top)

# Build submission using test (w,h) -> mapped category, else global
preds = []
for rec in test_meta['images']:
    wh = (rec['width'], rec['height'])
    cat = wh_to_topcat.get(wh, global_top)
    preds.append(cat)

sub_wh = pd.DataFrame({'Id': range(len(preds)), 'Predicted': preds})
sub_wh.to_csv('submission.csv', index=False)
print('submission.csv written with (w,h)-based priors. Shape:', sub_wh.shape)
print(sub_wh.head())

Unique (w,h) keys: 507
Global top category: 42811


submission.csv written with (w,h)-based priors. Shape: (477806, 2)
   Id  Predicted
0   0      42811
1   1      25497
2   2      49885
3   3      42811
4   4       5954


In [7]:
# Improved metadata-only baseline: backoff (w,h) -> w -> global
import pandas as pd
from collections import defaultdict, Counter

img_recs = {img['id']: (img['width'], img['height']) for img in train_meta['images']}
wh_to_cats = defaultdict(Counter)
w_to_cats = defaultdict(Counter)
for ann in train_meta['annotations']:
    img_id = ann['image_id']
    wh = img_recs.get(img_id, None)
    if wh is None:
        continue
    w,h = wh
    wh_to_cats[wh][ann['category_id']] += 1
    w_to_cats[w][ann['category_id']] += 1

wh_to_top = {k: v.most_common(1)[0][0] for k,v in wh_to_cats.items()}
w_to_top = {k: v.most_common(1)[0][0] for k,v in w_to_cats.items()}
global_top = Counter(a['category_id'] for a in train_meta['annotations']).most_common(1)[0][0]
print('Sizes: wh_keys', len(wh_to_top), ' w_keys', len(w_to_top), ' global_top', global_top)

preds = []
for rec in test_meta['images']:
    w,h = rec['width'], rec['height']
    cat = wh_to_top.get((w,h))
    if cat is None:
        cat = w_to_top.get(w, global_top)
    preds.append(cat)

sub = pd.DataFrame({'Id': range(len(preds)), 'Predicted': preds})
sub.to_csv('submission.csv', index=False)
print('Wrote submission.csv with backoff strategy. Shape:', sub.shape)
print(sub.head())

Sizes: wh_keys 507  w_keys 446  global_top 42811


Wrote submission.csv with backoff strategy. Shape: (477806, 2)
   Id  Predicted
0   0      42811
1   1      25497
2   2      49885
3   3      42811
4   4       5954


In [8]:
# Metadata-only baseline v3: infer institution by (w,h) -> predict institution's top category, with backoffs
import pandas as pd
from collections import defaultdict, Counter

# Build image_id -> (w,h)
img_wh = {img['id']: (img['width'], img['height']) for img in train_meta['images']}

# Build (w,h) -> institution counts and institution -> category counts
wh_to_inst = defaultdict(Counter)
inst_to_cat = defaultdict(Counter)
for ann in train_meta['annotations']:
    img_id = ann['image_id']
    wh = img_wh.get(img_id)
    if wh is None:
        continue
    inst = ann.get('institution_id', None)
    if inst is not None:
        wh_to_inst[wh][inst] += 1
        inst_to_cat[inst][ann['category_id']] += 1

# Top institution per (w,h) and top category per institution
wh_to_top_inst = {wh: cnt.most_common(1)[0][0] for wh, cnt in wh_to_inst.items()}
inst_to_top_cat = {inst: cnt.most_common(1)[0][0] for inst, cnt in inst_to_cat.items()}

# Backoff tables from prior baseline
w_to_cats = defaultdict(Counter)
for ann in train_meta['annotations']:
    wh = img_wh.get(ann['image_id'])
    if wh is None:
        continue
    w_to_cats[wh[0]][ann['category_id']] += 1
w_to_top = {k: v.most_common(1)[0][0] for k,v in w_to_cats.items()}
global_top = Counter(a['category_id'] for a in train_meta['annotations']).most_common(1)[0][0]

# Predict for test
preds = []
for rec in test_meta['images']:
    wh = (rec['width'], rec['height'])
    cat = None
    inst = wh_to_top_inst.get(wh)
    if inst is not None:
        cat = inst_to_top_cat.get(inst)
    if cat is None:
        # backoff to width-only
        cat = w_to_top.get(wh[0])
    if cat is None:
        cat = global_top
    preds.append(cat)

sub = pd.DataFrame({'Id': range(len(preds)), 'Predicted': preds})
sub.to_csv('submission.csv', index=False)
print('Wrote submission.csv using inferred institution-by-(w,h) with backoffs. Shape:', sub.shape)
print(sub.head())
print('Tables: wh_to_top_inst:', len(wh_to_top_inst), ' inst_to_top_cat:', len(inst_to_top_cat), ' w_keys:', len(w_to_top))

Wrote submission.csv using inferred institution-by-(w,h) with backoffs. Shape: (477806, 2)
   Id  Predicted
0   0      42811
1   1      42811
2   2      42811
3   3      42811
4   4      42811
Tables: wh_to_top_inst: 507  inst_to_top_cat: 5  w_keys: 446


In [10]:
# Install CUDA 12.1 torch stack and deps; verify GPU
import os, sys, subprocess, shutil
from pathlib import Path

def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

# Uninstall any existing torch stack (idempotent)
for pkg in ('torch','torchvision','torchaudio'):
    subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', pkg], check=False)

# Clean stray site dirs that can shadow correct wheels
for d in (
    '/app/.pip-target/torch',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torch-2.8.0.dist-info',
    '/app/.pip-target/torchvision-0.23.0.dist-info',
    '/app/.pip-target/torchaudio-2.8.0.dist-info',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
    '/app/.pip-target/torchgen',
    '/app/.pip-target/functorch',
):
    if os.path.exists(d):
        print('Removing', d, flush=True)
        shutil.rmtree(d, ignore_errors=True)

# Install exact cu121 torch stack
pip('install',
    '--index-url', 'https://download.pytorch.org/whl/cu121',
    '--extra-index-url', 'https://pypi.org/simple',
    'torch==2.4.1', 'torchvision==0.19.1', 'torchaudio==2.4.1')

# Freeze torch versions
Path('constraints.txt').write_text(
    'torch==2.4.1\n'
    'torchvision==0.19.1\n'
    'torchaudio==2.4.1\n'
)

# Install non-torch deps without upgrading torch
pip('install', '-c', 'constraints.txt',
    'timm==1.0.9',
    'albumentations==1.4.10',
    'opencv-python-headless==4.10.0.84',
    'pandas', 'scikit-learn', 'numpy', 'tqdm',
    '--upgrade-strategy', 'only-if-needed')

import torch
print('torch:', torch.__version__, 'built CUDA:', getattr(torch.version, 'cuda', None), flush=True)
print('CUDA available:', torch.cuda.is_available(), flush=True)
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0), flush=True)
assert str(getattr(torch.version,'cuda','')).startswith('12.1'), f'Wrong CUDA build: {torch.version.cuda}'
assert torch.cuda.is_available(), 'CUDA not available'
print('Torch stack installed and verified.', flush=True)





> pip install --index-url https://download.pytorch.org/whl/cu121 --extra-index-url https://pypi.org/simple torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1




Looking in indexes: https://download.pytorch.org/whl/cu121, https://pypi.org/simple


Collecting torch==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torch-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (799.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 MB 561.3 MB/s eta 0:00:00


Collecting torchvision==0.19.1
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.19.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 309.7 MB/s eta 0:00:00


Collecting torchaudio==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 235.2 MB/s eta 0:00:00
Collecting networkx


  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 55.3 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)


Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 518.7 MB/s eta 0:00:00


Collecting fsspec
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 517.7 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 348.0 MB/s eta 0:00:00


Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 325.7 MB/s eta 0:00:00


Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 233.4 MB/s eta 0:00:00


Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 64.6 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 184.1 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 242.2 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 447.4 MB/s eta 0:00:00


Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 266.3 MB/s eta 0:00:00


Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 142.4 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 254.8 MB/s eta 0:00:00


Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 269.9 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 427.2 MB/s eta 0:00:00


Collecting typing-extensions>=4.8.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 427.8 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 131.5 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 255.1 MB/s eta 0:00:00


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 152.0 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 118.7 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB)


Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 521.2 MB/s eta 0:00:00


Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio


Successfully installed MarkupSafe-3.0.3 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.3.0 sympy-1.14.0 torch-2.4.1+cu121 torchaudio-2.4.1+cu121 torchvision-0.19.1+cu121 triton-3.0.0 typing-extensions-4.15.0


> pip install -c constraints.txt timm==1.0.9 albumentations==1.4.10 opencv-python-headless==4.10.0.84 pandas scikit-learn numpy tqdm --upgrade-strategy only-if-needed


Collecting timm==1.0.9
  Downloading timm-1.0.9-py3-none-any.whl (2.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 66.1 MB/s eta 0:00:00
Collecting albumentations==1.4.10
  Downloading albumentations-1.4.10-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.9/161.9 KB 336.8 MB/s eta 0:00:00


Collecting opencv-python-headless==4.10.0.84
  Downloading opencv_python_headless-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.9/49.9 MB 211.3 MB/s eta 0:00:00


Collecting pandas
  Downloading pandas-2.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.4/12.4 MB 265.5 MB/s eta 0:00:00


Collecting scikit-learn
  Downloading scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (9.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 209.8 MB/s eta 0:00:00


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 167.2 MB/s eta 0:00:00
Collecting tqdm
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 339.9 MB/s eta 0:00:00


Collecting huggingface_hub
  Downloading huggingface_hub-0.35.1-py3-none-any.whl (563 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 563.3/563.3 KB 513.3 MB/s eta 0:00:00
Collecting pyyaml
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 416.0 MB/s eta 0:00:00
Collecting torch
  Downloading torch-2.4.1-cp311-cp311-manylinux1_x86_64.whl (797.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 286.8 MB/s eta 0:00:00


Collecting safetensors
  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.8/485.8 KB 323.7 MB/s eta 0:00:00
Collecting torchvision
  Downloading torchvision-0.19.1-cp311-cp311-manylinux1_x86_64.whl (7.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 262.7 MB/s eta 0:00:00


Collecting scipy>=1.10.0
  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 203.8 MB/s eta 0:00:00


Collecting albucore>=0.0.11
  Downloading albucore-0.0.33-py3-none-any.whl (18 kB)
Collecting pydantic>=2.7.0
  Downloading pydantic-2.11.9-py3-none-any.whl (444 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 444.9/444.9 KB 546.3 MB/s eta 0:00:00


Collecting typing-extensions>=4.9.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 423.8 MB/s eta 0:00:00
Collecting scikit-image>=0.21.0
  Downloading scikit_image-0.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/14.8 MB 113.5 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.2
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 KB 496.4 MB/s eta 0:00:00
Collecting tzdata>=2022.7
  Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 347.8/347.8 KB 539.1 MB/s eta 0:00:00


Collecting pytz>=2020.1
  Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 509.2/509.2 KB 532.1 MB/s eta 0:00:00
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 308.4/308.4 KB 537.8 MB/s eta 0:00:00


Collecting simsimd>=5.9.2
  Downloading simsimd-6.5.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 273.1 MB/s eta 0:00:00


Collecting stringzilla>=3.10.4
  Downloading stringzilla-4.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (496 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 496.5/496.5 KB 239.6 MB/s eta 0:00:00
Collecting annotated-types>=0.6.0
  Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)
Collecting typing-inspection>=0.4.0
  Downloading typing_inspection-0.4.1-py3-none-any.whl (14 kB)


Collecting pydantic-core==2.33.2
  Downloading pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 351.9 MB/s eta 0:00:00
Collecting six>=1.5
  Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Collecting imageio!=2.35.0,>=2.33
  Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.8/315.8 KB 509.7 MB/s eta 0:00:00
Collecting tifffile>=2022.8.12
  Downloading tifffile-2025.9.20-py3-none-any.whl (230 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.1/230.1 KB 478.2 MB/s eta 0:00:00


Collecting lazy-loader>=0.4
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)
Collecting networkx>=3.0
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 553.3 MB/s eta 0:00:00
Collecting packaging>=21
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 363.6 MB/s eta 0:00:00


Collecting pillow>=10.1
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 568.1 MB/s eta 0:00:00
Collecting fsspec>=2023.5.0
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 501.7 MB/s eta 0:00:00


Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 429.1 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting hf-xet<2.0.0,>=1.1.3
  Downloading hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 471.0 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 179.6 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 165.1 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 213.0 MB/s eta 0:00:00


Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 229.3 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 127.5 MB/s eta 0:00:00


Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 201.2 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 197.1 MB/s eta 0:00:00
Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 267.9 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 469.2 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 165.5 MB/s eta 0:00:00


Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 224.1 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 492.8 MB/s eta 0:00:00


Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 196.8 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 446.9 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 77.1 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB)
Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 524.2 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 466.6 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 450.7 MB/s eta 0:00:00


Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 512.9 MB/s eta 0:00:00
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 539.6 MB/s eta 0:00:00


Installing collected packages: simsimd, pytz, mpmath, urllib3, tzdata, typing-extensions, tqdm, threadpoolctl, sympy, stringzilla, six, safetensors, pyyaml, pillow, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, joblib, idna, hf-xet, fsspec, filelock, charset_normalizer, certifi, annotated-types, typing-inspection, triton, tifffile, scipy, requests, python-dateutil, pydantic-core, opencv-python-headless, nvidia-cusparse-cu12, nvidia-cudnn-cu12, lazy-loader, jinja2, imageio, scikit-learn, scikit-image, pydantic, pandas, nvidia-cusolver-cu12, huggingface_hub, albucore, torch, albumentations, torchvision, timm


Successfully installed MarkupSafe-3.0.3 albucore-0.0.33 albumentations-1.4.10 annotated-types-0.7.0 certifi-2025.8.3 charset_normalizer-3.4.3 filelock-3.19.1 fsspec-2025.9.0 hf-xet-1.1.10 huggingface_hub-0.35.1 idna-3.10 imageio-2.37.0 jinja2-3.1.6 joblib-1.5.2 lazy-loader-0.4 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 opencv-python-headless-4.10.0.84 packaging-25.0 pandas-2.3.2 pillow-11.3.0 pydantic-2.11.9 pydantic-core-2.33.2 python-dateutil-2.9.0.post0 pytz-2025.2 pyyaml-6.0.3 requests-2.32.5 safetensors-0.6.2 scikit-image-0.25.2 scikit-learn-1.7.2 scipy-1.16.2 simsimd-6.5.3 six-1.17.0 stringzilla-4.0.14 sympy-1.14.0 threadpoolctl-3.6.0 







torch: 2.4.1+cu121 built CUDA: 12.1


CUDA available: True


GPU: NVIDIA A10-24Q


Torch stack installed and verified.


In [11]:
# Build train/test DataFrames with file paths and label mapping; 80/20 stratified split
import pandas as pd
from collections import defaultdict
from pathlib import Path
from sklearn.model_selection import StratifiedShuffleSplit
import json, time

t0 = time.time()
print('Building train dataframe from metadata...')

# Image info map: image_id -> (path, width, height)
img_map = {}
for rec in train_meta['images']:
    img_map[rec['id']] = (train_dir / rec['file_name'], rec['width'], rec['height'])

paths = []
widths = []
heights = []
cat_ids = []
img_ids = []
inst_ids = []

ann = train_meta['annotations']
for a in ann:
    iid = a['image_id']
    pwh = img_map.get(iid)
    if pwh is None:
        continue
    p, w, h = pwh
    paths.append(str(p))
    widths.append(w)
    heights.append(h)
    cat_ids.append(a['category_id'])
    img_ids.append(iid)
    inst_ids.append(a.get('institution_id', -1))

train_df = pd.DataFrame({
    'image_id': img_ids,
    'file_path': paths,
    'width': widths,
    'height': heights,
    'category_id': cat_ids,
    'institution_id': inst_ids,
})
print('train_df shape:', train_df.shape, 'unique cats:', train_df['category_id'].nunique())
print(train_df.head())

# Label mapping cat_id -> idx and inverse
unique_cats = sorted(train_df['category_id'].unique())
cat2idx = {c:i for i,c in enumerate(unique_cats)}
idx2cat = {i:c for c,i in cat2idx.items()}
train_df['label_idx'] = train_df['category_id'].map(cat2idx)
print('Label space size C =', len(unique_cats))

# Save mappings
with open('cat2idx.json','w') as f:
    json.dump({str(k): int(v) for k,v in cat2idx.items()}, f)
with open('idx2cat.json','w') as f:
    json.dump({str(k): int(v) for k,v in idx2cat.items()}, f)
print('Saved cat2idx.json and idx2cat.json')

print('Building test dataframe...')
t_paths = []
t_widths = []
t_heights = []
t_ids = []
for rec in test_meta['images']:
    t_paths.append(str(test_dir / rec['file_name']))
    t_widths.append(rec['width'])
    t_heights.append(rec['height'])
    t_ids.append(int(rec['id']))
test_df = pd.DataFrame({
    'Id': t_ids,
    'file_path': t_paths,
    'width': t_widths,
    'height': t_heights,
})
test_df = test_df.sort_values('Id').reset_index(drop=True)
print('test_df shape:', test_df.shape)
print(test_df.head())

# 80/20 stratified split
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
train_idx, val_idx = next(sss.split(train_df, train_df['label_idx']))
trn_split = train_df.iloc[train_idx].reset_index(drop=True)
val_split = train_df.iloc[val_idx].reset_index(drop=True)
print('Split sizes:', trn_split.shape, val_split.shape)
print('Class counts (train split) top5:\n', trn_split['label_idx'].value_counts().head())
print('Class counts (val split) top5:\n', val_split['label_idx'].value_counts().head())

# Persist splits to speed up future runs
trn_split.to_pickle('train_split.pkl')
val_split.to_pickle('val_split.pkl')
test_df.to_pickle('test_df.pkl')
print('Saved splits to train_split.pkl, val_split.pkl, test_df.pkl')
print('Done. Elapsed: %.1f s' % (time.time()-t0))

Building train dataframe from metadata...


train_df shape: (1779953, 6) unique cats: 64500
   image_id                                          file_path  width  height  \
0   1608432  /var/lib/simon/agent_run_states/herbarium-2021...    671    1000   
1    796948  /var/lib/simon/agent_run_states/herbarium-2021...    678    1000   
2    994447  /var/lib/simon/agent_run_states/herbarium-2021...    671    1000   
3   1110611  /var/lib/simon/agent_run_states/herbarium-2021...    678    1000   
4   1769046  /var/lib/simon/agent_run_states/herbarium-2021...    678    1000   

   category_id  institution_id  
0        60492               0  
1        60492               0  
2        60492               0  
3        60492               0  
4        60492               0  
Label space size C = 64500
Saved cat2idx.json and idx2cat.json
Building test dataframe...


test_df shape: (477806, 4)
   Id                                          file_path  width  height
0   0  /var/lib/simon/agent_run_states/herbarium-2021...    666    1000
1   1  /var/lib/simon/agent_run_states/herbarium-2021...    672    1000
2   2  /var/lib/simon/agent_run_states/herbarium-2021...    669    1000
3   3  /var/lib/simon/agent_run_states/herbarium-2021...    666    1000
4   4  /var/lib/simon/agent_run_states/herbarium-2021...    675    1000


ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

In [12]:
# Safe 80/20 split handling rare classes (freq < 2 go to train only), then persist
import pandas as pd
from sklearn.model_selection import StratifiedShuffleSplit
import numpy as np
import time

t0 = time.time()
print('Creating safe stratified split (classes with <2 samples kept in train only)...')

# Compute class frequencies
freq = train_df['label_idx'].value_counts()
mask_strat = train_df['label_idx'].map(freq) >= 2
df_strat = train_df[mask_strat].reset_index(drop=True)
df_rare = train_df[~mask_strat].reset_index(drop=True)
print('Stratifiable samples:', df_strat.shape[0], ' Rare samples:', df_rare.shape[0])

# Stratified split on classes with at least 2 samples
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
tr_idx, va_idx = next(sss.split(df_strat, df_strat['label_idx']))
tr_strat = df_strat.iloc[tr_idx].reset_index(drop=True)
va_strat = df_strat.iloc[va_idx].reset_index(drop=True)

# Combine: put all rare classes into train only
trn_split = pd.concat([tr_strat, df_rare], axis=0).reset_index(drop=True)
val_split = va_strat

print('Final split sizes:', trn_split.shape, val_split.shape)
print('Train unique classes:', trn_split['label_idx'].nunique(), ' Val unique classes:', val_split['label_idx'].nunique())

# Persist splits
trn_split.to_pickle('train_split.pkl')
val_split.to_pickle('val_split.pkl')
test_df.to_pickle('test_df.pkl')
print('Saved splits to train_split.pkl, val_split.pkl, test_df.pkl')
print('Done. Elapsed: %.1f s' % (time.time()-t0))

Creating safe stratified split (classes with <2 samples kept in train only)...
Stratifiable samples: 1772208  Rare samples: 7745


Final split sizes: (1425511, 7) (354442, 7)
Train unique classes: 64500  Val unique classes: 51308


Saved splits to train_split.pkl, val_split.pkl, test_df.pkl
Done. Elapsed: 1.6 s


In [15]:
# Dataset, model, and training loop (ConvNeXt-Base @384, CE+LS, AMP, balanced sampler) - torchvision transforms
import math, time, json, os, random
import numpy as np
import pandas as pd
import cv2
import timm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from sklearn.metrics import f1_score
from PIL import Image
import torchvision.transforms as T

torch.backends.cudnn.benchmark = True
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load splits
trn_split = pd.read_pickle('train_split.pkl')
val_split = pd.read_pickle('val_split.pkl')
test_df = pd.read_pickle('test_df.pkl')
with open('idx2cat.json','r') as f:
    idx2cat = {int(k): int(v) for k,v in json.load(f).items()}

IMG_SIZE = 384
BATCH_SIZE = 16  # safer for A10 with convnext_base@384
EPOCHS = 1  # smoke test first
MAX_STEPS_PER_EPOCH = 1000  # faster smoke epoch
MAX_VAL_STEPS = 200  # limit validation batches for speed
LR = 2e-4
WD = 1e-4
LS_EPS = 0.1
SEED = 42
MODEL_NAME = 'convnext_base.fb_in22k_ft_in1k'
NUM_WORKERS = min(8, os.cpu_count() or 8)

random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED); torch.cuda.manual_seed_all(SEED)

mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)

train_tfms = T.Compose([
    T.RandomResizedCrop(size=IMG_SIZE, scale=(0.7, 1.0), ratio=(0.7, 1.4)),
    T.RandomHorizontalFlip(p=0.5),
    T.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.05),
    T.ToTensor(),
    T.Normalize(mean, std),
])
val_tfms = T.Compose([
    T.Resize(IMG_SIZE, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(IMG_SIZE),
    T.ToTensor(),
    T.Normalize(mean, std),
])

class ImgDS(Dataset):
    def __init__(self, df, tfm, label_col='label_idx'):
        self.df = df.reset_index(drop=True)
        self.tfm = tfm
        self.has_label = label_col in df.columns
        self.label_col = label_col
    def __len__(self):
        return len(self.df)
    def __getitem__(self, i):
        r = self.df.iloc[i]
        p = r['file_path']
        try:
            img = Image.open(p).convert('RGB')
        except Exception:
            img = Image.fromarray(np.ones((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8)*255)
        x = self.tfm(img)
        if self.has_label:
            y = int(r[self.label_col])
            return x, y
        else:
            return x, int(r['Id'])

# Balanced sampler weights ~ 1/sqrt(freq)
cls_counts = trn_split['label_idx'].value_counts().to_dict()
weights = trn_split['label_idx'].map(lambda c: 1.0/ math.sqrt(cls_counts.get(c,1))).astype('float64').values
sampler = WeightedRandomSampler(weights, num_samples=len(weights), replacement=True)

train_ds = ImgDS(trn_split, train_tfms)
val_ds = ImgDS(val_split, val_tfms)
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, sampler=sampler, num_workers=NUM_WORKERS, pin_memory=True, persistent_workers=True)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=NUM_WORKERS, pin_memory=True, persistent_workers=True)

num_classes = len(idx2cat)
model = timm.create_model(MODEL_NAME, pretrained=True, num_classes=num_classes)
model.to(device)

class LabelSmoothingCE(nn.Module):
    def __init__(self, eps=0.1):
        super().__init__()
        self.eps = eps
    def forward(self, logits, target):
        n = logits.size(-1)
        log_probs = F.log_softmax(logits, dim=-1)
        loss = -(log_probs.gather(dim=-1, index=target.unsqueeze(1)).squeeze(1))
        loss = (1 - self.eps) * loss - self.eps * log_probs.mean(dim=-1)
        return loss.mean()

criterion = LabelSmoothingCE(LS_EPS)
optimizer = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=WD)
num_steps = EPOCHS * min(MAX_STEPS_PER_EPOCH, math.ceil(len(train_loader)))
warmup_steps = int(0.05 * num_steps)

def cosine_warmup(step):
    if step < warmup_steps:
        return step / max(1, warmup_steps)
    p = (step - warmup_steps) / max(1, (num_steps - warmup_steps))
    return 0.5 * (1 + math.cos(math.pi * p))
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, cosine_warmup)

scaler = torch.cuda.amp.GradScaler(enabled=True)

def validate(model, loader, max_steps=None):
    model.eval()
    preds = []
    targs = []
    t0 = time.time()
    with torch.no_grad():
        for it, (xb, yb) in enumerate(loader):
            xb = xb.to(device, non_blocking=True)
            yb = yb.to(device, non_blocking=True)
            with torch.cuda.amp.autocast(True):
                logits = model(xb)
            pred = logits.argmax(dim=1)
            preds.append(pred.detach().cpu().numpy())
            targs.append(yb.detach().cpu().numpy())
            if max_steps is not None and (it+1) >= max_steps:
                break
    preds = np.concatenate(preds)
    targs = np.concatenate(targs)
    f1 = f1_score(targs, preds, average='macro')
    print(f'Validate: macro-F1={f1:.6f}  n={len(targs)}  batches={it+1}  elapsed={time.time()-t0:.1f}s', flush=True)
    return f1

best_f1 = -1.0
best_path = 'model_convnext_base_384_best.pth'
global_step = 0
t_epoch0 = time.time()
for epoch in range(EPOCHS):
    model.train()
    running = 0.0
    t0 = time.time()
    for it, (xb, yb) in enumerate(train_loader):
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)
        optimizer.zero_grad(set_to_none=True)
        with torch.cuda.amp.autocast(True):
            logits = model(xb)
            loss = criterion(logits, yb)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        scheduler.step()
        running += loss.item()
        global_step += 1
        if (it+1) % 200 == 0:
            print(f'E{epoch} It{it+1} loss={running/(it+1):.4f} lr={scheduler.get_last_lr()[0]:.6f} elapsed={time.time()-t0:.1f}s', flush=True)
        if (it+1) >= MAX_STEPS_PER_EPOCH:
            print(f'Reached MAX_STEPS_PER_EPOCH={MAX_STEPS_PER_EPOCH}, breaking epoch early.', flush=True)
            break
    print(f'Epoch {epoch} train_loss={running/max(1,(it+1)):.4f} epoch_time={time.time()-t0:.1f}s', flush=True)
    f1 = validate(model, val_loader, max_steps=MAX_VAL_STEPS)
    if f1 > best_f1:
        best_f1 = f1
        torch.save({'model': model.state_dict(), 'f1': best_f1}, best_path)
        print('Saved best checkpoint:', best_path, 'F1=', best_f1, flush=True)

# Safety: save checkpoint even if no improvement recorded
if best_f1 < 0:
    torch.save({'model': model.state_dict(), 'f1': best_f1}, best_path)
    print('Saved checkpoint without val improvement:', best_path, flush=True)
print(f'Training done. Best val macro-F1={best_f1:.6f}. Total elapsed={time.time()-t_epoch0:.1f}s', flush=True)

INFO:timm.models._builder:Loading pretrained weights from Hugging Face hub (timm/convnext_base.fb_in22k_ft_in1k)


INFO:timm.models._hub:[timm/convnext_base.fb_in22k_ft_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.


INFO:timm.models._builder:Missing keys (head.fc.weight, head.fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.


  scaler = torch.cuda.amp.GradScaler(enabled=True)


  with torch.cuda.amp.autocast(True):


E0 It200 loss=11.1348 lr=0.000188 elapsed=45.6s


E0 It400 loss=11.1062 lr=0.000140 elapsed=90.7s


E0 It600 loss=11.0909 lr=0.000075 elapsed=136.0s


E0 It800 loss=11.0841 lr=0.000021 elapsed=181.5s


E0 It1000 loss=11.0754 lr=0.000000 elapsed=227.2s


Reached MAX_STEPS_PER_EPOCH=1000, breaking epoch early.


Epoch 0 train_loss=11.0754 epoch_time=227.2s


  with torch.cuda.amp.autocast(True):


Validate: macro-F1=0.000000  n=3200  batches=200  elapsed=14.2s


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Saved best checkpoint: model_convnext_base_384_best.pth F1= 0.0


Training done. Best val macro-F1=0.000000. Total elapsed=241.9s


In [16]:
# Inference: fast single-view + tau tuning on subset -> submission.csv
import math, time, json, os
import numpy as np
import pandas as pd
import torch
import torchvision.transforms as T
from PIL import Image
from sklearn.metrics import f1_score
import timm

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load artifacts
val_split = pd.read_pickle('val_split.pkl')
trn_split = pd.read_pickle('train_split.pkl')
test_df = pd.read_pickle('test_df.pkl')
with open('idx2cat.json','r') as f:
    idx2cat = {int(k): int(v) for k,v in json.load(f).items()}
num_classes = len(idx2cat)

# Faster inference size
IMG_SIZE = 320  # faster than 384; convnext can accept variable sizes
MODEL_NAME = 'convnext_base.fb_in22k_ft_in1k'
ckpt_path = 'model_convnext_base_384_best.pth'

# Transforms
val_tfms = T.Compose([
    T.Resize(IMG_SIZE, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(IMG_SIZE),
    T.ToTensor(),
    T.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
])

def load_model():
    model = timm.create_model(MODEL_NAME, pretrained=False, num_classes=num_classes)
    ck = torch.load(ckpt_path, map_location='cpu')
    model.load_state_dict(ck['model'], strict=True)
    model.to(device)
    model.eval()
    return model

def infer_logits_paths(model, paths, batch_size=64, log_every=500):
    logits_all = []
    n = len(paths)
    for i in range(0, n, batch_size):
        batch_paths = paths[i:i+batch_size]
        imgs = []
        for p in batch_paths:
            try:
                img = Image.open(p).convert('RGB')
            except Exception:
                img = Image.fromarray(np.ones((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8)*255)
            imgs.append(val_tfms(img))
        xb = torch.stack(imgs).to(device, non_blocking=True)
        with torch.no_grad(), torch.cuda.amp.autocast(True):
            l = model(xb)  # single-view (no TTA) for speed
        logits_all.append(l.float().cpu().numpy())
        if ((i//batch_size)+1) % log_every == 0:
            print(f'Infer progress: {i+len(batch_paths)}/{n}', flush=True)
    return np.concatenate(logits_all, axis=0)

# Compute class priors from training split
counts = trn_split['label_idx'].value_counts().sort_index()
priors = counts.reindex(range(num_classes), fill_value=1).values.astype(np.float64)
priors = priors / priors.sum()
log_priors = np.log(priors + 1e-12)

# Validate tau on a subset of val split for speed
model = load_model()
VAL_CAP = 5000  # limit val samples
val_paths = val_split['file_path'].tolist()[:VAL_CAP]
val_y = val_split['label_idx'].values[:VAL_CAP]
print(f'Running validation inference to tune tau on {len(val_paths)} samples...', flush=True)
val_logits = infer_logits_paths(model, val_paths, batch_size=64, log_every=50)

best_tau, best_f1 = 0.0, -1.0
for tau in [0.5, 1.0, 1.5]:
    adj = val_logits - tau * log_priors[None, :]
    pred = adj.argmax(axis=1)
    f1 = f1_score(val_y, pred, average='macro')
    print(f'tau={tau:.2f} val macro-F1={f1:.6f}', flush=True)
    if f1 > best_f1:
        best_f1, best_tau = f1, tau
print('Best tau:', best_tau, 'Best val F1:', best_f1, flush=True)

# Test inference (may take time; single-view, larger batch)
print('Running test inference...', flush=True)
test_paths = test_df['file_path'].tolist()
test_logits = infer_logits_paths(model, test_paths, batch_size=64, log_every=500)
test_adj = test_logits - best_tau * log_priors[None, :]
test_pred_idx = test_adj.argmax(axis=1)
test_pred_cat = [idx2cat[int(i)] for i in test_pred_idx]

sub = pd.DataFrame({'Id': test_df['Id'].values, 'Predicted': test_pred_cat})
sub = sub.sort_values('Id').reset_index(drop=True)
sub.to_csv('submission.csv', index=False)
print('submission.csv written. Shape:', sub.shape, 'Head:\n', sub.head())

  ck = torch.load(ckpt_path, map_location='cpu')


Running validation inference to tune tau on 5000 samples...


  with torch.no_grad(), torch.cuda.amp.autocast(True):


Infer progress: 3200/5000


tau=0.50 val macro-F1=0.000000


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


tau=1.00 val macro-F1=0.000000


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


tau=1.50 val macro-F1=0.000000


Best tau: 0.5 Best val F1: 0.0


Running test inference...


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


  with torch.no_grad(), torch.cuda.amp.autocast(True):


Infer progress: 32000/477806


Infer progress: 64000/477806


Infer progress: 96000/477806


Infer progress: 128000/477806


Infer progress: 160000/477806


Infer progress: 192000/477806


Infer progress: 224000/477806


Infer progress: 256000/477806


Infer progress: 288000/477806


Infer progress: 320000/477806


Infer progress: 352000/477806


Infer progress: 384000/477806


Infer progress: 416000/477806


Infer progress: 448000/477806


In [17]:
# NCC (Nearest Class Centroid) frozen-feature baseline with tau-prior adjustment (K-per-class sampling)
import os, time, json, math, gc
from pathlib import Path
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as T
from PIL import Image
from sklearn.metrics import f1_score
import timm

torch.backends.cudnn.benchmark = True
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load splits & label maps
trn_split = pd.read_pickle('train_split.pkl')
val_split = pd.read_pickle('val_split.pkl')
test_df   = pd.read_pickle('test_df.pkl')
with open('idx2cat.json','r') as f:
    idx2cat = {int(k): int(v) for k,v in json.load(f).items()}
with open('cat2idx.json','r') as f:
    cat2idx = {int(k): int(v) for k,v in json.load(f).items()}
num_classes = len(idx2cat)

# Model / transforms
MODEL_NAME = 'convnext_base.fb_in22k_ft_in1k'
IMG_SIZE = 320  # 320 for throughput; consider 384 later
BATCH = 128     # adjust if OOM; A10 should handle ~128 at 320 for features
NUM_WORKERS = min(8, os.cpu_count() or 8)
TWO_VIEW = False  # start single-view for speed; can enable later
K_PER_CLASS = 4   # sample up to K images per class for centroids (fast build)

base_tfms = T.Compose([
    T.Resize(IMG_SIZE, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(IMG_SIZE),
    T.ToTensor(),
    T.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
])

class ImgDS(Dataset):
    def __init__(self, df, has_label, flip=False):
        self.df = df.reset_index(drop=True)
        self.has_label = has_label
        self.flip = flip
    def __len__(self):
        return len(self.df)
    def __getitem__(self, i):
        r = self.df.iloc[i]
        p = r['file_path']
        try:
            img = Image.open(p).convert('RGB')
        except Exception:
            img = Image.fromarray(np.ones((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8)*255)
        if self.flip:
            img = img.transpose(Image.FLIP_LEFT_RIGHT)
        x = base_tfms(img)
        if self.has_label:
            return x, int(r['label_idx'])
        else:
            return x, int(r['Id'])

def build_feature_model():
    # num_classes=0 returns pooled features in timm
    model = timm.create_model(MODEL_NAME, pretrained=True, num_classes=0, global_pool='avg')
    model.eval().to(device)
    return model

@torch.inference_mode()
def extract_features(model, df, batch=BATCH, two_view=TWO_VIEW, log_every=5000):
    ds0 = ImgDS(df, has_label=('label_idx' in df.columns), flip=False)
    dl0 = DataLoader(ds0, batch_size=batch, shuffle=False, num_workers=NUM_WORKERS, pin_memory=True, persistent_workers=True)
    feats = []
    labels = [] if ('label_idx' in df.columns) else None
    t0 = time.time()
    for it, (xb0, ids0) in enumerate(dl0):
        xb0 = xb0.to(device, non_blocking=True)
        f0 = model(xb0)
        f = f0
        f = F.normalize(f, dim=1)
        feats.append(f.detach().cpu())
        if labels is not None:
            labels.append(ids0.detach().cpu())
        done = min((it+1)*dl0.batch_size, len(ds0))
        if done % log_every < dl0.batch_size:
            print(f'Feat progress: {done}/{len(ds0)}  elapsed={time.time()-t0:.1f}s', flush=True)
    feats = torch.cat(feats, dim=0)
    if labels is not None:
        labels = torch.cat(labels, dim=0).long()
    return feats, labels

def compute_centroids(feats, labels):
    # Accumulate sums per class in fp32
    feat_dim = feats.shape[1]
    sums = torch.zeros((num_classes, feat_dim), dtype=torch.float32)
    counts = torch.zeros((num_classes,), dtype=torch.long)
    for f, y in zip(feats, labels):
        sums[y] += f.float()
        counts[y] += 1
    # Avoid div by zero
    mask = counts > 0
    sums[mask] = sums[mask] / counts[mask].unsqueeze(1).float()
    # L2-normalize
    centroids = F.normalize(sums, dim=1).half()  # store fp16
    return centroids, counts

def tune_tau(val_feats, val_labels, centroids, priors, grid=(0.5, 0.8, 1.0, 1.2, 1.5)):
    with torch.no_grad():
        cent_t = centroids.to(device).t().float()  # [D,C]
        sims = torch.matmul(val_feats.to(device), cent_t).cpu().numpy()  # [N,C]
    log_priors = np.log(priors + 1e-12)
    best_tau, best_f1 = 0.0, -1.0
    y_true = val_labels.numpy()
    for tau in grid:
        adj = sims - tau * log_priors[None, :]
        pred = adj.argmax(axis=1)
        f1 = f1_score(y_true, pred, average='macro')
        print(f'tau={tau:.2f} val macro-F1={f1:.6f}', flush=True)
        if f1 > best_f1:
            best_f1, best_tau = f1, tau
    return best_tau, best_f1

# Pipeline
t_all = time.time()
print('Building NCC centroids with K-per-class sampling...')
feat_model = build_feature_model()

# Sample up to K images per class for centroid building
t0 = time.time()
def sample_k_per_class(df, k):
    # df must contain 'label_idx'
    return (df.groupby('label_idx', group_keys=False)
              .apply(lambda g: g.sample(n=min(len(g), k), random_state=42))
              .reset_index(drop=True))

trn_k = sample_k_per_class(trn_split[['file_path','label_idx']], K_PER_CLASS)
print('Sampled for centroids:', trn_k.shape, ' (K_PER_CLASS=', K_PER_CLASS, ')', flush=True)

train_feats, train_labels = extract_features(feat_model, trn_k, batch=BATCH, two_view=TWO_VIEW, log_every=32768)
centroids, cls_counts = compute_centroids(train_feats, train_labels)
np.save('centroids_fp16.npy', centroids.cpu().numpy())
np.save('centroids_counts.npy', cls_counts.cpu().numpy())
print('Centroids built:', centroids.shape, ' time=%.1fs' % (time.time()-t0), flush=True)
del train_feats; del train_labels; gc.collect()

# Priors from full training split
counts = trn_split['label_idx'].value_counts().sort_index()
priors = counts.reindex(range(num_classes), fill_value=1).values.astype(np.float64)
priors = priors / priors.sum()

# Val features (cap for speed)
VAL_CAP = 20000
val_sub = val_split.iloc[:VAL_CAP].reset_index(drop=True)
v_feats, v_labels = extract_features(feat_model, val_sub[['file_path','label_idx']], batch=BATCH, two_view=TWO_VIEW, log_every=32768)
best_tau, best_f1 = tune_tau(v_feats, v_labels, centroids, priors, grid=(0.5, 0.8, 1.0, 1.2, 1.5))
print('Best tau:', best_tau, ' Best val F1:', best_f1, flush=True)
del v_feats; del v_labels; gc.collect()

# Test features
print('Extracting test features...')
t1 = time.time()
t_feats, t_ids = extract_features(feat_model, test_df[['file_path','Id']], batch=BATCH, two_view=TWO_VIEW, log_every=65536)
print('Test feats:', t_feats.shape, ' time=%.1fs' % (time.time()-t1), flush=True)

# Score in chunks on GPU
print('Scoring test...')
cent_t = centroids.to(device).t().float()  # [D,C]
log_priors = torch.from_numpy(np.log(priors + 1e-12)).to(device).float()
N = t_feats.shape[0]
CH = 4096
pred_idx = np.empty((N,), dtype=np.int32)
for s in range(0, N, CH):
    e = min(s + CH, N)
    feat_chunk = t_feats[s:e].to(device)
    sims = torch.matmul(feat_chunk, cent_t)  # [ch, C]
    adj = sims - best_tau * log_priors[None, :]
    pred = adj.argmax(dim=1).int().cpu().numpy()
    pred_idx[s:e] = pred
    if ((s // CH) + 1) % 20 == 0:
        print(f'Scored {e}/{N}', flush=True)

# Map back to original category ids
pred_cat = [idx2cat[int(i)] for i in pred_idx]
sub = pd.DataFrame({'Id': test_df['Id'].values, 'Predicted': pred_cat})
sub = sub.sort_values('Id').reset_index(drop=True)
sub.to_csv('submission.csv', index=False)
print('NCC submission.csv written. Shape:', sub.shape, ' head:\n', sub.head(), flush=True)
print('Total NCC pipeline time: %.1fs' % (time.time()-t_all), flush=True)

  from .autonotebook import tqdm as notebook_tqdm


Building NCC centroids with K-per-class sampling...


Sampled for centroids: (209447, 2)  (K_PER_CLASS= 4 )


  .apply(lambda g: g.sample(n=min(len(g), k), random_state=42))


Feat progress: 32768/209447  elapsed=267.8s


Feat progress: 65536/209447  elapsed=534.7s


Feat progress: 98304/209447  elapsed=803.5s


Feat progress: 131072/209447  elapsed=1072.4s


Feat progress: 163840/209447  elapsed=1341.8s


Feat progress: 196608/209447  elapsed=1610.3s


Centroids built: torch.Size([64500, 1024])  time=1736.7s


tau=0.50 val macro-F1=0.000000


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


tau=0.80 val macro-F1=0.000000


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


tau=1.00 val macro-F1=0.000000


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


tau=1.20 val macro-F1=0.000000


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


tau=1.50 val macro-F1=0.000000


Best tau: 0.5  Best val F1: 0.0


  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  ys_types = set(type_of_target(x) for x in ys)


Extracting test features...


Feat progress: 65536/477806  elapsed=537.1s


Feat progress: 131072/477806  elapsed=1075.0s


Feat progress: 196608/477806  elapsed=1614.0s


Feat progress: 262144/477806  elapsed=2152.4s


Feat progress: 327680/477806  elapsed=2690.7s


Feat progress: 393216/477806  elapsed=3230.3s


Feat progress: 458752/477806  elapsed=3767.8s


Test feats: torch.Size([477806, 1024])  time=3930.1s


Scoring test...


Scored 81920/477806


Scored 163840/477806


Scored 245760/477806


Scored 327680/477806


Scored 409600/477806


NCC submission.csv written. Shape: (477806, 2)  head:
    Id  Predicted
0   0      43278
1   1      25876
2   2      23067
3   3       7824
4   4      20895


Total NCC pipeline time: 5851.2s


In [18]:
# Short fine-tune: ConvNeXt-Base @320, CE+LS, class-balanced sampling, AMP (expert-tuned)
import math, time, json, os, random
import numpy as np
import pandas as pd
import timm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from sklearn.metrics import f1_score
from PIL import Image
import torchvision.transforms as T

torch.backends.cudnn.benchmark = True
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load splits and mappings
trn_split = pd.read_pickle('train_split.pkl')
val_split = pd.read_pickle('val_split.pkl')
with open('idx2cat.json','r') as f:
    idx2cat = {int(k): int(v) for k,v in json.load(f).items()}
num_classes = len(idx2cat)

# Config (per expert advice)
SEED = 42
IMG_SIZE = 320
BATCH_SIZE = 48  # fallback to 40 if OOM; try 64 if it fits
EPOCHS = 2
MAX_STEPS_PER_EPOCH = 15000  # ~30k total steps
MAX_VAL_STEPS = 1000
LR = 2e-4
WD = 1e-4
LS_EPS = 0.1
MODEL_NAME = 'convnext_base.fb_in22k_ft_in1k'
NUM_WORKERS = min(8, os.cpu_count() or 8)

random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED); torch.cuda.manual_seed_all(SEED)

mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)

train_tfms = T.Compose([
    T.RandomResizedCrop(size=IMG_SIZE, scale=(0.75, 1.0), ratio=(0.8, 1.25)),
    T.RandomHorizontalFlip(p=0.5),
    T.ColorJitter(brightness=0.15, contrast=0.15, saturation=0.15, hue=0.03),
    T.ToTensor(),
    T.Normalize(mean, std),
    T.RandomErasing(p=0.1, scale=(0.02, 0.1), ratio=(0.3, 3.3)),
])
val_tfms = T.Compose([
    T.Resize(IMG_SIZE, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(IMG_SIZE),
    T.ToTensor(),
    T.Normalize(mean, std),
])

class ImgDS(Dataset):
    def __init__(self, df, tfm, label_col='label_idx'):
        self.df = df.reset_index(drop=True)
        self.tfm = tfm
        self.has_label = label_col in df.columns
        self.label_col = label_col
    def __len__(self):
        return len(self.df)
    def __getitem__(self, i):
        r = self.df.iloc[i]
        p = r['file_path']
        try:
            img = Image.open(p).convert('RGB')
        except Exception:
            img = Image.fromarray(np.ones((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8)*255)
        x = self.tfm(img)
        if self.has_label:
            y = int(r[self.label_col])
            return x, y
        else:
            return x, int(r['Id'])

# Class-balanced sampler ~ 1/sqrt(freq)
cls_counts = trn_split['label_idx'].value_counts().to_dict()
weights = trn_split['label_idx'].map(lambda c: 1.0/ math.sqrt(cls_counts.get(c,1))).astype('float64').values
sampler = WeightedRandomSampler(weights, num_samples=len(weights), replacement=True)

train_ds = ImgDS(trn_split, train_tfms)
val_ds = ImgDS(val_split, val_tfms)
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, sampler=sampler, num_workers=NUM_WORKERS, pin_memory=True, persistent_workers=True)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=NUM_WORKERS, pin_memory=True, persistent_workers=True)

model = timm.create_model(MODEL_NAME, pretrained=True, num_classes=num_classes)
model.to(device)

class LabelSmoothingCE(nn.Module):
    def __init__(self, eps=0.1):
        super().__init__()
        self.eps = eps
    def forward(self, logits, target):
        n = logits.size(-1)
        log_probs = F.log_softmax(logits, dim=-1)
        loss = -(log_probs.gather(dim=-1, index=target.unsqueeze(1)).squeeze(1))
        loss = (1 - self.eps) * loss - self.eps * log_probs.mean(dim=-1)
        return loss.mean()

criterion = LabelSmoothingCE(LS_EPS)
optimizer = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=WD)

# Steps & scheduler
num_steps = EPOCHS * min(MAX_STEPS_PER_EPOCH, math.ceil(len(train_loader)))
warmup_steps = int(0.05 * num_steps)
def cosine_warmup(step):
    if step < warmup_steps:
        return step / max(1, warmup_steps)
    p = (step - warmup_steps) / max(1, (num_steps - warmup_steps))
    return 0.5 * (1 + math.cos(math.pi * p))
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, cosine_warmup)

scaler = torch.amp.GradScaler('cuda', enabled=True)

def validate(model, loader, max_steps=None):
    model.eval()
    preds = []
    targs = []
    t0 = time.time()
    with torch.no_grad():
        for it, (xb, yb) in enumerate(loader):
            xb = xb.to(device, non_blocking=True)
            yb = yb.to(device, non_blocking=True)
            with torch.amp.autocast('cuda', enabled=True):
                logits = model(xb)
            pred = logits.argmax(dim=1)
            preds.append(pred.detach().cpu().numpy())
            targs.append(yb.detach().cpu().numpy())
            if max_steps is not None and (it+1) >= max_steps:
                break
    preds = np.concatenate(preds)
    targs = np.concatenate(targs)
    f1 = f1_score(targs, preds, average='macro')
    print(f'Validate: macro-F1={f1:.6f}  n={len(targs)}  batches={it+1}  elapsed={time.time()-t0:.1f}s', flush=True)
    return f1

best_f1 = -1.0
best_path = 'model_convnext_base_320_ft_best.pth'
global_step = 0
t_start = time.time()
for epoch in range(EPOCHS):
    model.train()
    running = 0.0
    t0 = time.time()
    for it, (xb, yb) in enumerate(train_loader):
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)
        optimizer.zero_grad(set_to_none=True)
        with torch.amp.autocast('cuda', enabled=True):
            logits = model(xb)
            loss = criterion(logits, yb)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        scheduler.step()
        running += loss.item()
        global_step += 1
        if (it+1) % 200 == 0:
            print(f'E{epoch} It{it+1} loss={running/(it+1):.4f} lr={scheduler.get_last_lr()[0]:.6f} elapsed={time.time()-t0:.1f}s', flush=True)
        if (it+1) >= MAX_STEPS_PER_EPOCH:
            print(f'Reached MAX_STEPS_PER_EPOCH={MAX_STEPS_PER_EPOCH}, breaking epoch early.', flush=True)
            break
    print(f'Epoch {epoch} train_loss={running/max(1,(it+1)):.4f} epoch_time={time.time()-t0:.1f}s', flush=True)
    f1 = validate(model, val_loader, max_steps=MAX_VAL_STEPS)
    if f1 > best_f1:
        best_f1 = f1
        torch.save({'model': model.state_dict(), 'f1': best_f1}, best_path)
        print('Saved best checkpoint:', best_path, 'F1=', best_f1, flush=True)

if best_f1 < 0:
    torch.save({'model': model.state_dict(), 'f1': best_f1}, best_path)
    print('Saved checkpoint without val improvement:', best_path, flush=True)
print(f'Training done. Best val macro-F1={best_f1:.6f}. Total elapsed={time.time()-t_start:.1f}s', flush=True)

E0 It200 loss=11.1912 lr=0.000027 elapsed=96.3s


E0 It400 loss=11.1239 lr=0.000053 elapsed=181.2s


E0 It600 loss=11.0947 lr=0.000080 elapsed=266.9s


E0 It800 loss=11.0737 lr=0.000107 elapsed=353.2s


E0 It1000 loss=11.0497 lr=0.000133 elapsed=439.7s


E0 It1200 loss=11.0091 lr=0.000160 elapsed=526.4s


E0 It1400 loss=10.9575 lr=0.000187 elapsed=613.3s


E0 It1600 loss=10.8916 lr=0.000200 elapsed=700.0s


E0 It1800 loss=10.8134 lr=0.000200 elapsed=786.6s


E0 It2000 loss=10.7237 lr=0.000200 elapsed=873.3s


E0 It2200 loss=10.6285 lr=0.000200 elapsed=960.1s


E0 It2400 loss=10.5354 lr=0.000200 elapsed=1046.8s


E0 It2600 loss=10.4350 lr=0.000199 elapsed=1133.5s


E0 It2800 loss=10.3315 lr=0.000199 elapsed=1220.2s


E0 It3000 loss=10.2285 lr=0.000199 elapsed=1307.0s


E0 It3200 loss=10.1251 lr=0.000198 elapsed=1393.7s


E0 It3400 loss=10.0250 lr=0.000198 elapsed=1480.5s


E0 It3600 loss=9.9237 lr=0.000197 elapsed=1567.2s


E0 It3800 loss=9.8235 lr=0.000197 elapsed=1653.9s


E0 It4000 loss=9.7245 lr=0.000196 elapsed=1740.6s


E0 It4200 loss=9.6267 lr=0.000196 elapsed=1827.3s


E0 It4400 loss=9.5304 lr=0.000195 elapsed=1914.1s


E0 It4600 loss=9.4374 lr=0.000194 elapsed=2000.8s


E0 It4800 loss=9.3458 lr=0.000193 elapsed=2087.6s


E0 It5000 loss=9.2544 lr=0.000193 elapsed=2174.3s


E0 It5200 loss=9.1642 lr=0.000192 elapsed=2261.0s


E0 It5400 loss=9.0767 lr=0.000191 elapsed=2347.8s


E0 It5600 loss=8.9905 lr=0.000190 elapsed=2434.5s


E0 It5800 loss=8.9067 lr=0.000189 elapsed=2521.2s


E0 It6000 loss=8.8262 lr=0.000188 elapsed=2608.1s


E0 It6200 loss=8.7465 lr=0.000187 elapsed=2694.8s


E0 It6400 loss=8.6686 lr=0.000186 elapsed=2781.5s


E0 It6600 loss=8.5916 lr=0.000185 elapsed=2868.2s


E0 It6800 loss=8.5158 lr=0.000183 elapsed=2954.9s


E0 It7000 loss=8.4414 lr=0.000182 elapsed=3041.6s


E0 It7200 loss=8.3694 lr=0.000181 elapsed=3128.2s


E0 It7400 loss=8.2979 lr=0.000180 elapsed=3215.1s


E0 It7600 loss=8.2279 lr=0.000178 elapsed=3301.8s


E0 It7800 loss=8.1595 lr=0.000177 elapsed=3388.5s


E0 It8000 loss=8.0933 lr=0.000175 elapsed=3475.1s


E0 It8200 loss=8.0277 lr=0.000174 elapsed=3561.9s


E0 It8400 loss=7.9641 lr=0.000172 elapsed=3648.6s


E0 It8600 loss=7.9006 lr=0.000171 elapsed=3735.3s


E0 It8800 loss=7.8383 lr=0.000169 elapsed=3822.0s


E0 It9000 loss=7.7773 lr=0.000168 elapsed=3908.8s


E0 It9200 loss=7.7168 lr=0.000166 elapsed=3995.5s


E0 It9400 loss=7.6585 lr=0.000164 elapsed=4082.2s


E0 It9600 loss=7.6001 lr=0.000163 elapsed=4168.8s


E0 It9800 loss=7.5425 lr=0.000161 elapsed=4255.5s


E0 It10000 loss=7.4871 lr=0.000159 elapsed=4342.2s


E0 It10200 loss=7.4337 lr=0.000157 elapsed=4428.9s


E0 It10400 loss=7.3801 lr=0.000156 elapsed=4515.7s


E0 It10600 loss=7.3281 lr=0.000154 elapsed=4602.5s


E0 It10800 loss=7.2769 lr=0.000152 elapsed=4689.3s


E0 It11000 loss=7.2269 lr=0.000150 elapsed=4776.0s


E0 It11200 loss=7.1772 lr=0.000148 elapsed=4862.6s


E0 It11400 loss=7.1289 lr=0.000146 elapsed=4949.3s


E0 It11600 loss=7.0812 lr=0.000144 elapsed=5035.9s


E0 It11800 loss=7.0348 lr=0.000142 elapsed=5122.5s


E0 It12000 loss=6.9894 lr=0.000140 elapsed=5209.3s


E0 It12200 loss=6.9440 lr=0.000138 elapsed=5296.0s


E0 It12400 loss=6.8992 lr=0.000136 elapsed=5382.6s


E0 It12600 loss=6.8558 lr=0.000134 elapsed=5469.3s


E0 It12800 loss=6.8128 lr=0.000132 elapsed=5556.0s


E0 It13000 loss=6.7701 lr=0.000130 elapsed=5642.6s


E0 It13200 loss=6.7284 lr=0.000128 elapsed=5729.3s


E0 It13400 loss=6.6881 lr=0.000126 elapsed=5816.0s


E0 It13600 loss=6.6480 lr=0.000123 elapsed=5902.7s


E0 It13800 loss=6.6078 lr=0.000121 elapsed=5989.3s


E0 It14000 loss=6.5689 lr=0.000119 elapsed=6076.0s


E0 It14200 loss=6.5299 lr=0.000117 elapsed=6162.7s


E0 It14400 loss=6.4923 lr=0.000115 elapsed=6249.3s


E0 It14600 loss=6.4549 lr=0.000113 elapsed=6336.0s


E0 It14800 loss=6.4174 lr=0.000110 elapsed=6422.7s


E0 It15000 loss=6.3809 lr=0.000108 elapsed=6509.4s


Reached MAX_STEPS_PER_EPOCH=15000, breaking epoch early.


Epoch 0 train_loss=6.3809 epoch_time=6509.4s


Validate: macro-F1=0.323169  n=48000  batches=1000  elapsed=139.0s


Saved best checkpoint: model_convnext_base_320_ft_best.pth F1= 0.3231688390264109


E1 It200 loss=3.6406 lr=0.000106 elapsed=87.5s


E1 It400 loss=3.6331 lr=0.000104 elapsed=174.4s


E1 It600 loss=3.5975 lr=0.000102 elapsed=261.3s


E1 It800 loss=3.5827 lr=0.000099 elapsed=347.9s


E1 It1000 loss=3.5746 lr=0.000097 elapsed=434.8s


E1 It1200 loss=3.5695 lr=0.000095 elapsed=521.6s


E1 It1400 loss=3.5615 lr=0.000093 elapsed=608.3s


E1 It1600 loss=3.5478 lr=0.000091 elapsed=695.3s


E1 It1800 loss=3.5357 lr=0.000088 elapsed=782.1s


E1 It2000 loss=3.5222 lr=0.000086 elapsed=868.8s


E1 It2200 loss=3.5087 lr=0.000084 elapsed=955.6s


E1 It2400 loss=3.4937 lr=0.000082 elapsed=1042.3s


E1 It2600 loss=3.4791 lr=0.000080 elapsed=1129.0s


E1 It2800 loss=3.4667 lr=0.000078 elapsed=1215.6s


E1 It3000 loss=3.4558 lr=0.000075 elapsed=1302.3s


E1 It3200 loss=3.4458 lr=0.000073 elapsed=1389.2s


E1 It3400 loss=3.4355 lr=0.000071 elapsed=1475.9s


E1 It3600 loss=3.4238 lr=0.000069 elapsed=1562.5s


E1 It3800 loss=3.4130 lr=0.000067 elapsed=1649.2s


E1 It4000 loss=3.4009 lr=0.000065 elapsed=1735.9s


E1 It4200 loss=3.3908 lr=0.000063 elapsed=1822.6s


E1 It4400 loss=3.3810 lr=0.000061 elapsed=1909.3s


E1 It4600 loss=3.3703 lr=0.000059 elapsed=1996.2s


E1 It4800 loss=3.3602 lr=0.000057 elapsed=2082.9s


E1 It5000 loss=3.3498 lr=0.000055 elapsed=2169.6s


E1 It5200 loss=3.3393 lr=0.000053 elapsed=2256.3s


E1 It5400 loss=3.3300 lr=0.000051 elapsed=2342.9s


E1 It5600 loss=3.3192 lr=0.000049 elapsed=2429.6s


E1 It5800 loss=3.3097 lr=0.000047 elapsed=2516.3s


E1 It6000 loss=3.3005 lr=0.000045 elapsed=2603.0s


E1 It6200 loss=3.2909 lr=0.000043 elapsed=2689.9s


E1 It6400 loss=3.2825 lr=0.000042 elapsed=2776.6s


E1 It6600 loss=3.2732 lr=0.000040 elapsed=2863.3s


E1 It6800 loss=3.2646 lr=0.000038 elapsed=2949.9s


E1 It7000 loss=3.2557 lr=0.000036 elapsed=3036.6s


E1 It7200 loss=3.2472 lr=0.000035 elapsed=3123.3s


E1 It7400 loss=3.2395 lr=0.000033 elapsed=3210.3s


E1 It7600 loss=3.2315 lr=0.000031 elapsed=3297.1s


E1 It7800 loss=3.2236 lr=0.000030 elapsed=3383.7s


E1 It8000 loss=3.2157 lr=0.000028 elapsed=3470.4s


E1 It8200 loss=3.2090 lr=0.000027 elapsed=3557.1s


E1 It8400 loss=3.2013 lr=0.000025 elapsed=3643.9s


E1 It8600 loss=3.1937 lr=0.000024 elapsed=3730.7s


E1 It8800 loss=3.1871 lr=0.000022 elapsed=3817.6s


E1 It9000 loss=3.1798 lr=0.000021 elapsed=3904.5s


E1 It9200 loss=3.1725 lr=0.000020 elapsed=3991.3s


E1 It9400 loss=3.1662 lr=0.000018 elapsed=4078.2s


E1 It9600 loss=3.1600 lr=0.000017 elapsed=4165.1s


E1 It9800 loss=3.1540 lr=0.000016 elapsed=4251.9s


E1 It10000 loss=3.1479 lr=0.000015 elapsed=4338.7s


E1 It10200 loss=3.1414 lr=0.000014 elapsed=4425.5s


E1 It10400 loss=3.1353 lr=0.000013 elapsed=4512.2s


E1 It10600 loss=3.1295 lr=0.000012 elapsed=4599.0s


E1 It10800 loss=3.1233 lr=0.000011 elapsed=4685.9s


E1 It11000 loss=3.1182 lr=0.000010 elapsed=4772.6s


E1 It11200 loss=3.1126 lr=0.000009 elapsed=4859.4s


E1 It11400 loss=3.1063 lr=0.000008 elapsed=4946.1s


E1 It11600 loss=3.1014 lr=0.000007 elapsed=5032.9s


E1 It11800 loss=3.0963 lr=0.000006 elapsed=5119.6s


E1 It12000 loss=3.0911 lr=0.000005 elapsed=5206.4s


E1 It12200 loss=3.0863 lr=0.000005 elapsed=5293.3s


E1 It12400 loss=3.0816 lr=0.000004 elapsed=5380.1s


E1 It12600 loss=3.0776 lr=0.000003 elapsed=5466.9s


E1 It12800 loss=3.0730 lr=0.000003 elapsed=5553.7s


E1 It13000 loss=3.0686 lr=0.000002 elapsed=5640.5s


E1 It13200 loss=3.0644 lr=0.000002 elapsed=5727.3s


E1 It13400 loss=3.0606 lr=0.000002 elapsed=5814.1s


E1 It13600 loss=3.0561 lr=0.000001 elapsed=5900.8s


E1 It13800 loss=3.0527 lr=0.000001 elapsed=5987.7s


E1 It14000 loss=3.0489 lr=0.000001 elapsed=6074.5s


E1 It14200 loss=3.0455 lr=0.000000 elapsed=6161.3s


E1 It14400 loss=3.0419 lr=0.000000 elapsed=6248.0s


E1 It14600 loss=3.0382 lr=0.000000 elapsed=6334.8s


E1 It14800 loss=3.0345 lr=0.000000 elapsed=6421.6s


E1 It15000 loss=3.0309 lr=0.000000 elapsed=6508.3s


Reached MAX_STEPS_PER_EPOCH=15000, breaking epoch early.


Epoch 1 train_loss=3.0309 epoch_time=6508.3s


Validate: macro-F1=0.451064  n=48000  batches=1000  elapsed=138.7s


Saved best checkpoint: model_convnext_base_320_ft_best.pth F1= 0.45106387697791467


Training done. Best val macro-F1=0.451064. Total elapsed=13297.4s


In [19]:
# Inference with 2x TTA (orig + HFlip) at 320 and tau tuning on large val subset
import os, time, json, math
import numpy as np
import pandas as pd
import torch
import torchvision.transforms as T
from PIL import Image
from sklearn.metrics import f1_score
import timm

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

val_split = pd.read_pickle('val_split.pkl')
trn_split = pd.read_pickle('train_split.pkl')
test_df = pd.read_pickle('test_df.pkl')
with open('idx2cat.json','r') as f:
    idx2cat = {int(k): int(v) for k,v in json.load(f).items()}
num_classes = len(idx2cat)

IMG_SIZE = 320
MODEL_NAME = 'convnext_base.fb_in22k_ft_in1k'
ckpt_path = 'model_convnext_base_320_ft_best.pth'

val_tfms = T.Compose([
    T.Resize(IMG_SIZE, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(IMG_SIZE),
    T.ToTensor(),
    T.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
])

def load_model_for_infer():
    model = timm.create_model(MODEL_NAME, pretrained=False, num_classes=num_classes)
    ck = torch.load(ckpt_path, map_location='cpu')
    model.load_state_dict(ck['model'], strict=True)
    model.to(device).eval()
    return model

def infer_logits_paths_tta(model, paths, batch_size=96, log_every=100):
    # try larger batch, fallback if OOM
    bs_try = batch_size
    logits_out = []
    n = len(paths)
    i = 0
    t0 = time.time()
    while i < n:
        cur_bs = min(bs_try, n - i)
        batch_paths = paths[i:i+cur_bs]
        try:
            # orig
            imgs0 = []
            for p in batch_paths:
                try:
                    img = Image.open(p).convert('RGB')
                except Exception:
                    img = Image.fromarray(np.ones((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8)*255)
                imgs0.append(val_tfms(img))
            xb0 = torch.stack(imgs0).to(device, non_blocking=True)
            # hflip
            imgs1 = []
            for p in batch_paths:
                try:
                    img = Image.open(p).convert('RGB')
                except Exception:
                    img = Image.fromarray(np.ones((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8)*255)
                img = img.transpose(Image.FLIP_LEFT_RIGHT)
                imgs1.append(val_tfms(img))
            xb1 = torch.stack(imgs1).to(device, non_blocking=True)
            with torch.no_grad(), torch.amp.autocast('cuda', enabled=True):
                l0 = model(xb0)
                l1 = model(xb1)
                l = (l0 + l1) * 0.5
            logits_out.append(l.float().cpu().numpy())
            i += cur_bs
            if ((i // bs_try) % log_every) == 0:
                print(f'Infer TTA progress: {i}/{n} elapsed={time.time()-t0:.1f}s', flush=True)
        except RuntimeError as e:
            if 'CUDA out of memory' in str(e) and bs_try > 32:
                torch.cuda.empty_cache()
                bs_try = max(32, bs_try // 2)
                print('OOM encountered. Reducing batch_size to', bs_try, flush=True)
                continue
            else:
                raise
    return np.concatenate(logits_out, axis=0)

# Priors from training split
counts = trn_split['label_idx'].value_counts().sort_index()
priors = counts.reindex(range(num_classes), fill_value=1).values.astype(np.float64)
priors = priors / priors.sum()
log_priors = np.log(priors + 1e-12)

def tune_tau_on_val(model, cap=30000):
    n = min(cap, len(val_split))
    paths = val_split['file_path'].tolist()[:n]
    y_true = val_split['label_idx'].values[:n]
    print(f'Running 2x TTA validation on {n} samples for tau tuning...', flush=True)
    val_logits = infer_logits_paths_tta(model, paths, batch_size=96, log_every=10)
    best_tau, best_f1 = None, -1.0
    for tau in [0.6, 0.8, 1.0, 1.2]:
        adj = val_logits - tau * log_priors[None, :]
        pred = adj.argmax(axis=1)
        f1 = f1_score(y_true, pred, average='macro')
        print(f'tau={tau:.2f} val macro-F1={f1:.6f}', flush=True)
        if f1 > best_f1:
            best_f1, best_tau = f1, tau
    print('Best tau:', best_tau, 'Best val F1:', best_f1, flush=True)
    return best_tau, best_f1

def run_test_inference(model, tau, batch_size=96):
    paths = test_df['file_path'].tolist()
    print('Running 2x TTA test inference...', flush=True)
    test_logits = infer_logits_paths_tta(model, paths, batch_size=batch_size, log_every=50)
    adj = test_logits - tau * log_priors[None, :]
    pred_idx = adj.argmax(axis=1)
    pred_cat = [idx2cat[int(i)] for i in pred_idx]
    sub = pd.DataFrame({'Id': test_df['Id'].values, 'Predicted': pred_cat}).sort_values('Id').reset_index(drop=True)
    sub.to_csv('submission.csv', index=False)
    print('submission.csv written. Shape:', sub.shape, 'Head:\n', sub.head(), flush=True)

print('Loading model checkpoint:', ckpt_path, flush=True)
model_for_infer = load_model_for_infer()
print('Model loaded. Ready to tune tau and run test after training completes.', flush=True)

Loading model checkpoint: model_convnext_base_320_ft_best.pth


  ck = torch.load(ckpt_path, map_location='cpu')


Model loaded. Ready to tune tau and run test after training completes.


In [20]:
# Execute 2x TTA tau tuning (30k val) and full test inference
print('Starting tau tuning on val (cap=30000)...', flush=True)
best_tau, best_f1 = tune_tau_on_val(model_for_infer, cap=30000)
print(f'Chosen tau={best_tau} with val F1={best_f1:.6f}', flush=True)
print('Running full test inference with 2x TTA...', flush=True)
run_test_inference(model_for_infer, best_tau, batch_size=96)
print('Done: submission.csv written.', flush=True)

Starting tau tuning on val (cap=30000)...


Running 2x TTA validation on 30000 samples for tau tuning...


Infer TTA progress: 960/30000 elapsed=25.2s


Infer TTA progress: 1920/30000 elapsed=47.1s


Infer TTA progress: 2880/30000 elapsed=68.6s


Infer TTA progress: 3840/30000 elapsed=90.1s


Infer TTA progress: 4800/30000 elapsed=111.8s


Infer TTA progress: 5760/30000 elapsed=133.4s


Infer TTA progress: 6720/30000 elapsed=154.8s


Infer TTA progress: 7680/30000 elapsed=176.2s


Infer TTA progress: 8640/30000 elapsed=197.8s


Infer TTA progress: 9600/30000 elapsed=219.3s


Infer TTA progress: 10560/30000 elapsed=240.6s


Infer TTA progress: 11520/30000 elapsed=261.9s


Infer TTA progress: 12480/30000 elapsed=283.3s


Infer TTA progress: 13440/30000 elapsed=304.8s


Infer TTA progress: 14400/30000 elapsed=326.2s


Infer TTA progress: 15360/30000 elapsed=347.6s


Infer TTA progress: 16320/30000 elapsed=369.0s


Infer TTA progress: 17280/30000 elapsed=390.5s


Infer TTA progress: 18240/30000 elapsed=412.0s


Infer TTA progress: 19200/30000 elapsed=433.3s


Infer TTA progress: 20160/30000 elapsed=454.8s


Infer TTA progress: 21120/30000 elapsed=476.2s


Infer TTA progress: 22080/30000 elapsed=497.7s


Infer TTA progress: 23040/30000 elapsed=519.3s


Infer TTA progress: 24000/30000 elapsed=540.7s


Infer TTA progress: 24960/30000 elapsed=562.0s


Infer TTA progress: 25920/30000 elapsed=583.5s


Infer TTA progress: 26880/30000 elapsed=606.1s


Infer TTA progress: 27840/30000 elapsed=627.6s


Infer TTA progress: 28800/30000 elapsed=649.0s


Infer TTA progress: 29760/30000 elapsed=670.5s


tau=0.60 val macro-F1=0.421119


  type_true = type_of_target(y_true, input_name="y_true")
  type_pred = type_of_target(y_pred, input_name="y_pred")
  ys_types = set(type_of_target(x) for x in ys)
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  type_pred = type_of_target(y_pred, input_name="y_pred")
  ys_types = set(type_of_target(x) for x in ys)
  ys_types = set(type_of_target(x) for x in ys)


tau=0.80 val macro-F1=0.392910


  type_true = type_of_target(y_true, input_name="y_true")
  type_pred = type_of_target(y_pred, input_name="y_pred")
  ys_types = set(type_of_target(x) for x in ys)
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  type_pred = type_of_target(y_pred, input_name="y_pred")
  ys_types = set(type_of_target(x) for x in ys)
  ys_types = set(type_of_target(x) for x in ys)


tau=1.00 val macro-F1=0.358813


  type_true = type_of_target(y_true, input_name="y_true")
  type_pred = type_of_target(y_pred, input_name="y_pred")
  ys_types = set(type_of_target(x) for x in ys)
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  type_pred = type_of_target(y_pred, input_name="y_pred")
  ys_types = set(type_of_target(x) for x in ys)
  ys_types = set(type_of_target(x) for x in ys)


tau=1.20 val macro-F1=0.320986


Best tau: 0.6 Best val F1: 0.4211189742274307


Chosen tau=0.6 with val F1=0.421119


Running full test inference with 2x TTA...


Running 2x TTA test inference...


  type_true = type_of_target(y_true, input_name="y_true")
  type_pred = type_of_target(y_pred, input_name="y_pred")
  ys_types = set(type_of_target(x) for x in ys)
  ys_types = set(type_of_target(x) for x in ys)
  type_true = type_of_target(y_true, input_name="y_true")
  type_pred = type_of_target(y_pred, input_name="y_pred")
  ys_types = set(type_of_target(x) for x in ys)
  ys_types = set(type_of_target(x) for x in ys)


Infer TTA progress: 4800/477806 elapsed=107.6s


Infer TTA progress: 9600/477806 elapsed=215.0s


Infer TTA progress: 14400/477806 elapsed=321.9s


Infer TTA progress: 19200/477806 elapsed=429.0s


Infer TTA progress: 24000/477806 elapsed=537.0s


Infer TTA progress: 28800/477806 elapsed=644.3s


Infer TTA progress: 33600/477806 elapsed=753.0s


Infer TTA progress: 38400/477806 elapsed=860.3s


Infer TTA progress: 43200/477806 elapsed=967.7s


Infer TTA progress: 48000/477806 elapsed=1074.9s


Infer TTA progress: 52800/477806 elapsed=1183.0s


Infer TTA progress: 57600/477806 elapsed=1290.5s


Infer TTA progress: 62400/477806 elapsed=1398.3s


Infer TTA progress: 67200/477806 elapsed=1506.1s


Infer TTA progress: 72000/477806 elapsed=1615.5s


Infer TTA progress: 76800/477806 elapsed=1724.8s


Infer TTA progress: 81600/477806 elapsed=1832.7s


Infer TTA progress: 86400/477806 elapsed=1942.3s


Infer TTA progress: 91200/477806 elapsed=2050.0s


Infer TTA progress: 96000/477806 elapsed=2158.5s


Infer TTA progress: 100800/477806 elapsed=2266.6s


Infer TTA progress: 105600/477806 elapsed=2374.7s


Infer TTA progress: 110400/477806 elapsed=2482.6s


Infer TTA progress: 115200/477806 elapsed=2590.6s


Infer TTA progress: 120000/477806 elapsed=2698.8s


Infer TTA progress: 124800/477806 elapsed=2806.5s


Infer TTA progress: 129600/477806 elapsed=2914.1s


Infer TTA progress: 134400/477806 elapsed=3021.6s


Infer TTA progress: 139200/477806 elapsed=3129.6s


Infer TTA progress: 144000/477806 elapsed=3238.0s


Infer TTA progress: 148800/477806 elapsed=3345.4s


Infer TTA progress: 153600/477806 elapsed=3453.1s


Infer TTA progress: 158400/477806 elapsed=3561.0s


Infer TTA progress: 163200/477806 elapsed=3668.2s


Infer TTA progress: 168000/477806 elapsed=3776.5s


Infer TTA progress: 172800/477806 elapsed=3884.7s


Infer TTA progress: 177600/477806 elapsed=3992.5s


Infer TTA progress: 182400/477806 elapsed=4100.0s


Infer TTA progress: 187200/477806 elapsed=4208.2s


Infer TTA progress: 192000/477806 elapsed=4315.8s


Infer TTA progress: 196800/477806 elapsed=4425.0s


Infer TTA progress: 201600/477806 elapsed=4536.5s


Infer TTA progress: 206400/477806 elapsed=4646.0s


Infer TTA progress: 211200/477806 elapsed=4754.1s


Infer TTA progress: 216000/477806 elapsed=4863.9s


Infer TTA progress: 220800/477806 elapsed=4972.9s


Infer TTA progress: 225600/477806 elapsed=5082.6s


Infer TTA progress: 230400/477806 elapsed=5191.0s


Infer TTA progress: 235200/477806 elapsed=5298.9s


Infer TTA progress: 240000/477806 elapsed=5408.4s


Infer TTA progress: 244800/477806 elapsed=5517.2s


Infer TTA progress: 249600/477806 elapsed=5625.3s


Infer TTA progress: 254400/477806 elapsed=5734.5s


Infer TTA progress: 259200/477806 elapsed=5841.5s


Infer TTA progress: 264000/477806 elapsed=5950.9s


Infer TTA progress: 268800/477806 elapsed=6059.5s


Infer TTA progress: 273600/477806 elapsed=6167.5s


Infer TTA progress: 278400/477806 elapsed=6275.4s


Infer TTA progress: 283200/477806 elapsed=6384.3s


Infer TTA progress: 288000/477806 elapsed=6492.6s


Infer TTA progress: 292800/477806 elapsed=6603.1s


Infer TTA progress: 297600/477806 elapsed=6710.7s


Infer TTA progress: 302400/477806 elapsed=6817.8s


Infer TTA progress: 307200/477806 elapsed=6926.3s


Infer TTA progress: 312000/477806 elapsed=7034.4s


Infer TTA progress: 316800/477806 elapsed=7142.0s


Infer TTA progress: 321600/477806 elapsed=7250.1s


Infer TTA progress: 326400/477806 elapsed=7357.5s


Infer TTA progress: 331200/477806 elapsed=7464.7s


Infer TTA progress: 336000/477806 elapsed=7572.8s


Infer TTA progress: 340800/477806 elapsed=7681.1s


Infer TTA progress: 345600/477806 elapsed=7788.9s


Infer TTA progress: 350400/477806 elapsed=7896.6s


Infer TTA progress: 355200/477806 elapsed=8004.0s


Infer TTA progress: 360000/477806 elapsed=8113.3s


Infer TTA progress: 364800/477806 elapsed=8221.9s


Infer TTA progress: 369600/477806 elapsed=8329.1s


Infer TTA progress: 374400/477806 elapsed=8438.1s


Infer TTA progress: 379200/477806 elapsed=8546.0s


Infer TTA progress: 384000/477806 elapsed=8653.3s


Infer TTA progress: 388800/477806 elapsed=8760.9s


Infer TTA progress: 393600/477806 elapsed=8869.8s


Infer TTA progress: 398400/477806 elapsed=8978.0s


Infer TTA progress: 403200/477806 elapsed=9085.3s


Infer TTA progress: 408000/477806 elapsed=9194.2s


Infer TTA progress: 412800/477806 elapsed=9302.3s


Infer TTA progress: 417600/477806 elapsed=9410.9s


Infer TTA progress: 422400/477806 elapsed=9519.2s


Infer TTA progress: 427200/477806 elapsed=9627.8s


Infer TTA progress: 432000/477806 elapsed=9736.3s


Infer TTA progress: 436800/477806 elapsed=9844.2s


Infer TTA progress: 441600/477806 elapsed=9952.2s


Infer TTA progress: 446400/477806 elapsed=10060.3s


Infer TTA progress: 451200/477806 elapsed=10169.4s


Infer TTA progress: 456000/477806 elapsed=10279.0s


Infer TTA progress: 460800/477806 elapsed=10386.4s


Infer TTA progress: 465600/477806 elapsed=10494.2s


Infer TTA progress: 470400/477806 elapsed=10603.0s


Infer TTA progress: 475200/477806 elapsed=10712.1s


In [None]:
# Fast, reliable single-view inference with fixed tau=0.6 and periodic checkpoints
import os, time, json, math, gc
import numpy as np
import pandas as pd
import torch
import torchvision.transforms as T
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import timm

torch.backends.cudnn.benchmark = True
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load artifacts
val_split = pd.read_pickle('val_split.pkl')  # not strictly needed, but kept for consistency
trn_split = pd.read_pickle('train_split.pkl')
test_df = pd.read_pickle('test_df.pkl')
with open('idx2cat.json','r') as f:
    idx2cat = {int(k): int(v) for k,v in json.load(f).items()}
num_classes = len(idx2cat)

IMG_SIZE = 320
MODEL_NAME = 'convnext_base.fb_in22k_ft_in1k'
CKPT_PATH = 'model_convnext_base_320_ft_best.pth'
TAU = 0.6  # fixed from prior tuning

# Priors from training split
counts = trn_split['label_idx'].value_counts().sort_index()
priors = counts.reindex(range(num_classes), fill_value=1).values.astype(np.float64)
priors = priors / priors.sum()
log_priors = torch.from_numpy(np.log(priors + 1e-12)).to(device).float()

val_tfms = T.Compose([
    T.Resize(IMG_SIZE, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(IMG_SIZE),
    T.ToTensor(),
    T.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
])

class TestDS(Dataset):
    def __init__(self, df):
        self.df = df.reset_index(drop=True)
    def __len__(self):
        return len(self.df)
    def __getitem__(self, i):
        r = self.df.iloc[i]
        p = r['file_path']
        try:
            img = Image.open(p).convert('RGB')
        except Exception:
            img = Image.fromarray(np.ones((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8)*255)
        x = val_tfms(img)
        return x, int(r['Id'])

def load_model():
    model = timm.create_model(MODEL_NAME, pretrained=False, num_classes=num_classes)
    ck = torch.load(CKPT_PATH, map_location='cpu')
    model.load_state_dict(ck['model'], strict=True)
    model.to(device).eval()
    return model

@torch.inference_mode()
def run_single_view_infer():
    ds = TestDS(test_df)
    # Use conservative batch for stability; OOM-safe backoff not needed for single-view
    bs = 128
    dl = DataLoader(ds, batch_size=bs, shuffle=False, num_workers=min(8, os.cpu_count() or 8), pin_memory=True, persistent_workers=True)
    model = load_model()
    preds_idx = np.empty((len(ds),), dtype=np.int32)
    t0 = time.time()
    done = 0
    for it, (xb, ids) in enumerate(dl):
        xb = xb.to(device, non_blocking=True)
        with torch.amp.autocast('cuda', enabled=True):
            logits = model(xb)
            adj = logits - TAU * log_priors[None, :]
        pred = adj.argmax(dim=1).int().cpu().numpy()
        s = done
        e = done + len(ids)
        preds_idx[s:e] = pred
        done = e
        if (it+1) % 300 == 0 or e == len(ds):
            print(f'Test infer progress: {e}/{len(ds)} elapsed={time.time()-t0:.1f}s', flush=True)
    # Map to category ids and write submission
    pred_cat = [idx2cat[int(i)] for i in preds_idx]
    sub = pd.DataFrame({'Id': test_df['Id'].values, 'Predicted': pred_cat}).sort_values('Id').reset_index(drop=True)
    sub.to_csv('submission.csv', index=False)
    print('submission.csv written. Shape:', sub.shape, ' Head:\n', sub.head(), flush=True)
    return sub

print('Starting fast single-view inference with tau=0.6 ...', flush=True)
sub = run_single_view_infer()
print('Done.')

  from .autonotebook import tqdm as notebook_tqdm


Starting fast single-view inference with tau=0.6 ...


  ck = torch.load(CKPT_PATH, map_location='cpu')
