# Plan: iWildCam 2020 - FGVC7 (Goal: Medal)

Objectives:
- Establish GPU-enabled environment; verify CUDA.
- Load/inspect provided artifacts: train/test dirs, annotations JSONs, megadetector results, test info, sample submission.
- Build fast, correct baseline: single strong pretrained CNN (e.g., timm resnet50/efficientnet), 224-384px, focal loss or weighted CE, standard aug.
- Robust CV mirroring test distribution: GroupKFold by location/camera_id if available; otherwise stratified with site-aware split from metadata.
- Ship working submission ASAP; iterate to medal with improvements and ensembling.

Initial Baseline Roadmap:
1) Environment:
   - Install PyTorch cu121 stack and timm; confirm GPU via nvidia-smi and torch.cuda.is_available().
2) Data pipeline:
   - Parse iwildcam2020_train_annotations.json to extract image_id, file_name, category_id, location/camera if present.
   - Map categories to contiguous labels; build DataFrame.
   - For test, read iwildcam2020_test_information.json (file_name list, potentially location).
3) Validation:
   - If annotations provide location/site/camera, use GroupKFold by location to simulate domain shift.
   - Else: StratifiedKFold with careful leakage checks; fix random_state for determinism.
4) Modeling:
   - Start with timm models: efficientnet_b0 or convnext_tiny at 320px, pretrained=True.
   - Augs: RandomResizedCrop, HFlip, ColorJitter, Normalize; use mixup/cutmix small.
   - Optim: AdamW, cosine schedule, warmup, label smoothing 0.1; epochs: 5-8 for smoke baseline, early stop on OOF.
   - Loss: CrossEntropy with class weights or Focal if imbalance severe.
5) Inference:
   - TTA x3 (scales/flips) if time allows.
   - Save submission.csv with columns [Id,Category].
6) Iterations to Medal:
   - Scale model/resolution (efficientnet_b3/b4, convnext_base, swin_t).
   - Use MegaDetector crops to focus on detected boxes (fallback to full image).
   - Pseudo-labeling on confident test predictions if CV aligns; blend full-image and crop models.
   - Class-balanced sampling; per-location batch sampling.

Risks & Mitigations:
- Data leakage via site overlap across folds → enforce GroupKFold by location if available.
- Long training times → smoke runs first, log per-epoch times, cache datasets, num_workers tuned.
- File I/O bottleneck (160k train images) → use pillow-simd if available, persistent workers, prefetch_factor.

Next Actions:
A) Add and run GPU/Env check cell.
B) Quick EDA: load JSONs; count classes, sites, images; verify file paths.
C) Implement training script train.py for clean subprocess runs.
D) Ship baseline model with 1-2 folds and generate first submission.

Expert Questions:
- Recommended CV protocol for iWildCam 2020: group by location vs sequence_id vs camera trap id?
- Baseline model/resolution that reliably gets >0.60 accuracy on LB?
- Best use of MegaDetector: single crop per image or multi-box with NMS/ensembling?
- Any pitfalls with class mapping or missing classes in test?

We will request expert review after env check + EDA and again after first baseline OOF/LB.

In [None]:
# GPU / Environment check
import os, subprocess, sys, time, json
print("=== nvidia-smi ===", flush=True)
subprocess.run(["bash","-lc","nvidia-smi || true"], check=False)
print("=== CUDA env ===", flush=True)
print("CUDA_VISIBLE_DEVICES=", os.environ.get("CUDA_VISIBLE_DEVICES"))
print("=== GPU query ===", flush=True)
subprocess.run(["bash","-lc","nvidia-smi --query-gpu=name,driver_version,memory.total --format=csv,noheader || true"], check=False)
print("=== Done ===", flush=True)

In [None]:
# Install PyTorch cu121 stack and core deps (avoid re-installing torch during later installs)
import os, sys, subprocess, shutil
from pathlib import Path
def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

# Uninstall any stray torch stacks (idempotent)
for pkg in ('torch','torchvision','torchaudio'):
    subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', pkg], check=False)

# Clean potential shadow dirs
for d in (
    '/app/.pip-target/torch',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
    '/app/.pip-target/torchgen',
    '/app/.pip-target/functorch',
):
    if os.path.exists(d):
        print('Removing', d)
        shutil.rmtree(d, ignore_errors=True)

# Install exact cu121 torch stack
pip('install',
    '--index-url','https://download.pytorch.org/whl/cu121',
    '--extra-index-url','https://pypi.org/simple',
    'torch==2.4.1','torchvision==0.19.1','torchaudio==2.4.1')

# Freeze constraints for later installs
Path('constraints.txt').write_text('torch==2.4.1\ntorchvision==0.19.1\ntorchaudio==2.4.1\n')

# Install timm without deps to avoid pulling CPU torch wheels
pip('install', 'timm==1.0.9', '--no-deps')

# Install non-torch deps safely (none of these depend on torch)
pip('install','-c','constraints.txt',
    'albumentations==1.4.14','opencv-python-headless==4.10.0.84',
    'pandas','numpy','scikit-learn',
    'tensorboard','einops','pyyaml','matplotlib',
    '--upgrade-strategy','only-if-needed')

# Sanity check
import torch
print('torch:', torch.__version__, 'CUDA build:', getattr(torch.version,'cuda',None))
print('CUDA available:', torch.cuda.is_available())
assert str(getattr(torch.version,'cuda','')).startswith('12.1'), f"Wrong CUDA build: {torch.version.cuda}"
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))
else:
    print('WARNING: CUDA not available')

In [None]:
# EDA: Load JSONs, build DataFrames, sanity checks, MD coverage
import json, os, pandas as pd, numpy as np, math, time
from pathlib import Path

data_dir = Path('.')
train_dir = data_dir / 'train'
test_dir = data_dir / 'test'
ann_path = data_dir / 'iwildcam2020_train_annotations.json'
test_info_path = data_dir / 'iwildcam2020_test_information.json'
md_path = data_dir / 'iwildcam2020_megadetector_results.json'
sample_sub_path = data_dir / 'sample_submission.csv'

t0=time.time()
print('Loading train annotations...')
with open(ann_path, 'r') as f:
    train_json = json.load(f)
print('Keys:', list(train_json.keys()))

# Expect COCO-like structure
images = pd.DataFrame(train_json.get('images', []))
ann = pd.DataFrame(train_json.get('annotations', []))
cats = pd.DataFrame(train_json.get('categories', []))
print('images:', images.shape, 'annotations:', ann.shape, 'categories:', cats.shape)
print('images columns:', images.columns.tolist()[:20])
print('annotations columns:', ann.columns.tolist())
print('categories head:\n', cats.head(3))

# Basic integrity
assert 'id' in images.columns and 'file_name' in images.columns, 'images must contain id and file_name'
assert 'image_id' in ann.columns and 'category_id' in ann.columns, 'annotations must contain image_id and category_id'
assert 'id' in cats.columns, 'categories must contain id'

# Merge labels (support seq_id if present)
merge_cols = ['id','file_name']
if 'location' in images.columns:
    merge_cols.append('location')
if 'seq_id' in images.columns:
    merge_cols.append('seq_id')
df = ann.merge(images[merge_cols].rename(columns={'id':'image_id'}), on='image_id', how='left')
print('Labeled records:', df.shape, 'unique images in labels:', df['image_id'].nunique())

# Location/sequence availability
has_location = 'location' in images.columns
has_sequence = 'seq_id' in images.columns
print('has_location:', has_location, 'has_sequence:', has_sequence)
if has_location:
    print('unique locations:', images['location'].nunique())
    print('location nulls:', images['location'].isna().sum())
if has_sequence:
    print('unique sequences:', images['seq_id'].nunique())

# Category mapping checks
cat_ids = cats['id'].tolist()
train_cat_min, train_cat_max = min(cat_ids), max(cat_ids)
print('Category id range:', train_cat_min, 'to', train_cat_max, 'count:', len(cat_ids))
missing_cats = sorted(set(df['category_id'].unique()) - set(cat_ids))
print('Missing categories referenced by annotations:', missing_cats[:10], '... count', len(missing_cats))

# Verify file paths exist for a small sample
exists_sample = df[['file_name']].drop_duplicates().sample(n=min(10, df['file_name'].nunique()), random_state=42)['file_name'].tolist()
missing_files = []
for fn in exists_sample:
    p = train_dir / fn
    if not p.exists():
        missing_files.append(fn)
print('Sample path missing count (train):', len(missing_files))
if missing_files[:3]:
    print('Missing examples:', missing_files[:3])

# Load test info
print('\nLoading test info...')
with open(test_info_path, 'r') as f:
    test_info = json.load(f)
test_images = pd.DataFrame(test_info.get('images', test_info.get('images_info', []))) if isinstance(test_info, dict) else pd.DataFrame(test_info)
if test_images.empty and 'images' in train_json:
    # some variants store as 'images' key
    test_images = pd.DataFrame(test_info.get('images', []))
print('test_images shape:', test_images.shape, 'columns:', test_images.columns.tolist()[:20])
assert 'file_name' in test_images.columns or 'id' in test_images.columns, 'test info must contain file_name or id'

# Sample submission checks
sample_sub = pd.read_csv(sample_sub_path)
print('sample_submission shape:', sample_sub.shape, 'columns:', sample_sub.columns.tolist())
sub_id_col = sample_sub.columns[0]
sub_target_col = sample_sub.columns[1]
print('Submission Id column:', sub_id_col, 'Target column:', sub_target_col)

# Align test id key
test_key_col = 'file_name' if 'file_name' in test_images.columns else (sub_id_col if sub_id_col in test_images.columns else None)
print('Test key column determined as:', test_key_col)
if test_key_col is None:
    # try to infer
    for c in ['Id','id','image_id','name','file']:
        if c in test_images.columns:
            test_key_col = c
            break
print('Final test key column:', test_key_col)
assert test_key_col is not None, 'Could not determine test key column'

# Verify a few test files exist
test_exists_sample = test_images[test_key_col].drop_duplicates().sample(n=min(10, len(test_images)), random_state=42).tolist()
missing_test = []
for fn in test_exists_sample:
    p = test_dir / fn
    if not p.exists():
        missing_test.append(fn)
print('Sample path missing count (test):', len(missing_test))
if missing_test[:3]:
    print('Missing test examples:', missing_test[:3])

# Load MegaDetector results and compute coverage
print('\nLoading MegaDetector results...')
with open(md_path, 'r') as f:
    md = json.load(f)
md_images = md.get('images', md)
md_df = pd.DataFrame(md_images)
print('MD entries:', md_df.shape, 'columns:', md_df.columns.tolist())

# Determine rel_name for MD either via file path present or by joining on id->file_name
file_col = 'file' if 'file' in md_df.columns else ('image_path' if 'image_path' in md_df.columns else None)
if file_col is not None:
    def rel_name(p):
        p = str(p)
        if p.startswith('train/') or p.startswith('test/'):
            return p.split('/',1)[1]
        return os.path.basename(p)
    md_df['rel_name'] = md_df[file_col].apply(rel_name)
else:
    # Build id->file_name map from train and test metadata
    id_map_cols = ['id','file_name']
    id_map = images[id_map_cols].copy()
    if ('id' in test_images.columns) and ('file_name' in test_images.columns):
        id_map = pd.concat([id_map, test_images[id_map_cols]], ignore_index=True)
    md_df = md_df.merge(id_map, on='id', how='left')
    assert 'file_name' in md_df.columns, 'MD id could not be mapped to file_name; check schemas'
    md_df['rel_name'] = md_df['file_name']

# Keep only animal category boxes if present
def best_animal_box(recs):
    if not isinstance(recs, list) or len(recs)==0:
        return None
    best = None
    for d in recs:
        cat = str(d.get('category', ''))
        if cat in ('1','animal','animal_person_vehicle'):
            if (best is None) or (d.get('conf',0) > best.get('conf',0)):
                best = d
    return best
if 'detections' in md_df.columns:
    md_df['best_det'] = md_df['detections'].apply(best_animal_box)
else:
    md_df['best_det'] = None
md_df['has_animal'] = md_df['best_det'].notna()
md_cov = md_df.groupby('rel_name')['has_animal'].max().rename('md_has_animal').reset_index()
print('MD animal coverage (unique files):', md_cov['md_has_animal'].mean().round(4))

# Join MD coverage to train and test samples (by file_name)
train_files_unique = images[['file_name']].drop_duplicates().copy()
train_md = train_files_unique.merge(md_cov, left_on='file_name', right_on='rel_name', how='left')
train_md['md_has_animal'] = train_md['md_has_animal'].fillna(False)
print('Train MD coverage:', train_md['md_has_animal'].mean().round(4))
test_files_unique = test_images[[test_key_col]].drop_duplicates().copy()
test_files_unique.columns = ['file_name_key']
test_md = test_files_unique.merge(md_cov, left_on='file_name_key', right_on='rel_name', how='left')
test_md['md_has_animal'] = test_md['md_has_animal'].fillna(False)
print('Test MD coverage:', test_md['md_has_animal'].mean().round(4))

print(f'EDA done in {time.time()-t0:.2f}s')

In [None]:
# Precompute splits, label maps, and MD best boxes; install missing deps
import os, json, math, time, pickle
import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.model_selection import GroupKFold
import subprocess, sys

# Ensure timm deps present (per expert advice)
def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)
try:
    import huggingface_hub, safetensors
except Exception:
    pip('install','-c','constraints.txt','huggingface_hub','safetensors','--upgrade-strategy','only-if-needed')

data_dir = Path('.')
ann_path = data_dir / 'iwildcam2020_train_annotations.json'
test_info_path = data_dir / 'iwildcam2020_test_information.json'
md_path = data_dir / 'iwildcam2020_megadetector_results.json'

with open(ann_path,'r') as f:
    train_json = json.load(f)
images = pd.DataFrame(train_json['images'])
ann = pd.DataFrame(train_json['annotations'])
cats = pd.DataFrame(train_json['categories'])
train_df = ann.merge(images[['id','file_name','location','seq_id']].rename(columns={'id':'image_id'}), on='image_id', how='left')

# Build label maps: category_id -> idx (contiguous) and inverse
unique_cat_ids = np.sort(train_df['category_id'].unique())
catid2idx = {int(c):i for i,c in enumerate(unique_cat_ids)}
idx2catid = {i:int(c) for i,c in enumerate(unique_cat_ids)}
print('Num classes:', len(unique_cat_ids))

# Class frequencies for weights
cls_counts = train_df['category_id'].value_counts().reindex(unique_cat_ids, fill_value=0).values.astype(np.float64)
w = 1.0/np.sqrt(np.maximum(1.0, cls_counts))
w = w * (len(w)/w.sum())
class_weights = w.astype(np.float32)
print('Class weight stats min/mean/max:', float(w.min()), float(w.mean()), float(w.max()))

# Build 5-fold GroupKFold by location
gkf = GroupKFold(n_splits=5)
folds = np.full(len(train_df), -1, dtype=np.int16)
for fold, (_, val_idx) in enumerate(gkf.split(train_df, groups=train_df['location'])):
    folds[val_idx] = fold
assert (folds>=0).all()
train_df['fold'] = folds
fold_sizes = train_df.groupby('fold')['image_id'].nunique().to_dict()
print('Fold image counts:', fold_sizes)

# Load MegaDetector and compute best box per file (expanded) with guards
with open(md_path,'r') as f:
    md = json.load(f)
md_df = pd.DataFrame(md.get('images', md))
def best_animal_box(recs):
    if not isinstance(recs, list) or len(recs)==0: return None
    best=None
    for d in recs:
        if str(d.get('category','')) in ('1','animal','animal_person_vehicle'):
            if (best is None) or (d.get('conf',0) > best.get('conf',0)): best=d
    return best
md_df['best_det'] = md_df['detections'].apply(best_animal_box) if 'detections' in md_df.columns else None

# Map md id -> file_name via metadata
id_map = pd.concat([images[['id','file_name']], pd.DataFrame()], ignore_index=True)
md_df = md_df.merge(id_map, on='id', how='left') if 'id' in md_df.columns else md_df
assert 'file_name' in md_df.columns, 'Could not map MD entries to file_name'

# Build dict: file_name -> {'bbox':[x,y,w,h], 'conf':c} using guards
def expand_and_clamp(box, pad, W, H):
    x,y,w,h = box
    x0 = max(0.0, x - pad*w)
    y0 = max(0.0, y - pad*h)
    x1 = min(1.0, x + w + pad*w)
    y1 = min(1.0, y + h + pad*h)
    return [x0, y0, x1-x0, y1-y0]

md_best = {}
for _, row in md_df.iterrows():
    fn = row['file_name']
    det = row['best_det'] if isinstance(row.get('best_det',None), dict) else None
    if det is None:
        continue
    conf = float(det.get('conf',0.0))
    bbox = det.get('bbox', None)
    if not isinstance(bbox, (list,tuple)) or len(bbox)!=4:
        continue
    x,y,w,h = [float(v) for v in bbox]
    area = max(0.0, min(1.0, w))*max(0.0, min(1.0, h))
    if (conf < 0.2) or (area < 0.02) or (area > 0.9):
        continue
    # adaptive padding
    pad = 0.35 if conf < 0.3 else (0.25 if conf < 0.7 else 0.15)
    eb = expand_and_clamp([x,y,w,h], pad, 1.0, 1.0)
    md_best[fn] = {'bbox': eb, 'conf': conf}
print('MD best boxes computed:', len(md_best))

# Save artifacts
art_dir = Path('artifacts')
art_dir.mkdir(exist_ok=True)
with open(art_dir/'catid2idx.json','w') as f: json.dump({str(k):int(v) for k,v in catid2idx.items()}, f)
with open(art_dir/'idx2catid.json','w') as f: json.dump({int(k):int(v) for k,v in idx2catid.items()}, f)
np.save(art_dir/'class_weights.npy', class_weights)
train_df.to_parquet(art_dir/'train_df.parquet', index=False)
with open(art_dir/'md_best.pkl','wb') as f: pickle.dump(md_best, f)
print('Saved artifacts to', art_dir)
print('Prep done.')

In [None]:
# Train fold0 convnext_tiny@320 with MD crops; infer test with HFlip TTA + seq averaging; write submission.csv
import os, time, math, json, pickle, random
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image, ImageOps, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as T
import timm
from timm.utils import ModelEmaV2
from timm.data import Mixup
from timm.loss import SoftTargetCrossEntropy

SEED = 42
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)
torch.backends.cudnn.benchmark = True
try:
    torch.set_float32_matmul_precision('high')
except Exception:
    pass
os.environ.setdefault('PYTORCH_CUDA_ALLOC_CONF','expandable_segments:True')

data_dir = Path('.')
train_dir = data_dir/'train'
test_dir = data_dir/'test'
art_dir = data_dir/'artifacts'

# Load artifacts
train_df = pd.read_parquet(art_dir/'train_df.parquet')
with open(art_dir/'catid2idx.json') as f: catid2idx = {int(k):int(v) for k,v in json.load(f).items()}
with open(art_dir/'idx2catid.json') as f: idx2catid = {int(k):int(v) for k,v in json.load(f).items()}
class_weights = torch.tensor(np.load(art_dir/'class_weights.npy'), dtype=torch.float32)
with open(art_dir/'md_best.pkl','rb') as f: md_best = pickle.load(f)

# Load test info and sample submission for keys and seq_id
with open(data_dir/'iwildcam2020_test_information.json','r') as f: test_info = json.load(f)
test_images = pd.DataFrame(test_info.get('images', test_info.get('images_info', [])))
sample_sub = pd.read_csv(data_dir/'sample_submission.csv')
sub_id_col, sub_target_col = sample_sub.columns[0], sample_sub.columns[1]

# Robustly build test_meta with [Id, file_name, seq_id] using id->file_name mapping
if 'id' in test_images.columns and 'file_name' in test_images.columns:
    id2file = dict(zip(test_images['id'].tolist(), test_images['file_name'].tolist()))
    id2seq = dict(zip(test_images['id'].tolist(), test_images['seq_id'].tolist())) if 'seq_id' in test_images.columns else {}
    test_meta = sample_sub[[sub_id_col]].copy()
    test_meta = test_meta.rename(columns={sub_id_col: 'Id'})
    test_meta['file_name'] = test_meta['Id'].map(id2file)
    test_meta['seq_id'] = test_meta['Id'].map(id2seq) if id2seq else -1
elif 'file_name' in test_images.columns:
    test_meta = sample_sub[[sub_id_col]].copy().rename(columns={sub_id_col: 'Id'})
    # Assume Id already equals file_name in this rare schema
    test_meta['file_name'] = test_meta['Id']
    test_meta['seq_id'] = -1
else:
    raise AssertionError('Test info must contain id and/or file_name')
assert test_meta['file_name'].notna().all(), 'Could not align sample Ids to file_name'

# Augment md_best with test crops from MD JSON (guards + adaptive padding)
try:
    with open(data_dir/'iwildcam2020_megadetector_results.json','r') as f:
        md_all = json.load(f)
    md_imgs = md_all.get('images', md_all)
    test_id2file = dict(zip(test_images['id'].tolist(), test_images['file_name'].tolist())) if 'id' in test_images.columns else {}
    def best_animal_box(recs):
        if not isinstance(recs, list) or len(recs)==0: return None
        best=None
        for d in recs:
            if str(d.get('category','')) in ('1','animal','animal_person_vehicle'):
                if (best is None) or (d.get('conf',0) > best.get('conf',0)): best=d
        return best
    def expand_and_clamp(box, pad):
        x,y,w,h = box
        x0 = max(0.0, x - pad*w); y0 = max(0.0, y - pad*h)
        x1 = min(1.0, x + w + pad*w); y1 = min(1.0, y + h + pad*h)
        return [x0, y0, x1-x0, y1-y0]
    added=0
    for rec in md_imgs:
        rid = rec.get('id', None)
        if rid is None or rid not in test_id2file: continue
        det = best_animal_box(rec.get('detections', []))
        if det is None: continue
        conf = float(det.get('conf', 0.0))
        bbox = det.get('bbox', None)
        if not isinstance(bbox, (list,tuple)) or len(bbox)!=4: continue
        x,y,w,h = [float(v) for v in bbox]
        area = max(0.0, min(1.0, w))*max(0.0, min(1.0, h))
        if (conf < 0.2) or (area < 0.02) or (area > 0.9): continue
        pad = 0.35 if conf < 0.3 else (0.25 if conf < 0.7 else 0.15)
        eb = expand_and_clamp([x,y,w,h], pad)
        fn = test_id2file[rid]
        if fn not in md_best:
            md_best[fn] = {'bbox': eb, 'conf': conf}
            added += 1
    print('Augmented md_best with test crops:', added)
except Exception as e:
    print('MD test augmentation skipped due to error:', e)

# Dataset
def load_image(path: Path):
    with Image.open(path) as im:
        im = ImageOps.exif_transpose(im.convert('RGB'))
        return im

def crop_by_norm_box(im: Image.Image, box):
    w, h = im.size
    x, y, bw, bh = box
    x0 = int(max(0, min(w, x * w)))
    y0 = int(max(0, min(h, y * h)))
    x1 = int(max(0, min(w, (x + bw) * w)))
    y1 = int(max(0, min(h, (y + bh) * h)))
    if x1 <= x0 or y1 <= y0:
        return im
    return im.crop((x0, y0, x1, y1))

class IWildCamDataset(Dataset):
    def __init__(self, df, root_dir, catid2idx, md_best, is_train=True, img_size=320, md_ignore_p=0.2):
        self.df = df.reset_index(drop=True)
        self.root = Path(root_dir)
        self.catid2idx = catid2idx
        self.md_best = md_best
        self.is_train = is_train
        self.img_size = img_size
        self.md_ignore_p = md_ignore_p
        mean=(0.485,0.456,0.406); std=(0.229,0.224,0.225)
        if is_train:
            self.tf = T.Compose([
                T.RandomResizedCrop(self.img_size, scale=(0.8, 1.0), ratio=(0.75, 1.333)),
                T.RandomHorizontalFlip(p=0.5),
                T.ColorJitter(0.2,0.2,0.2,0.0),
                T.ToTensor(),
                T.RandomErasing(p=0.1, scale=(0.02, 0.33), ratio=(0.3, 3.3), value='random'),
                T.Normalize(mean, std),
            ])
        else:
            self.tf = T.Compose([
                T.Resize((self.img_size, self.img_size)),
                T.ToTensor(),
                T.Normalize(mean, std),
            ])
    def __len__(self):
        return len(self.df)
    def __getitem__(self, idx):
        r = self.df.iloc[idx]
        fn = r['file_name']
        path = (self.root/fn)
        try:
            im = load_image(path)
            # MD crop logic
            use_full = (not self.is_train and fn not in self.md_best) or (self.is_train and (random.random() < self.md_ignore_p or fn not in self.md_best))
            if not use_full:
                im = crop_by_norm_box(im, self.md_best[fn]['bbox'])
        except Exception:
            # Fallback: use a solid gray image if the file is corrupted/unreadable
            im = Image.new('RGB', (self.img_size, self.img_size), (128,128,128))
        out = self.tf(im)
        if 'category_id' in r:
            y = self.catid2idx[int(r['category_id'])]
            return out, torch.tensor(y, dtype=torch.long)
        else:
            return out, fn  # for test

# Model
def create_model(num_classes):
    model = timm.create_model('convnext_tiny', pretrained=True, num_classes=num_classes, drop_path_rate=0.1)
    return model

# Train one fold
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
num_classes = len(idx2catid)
fold = 0
train_idx = train_df.index[train_df['fold'] != fold].tolist()
val_idx = train_df.index[train_df['fold'] == fold].tolist()
df_tr = train_df.loc[train_idx, ['file_name','category_id']].copy()
df_va = train_df.loc[val_idx, ['file_name','category_id']].copy()
print(f'Fold {fold}: train images {df_tr.shape[0]} val images {df_va.shape[0]}')

img_size = 320
bs = 64
epochs = 10
use_channels_last = True

ds_tr = IWildCamDataset(df_tr, train_dir, catid2idx, md_best, is_train=True, img_size=img_size, md_ignore_p=0.2)
ds_va = IWildCamDataset(df_va, train_dir, catid2idx, md_best, is_train=False, img_size=img_size, md_ignore_p=0.0)
dl_tr = DataLoader(ds_tr, batch_size=bs, shuffle=True, num_workers=8, pin_memory=True, persistent_workers=True, prefetch_factor=2, drop_last=True)
dl_va = DataLoader(ds_va, batch_size=bs, shuffle=False, num_workers=8, pin_memory=True, persistent_workers=True, prefetch_factor=2)

model = create_model(num_classes).to(device)
if use_channels_last:
    model = model.to(memory_format=torch.channels_last)
if hasattr(model, 'set_grad_checkpointing'):
    model.set_grad_checkpointing(True)
ema = ModelEmaV2(model, decay=0.999)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=0.05)
num_steps = max(1, epochs * math.ceil(len(dl_tr)))
warmup_steps = max(1, int(0.2 * math.ceil(len(dl_tr))))
def lr_schedule(step):
    if step < warmup_steps:
        return step / warmup_steps
    t = (step - warmup_steps) / max(1, (num_steps - warmup_steps))
    return 0.5 * (1 + math.cos(math.pi * t))
scaler = torch.amp.GradScaler('cuda', enabled=True)
weight = class_weights.to(device)

# Mixup + correct loss
mixup_fn = Mixup(mixup_alpha=0.2, cutmix_alpha=0.0, prob=0.5, switch_prob=0.0, mode='batch', num_classes=num_classes)
crit_soft = SoftTargetCrossEntropy()
crit_hard = nn.CrossEntropyLoss(weight=weight, label_smoothing=0.1)

best_acc = 0.0
best_path = art_dir/f'convnext_tiny_fold{fold}.pt'
global_step = 0
t_start = time.time()

if best_path.exists():
    print(f'Found existing model at {best_path}, skipping training.')
else:
    for ep in range(1, epochs+1):
        model.train()
        running = 0.0
        t0 = time.time()
        for i,(x,y) in enumerate(dl_tr):
            x = x.to(device, non_blocking=True)
            if use_channels_last:
                x = x.to(memory_format=torch.channels_last)
            y = y.to(device, non_blocking=True)
            lr = 1e-3 * lr_schedule(global_step)
            for pg in optimizer.param_groups: pg['lr'] = lr
            optimizer.zero_grad(set_to_none=True)
            with torch.amp.autocast('cuda', dtype=torch.float16):
                if mixup_fn is not None:
                    x, y_mix = mixup_fn(x, y)
                    logits = model(x)
                    loss = crit_soft(logits, y_mix)
                else:
                    logits = model(x)
                    loss = crit_hard(logits, y)
            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()
            ema.update(model)
            running += loss.item() * x.size(0)
            global_step += 1
            if (i % 100)==0:
                print(f'[Ep {ep}] step {i}/{len(dl_tr)} loss {loss.item():.4f} lr {lr:.6f}', flush=True)
        tr_loss = running/len(ds_tr)
        # Validate
        model.eval()
        ema_model = ema.module
        correct=0; total=0
        with torch.no_grad():
            for x,y in dl_va:
                x = x.to(device, non_blocking=True)
                if use_channels_last:
                    x = x.to(memory_format=torch.channels_last)
                y = y.to(device, non_blocking=True)
                with torch.amp.autocast('cuda', dtype=torch.float16):
                    logits = ema_model(x)
                preds = logits.argmax(1)
                correct += (preds==y).sum().item()
                total += y.numel()
        val_acc = correct/total if total>0 else 0.0
        print(f'Epoch {ep}/{epochs} tr_loss {tr_loss:.4f} val_acc {val_acc:.4f} elapsed {(time.time()-t0):.1f}s total {(time.time()-t_start)/60:.1f}m', flush=True)
        if val_acc > best_acc:
            best_acc = val_acc
            torch.save({'model': ema_model.state_dict(), 'acc': best_acc}, best_path)
            print(f'New best acc {best_acc:.4f}; saved {best_path}')
    print('Best val_acc:', best_acc)

assert best_path.exists(), 'No model saved'

# Inference on test with HFlip TTA and sequence averaging
state = torch.load(best_path, map_location='cpu')
model.load_state_dict(state['model'])
model.eval()

test_ds = IWildCamDataset(test_meta[['file_name']].copy(), test_dir, catid2idx, md_best, is_train=False, img_size=img_size, md_ignore_p=0.0)
test_dl = DataLoader(test_ds, batch_size=bs, shuffle=False, num_workers=8, pin_memory=True, persistent_workers=True, prefetch_factor=2)

all_logits = []
all_files = []
with torch.no_grad():
    for x, fns in test_dl:
        x = x.to(device, non_blocking=True)
        if use_channels_last:
            x = x.to(memory_format=torch.channels_last)
        with torch.amp.autocast('cuda', dtype=torch.float16):
            logit = model(x)
            # HFlip TTA
            x_flip = torch.flip(x, dims=[3])
            logit_flip = model(x_flip)
            logit = (logit + logit_flip) / 2.0
        all_logits.append(logit.detach().cpu())
        all_files.extend(list(fns))
all_logits = torch.cat(all_logits, dim=0).numpy()  # N x C (likely float16 due to autocast)
test_pred_df = pd.DataFrame({'file_name': all_files})
for i in range(num_classes):
    test_pred_df[f'c{i}'] = all_logits[:, i]

# Sequence-level averaging (only average valid seqs with count > 1)
seq_map = test_meta[['file_name','seq_id']].copy()
test_pred_df = test_pred_df.merge(seq_map, on='file_name', how='left')
mask = test_pred_df['seq_id'].notna() & (test_pred_df['seq_id'] != -1)
logit_cols = [f'c{i}' for i in range(num_classes)]
if mask.any():
    counts = test_pred_df.loc[mask, 'seq_id'].value_counts()
    valid = test_pred_df['seq_id'].isin(counts[counts > 1].index)
    if valid.any():
        seq_mean = test_pred_df.loc[valid].groupby('seq_id')[logit_cols].mean()
        # Match dtype to avoid pandas incompatible dtype warnings (our cols may be float16)
        seq_mean = seq_mean.astype(test_pred_df[logit_cols].dtypes.iloc[0])
        upd = test_pred_df.loc[valid, ['file_name', 'seq_id']].merge(seq_mean, on='seq_id', how='left').set_index('file_name')[logit_cols]
        test_pred_df = test_pred_df.set_index('file_name')
        test_pred_df.update(upd)
        test_pred_df = test_pred_df.reset_index()

# Argmax and map back to original category_id
pred_idx = test_pred_df[logit_cols].values.argmax(1)
pred_cat = [idx2catid[int(i)] for i in pred_idx]
pred_df = pd.DataFrame({'file_name': test_pred_df['file_name'], sub_target_col: pred_cat})

# Build submission in sample order (Id from sample_sub, map via test_meta file_name)
# Start from Id only to avoid duplicate 'Category' columns
sub = sample_sub[[sub_id_col]].copy().rename(columns={sub_id_col: 'Id'})
sub = sub.merge(test_meta[['Id','file_name']], on='Id', how='left')
sub = sub.merge(pred_df, on='file_name', how='left')
sub = sub[['Id', sub_target_col]]
assert sub.shape[0] == sample_sub.shape[0], 'Submission row count mismatch'
assert sub[sub_target_col].notna().all(), 'Missing predictions in submission'
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv with rows:', len(sub))

Augmented md_best with test crops: 23901
Fold 0: train images 125759 val images 31440


Found existing model at artifacts/convnext_tiny_fold0.pt, skipping training.


  state = torch.load(best_path, map_location='cpu')
