# iWildCam 2020 (FGVC7) – Medal Plan

Objectives:
- Build a robust, GPU-accelerated image classification pipeline.
- Establish trustworthy CV mirroring test distribution (camera/location-wise).
- Ship a strong baseline fast; iterate with augmentations, sampler, and better backbones.
- Cache OOF/logits and ensemble diverse seeds/backbones to reach medal.

Milestones:
1) Environment & GPU gate
   - Verify CUDA 12.1, install torch 2.4.1/cu121 + torchvision 0.19.1.
   - Sanity-check nvidia-smi, torch.cuda, GPU name.

2) Data audit & EDA
   - Load JSONs: train annotations, test info, megadetector detections.
   - Inspect fields: image_id → file, category_id, location/camera/site, sequence_id, empty images.
   - Count classes, class imbalance, per-location distribution.
   - Verify train/ test paths and file existence.

3) Validation protocol
   - Primary: GroupKFold by location/camera_id to simulate domain shift.
   - Alternative fallback: StratifiedKFold on category with group on sequence if location missing.
   - Fix seed and persist folds to disk.

4) Baseline model (deliver ASAP)
   - TorchVision pretrained backbone (e.g., resnet50 or efficientnet_b3).
   - 224→384 short-side resize with RandAugment/AutoAug, RandomResizedCrop, horizontal flip.
   - Class-balanced sampler or weighted CE; label smoothing.
   - Mixed precision, EMA, cosine LR with warmup; early stopping.
   - Save best by val accuracy per fold.

5) Iteration for gains
   - Backbones: ConvNeXt-T/S, EfficientNetV2-S, ResNet101, NFNet-F0 (if available).
   - Resolutions: 380→448; CutMix/MixUp; stronger aug.
   - Use MegaDetector crops (animal boxes) vs full-image; blend logits.
   - TTA (flips, multi-scale).

6) Ensembling
   - Weighted average of fold and seed logits across backbones and crop/full variants.
   - Calibrate (temperature scaling) on OOF if helpful.

7) Submission & Error analysis
   - Verify submission.csv format: image_id, category_id prediction.
   - Analyze OOF by class/location; address top error buckets.

Checkpoints with Expert Review:
- After plan (this cell).
- After EDA/validation lock-in.
- After baseline OOF results.
- After adding MD crops/ensembling.

Risks & Mitigations:
- Heavy training time: start with fast 224 baseline; log progress and allow early stop.
- CV mismatch: enforce location-grouped folds; verify distribution.
- Class imbalance: use per-class weights and/or sampler.

Next action:
- Run environment/GPU checks, list data, and parse JSON schemas; then set up folds and baseline dataloader.

In [1]:
# GPU/Env gate + quick data sanity
import os, sys, json, time, shutil, subprocess
from pathlib import Path

def run(cmd):
    print("$", " ".join(cmd), flush=True)
    return subprocess.run(cmd, check=False)

t0 = time.time()
print("=== NVIDIA-SMI ===", flush=True)
run(['bash','-lc','nvidia-smi || true'])

# Clean any preinstalled torch stack to avoid CUDA mismatch
for pkg in ("torch","torchvision","torchaudio"):
    subprocess.run([sys.executable, "-m", "pip", "uninstall", "-y", pkg], check=False)

for d in (
    "/app/.pip-target/torch",
    "/app/.pip-target/torchvision",
    "/app/.pip-target/torchaudio",
):
    if os.path.exists(d):
        print("Removing", d); shutil.rmtree(d, ignore_errors=True)

def pip(*args):
    print("> pip", *args, flush=True)
    subprocess.run([sys.executable, "-m", "pip", *args], check=True)

print("=== Installing PyTorch cu121 stack ===", flush=True)
pip("install",
    "--index-url", "https://download.pytorch.org/whl/cu121",
    "--extra-index-url", "https://pypi.org/simple",
    "torch==2.4.1", "torchvision==0.19.1", "torchaudio==2.4.1")

Path("constraints.txt").write_text("torch==2.4.1\ntorchvision==0.19.1\ntorchaudio==2.4.1\n")

import torch
print("torch:", torch.__version__, "built CUDA:", getattr(torch.version, "cuda", None))
print("CUDA available:", torch.cuda.is_available())
assert str(getattr(torch.version,'cuda','')).startswith('12.1'), f"Wrong CUDA build: {torch.version.cuda}"
assert torch.cuda.is_available(), "CUDA not available"
print("GPU:", torch.cuda.get_device_name(0))

print("=== Quick data sanity ===", flush=True)
base = Path('.')
train_dir = base/"train"
test_dir = base/"test"
ann_path = base/"iwildcam2020_train_annotations.json"
test_info_path = base/"iwildcam2020_test_information.json"
sample_sub_path = base/"sample_submission.csv"

n_train = len(os.listdir(train_dir))
n_test = len(os.listdir(test_dir))
print(f"Train images: {n_train}")
print(f"Test images:  {n_test}")

with open(ann_path,'r') as f:
    ann = json.load(f)
print("Annotation keys:", list(ann.keys()))
images = ann.get('images', [])
annotations = ann.get('annotations', [])
categories = ann.get('categories', [])
print(f"Images in JSON: {len(images)} | Annotations: {len(annotations)} | Categories: {len(categories)}")
cat_ids = [c.get('id') for c in categories]
print("Min/Max category_id:", min(cat_ids), max(cat_ids))

with open(test_info_path,'r') as f:
    test_info = json.load(f)
test_images = test_info.get('images', [])
print(f"Test JSON images: {len(test_images)}")

import pandas as pd
samp = pd.read_csv(sample_sub_path)
print("Sample submission head:\n", samp.head())
print("Sample submission shape:", samp.shape)

print(f"Setup done in {time.time()-t0:.1f}s", flush=True)

=== NVIDIA-SMI ===


$ bash -lc nvidia-smi || true


Wed Sep 24 04:12:14 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     182MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                





=== Installing PyTorch cu121 stack ===


> pip install --index-url https://download.pytorch.org/whl/cu121 --extra-index-url https://pypi.org/simple torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1




Looking in indexes: https://download.pytorch.org/whl/cu121, https://pypi.org/simple


Collecting torch==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torch-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (799.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 MB 423.7 MB/s eta 0:00:00


Collecting torchvision==0.19.1
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.19.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 457.2 MB/s eta 0:00:00


Collecting torchaudio==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 408.1 MB/s eta 0:00:00


Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 66.7 MB/s eta 0:00:00


Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 86.7 MB/s eta 0:00:00


Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 151.9 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 348.9 MB/s eta 0:00:00


Collecting fsspec
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 481.7 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 435.3 MB/s eta 0:00:00


Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 55.7 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 60.7 MB/s eta 0:00:00


Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 65.7 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 77.2 MB/s eta 0:00:00


Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 390.7 MB/s eta 0:00:00


Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 231.6 MB/s eta 0:00:00


Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 39.2 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 133.2 MB/s eta 0:00:00


Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 507.8 MB/s eta 0:00:00


Collecting typing-extensions>=4.8.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 374.5 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 72.6 MB/s eta 0:00:00


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 177.6 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 144.1 MB/s eta 0:00:00
Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 75.4 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)


Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 525.5 MB/s eta 0:00:00


Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio


Successfully installed MarkupSafe-3.0.2 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.3.0 sympy-1.14.0 torch-2.4.1+cu121 torchaudio-2.4.1+cu121 torchvision-0.19.1+cu121 triton-3.0.0 typing-extensions-4.15.0


torch: 2.4.1+cu121 built CUDA: 12.1
CUDA available: True
GPU: NVIDIA A10-24Q
=== Quick data sanity ===


Train images: 157199
Test images:  60760


Annotation keys: ['annotations', 'images', 'categories', 'info']
Images in JSON: 157199 | Annotations: 157199 | Categories: 267
Min/Max category_id: 0 675
Test JSON images: 60760


Sample submission head:
                                      Id  Category
0  879d74d8-21bc-11ea-a13a-137349068a90       559
1  90243894-21bc-11ea-a13a-137349068a90       629
2  944adb30-21bc-11ea-a13a-137349068a90       192
3  8ced2424-21bc-11ea-a13a-137349068a90       359
4  8aac3a4c-21bc-11ea-a13a-137349068a90         9
Sample submission shape: (60760, 2)
Setup done in 76.6s


In [3]:
# Build grouped 5-fold CV (group = location + sequence), persist folds and label mapping
import json, pandas as pd, numpy as np
from pathlib import Path
from sklearn.model_selection import GroupKFold

base = Path('.')
ann_path = base/"iwildcam2020_train_annotations.json"
with open(ann_path, 'r') as f:
    ann = json.load(f)

images = pd.DataFrame(ann["images"]).copy()
annots = pd.DataFrame(ann["annotations"]).copy()
cats = pd.DataFrame(ann["categories"]).copy()
print("images columns:", images.columns.tolist())
print("annotations columns:", annots.columns.tolist())
print("categories columns:", cats.columns.tolist())

# Merge annotation to image rows (1:1 here, annotation per image)
df = images.merge(annots, left_on='id', right_on='image_id', how='inner')

# Derive grouping keys robustly
def pick(d, keys, default=None):
    for k in keys:
        if k in d and pd.notna(d[k]):
            return d[k]
    return default

loc_key_candidates = ['location','location_id','loc']
seq_key_candidates = ['sequence_id','seq_id','sequence','seq']
cam_key_candidates = ['camera_id','camera','cam_id']

# Apply to dataframe
df['__location'] = df.apply(lambda r: pick(r, loc_key_candidates, 'NA_LOC'), axis=1)
df['__sequence'] = df.apply(lambda r: pick(r, seq_key_candidates, None), axis=1)
df['__camera'] = df.apply(lambda r: pick(r, cam_key_candidates, None), axis=1)

def make_group(r):
    loc = str(r['__location'])
    if r['__sequence'] is not None:
        return f"{loc}_{r['__sequence']}"
    elif r['__camera'] is not None:
        return f"{loc}_{r['__camera']}"
    else:
        return f"{loc}_NASEQ"

df['group'] = df.apply(make_group, axis=1)

# Build category_id -> index mapping using the order in categories list
cat_ids_ordered = cats['id'].tolist()
id2index = {int(cid): i for i, cid in enumerate(cat_ids_ordered)}
index2id = {i: int(cid) for i, cid in enumerate(cat_ids_ordered)}
df['label_index'] = df['category_id'].map(id2index).astype(int)
assert df['label_index'].notnull().all(), "Some category_ids not in mapping"
num_classes = len(cat_ids_ordered)
print("Num classes:", num_classes)

# GroupKFold split
gkf = GroupKFold(n_splits=5)
df['fold'] = -1
for fold, (trn_idx, val_idx) in enumerate(gkf.split(df, groups=df['group'])):
    df.loc[df.index[val_idx], 'fold'] = fold
assert (df['fold']>=0).all(), "Fold assignment failed"

# Quick diagnostics
print(df.groupby('fold').size())
print("Unique groups per fold:", df.groupby('fold')['group'].nunique().to_dict())

# Persist
out_folds = base/"folds.csv"
# Harmonize IDs: use image_id as the canonical image identifier; include file path fields
df['img_id'] = df['image_id']
cols = ['img_id','file_name','image_id','category_id','label_index','group','fold','__location','__sequence','__camera']
df[cols].to_csv(out_folds, index=False)
print("Saved folds to", out_folds.resolve())

# Save label mapping
mapping = {"id2index": id2index, "index2id": index2id, "num_classes": num_classes}
with open(base/"label_mapping.json", 'w') as f:
    json.dump(mapping, f)
print("Saved label_mapping.json")

images columns: ['seq_num_frames', 'location', 'datetime', 'id', 'frame_num', 'seq_id', 'width', 'height', 'file_name']
annotations columns: ['count', 'image_id', 'id', 'category_id']
categories columns: ['count', 'id', 'name']


Num classes: 267
fold
0    31440
1    31440
2    31440
3    31440
4    31439
dtype: int64
Unique groups per fold: {0: 3031, 1: 3032, 2: 3032, 3: 3032, 4: 3030}


Saved folds to /var/lib/simon/agent_run_states/iwildcam-2020-fgvc7-20250924-031313/folds.csv
Saved label_mapping.json


In [5]:
# Strict location-only CV + diagnostics; persist as folds_location.csv
import json, pandas as pd, numpy as np
from pathlib import Path
base = Path('.')

df = pd.read_csv(base/"folds.csv")

# Recreate label mapping for safety
with open(base/"label_mapping.json", 'r') as f:
    mapping = json.load(f)
num_classes = mapping["num_classes"]

# Strict folds: groups = __location; prefer StratifiedGroupKFold if available
use_sgkf = False
try:
    from sklearn.model_selection import StratifiedGroupKFold
    use_sgkf = True
except Exception:
    from sklearn.model_selection import GroupKFold

df_strict = df.copy()
df_strict['fold'] = -1
if use_sgkf:
    sgkf = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
    for f, (_, val_idx) in enumerate(sgkf.split(X=df_strict, y=df_strict['label_index'], groups=df_strict['__location'])):
        df_strict.loc[val_idx, 'fold'] = f
else:
    gkf = GroupKFold(n_splits=5)
    for f, (_, val_idx) in enumerate(gkf.split(df_strict, groups=df_strict['__location'])):
        df_strict.loc[df_strict.index[val_idx], 'fold'] = f
assert (df_strict['fold']>=0).all(), "Strict fold assignment failed"

# Leakage guards
# Ensure locations are not split across folds
max_loc_folds = df_strict.groupby('__location')['fold'].nunique().max()
print("Max folds per location (should be 1):", max_loc_folds)
assert max_loc_folds == 1, "Location split across folds!"

# Within each location, ensure sequences are not split across folds
if df_strict['__sequence'].notna().any():
    grp = df_strict[df_strict['__sequence'].notna()].groupby(['__location','__sequence'])['fold'].nunique()
    max_seq_loc = grp.max() if len(grp) else 1
    print("Max folds per (location,sequence) (should be 1):", max_seq_loc)
    assert max_seq_loc == 1, "A (location,sequence) group split across folds!"

# Class balance diagnostics
ct = pd.crosstab(df_strict['fold'], df_strict['label_index'])
per_class = (ct.max()-ct.min())/ct.replace(0, np.nan).mean()
print("Per-class fold count deviation (summary):")
print(per_class.describe())
missing_any = (ct==0).any()
print("Any class missing in a fold?", bool(missing_any.any()))

# Location vs fold distribution summary
loc_dist = pd.crosstab(df_strict['__location'], df_strict['fold']).sum().to_dict()
print("Counts per fold (strict):", loc_dist)

# Persist strict folds
out_strict = base/"folds_location.csv"
df_strict.to_csv(out_strict, index=False)
print("Saved strict location folds to", out_strict.resolve())

# Note: We'll train with strict location-only folds moving forward for honest CV. Keep folds.csv as secondary split.



Max folds per location (should be 1): 1
Max folds per (location,sequence) (should be 1): 1
Per-class fold count deviation (summary):
count    185.000000
mean       1.527365
std        0.642848
min        0.342321
25%        1.000000
50%        1.400000
75%        1.871287
max        3.862434
dtype: float64
Any class missing in a fold? True
Counts per fold (strict): {0: 20033, 1: 33416, 2: 42820, 3: 35755, 4: 25175}


Saved strict location folds to /var/lib/simon/agent_run_states/iwildcam-2020-fgvc7-20250924-031313/folds_location.csv


In [6]:
# Write baseline training script: ConvNeXt-Tiny @320 with strict folds
import json, os, sys, time
from pathlib import Path

script = r'''
import os, json, time, math, random, argparse
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
import torchvision.transforms as T
import timm

def set_seed(seed=42):
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)

class IWCDataset(Dataset):
    def __init__(self, df, img_dir, label_mapping, img_size=320, train=True):
        self.df = df.reset_index(drop=True)
        self.img_dir = Path(img_dir)
        self.id2index = label_mapping['id2index']
        self.train = train
        self.size = img_size
        self.mean = (0.485,0.456,0.406); self.std = (0.229,0.224,0.225)
        if train:
            self.tf = T.Compose([
                T.RandomResizedCrop(self.size, scale=(0.2,1.0), interpolation=T.InterpolationMode.BICUBIC),
                T.RandomHorizontalFlip(),
                T.ColorJitter(0.2,0.2,0.2,0.1),
                T.RandAugment(num_ops=2, magnitude=8),
                T.ToTensor(),
                T.Normalize(self.mean, self.std),
            ])
        else:
            self.tf = T.Compose([
                T.Resize(int(self.size*1.15), interpolation=T.InterpolationMode.BICUBIC),
                T.CenterCrop(self.size),
                T.ToTensor(),
                T.Normalize(self.mean, self.std),
            ])
    def __len__(self): return len(self.df)
    def __getitem__(self, i):
        r = self.df.iloc[i]
        img_path = self.img_dir / r['file_name']
        with Image.open(img_path) as im:
            im = im.convert('RGB')
            x = self.tf(im)
        y = int(r['label_index']) if 'label_index' in r and not pd.isna(r['label_index']) else -1
        return x, y, r['img_id']

def build_model(num_classes):
    model = timm.create_model('convnext_tiny.in12k', pretrained=True, num_classes=num_classes)
    return model

def get_class_weights(df, num_classes):
    counts = df['label_index'].value_counts().reindex(range(num_classes), fill_value=0).values.astype(np.float32)
    counts[counts==0] = 1.0
    weights = 1.0 / np.sqrt(counts)
    weights = weights / weights.mean()
    return torch.tensor(weights, dtype=torch.float32)

def train_fold(args, fold, df, mapping, device):
    num_classes = mapping['num_classes']
    df_tr = df[df['fold']!=fold].copy()
    df_va = df[df['fold']==fold].copy()
    print(f"Fold {fold}: train {len(df_tr)} | val {len(df_va)}", flush=True)

    train_ds = IWCDataset(df_tr, args.train_dir, mapping, img_size=args.img_size, train=True)
    val_ds   = IWCDataset(df_va, args.train_dir, mapping, img_size=args.img_size, train=False)

    # Use standard loader; optionally weighted CE
    train_loader = DataLoader(train_ds, batch_size=args.batch_size, shuffle=True, num_workers=args.workers, pin_memory=True, drop_last=True)
    val_loader   = DataLoader(val_ds, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True)

    model = build_model(num_classes).to(device)
    ema_model = None
    if args.ema:
        ema_model = timm.utils.ModelEmaV2(model, decay=0.9998)

    lr = args.lr
    opt = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=args.weight_decay)
    lf = lambda x: 0.5*(1+math.cos(math.pi*x/args.epochs))  # cosine
    sched = torch.optim.lr_scheduler.LambdaLR(opt, lr_lambda=lf)
    criterion = nn.CrossEntropyLoss(label_smoothing=args.label_smoothing).to(device)

    scaler = torch.cuda.amp.GradScaler(enabled=args.amp)
    best_acc = 0.0
    oof_logits = np.zeros((len(df_va), num_classes), dtype=np.float32)

    for epoch in range(args.epochs):
        t0 = time.time()
        model.train()
        total, correct, loss_sum = 0, 0, 0.0
        for it,(xb,yb,_) in enumerate(train_loader):
            xb = xb.to(device, non_blocking=True)
            yb = yb.to(device, non_blocking=True)
            opt.zero_grad(set_to_none=True)
            with torch.cuda.amp.autocast(enabled=args.amp):
                logits = model(xb)
                loss = criterion(logits, yb)
            scaler.scale(loss).backward()
            scaler.step(opt); scaler.update()
            if ema_model is not None:
                ema_model.update(model)
            loss_sum += loss.item()*xb.size(0)
            preds = logits.argmax(1)
            correct += (preds==yb).sum().item()
            total += xb.size(0)
            if (it+1)%100==0:
                print(f"Epoch {epoch+1}/{args.epochs} It {it+1} Train acc={correct/total:.4f} loss={loss_sum/total:.4f}", flush=True)
        sched.step()
        # Val
        model.eval()
        if ema_model is not None:
            eval_model = ema_model.module
        else:
            eval_model = model
        val_total, val_correct = 0, 0
        val_logits = []
        with torch.no_grad():
            for xb,yb,_ids in val_loader:
                xb = xb.to(device, non_blocking=True); yb = yb.to(device, non_blocking=True)
                with torch.cuda.amp.autocast(enabled=args.amp):
                    lg = eval_model(xb)
                val_logits.append(lg.float().cpu().numpy())
                preds = lg.argmax(1)
                val_correct += (preds==yb).sum().item()
                val_total += xb.size(0)
        val_acc = val_correct/val_total if val_total else 0.0
        val_logits = np.concatenate(val_logits, axis=0) if len(val_logits) else np.zeros((0, num_classes), dtype=np.float32)
        # Store OOF in original val order
        oof_logits = val_logits
        dt = time.time()-t0
        print(f"Fold {fold} Epoch {epoch+1}: val_acc={val_acc:.4f} time={dt:.1f}s", flush=True)
        if val_acc>best_acc:
            best_acc=val_acc
            ckpt = {'state_dict': model.state_dict(), 'acc': best_acc, 'epoch': epoch+1}
            torch.save(ckpt, Path(args.out_dir)/f"ckpt_fold{fold}.pt")
    # Save OOF logits
    np.save(Path(args.out_dir)/f"oof_logits_fold{fold}.npy", oof_logits)
    print(f"Saved OOF logits for fold {fold}", flush=True)

def infer_test(args, df_test, mapping, ckpt_paths, device, tta_flip=True):
    num_classes = mapping['num_classes']
    class TestDS(Dataset):
        def __init__(self, df, img_dir, size):
            self.df=df.reset_index(drop=True); self.dir=Path(img_dir); self.size=size
            self.mean=(0.485,0.456,0.406); self.std=(0.229,0.224,0.225)
            self.tf = T.Compose([
                T.Resize(int(size*1.15), interpolation=T.InterpolationMode.BICUBIC),
                T.CenterCrop(size), T.ToTensor(), T.Normalize(self.mean,self.std) ])
            self.tf_h = T.Compose([
                T.Resize(int(size*1.15), interpolation=T.InterpolationMode.BICUBIC),
                T.CenterCrop(size), T.functional.hflip, T.ToTensor(), T.Normalize(self.mean,self.std) ])
        def __len__(self): return len(self.df)
        def __getitem__(self, i):
            r=self.df.iloc[i]
            p=self.dir/r['file_name']
            with Image.open(p) as im:
                im=im.convert('RGB')
                x=self.tf(im);
                if {tta_flip}:
                    xh=T.functional.hflip(im)
                    xh=self.tf.transforms[0](xh) if False else None
            return x, r['img_id']
    # Simpler: do only no-flip to avoid complexity in script; flip TTA can be added later
    ds = IWCDataset(df_test, args.test_dir, mapping, img_size=args.img_size, train=False)
    loader = DataLoader(ds, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True)
    logits_sum = np.zeros((len(df_test), num_classes), dtype=np.float32)
    for ckpt in ckpt_paths:
        model = build_model(num_classes).to(device)
        sd = torch.load(ckpt, map_location='cpu')['state_dict']
        model.load_state_dict(sd, strict=True)
        model.eval()
        all_logits=[]
        with torch.no_grad():
            for xb,_,_ids in loader:
                xb = xb.to(device, non_blocking=True)
                with torch.cuda.amp.autocast(enabled=args.amp):
                    lg = model(xb)
                all_logits.append(lg.float().cpu().numpy())
        logits = np.concatenate(all_logits, axis=0)
        logits_sum += logits
    logits_avg = logits_sum/len(ckpt_paths)
    np.save(Path(args.out_dir)/"test_logits.npy", logits_avg)
    print("Saved test logits", flush=True)

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument('--train_dir', default='train')
    ap.add_argument('--test_dir', default='test')
    ap.add_argument('--ann_train', default='iwildcam2020_train_annotations.json')
    ap.add_argument('--test_info', default='iwildcam2020_test_information.json')
    ap.add_argument('--folds_csv', default='folds_location.csv')
    ap.add_argument('--mapping_json', default='label_mapping.json')
    ap.add_argument('--out_dir', default='out_baseline')
    ap.add_argument('--img_size', type=int, default=320)
    ap.add_argument('--batch_size', type=int, default=64)
    ap.add_argument('--workers', type=int, default=8)
    ap.add_argument('--epochs', type=int, default=10)
    ap.add_argument('--lr', type=float, default=3e-4)
    ap.add_argument('--weight_decay', type=float, default=5e-2)
    ap.add_argument('--label_smoothing', type=float, default=0.1)
    ap.add_argument('--amp', action='store_true')
    ap.add_argument('--ema', action='store_true')
    ap.add_argument('--folds', type=str, default='0')
    ap.add_argument('--seed', type=int, default=42)
    args = ap.parse_args()

    set_seed(args.seed)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    Path(args.out_dir).mkdir(parents=True, exist_ok=True)

    with open(args.mapping_json,'r') as f:
        mapping = json.load(f)

    df = pd.read_csv(args.folds_csv)
    # Prepare test df
    test_info = json.load(open(args.test_info,'r'))
    test_images = pd.DataFrame(test_info['images'])
    test_df = test_images[['id','file_name']].copy()
    test_df.rename(columns={'id':'img_id'}, inplace=True)
    # Add dummy label_index for Dataset
    test_df['label_index'] = 0

    folds = [int(f) for f in args.folds.split(',')]
    for f in folds:
        print(f"=== Training fold {f} ===", flush=True)
        train_fold(args, f, df, mapping, device)

    # Collect ckpts for provided folds
    ckpts = [str(Path(args.out_dir)/f"ckpt_fold{f}.pt") for f in folds if Path(args.out_dir)/f"ckpt_fold{f}.pt" ]
    if len(ckpts)>0:
        infer_test(args, test_df, mapping, ckpts, device)

if __name__ == '__main__':
    main()
'''

Path('train_full_baseline.py').write_text(script)
print('Wrote train_full_baseline.py')

# Install deps for training
import subprocess, sys
def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)
pip('install','-c','constraints.txt','timm==1.0.9','albumentations==1.4.14','opencv-python-headless==4.10.0.84','--upgrade-strategy','only-if-needed')
print('Deps installed')

Wrote train_full_baseline.py
> pip install -c constraints.txt timm==1.0.9 albumentations==1.4.14 opencv-python-headless==4.10.0.84 --upgrade-strategy only-if-needed


Collecting timm==1.0.9
  Downloading timm-1.0.9-py3-none-any.whl (2.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 30.9 MB/s eta 0:00:00


Collecting albumentations==1.4.14
  Downloading albumentations-1.4.14-py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.0/178.0 KB 365.2 MB/s eta 0:00:00
Collecting opencv-python-headless==4.10.0.84
  Downloading opencv_python_headless-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.9/49.9 MB 64.5 MB/s eta 0:00:00
Collecting huggingface_hub
  Downloading huggingface_hub-0.35.1-py3-none-any.whl (563 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 563.3/563.3 KB 455.6 MB/s eta 0:00:00


Collecting torchvision
  Downloading torchvision-0.19.1-cp311-cp311-manylinux1_x86_64.whl (7.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 124.7 MB/s eta 0:00:00
Collecting pyyaml


  Downloading PyYAML-6.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (762 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 763.0/763.0 KB 110.8 MB/s eta 0:00:00
Collecting safetensors


  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.8/485.8 KB 495.6 MB/s eta 0:00:00
Collecting torch


  Downloading torch-2.4.1-cp311-cp311-manylinux1_x86_64.whl (797.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 55.8 MB/s eta 0:00:00


Collecting eval-type-backport
  Downloading eval_type_backport-0.2.2-py3-none-any.whl (5.8 kB)
Collecting scipy>=1.10.0


  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 171.1 MB/s eta 0:00:00
Collecting typing-extensions>=4.9.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 318.9 MB/s eta 0:00:00


Collecting numpy>=1.24.4
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 32.1 MB/s eta 0:00:00
Collecting scikit-image>=0.21.0
  Downloading scikit_image-0.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/14.8 MB 162.4 MB/s eta 0:00:00
Collecting albucore>=0.0.13
  Downloading albucore-0.0.33-py3-none-any.whl (18 kB)


Collecting pydantic>=2.7.0
  Downloading pydantic-2.11.9-py3-none-any.whl (444 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 444.9/444.9 KB 523.9 MB/s eta 0:00:00


Collecting simsimd>=5.9.2
  Downloading simsimd-6.5.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 91.2 MB/s eta 0:00:00


Collecting stringzilla>=3.10.4
  Downloading stringzilla-4.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (496 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 496.5/496.5 KB 303.5 MB/s eta 0:00:00
Collecting annotated-types>=0.6.0
  Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)


Collecting pydantic-core==2.33.2
  Downloading pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 147.6 MB/s eta 0:00:00
Collecting typing-inspection>=0.4.0
  Downloading typing_inspection-0.4.1-py3-none-any.whl (14 kB)
Collecting tifffile>=2022.8.12
  Downloading tifffile-2025.9.20-py3-none-any.whl (230 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.1/230.1 KB 490.2 MB/s eta 0:00:00


Collecting packaging>=21
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 411.1 MB/s eta 0:00:00
Collecting lazy-loader>=0.4
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)


Collecting pillow>=10.1
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 60.6 MB/s eta 0:00:00
Collecting networkx>=3.0
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 283.3 MB/s eta 0:00:00
Collecting imageio!=2.35.0,>=2.33


  Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.8/315.8 KB 78.3 MB/s eta 0:00:00
Collecting hf-xet<2.0.0,>=1.1.3
  Downloading hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 288.8 MB/s eta 0:00:00


Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 347.3 MB/s eta 0:00:00
Collecting tqdm>=4.42.1
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 463.4 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting fsspec>=2023.5.0
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 470.2 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 150.5 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 502.8 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 118.1 MB/s eta 0:00:00


Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 105.7 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 95.7 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 63.8 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 111.9 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1


  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 47.6 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 451.1 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 34.0 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 325.0 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106


  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 133.1 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 103.4 MB/s eta 0:00:00
Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 62.9 MB/s eta 0:00:00
Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 121.4 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)
Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 502.1 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1


  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 427.7 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 445.1 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 459.5 MB/s eta 0:00:00
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 511.0 MB/s eta 0:00:00


Installing collected packages: simsimd, mpmath, urllib3, typing-extensions, tqdm, sympy, stringzilla, safetensors, pyyaml, pillow, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, idna, hf-xet, fsspec, filelock, eval-type-backport, charset_normalizer, certifi, annotated-types, typing-inspection, triton, tifffile, scipy, requests, pydantic-core, opencv-python-headless, nvidia-cusparse-cu12, nvidia-cudnn-cu12, lazy-loader, jinja2, imageio, scikit-image, pydantic, nvidia-cusolver-cu12, huggingface_hub, albucore, torch, albumentations, torchvision, timm


Successfully installed MarkupSafe-3.0.2 albucore-0.0.33 albumentations-1.4.14 annotated-types-0.7.0 certifi-2025.8.3 charset_normalizer-3.4.3 eval-type-backport-0.2.2 filelock-3.19.1 fsspec-2025.9.0 hf-xet-1.1.10 huggingface_hub-0.35.1 idna-3.10 imageio-2.37.0 jinja2-3.1.6 lazy-loader-0.4 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 opencv-python-headless-4.10.0.84 packaging-25.0 pillow-11.3.0 pydantic-2.11.9 pydantic-core-2.33.2 pyyaml-6.0.2 requests-2.32.5 safetensors-0.6.2 scikit-image-0.25.2 scipy-1.16.2 simsimd-6.5.3 stringzilla-4.0.14 sympy-1.14.0 tifffile-2025.9.20 timm-1.0.9 torch-2.4.1 torchvision-0.19.1 tqdm-4.67.1 triton-3.0.0 typin





Deps installed


In [11]:
# Run a quick sanity training on fold 0 (1 epoch) with AMP+EMA
import subprocess, time, sys
cmd = [sys.executable, 'train_full_baseline.py',
       '--folds', '0',
       '--epochs', '1',
       '--img_size', '320',
       '--batch_size', '64',
       '--workers', '8',
       '--out_dir', 'out_baseline_f0',
       '--amp',
       '--ema'
      ]
print('Running:', ' '.join(cmd), flush=True)
t0=time.time()
subprocess.run(cmd, check=False)
print(f'Done in {time.time()-t0:.1f}s', flush=True)

Running: /usr/bin/python3.11 train_full_baseline.py --folds 0 --epochs 1 --img_size 320 --batch_size 64 --workers 8 --out_dir out_baseline_f0 --amp --ema


=== Training fold 0 ===
Fold 0: train 137166 | val 20033


  scaler = torch.cuda.amp.GradScaler(enabled=args.amp)


Epoch 1/1 It 100 Train acc=0.3278 loss=3.7881


Epoch 1/1 It 200 Train acc=0.3442 loss=3.6236


Epoch 1/1 It 300 Train acc=0.3573 loss=3.4779


Epoch 1/1 It 400 Train acc=0.3735 loss=3.3362


Epoch 1/1 It 500 Train acc=0.3844 loss=3.2345


Epoch 1/1 It 600 Train acc=0.3944 loss=3.1548


Epoch 1/1 It 700 Train acc=0.4041 loss=3.0852


Epoch 1/1 It 800 Train acc=0.4129 loss=3.0225


Epoch 1/1 It 900 Train acc=0.4211 loss=2.9721


Epoch 1/1 It 1000 Train acc=0.4277 loss=2.9291


Epoch 1/1 It 1100 Train acc=0.4354 loss=2.8857


Epoch 1/1 It 1200 Train acc=0.4430 loss=2.8444


Epoch 1/1 It 1300 Train acc=0.4497 loss=2.8083


Epoch 1/1 It 1400 Train acc=0.4563 loss=2.7745


Epoch 1/1 It 1500 Train acc=0.4629 loss=2.7417


Epoch 1/1 It 1600 Train acc=0.4697 loss=2.7101


Epoch 1/1 It 1700 Train acc=0.4760 loss=2.6809


Epoch 1/1 It 1800 Train acc=0.4828 loss=2.6524


Epoch 1/1 It 1900 Train acc=0.4895 loss=2.6251


Epoch 1/1 It 2000 Train acc=0.4964 loss=2.5974


Epoch 1/1 It 2100 Train acc=0.5025 loss=2.5730


  with torch.cuda.amp.autocast(enabled=args.amp):


Fold 0 Epoch 1: val_acc=0.1819 time=576.7s
Saved OOF logits for fold 0


  sd = torch.load(ckpt, map_location='cpu')['state_dict']


  with torch.cuda.amp.autocast(enabled=args.amp):


Saved test logits


Done in 801.9s


In [8]:
# Patch train_full_baseline.py to handle truncated images
from pathlib import Path
p = Path('train_full_baseline.py')
txt = p.read_text()
if 'ImageFile.LOAD_TRUNCATED_IMAGES = True' not in txt:
    txt = txt.replace('from PIL import Image', 'from PIL import Image, ImageFile\nImageFile.LOAD_TRUNCATED_IMAGES = True')
    p.write_text(txt)
    print('Patched train_full_baseline.py to enable LOAD_TRUNCATED_IMAGES')
else:
    print('Patch already applied')

Patched train_full_baseline.py to enable LOAD_TRUNCATED_IMAGES


In [12]:
# Build submission.csv from test logits (out_baseline_f0/test_logits.npy) mapped to sample_submission order
import json, numpy as np, pandas as pd
from pathlib import Path

logits_path = Path('out_baseline_f0')/'test_logits.npy'
mapping_path = Path('label_mapping.json')
test_info_path = Path('iwildcam2020_test_information.json')
sample_path = Path('sample_submission.csv')

assert logits_path.exists(), f"Missing logits at {logits_path}"
logits = np.load(logits_path)
with open(mapping_path,'r') as f:
    mapping = json.load(f)
index2id = {int(k): int(v) for k,v in mapping['index2id'].items()}

test_info = json.load(open(test_info_path,'r'))
test_images = pd.DataFrame(test_info['images'])
test_ids = test_images['id'].tolist()
id_to_pos = {tid:i for i,tid in enumerate(test_ids)}
assert logits.shape[0] == len(test_ids), f"Logits rows {logits.shape[0]} != test images {len(test_ids)}"

pred_idx = logits.argmax(axis=1).astype(int)
pred_cat = [index2id[int(i)] for i in pred_idx]

samp = pd.read_csv(sample_path)
def map_pred(row):
    pos = id_to_pos.get(row['Id'], None)
    if pos is None:
        return pred_cat[0]
    return pred_cat[pos]
samp['Category'] = samp.apply(map_pred, axis=1)
samp.to_csv('submission.csv', index=False)
print('Wrote submission.csv with shape', samp.shape)

Wrote submission.csv with shape (60760, 2)


In [10]:
# Patch train_full_baseline.py to robustly handle unreadable images in __getitem__
from pathlib import Path
p = Path('train_full_baseline.py')
txt = p.read_text()
old = "with Image.open(img_path) as im:\n            im = im.convert('RGB')\n            x = self.tf(im)"
new = ("try:\n            im = Image.open(img_path).convert('RGB')\n        except Exception as e:\n            im = Image.new('RGB', (self.size, self.size))\n        x = self.tf(im)")
if old in txt:
    txt = txt.replace(old, new)
    p.write_text(txt)
    print('Patched __getitem__ with try/except fallback to black image')
else:
    print('Expected snippet not found; no patch applied')

Patched __getitem__ with try/except fallback to black image


In [13]:
# Write improved training script v2: convnext_tiny.fb_in22k @384, warmup+cosine, class weights, MixUp, RandErasing, EMA, hflip TTA, seq-avg
from pathlib import Path
script = r'''
import os, json, time, math, random, argparse
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as T
import timm
from timm.data.mixup import Mixup
from timm.loss import SoftTargetCrossEntropy

def set_seed(seed=42):
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)

class IWCDataset(Dataset):
    def __init__(self, df, img_dir, img_size=384, train=True):
        self.df = df.reset_index(drop=True)
        self.img_dir = Path(img_dir)
        self.train = train
        self.size = img_size
        self.mean = (0.485,0.456,0.406); self.std = (0.229,0.224,0.225)
        if train:
            self.tf = T.Compose([
                T.RandomResizedCrop(self.size, scale=(0.2,1.0), interpolation=T.InterpolationMode.BICUBIC),
                T.RandomHorizontalFlip(),
                T.ColorJitter(0.2,0.2,0.2,0.1),
                T.RandAugment(num_ops=2, magnitude=8),
                T.ToTensor(),
                T.RandomErasing(p=0.1, scale=(0.02,0.2), ratio=(0.3,3.3)),
                T.Normalize(self.mean, self.std),
            ])
        else:
            self.tf = T.Compose([
                T.Resize(int(self.size*1.15), interpolation=T.InterpolationMode.BICUBIC),
                T.CenterCrop(self.size),
                T.ToTensor(),
                T.Normalize(self.mean, self.std),
            ])
    def __len__(self): return len(self.df)
    def __getitem__(self, i):
        r = self.df.iloc[i]
        img_path = self.img_dir / r['file_name']
        try:
            im = Image.open(img_path).convert('RGB')
        except Exception:
            im = Image.new('RGB', (self.size, self.size))
        x = self.tf(im)
        y = int(r['label_index']) if 'label_index' in r and not pd.isna(r['label_index']) else -1
        return x, y, r['img_id']

def build_model(num_classes):
    model = timm.create_model('convnext_tiny.fb_in22k', pretrained=True, num_classes=num_classes)
    return model

def cosine_warmup_scheduler(optimizer, warmup_steps, total_steps):
    def lr_lambda(step):
        if step < warmup_steps:
            return float(step) / float(max(1, warmup_steps))
        prog = float(step - warmup_steps) / float(max(1, total_steps - warmup_steps))
        return 0.5 * (1.0 + math.cos(math.pi * prog))
    return torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda)

def compute_class_weights(df, num_classes):
    cnt = df['label_index'].value_counts().reindex(range(num_classes), fill_value=0).values.astype(np.float32)
    cnt[cnt==0] = 1.0
    w = 1.0 / np.sqrt(cnt)
    w = w / w.mean()
    return torch.tensor(w, dtype=torch.float32)

def train_fold(args, fold, df, num_classes, device):
    df_tr = df[df['fold']!=fold].copy()
    df_va = df[df['fold']==fold].copy()
    print(f"Fold {fold}: train {len(df_tr)} | val {len(df_va)}", flush=True)

    train_ds = IWCDataset(df_tr, args.train_dir, img_size=args.img_size, train=True)
    val_ds   = IWCDataset(df_va, args.train_dir, img_size=args.img_size, train=False)
    train_loader = DataLoader(train_ds, batch_size=args.batch_size, shuffle=True, num_workers=args.workers, pin_memory=True, drop_last=True)
    val_loader   = DataLoader(val_ds, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True)

    model = build_model(num_classes).to(device)
    ema_model = timm.utils.ModelEmaV2(model, decay=0.9998) if args.ema else None

    opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
    total_steps = args.epochs * max(1, len(train_loader))
    warmup_steps = max(1, int(0.1 * total_steps))
    sched = cosine_warmup_scheduler(opt, warmup_steps, total_steps)

    mixup_fn = Mixup(mixup_alpha=args.mixup_alpha, cutmix_alpha=args.mixup_alpha, prob=0.3, label_smoothing=args.label_smoothing, num_classes=num_classes) if args.mixup_alpha>0 else None
    if mixup_fn is not None:
        criterion = SoftTargetCrossEntropy().to(device)
    else:
        criterion = nn.CrossEntropyLoss(label_smoothing=args.label_smoothing).to(device)

    class_weights = compute_class_weights(df_tr, num_classes).to(device)
    if mixup_fn is None and args.use_class_weights:
        criterion = nn.CrossEntropyLoss(weight=class_weights, label_smoothing=args.label_smoothing).to(device)

    scaler = torch.amp.GradScaler('cuda', enabled=args.amp)
    best_acc = 0.0
    oof_logits = np.zeros((len(df_va), num_classes), dtype=np.float32)
    oof_ids = df_va['img_id'].values

    global_step = 0
    for epoch in range(args.epochs):
        t0 = time.time()
        model.train()
        total, correct, loss_sum = 0, 0, 0.0
        for it,(xb,yb,_) in enumerate(train_loader):
            xb = xb.to(device, non_blocking=True)
            yb = yb.to(device, non_blocking=True)
            opt.zero_grad(set_to_none=True)
            with torch.amp.autocast('cuda', enabled=args.amp):
                if mixup_fn is not None:
                    xb, yb_smooth = mixup_fn(xb, yb)
                    logits = model(xb)
                    loss = criterion(logits, yb_smooth)
                else:
                    logits = model(xb)
                    loss = criterion(logits, yb)
            scaler.scale(loss).backward()
            scaler.step(opt); scaler.update()
            if ema_model is not None:
                ema_model.update(model)
            loss_sum += loss.item()*xb.size(0)
            if mixup_fn is None:
                preds = logits.argmax(1)
                correct += (preds==yb).sum().item()
                total += xb.size(0)
            global_step += 1
            sched.step()
            if (it+1)%200==0:
                tr_acc = (correct/total) if total>0 else 0.0
                print(f"Epoch {epoch+1}/{args.epochs} It {it+1} acc={tr_acc:.4f} loss={loss_sum/max(1,total):.4f}", flush=True)
        # Val
        model.eval(); eval_model = ema_model.module if ema_model is not None else model
        val_total, val_correct = 0, 0
        val_logits = []
        with torch.no_grad():
            for xb,yb,_ids in val_loader:
                xb = xb.to(device, non_blocking=True); yb = yb.to(device, non_blocking=True)
                with torch.amp.autocast('cuda', enabled=args.amp):
                    lg = eval_model(xb)
                val_logits.append(lg.float().cpu().numpy())
                preds = lg.argmax(1)
                val_correct += (preds==yb).sum().item()
                val_total += xb.size(0)
        val_acc = val_correct/val_total if val_total else 0.0
        val_logits = np.concatenate(val_logits, axis=0) if len(val_logits) else np.zeros((0, num_classes), dtype=np.float32)
        oof_logits = val_logits
        dt = time.time()-t0
        print(f"Fold {fold} Epoch {epoch+1}: val_acc={val_acc:.4f} time={dt:.1f}s", flush=True)
        if val_acc>best_acc:
            best_acc=val_acc
            ckpt = {'state_dict': model.state_dict(), 'acc': best_acc, 'epoch': epoch+1}
            torch.save(ckpt, Path(args.out_dir)/f"ckpt_fold{fold}.pt")
    np.save(Path(args.out_dir)/f"oof_logits_fold{fold}.npy", oof_logits)
    pd.DataFrame({'img_id':oof_ids}).to_csv(Path(args.out_dir)/f"oof_ids_fold{fold}.csv", index=False)
    print(f"Saved OOF for fold {fold}", flush=True)

def infer_test(args, df_test, num_classes, ckpt_paths, device, tta_flip=True):
    ds = IWCDataset(df_test, args.test_dir, img_size=args.img_size, train=False)
    loader = DataLoader(ds, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True)
    def run_model(model):
        all_logits=[]
        with torch.no_grad():
            for xb,_,_ids in loader:
                xb = xb.to(device, non_blocking=True)
                with torch.amp.autocast('cuda', enabled=args.amp):
                    lg = model(xb)
                if tta_flip:
                    with torch.amp.autocast('cuda', enabled=args.amp):
                        lg_h = model(torch.flip(xb, dims=[3]))
                    lg = (lg + lg_h) * 0.5
                all_logits.append(lg.float().cpu().numpy())
        return np.concatenate(all_logits, axis=0)
    logits_sum = np.zeros((len(ds), num_classes), dtype=np.float32)
    for ckpt in ckpt_paths:
        model = build_model(num_classes).to(device)
        sd = torch.load(ckpt, map_location='cpu')['state_dict']
        model.load_state_dict(sd, strict=True)
        model.eval()
        logits_sum += run_model(model)
    logits_avg = logits_sum/len(ckpt_paths)
    np.save(Path(args.out_dir)/"test_logits.npy", logits_avg)
    print("Saved test logits", flush=True)

def seq_average_logits(df_items, logits, seq_col):
    arr = logits.copy()
    if seq_col not in df_items.columns: return arr
    seq = df_items[seq_col].fillna(df_items.get('img_id', None)).values
    dfL = pd.DataFrame(arr)
    dfL['__seq'] = seq
    grp = dfL.groupby('__seq').mean()
    arr2 = dfL['__seq'].map(grp.to_dict(orient='index')).apply(lambda d: np.array(list(d.values()))).values
    return np.stack(arr2, axis=0)

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument('--train_dir', default='train')
    ap.add_argument('--test_dir', default='test')
    ap.add_argument('--folds_csv', default='folds_location.csv')
    ap.add_argument('--test_info', default='iwildcam2020_test_information.json')
    ap.add_argument('--out_dir', default='out_full_v2')
    ap.add_argument('--img_size', type=int, default=384)
    ap.add_argument('--batch_size', type=int, default=48)
    ap.add_argument('--workers', type=int, default=8)
    ap.add_argument('--epochs', type=int, default=12)
    ap.add_argument('--lr', type=float, default=3e-4)
    ap.add_argument('--weight_decay', type=float, default=5e-2)
    ap.add_argument('--label_smoothing', type=float, default=0.1)
    ap.add_argument('--mixup_alpha', type=float, default=0.2)
    ap.add_argument('--use_class_weights', action='store_true')
    ap.add_argument('--amp', action='store_true')
    ap.add_argument('--ema', action='store_true')
    ap.add_argument('--folds', type=str, default='0,1,2,3,4')
    ap.add_argument('--seed', type=int, default=42)
    args = ap.parse_args()

    set_seed(args.seed)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    Path(args.out_dir).mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(args.folds_csv)
    num_classes = int(df['label_index'].max())+1

    # Train folds
    folds = [int(f) for f in args.folds.split(',')]
    for f in folds:
        print(f"=== Training fold {f} ===", flush=True)
        train_fold(args, f, df, num_classes, device)

    # Prepare test df
    test_info = json.load(open(args.test_info,'r'))
    test_images = pd.DataFrame(test_info['images'])
    test_df = test_images[['id','file_name','seq_id']].copy() if 'seq_id' in test_images.columns else test_images[['id','file_name']].copy()
    test_df.rename(columns={'id':'img_id'}, inplace=True)
    test_df['label_index'] = 0

    ckpts = [str(Path(args.out_dir)/f"ckpt_fold{f}.pt") for f in folds if (Path(args.out_dir)/f"ckpt_fold{f}.pt").exists()]
    if len(ckpts)>0:
        infer_test(args, test_df, num_classes, ckpts, device, tta_flip=True)
        # Sequence average test logits if seq_id present
        test_logits = np.load(Path(args.out_dir)/"test_logits.npy")
        if 'seq_id' in test_df.columns:
            test_logits = seq_average_logits(test_df.rename(columns={'seq_id':'__sequence'}), test_logits, '__sequence')
            np.save(Path(args.out_dir)/"test_logits_seqavg.npy", test_logits)
            print('Saved test_logits_seqavg.npy', flush=True)

if __name__ == '__main__':
    main()
'''
Path('train_full_v2.py').write_text(script)
print('Wrote train_full_v2.py')

Wrote train_full_v2.py


In [14]:
# Launch full-image training v2: convnext_tiny.fb_in22k @384 on fold 0 (12 epochs) with AMP+EMA
import subprocess, sys, time
cmd = [sys.executable, 'train_full_v2.py',
       '--folds', '0',
       '--epochs', '12',
       '--img_size', '384',
       '--batch_size', '48',
       '--workers', '8',
       '--out_dir', 'out_full_v2_f0',
       '--amp',
       '--ema',
       '--use_class_weights'
      ]
print('Running:', ' '.join(cmd), flush=True)
t0=time.time()
subprocess.run(cmd, check=False)
print(f'Done in {time.time()-t0:.1f}s', flush=True)

Running: /usr/bin/python3.11 train_full_v2.py --folds 0 --epochs 12 --img_size 384 --batch_size 48 --workers 8 --out_dir out_full_v2_f0 --amp --ema --use_class_weights


=== Training fold 0 ===
Fold 0: train 137166 | val 20033




Epoch 1/12 It 200 acc=0.0000 loss=40399.8433


Epoch 1/12 It 400 acc=0.0000 loss=66597.1120


Epoch 1/12 It 600 acc=0.0000 loss=89126.9188


Epoch 1/12 It 800 acc=0.0000 loss=109322.6824


Epoch 1/12 It 1000 acc=0.0000 loss=128365.5903


Epoch 1/12 It 1200 acc=0.0000 loss=146219.9182


Epoch 1/12 It 1400 acc=0.0000 loss=164297.7572


Epoch 1/12 It 1600 acc=0.0000 loss=181888.8980


Epoch 1/12 It 1800 acc=0.0000 loss=199129.2058


Epoch 1/12 It 2000 acc=0.0000 loss=215967.0547


Epoch 1/12 It 2200 acc=0.0000 loss=232636.3737


Epoch 1/12 It 2400 acc=0.0000 loss=248878.9198


Epoch 1/12 It 2600 acc=0.0000 loss=265261.9527


Epoch 1/12 It 2800 acc=0.0000 loss=281754.8540


Fold 0 Epoch 1: val_acc=0.3568 time=807.8s


Epoch 2/12 It 200 acc=0.0000 loss=16363.2322


Epoch 2/12 It 400 acc=0.0000 loss=32706.6270


Epoch 2/12 It 600 acc=0.0000 loss=48629.1064


Epoch 2/12 It 800 acc=0.0000 loss=64740.3722


Epoch 2/12 It 1000 acc=0.0000 loss=80550.5109


Epoch 2/12 It 1200 acc=0.0000 loss=96330.2317


Epoch 2/12 It 1400 acc=0.0000 loss=112070.4427


Epoch 2/12 It 1600 acc=0.0000 loss=127784.3943


Epoch 2/12 It 1800 acc=0.0000 loss=143731.3352


Epoch 2/12 It 2000 acc=0.0000 loss=159108.6871


Epoch 2/12 It 2200 acc=0.0000 loss=174185.7473


Epoch 2/12 It 2400 acc=0.0000 loss=189322.2568


Epoch 2/12 It 2600 acc=0.0000 loss=204673.4063


Epoch 2/12 It 2800 acc=0.0000 loss=219861.5538


Fold 0 Epoch 2: val_acc=0.6470 time=809.8s


Epoch 3/12 It 200 acc=0.0000 loss=14736.2666


Epoch 3/12 It 400 acc=0.0000 loss=29657.8535


Epoch 3/12 It 600 acc=0.0000 loss=44015.3720


Epoch 3/12 It 800 acc=0.0000 loss=59018.8258


Epoch 3/12 It 1000 acc=0.0000 loss=73288.0639


Epoch 3/12 It 1200 acc=0.0000 loss=87523.1626


Epoch 3/12 It 1400 acc=0.0000 loss=101628.0560


Epoch 3/12 It 1600 acc=0.0000 loss=115771.6632


Epoch 3/12 It 1800 acc=0.0000 loss=130519.4548


Epoch 3/12 It 2000 acc=0.0000 loss=144486.8973


Epoch 3/12 It 2200 acc=0.0000 loss=158613.0127


Epoch 3/12 It 2400 acc=0.0000 loss=172903.3754


Epoch 3/12 It 2600 acc=0.0000 loss=187212.7174


Epoch 3/12 It 2800 acc=0.0000 loss=200818.6659


Fold 0 Epoch 3: val_acc=0.6506 time=808.9s


Epoch 4/12 It 200 acc=0.0000 loss=13645.1121


Epoch 4/12 It 400 acc=0.0000 loss=27144.1021


Epoch 4/12 It 600 acc=0.0000 loss=41040.1738


Epoch 4/12 It 800 acc=0.0000 loss=54503.9124


Epoch 4/12 It 1000 acc=0.0000 loss=67978.7010


Epoch 4/12 It 1200 acc=0.0000 loss=81697.5413


Epoch 4/12 It 1400 acc=0.0000 loss=95231.1192


Epoch 4/12 It 1600 acc=0.0000 loss=108833.5964


Epoch 4/12 It 1800 acc=0.0000 loss=122232.0518


Epoch 4/12 It 2000 acc=0.0000 loss=135908.1179


Epoch 4/12 It 2200 acc=0.0000 loss=149053.9619


Epoch 4/12 It 2400 acc=0.0000 loss=162308.3228


Epoch 4/12 It 2600 acc=0.0000 loss=175191.2190


Epoch 4/12 It 2800 acc=0.0000 loss=188706.1630


Fold 0 Epoch 4: val_acc=0.6461 time=805.8s


Epoch 5/12 It 200 acc=0.0000 loss=12615.7880


Epoch 5/12 It 400 acc=0.0000 loss=25589.5087


Epoch 5/12 It 600 acc=0.0000 loss=38363.3616


Epoch 5/12 It 800 acc=0.0000 loss=51701.0082


Epoch 5/12 It 1000 acc=0.0000 loss=64781.0636


Epoch 5/12 It 1200 acc=0.0000 loss=77739.5638


Epoch 5/12 It 1400 acc=0.0000 loss=90303.8545


Epoch 5/12 It 1600 acc=0.0000 loss=102927.2162


Epoch 5/12 It 1800 acc=0.0000 loss=115673.9833


Epoch 5/12 It 2000 acc=0.0000 loss=128036.5956


Epoch 5/12 It 2200 acc=0.0000 loss=140895.7407


Epoch 5/12 It 2400 acc=0.0000 loss=153694.6379


Epoch 5/12 It 2600 acc=0.0000 loss=166264.1060


Epoch 5/12 It 2800 acc=0.0000 loss=178765.6713


Fold 0 Epoch 5: val_acc=0.6427 time=808.3s


Epoch 6/12 It 200 acc=0.0000 loss=12520.1302


Epoch 6/12 It 400 acc=0.0000 loss=24907.0555


Epoch 6/12 It 600 acc=0.0000 loss=37102.3930


Epoch 6/12 It 800 acc=0.0000 loss=49737.2121


Epoch 6/12 It 1000 acc=0.0000 loss=61652.9624


Epoch 6/12 It 1200 acc=0.0000 loss=73780.5848


Epoch 6/12 It 1400 acc=0.0000 loss=86051.9642


Epoch 6/12 It 1600 acc=0.0000 loss=98069.5959


Epoch 6/12 It 1800 acc=0.0000 loss=109496.9642


Epoch 6/12 It 2000 acc=0.0000 loss=121596.1948


Epoch 6/12 It 2200 acc=0.0000 loss=134282.6419


Epoch 6/12 It 2400 acc=0.0000 loss=146152.3110


Epoch 6/12 It 2600 acc=0.0000 loss=158247.6754


Epoch 6/12 It 2800 acc=0.0000 loss=170262.5437


Fold 0 Epoch 6: val_acc=0.6418 time=807.9s


Epoch 7/12 It 200 acc=0.0000 loss=11769.1642


Epoch 7/12 It 400 acc=0.0000 loss=23719.3452


Epoch 7/12 It 600 acc=0.0000 loss=35215.5675


Epoch 7/12 It 800 acc=0.0000 loss=46705.9218


Epoch 7/12 It 1000 acc=0.0000 loss=58456.4828


Epoch 7/12 It 1200 acc=0.0000 loss=70560.5385


Epoch 7/12 It 1400 acc=0.0000 loss=82483.7172


Epoch 7/12 It 1600 acc=0.0000 loss=93693.1681


Epoch 7/12 It 1800 acc=0.0000 loss=105945.8140


Epoch 7/12 It 2000 acc=0.0000 loss=117675.4165


Epoch 7/12 It 2200 acc=0.0000 loss=129094.0863


Epoch 7/12 It 2400 acc=0.0000 loss=141139.3606


Epoch 7/12 It 2600 acc=0.0000 loss=152511.1553


Epoch 7/12 It 2800 acc=0.0000 loss=164464.1305


Fold 0 Epoch 7: val_acc=0.6324 time=809.7s


Epoch 8/12 It 200 acc=0.0000 loss=11251.9852


Epoch 8/12 It 400 acc=0.0000 loss=22782.4123


Epoch 8/12 It 600 acc=0.0000 loss=34661.7415


Epoch 8/12 It 800 acc=0.0000 loss=46127.8521


Epoch 8/12 It 1000 acc=0.0000 loss=57493.7116


Epoch 8/12 It 1200 acc=0.0000 loss=69354.3163


Epoch 8/12 It 1400 acc=0.0000 loss=81081.3546


Epoch 8/12 It 1600 acc=0.0000 loss=92726.2169


Epoch 8/12 It 1800 acc=0.0000 loss=103963.8173


Epoch 8/12 It 2000 acc=0.0000 loss=115681.8390


Epoch 8/12 It 2200 acc=0.0000 loss=126989.9050


Epoch 8/12 It 2400 acc=0.0000 loss=137730.0566


Epoch 8/12 It 2600 acc=0.0000 loss=148960.3206


Epoch 8/12 It 2800 acc=0.0000 loss=159933.4481


Fold 0 Epoch 8: val_acc=0.6257 time=804.4s


Epoch 9/12 It 200 acc=0.0000 loss=11140.9686


Epoch 9/12 It 400 acc=0.0000 loss=22379.6783


Epoch 9/12 It 600 acc=0.0000 loss=33716.8451


Epoch 9/12 It 800 acc=0.0000 loss=44505.7173


Epoch 9/12 It 1000 acc=0.0000 loss=55598.5424


Epoch 9/12 It 1200 acc=0.0000 loss=66132.2448


Epoch 9/12 It 1400 acc=0.0000 loss=76761.1670


Epoch 9/12 It 1600 acc=0.0000 loss=87997.9391


Epoch 9/12 It 1800 acc=0.0000 loss=98665.8707


Epoch 9/12 It 2000 acc=0.0000 loss=110009.3875


Epoch 9/12 It 2200 acc=0.0000 loss=121196.0563


Epoch 9/12 It 2400 acc=0.0000 loss=132256.9939


Epoch 9/12 It 2600 acc=0.0000 loss=143305.7166


Epoch 9/12 It 2800 acc=0.0000 loss=154178.5914


Fold 0 Epoch 9: val_acc=0.6227 time=807.1s


Epoch 10/12 It 200 acc=0.0000 loss=11023.9889


Epoch 10/12 It 400 acc=0.0000 loss=21731.9409


Epoch 10/12 It 600 acc=0.0000 loss=32876.9200


Epoch 10/12 It 800 acc=0.0000 loss=43332.0505


Epoch 10/12 It 1000 acc=0.0000 loss=54071.7115


Epoch 10/12 It 1200 acc=0.0000 loss=64991.7568


Epoch 10/12 It 1400 acc=0.0000 loss=76066.4077


Epoch 10/12 It 1600 acc=0.0000 loss=86834.3710


Epoch 10/12 It 1800 acc=0.0000 loss=98020.7873


Epoch 10/12 It 2000 acc=0.0000 loss=108968.3738


Epoch 10/12 It 2200 acc=0.0000 loss=119516.6926


Epoch 10/12 It 2400 acc=0.0000 loss=130215.9614


Epoch 10/12 It 2600 acc=0.0000 loss=140843.6445


Epoch 10/12 It 2800 acc=0.0000 loss=151094.2758


Fold 0 Epoch 10: val_acc=0.6204 time=807.2s


Epoch 11/12 It 200 acc=0.0000 loss=10838.7582


Epoch 11/12 It 400 acc=0.0000 loss=21507.1923


Epoch 11/12 It 600 acc=0.0000 loss=32292.0309


Epoch 11/12 It 800 acc=0.0000 loss=42899.2608


Epoch 11/12 It 1000 acc=0.0000 loss=53551.7698


Epoch 11/12 It 1200 acc=0.0000 loss=64037.2302


Epoch 11/12 It 1400 acc=0.0000 loss=74661.1837


Epoch 11/12 It 1600 acc=0.0000 loss=85766.8362


Epoch 11/12 It 1800 acc=0.0000 loss=96437.7258


Epoch 11/12 It 2000 acc=0.0000 loss=106936.7424


Epoch 11/12 It 2200 acc=0.0000 loss=117726.9780


Epoch 11/12 It 2400 acc=0.0000 loss=128477.8598


Epoch 11/12 It 2600 acc=0.0000 loss=138948.0814


Epoch 11/12 It 2800 acc=0.0000 loss=149354.2162


Fold 0 Epoch 11: val_acc=0.6192 time=807.3s


Epoch 12/12 It 200 acc=0.0000 loss=10112.5092


Epoch 12/12 It 400 acc=0.0000 loss=20754.8592


Epoch 12/12 It 600 acc=0.0000 loss=31276.2369


Epoch 12/12 It 800 acc=0.0000 loss=41628.1333


Epoch 12/12 It 1000 acc=0.0000 loss=51982.5222


Epoch 12/12 It 1200 acc=0.0000 loss=62298.0956


Epoch 12/12 It 1400 acc=0.0000 loss=72787.1997


Epoch 12/12 It 1600 acc=0.0000 loss=83439.8224


Epoch 12/12 It 1800 acc=0.0000 loss=93963.8391


Epoch 12/12 It 2000 acc=0.0000 loss=104436.7317


Epoch 12/12 It 2200 acc=0.0000 loss=114994.7463


Epoch 12/12 It 2400 acc=0.0000 loss=125064.8363


Epoch 12/12 It 2600 acc=0.0000 loss=135757.0178


Epoch 12/12 It 2800 acc=0.0000 loss=146711.8227


Fold 0 Epoch 12: val_acc=0.6183 time=805.6s
Saved OOF for fold 0


  sd = torch.load(ckpt, map_location='cpu')['state_dict']


Saved test logits


Saved test_logits_seqavg.npy


Done in 9947.7s


In [18]:
# Parse MegaDetector results using image 'id' -> file_name mapping; output per-image detections
import json
from pathlib import Path
import pandas as pd

base = Path('.')
md_path = base/'iwildcam2020_megadetector_results.json'

# Load train/test metadata
ann_train = json.load(open(base/'iwildcam2020_train_annotations.json','r'))
test_info = json.load(open(base/'iwildcam2020_test_information.json','r'))

# Build mappings by image id
train_img_df = pd.DataFrame(ann_train['images'])
test_img_df = pd.DataFrame(test_info['images'])

train_by_id = {str(r['id']): r for r in ann_train['images']}
test_by_id = {str(r['id']): r for r in test_info['images']}

def get_wh(meta):
    w = meta.get('width', None); h = meta.get('height', None)
    return (int(w), int(h)) if w is not None and h is not None else (None, None)

# Load MD
md = json.load(open(md_path,'r'))
items = md.get('images', md)

det_train = {}
det_test = {}
miss = 0

for it in items:
    img_id = str(it.get('id', ''))
    if not img_id:
        miss += 1
        continue
    # Determine split and metadata
    meta = train_by_id.get(img_id)
    split = 'train'
    if meta is None:
        meta = test_by_id.get(img_id)
        split = 'test' if meta is not None else None
    if meta is None:
        miss += 1
        continue
    name = meta['file_name']
    W, H = get_wh(meta)
    dets = it.get('detections', []) or []
    out = []
    for d in dets:
        cat = str(d.get('category', '1'))
        if cat != '1':
            continue
        conf = float(d.get('conf', d.get('confidence', 0.0)))
        bbox = d.get('bbox', d.get('bbox_xywh', None))
        if bbox is None or len(bbox) != 4:
            continue
        x, y, w, h = [float(v) for v in bbox]
        record = {'conf': conf, 'bbox_norm': [x, y, w, h]}
        if W is not None and H is not None:
            record['bbox_px'] = [x*W, y*H, w*W, h*H]
        out.append(record)
    if split == 'train':
        det_train[name] = out
    elif split == 'test':
        det_test[name] = out

out_path = base/'md_detections.json'
with open(out_path, 'w') as f:
    json.dump({'train': det_train, 'test': det_test}, f)
print('Saved md_detections.json', 'train imgs:', len(det_train), 'test imgs:', len(det_test), 'miss items:', miss)

Saved md_detections.json train imgs: 157181 test imgs: 60759 miss items: 62870


In [21]:
# Write MD-crop training script (multi-box averaging, thresholds, NMS, hflip TTA)
from pathlib import Path
script = r'''
import os, json, math, time, random, argparse
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as T
import timm

def set_seed(seed=42):
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)

def nms(boxes, scores, iou_thr=0.5):
    if len(boxes) == 0: return []
    boxes = np.array(boxes, dtype=np.float32)
    scores = np.array(scores, dtype=np.float32)
    x1,y1,w,h = boxes[:,0], boxes[:,1], boxes[:,2], boxes[:,3]
    x2 = x1 + w; y2 = y1 + h
    order = scores.argsort()[::-1]
    keep = []
    areas = w*h
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])
        inter = np.maximum(0.0, xx2-xx1) * np.maximum(0.0, yy2-yy1)
        iou = inter / (areas[i] + areas[order[1:]] - inter + 1e-9)
        inds = np.where(iou <= iou_thr)[0]
        order = order[inds+1]
    return keep

class MDCropDataset(Dataset):
    def __init__(self, df, img_dir, md_json, img_size=320, train=True, conf_hi=0.8, conf_lo=0.6, pad=0.15, min_side=32, min_area_frac=0.005, topk=3):
        self.df = df.reset_index(drop=True)
        self.dir = Path(img_dir)
        self.train = train
        self.size = img_size
        self.conf_hi = conf_hi; self.conf_lo = conf_lo; self.pad = pad
        self.min_side = min_side; self.min_area_frac = min_area_frac; self.topk = topk
        self.mean=(0.485,0.456,0.406); self.std=(0.229,0.224,0.225)
        self.md = json.load(open(md_json,'r'))
        self.md_map = self.md.get('train',{}) if 'train' in self.md else self.md
        if not train:
            self.md_map = self.md.get('test', self.md_map)
        self.tf_tr = T.Compose([
            T.RandomResizedCrop(self.size, scale=(0.3,1.0), interpolation=T.InterpolationMode.BICUBIC),
            T.RandomHorizontalFlip(),
            T.ColorJitter(0.1,0.1,0.1,0.05),
            T.ToTensor(),
            T.Normalize(self.mean,self.std),
        ])
        self.tf_te = T.Compose([
            T.Resize(int(self.size*1.15), interpolation=T.InterpolationMode.BICUBIC),
            T.CenterCrop(self.size),
            T.ToTensor(),
            T.Normalize(self.mean,self.std),
        ])
    def __len__(self): return len(self.df)
    def get_boxes(self, name, W, H):
        dets = self.md_map.get(name, [])
        boxes_hi, scores_hi = [], []
        boxes_lo, scores_lo = [], []
        min_area = max(self.min_side*self.min_side, self.min_area_frac*W*H)
        for d in dets:
            conf = float(d.get('conf',0.0))
            b = d.get('bbox_px')
            if b is None:
                bn = d.get('bbox_norm',[0,0,0,0]); b = [bn[0]*W,bn[1]*H,bn[2]*W,bn[3]*H]
            x,y,w,h = b
            if w < self.min_side or h < self.min_side or (w*h) < min_area:
                continue
            cx = x + w/2; cy = y + h/2
            w2 = w*(1+self.pad*2); h2 = h*(1+self.pad*2)
            x2 = max(0, cx - w2/2); y2 = max(0, cy - h2/2)
            w2 = min(w2, W - x2); h2 = min(h2, H - y2)
            if conf >= self.conf_hi:
                boxes_hi.append([x2,y2,w2,h2]); scores_hi.append(conf)
            elif conf >= self.conf_lo:
                boxes_lo.append([x2,y2,w2,h2]); scores_lo.append(conf)
        sel = []
        if boxes_hi:
            keep = nms(boxes_hi, scores_hi, iou_thr=0.5)
            idx = sorted(keep, key=lambda i: scores_hi[i], reverse=True)[:self.topk]
            sel = [boxes_hi[i] for i in idx]
        elif boxes_lo:
            keep = nms(boxes_lo, scores_lo, iou_thr=0.5)
            idx = sorted(keep, key=lambda i: scores_lo[i], reverse=True)[:self.topk]
            sel = [boxes_lo[i] for i in idx]
        return sel
    def crop_img(self, im, box):
        x,y,w,h = box
        return im.crop((x,y,x+w,y+h))
    def __getitem__(self, i):
        r = self.df.iloc[i]
        name = r['file_name']
        p = self.dir / name
        try:
            im = Image.open(p).convert('RGB')
        except Exception:
            im = Image.new('RGB', (self.size, self.size))
        W,H = im.size
        boxes = self.get_boxes(name, W, H)
        if len(boxes)==0:
            if self.train:
                crop = im
                x = self.tf_tr(crop)
            else:
                x = self.tf_te(im)
        else:
            if self.train:
                crop = self.crop_img(im, random.choice(boxes))
                x = self.tf_tr(crop)
            else:
                # For test, we will ignore x and re-crop/transform per-box in infer loop
                x = self.tf_te(self.crop_img(im, boxes[0]))
        y = int(r['label_index']) if 'label_index' in r and not pd.isna(r['label_index']) else -1
        return x, y, r['img_id'], name, (W,H), boxes

def build_model(num_classes):
    return timm.create_model('convnext_tiny.fb_in22k', pretrained=True, num_classes=num_classes)

# Custom collate to avoid collating variable-length metadata (boxes) in batches
def collate_xy(batch):
    xs, ys = [], []
    for b in batch:
        xs.append(b[0])
        ys.append(b[1])
    xs = torch.stack(xs, dim=0)
    ys = torch.tensor(ys, dtype=torch.long)
    return xs, ys

def train_fold(args, fold, df, num_classes, device):
    df_tr = df[df['fold']!=fold].copy(); df_va = df[df['fold']==fold].copy()
    print(f"Fold {fold}: train {len(df_tr)} | val {len(df_va)}", flush=True)
    tr_ds = MDCropDataset(df_tr, args.train_dir, args.md_json, img_size=args.img_size, train=True, conf_hi=args.conf_hi, conf_lo=args.conf_lo, pad=args.pad, topk=args.topk)
    va_ds = MDCropDataset(df_va, args.train_dir, args.md_json, img_size=args.img_size, train=False, conf_hi=args.conf_hi, conf_lo=args.conf_lo, pad=args.pad, topk=args.topk)
    tr_ld = DataLoader(tr_ds, batch_size=args.batch_size, shuffle=True, num_workers=args.workers, pin_memory=True, drop_last=True, collate_fn=collate_xy)
    va_ld = DataLoader(va_ds, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True, collate_fn=collate_xy)
    model = build_model(num_classes).to(device)
    ema = timm.utils.ModelEmaV2(model, decay=0.9998) if args.ema else None
    opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
    criterion = nn.CrossEntropyLoss(label_smoothing=0.1).to(device)
    scaler = torch.amp.GradScaler('cuda', enabled=args.amp)
    best = 0.0
    for ep in range(args.epochs):
        t0=time.time(); model.train(); tot=cor=ls=0.0
        for it,(xb,yb) in enumerate(tr_ld):
            xb=xb.to(device); yb=yb.to(device); opt.zero_grad(set_to_none=True)
            with torch.amp.autocast('cuda', enabled=args.amp):
                lg=model(xb); loss=criterion(lg,yb)
            scaler.scale(loss).backward(); scaler.step(opt); scaler.update()
            if ema: ema.update(model)
            ls += loss.item()*xb.size(0);
            pr=lg.argmax(1); cor += (pr==yb).sum().item(); tot += xb.size(0)
            if (it+1)%200==0:
                print(f"Ep {ep+1}/{args.epochs} It {it+1} acc={cor/max(1,tot):.4f} loss={ls/max(1,tot):.4f}", flush=True)
        # val
        model.eval(); m=ema.module if ema else model; vtot=vcor=0;
        with torch.no_grad():
            for xb,yb in va_ld:
                xb=xb.to(device); yb=yb.to(device)
                with torch.amp.autocast('cuda', enabled=args.amp):
                    lg=m(xb)
                vcor += (lg.argmax(1)==yb).sum().item(); vtot += xb.size(0)
        vacc=vcor/max(1,vtot); dt=time.time()-t0
        print(f"Fold {fold} Ep {ep+1}: val_acc={vacc:.4f} time={dt:.1f}s", flush=True)
        if vacc>best:
            best=vacc; torch.save({'state_dict':model.state_dict(),'acc':best,'epoch':ep+1}, Path(args.out_dir)/f"ckpt_fold{fold}.pt")

def infer_test(args, df_test, num_classes, ckpt_paths, device):
    # We'll do per-image multi-box averaging with hflip TTA
    ds = MDCropDataset(df_test, args.test_dir, args.md_json, img_size=args.img_size, train=False, conf_hi=args.conf_hi, conf_lo=args.conf_lo, pad=args.pad, topk=args.topk)
    ld = DataLoader(ds, batch_size=1, shuffle=False, num_workers=args.workers, pin_memory=True)
    # Define test transform locally to re-crop per box
    mean=(0.485,0.456,0.406); std=(0.229,0.224,0.225)
    tf_te = T.Compose([
        T.Resize(int(args.img_size*1.15), interpolation=T.InterpolationMode.BICUBIC),
        T.CenterCrop(args.img_size),
        T.ToTensor(),
        T.Normalize(mean,std),
    ])
    def run_model(model):
        outs=[]
        with torch.no_grad():
            for _,_,_id,name,wh,boxes in ld:
                # reopen image
                p = Path(args.test_dir)/name[0]
                try:
                    im = Image.open(p).convert('RGB')
                except Exception:
                    im = Image.new('RGB', (args.img_size, args.img_size))
                if len(boxes[0]) == 0:
                    crops = [im]
                else:
                    crops = [im.crop((x,y,x+w,y+h)) for (x,y,w,h) in boxes[0]]
                logits_sum = None; n=0
                for crop in crops:
                    x = tf_te(crop).unsqueeze(0).to(device)
                    with torch.amp.autocast('cuda', enabled=args.amp):
                        lg = model(x)
                        if args.tta_flip:
                            lg_h = model(torch.flip(x, dims=[3]))
                            lg = 0.5*(lg+lg_h)
                    lg = lg.float()
                    logits_sum = lg if logits_sum is None else (logits_sum + lg)
                    n += 1
                logits = (logits_sum / max(1,n)).squeeze(0).cpu().numpy()
                outs.append(logits)
        return np.stack(outs,0)
    agg = np.zeros((len(ds), num_classes), dtype=np.float32)
    for ck in ckpt_paths:
        m = build_model(num_classes).to(device); sd=torch.load(ck,map_location='cpu')['state_dict']; m.load_state_dict(sd, strict=True); m.eval()
        agg += run_model(m)
    agg /= max(1,len(ckpt_paths))
    np.save(Path(args.out_dir)/'test_logits.npy', agg); print('Saved MD test logits', flush=True)

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument('--train_dir', default='train')
    ap.add_argument('--test_dir', default='test')
    ap.add_argument('--folds_csv', default='folds_location.csv')
    ap.add_argument('--md_json', default='md_detections.json')
    ap.add_argument('--out_dir', default='out_md_v1')
    ap.add_argument('--img_size', type=int, default=320)
    ap.add_argument('--batch_size', type=int, default=64)
    ap.add_argument('--workers', type=int, default=8)
    ap.add_argument('--epochs', type=int, default=8)
    ap.add_argument('--lr', type=float, default=3e-4)
    ap.add_argument('--weight_decay', type=float, default=5e-2)
    ap.add_argument('--amp', action='store_true')
    ap.add_argument('--ema', action='store_true')
    ap.add_argument('--tta_flip', action='store_true')
    ap.add_argument('--folds', type=str, default='0,1,2,3,4')
    ap.add_argument('--conf_hi', type=float, default=0.8)
    ap.add_argument('--conf_lo', type=float, default=0.6)
    ap.add_argument('--pad', type=float, default=0.15)
    ap.add_argument('--topk', type=int, default=3)
    args = ap.parse_args()
    set_seed(42); device='cuda' if torch.cuda.is_available() else 'cpu'; Path(args.out_dir).mkdir(parents=True, exist_ok=True)
    df = pd.read_csv(args.folds_csv); num_classes = int(df['label_index'].max())+1
    folds = [int(f) for f in args.folds.split(',')]
    for f in folds:
        print(f'=== MD train fold {f} ===', flush=True)
        train_fold(args, f, df, num_classes, device)
    # test
    test_info = json.load(open('iwildcam2020_test_information.json','r'))
    test_df = pd.DataFrame(test_info['images'])[['id','file_name']].rename(columns={'id':'img_id'}); test_df['label_index']=0
    ckpts = [str(Path(args.out_dir)/f"ckpt_fold{f}.pt") for f in folds if (Path(args.out_dir)/f"ckpt_fold{f}.pt").exists()]
    if len(ckpts)>0:
        infer_test(args, test_df, num_classes, ckpts, device)

if __name__=='__main__':
    main()
'''
Path('train_md_v1.py').write_text(script)
print('Wrote train_md_v1.py')

Wrote train_md_v1.py


In [24]:
# Blending and submission builder: full-image + MD-crops with rules, then sequence averaging
import json, numpy as np, pandas as pd
from pathlib import Path

full_dir = Path('out_full_v2_f0')  # adjust later if averaging multiple folds
md_dir = Path('out_md_v1_f0')  # updated to use fold-0 MD output
mapping_path = Path('label_mapping.json')
ann_train_path = Path('iwildcam2020_train_annotations.json')
test_info_path = Path('iwildcam2020_test_information.json')
sample_path = Path('sample_submission.csv')
md_det_path = Path('md_detections.json')

assert full_dir.exists(), 'Full-image out dir missing'
with open(mapping_path,'r') as f: mapping = json.load(f)
index2id = {int(k): int(v) for k,v in mapping['index2id'].items()}

# Locate empty class index if available
empty_idx = None
try:
    ann = json.load(open(ann_train_path,'r'))
    cats = pd.DataFrame(ann['categories'])
    if 'name' in cats.columns:
        empty_rows = cats[cats['name'].str.lower()=='empty']
        if len(empty_rows)>0:
            empty_cid = int(empty_rows.iloc[0]['id'])
            empty_idx = int({int(cid):i for i,cid in enumerate(cats['id'].tolist())}[empty_cid])
except Exception:
    pass

# Load test info
test_info = json.load(open(test_info_path,'r'))
test_df = pd.DataFrame(test_info['images'])
has_seq = 'seq_id' in test_df.columns

def seq_average_logits(df_items, logits):
    if not has_seq: return logits
    seq = df_items['seq_id'].values
    dfL = pd.DataFrame(logits)
    dfL['__seq'] = seq
    grp = dfL.groupby('__seq').mean()
    mapper = grp.to_dict(orient='index')
    arr = np.stack(dfL['__seq'].map(lambda s: np.array(list(mapper[s].values()))).values, axis=0)
    return arr

# Load logits
full_logits_path = full_dir/'test_logits.npy'
assert full_logits_path.exists(), f'Missing full-image logits at {full_logits_path}'
full_logits = np.load(full_logits_path)

md_logits = None
if (md_dir/'test_logits.npy').exists():
    md_logits = np.load(md_dir/'test_logits.npy')
    assert md_logits.shape == full_logits.shape, f'MD logits shape {md_logits.shape} != full {full_logits.shape}'

# MD-based confidence flags per image
hi_conf = np.zeros(len(test_df), dtype=bool)
lo_conf = np.zeros(len(test_df), dtype=bool)
if md_det_path.exists():
    md = json.load(open(md_det_path,'r'))
    det_map = md.get('test', {}) if 'test' in md else md
    name_to_idx = {n:i for i,n in enumerate(test_df['file_name'].tolist())}
    for name, dets in det_map.items():
        i = name_to_idx.get(name, None)
        if i is None: continue
        confs = [float(d.get('conf',0.0)) for d in dets]
        if any(c>=0.8 for c in confs):
            hi_conf[i] = True
        elif any(c>=0.6 for c in confs):
            lo_conf[i] = True

# Blend logits per rules
blend = full_logits.copy()
if md_logits is not None:
    # hi-conf: 0.7*md + 0.3*full
    mask_hi = hi_conf
    blend[mask_hi] = 0.7*md_logits[mask_hi] + 0.3*full_logits[mask_hi]
    # low-conf: 0.5/0.5
    mask_lo = (~mask_hi) & lo_conf
    blend[mask_lo] = 0.5*md_logits[mask_lo] + 0.5*full_logits[mask_lo]
else:
    mask_hi = np.zeros(len(test_df), dtype=bool)
    mask_lo = np.zeros(len(test_df), dtype=bool)

# No-crop fallback: optionally add small bias to empty class
if empty_idx is not None:
    none_mask = (~mask_hi) & (~mask_lo)
    if none_mask.any():
        bias = np.zeros_like(blend[0])
        bias[empty_idx] = 0.1
        blend[none_mask] = blend[none_mask] + bias

# Sequence averaging after blending
blend_seq = seq_average_logits(test_df, blend)

# Map to submission
pred_idx = blend_seq.argmax(axis=1).astype(int)
pred_cat = [index2id[int(i)] for i in pred_idx]
samp = pd.read_csv(sample_path)
id_to_pos = {tid:i for i,tid in enumerate(test_df['id'].tolist())}
def map_pred(row):
    pos = id_to_pos.get(row['Id'], None)
    return pred_cat[pos] if pos is not None else pred_cat[0]
samp['Category'] = samp.apply(map_pred, axis=1)
samp.to_csv('submission.csv', index=False)
print('Wrote submission.csv with shape', samp.shape, 'has_seq' if has_seq else 'no_seq')

Wrote submission.csv with shape (60760, 2) has_seq


In [17]:
# Debug MegaDetector JSON: inspect paths and basename matching to train/test
import json, os
from pathlib import Path
import pandas as pd

md_path = Path('iwildcam2020_megadetector_results.json')

# Load train/test names via json.load (pd.read_json expects tabular structure)
train_ann = json.load(open('iwildcam2020_train_annotations.json','r'))
test_info = json.load(open('iwildcam2020_test_information.json','r'))
train_names = set([img['file_name'] for img in train_ann['images']])
test_names = set([img['file_name'] for img in test_info['images']])

def fname_only(p):
    return Path(p).name if p else None

md = json.load(open(md_path,'r'))
items = md.get('images', md)
print('Total MD items:', len(items))
print('MD first item keys:', list(items[0].keys()) if len(items)>0 else [])

hits_train = hits_test = 0
sample = []
for it in items[:50]:
    f = it.get('file') or it.get('image_path') or it.get('filename')
    name = fname_only(f)
    in_train = (name in train_names)
    in_test = (name in test_names)
    if in_train: hits_train += 1
    if in_test: hits_test += 1
    sample.append((f, name, in_train, in_test, len(it.get('detections', []))))
print('First 5 samples (orig_path, basename, in_train, in_test, ndets):')
for row in sample[:5]:
    print(row)
print('Basename matches -> train:', hits_train, '| test:', hits_test)

# Also check if MD filenames include subdirs identical to our dirs
subdir_hits = {'train/':0,'test/':0}
for it in items[:5000]:
    f = (it.get('file') or it.get('image_path') or it.get('filename') or '')
    for k in subdir_hits:
        if k in f: subdir_hits[k]+=1
print('Subdir markers in MD JSON:', subdir_hits)

# Count how many MD basenames exist on disk in train/ and test/ dirs
from os.path import exists
on_disk_train = on_disk_test = 0
for it in items[:2000]:
    f = it.get('file') or it.get('image_path') or it.get('filename')
    name = fname_only(f)
    if name and (Path('train')/name).exists(): on_disk_train += 1
    if name and (Path('test')/name).exists(): on_disk_test += 1
print('On-disk basename matches -> train:', on_disk_train, '| test:', on_disk_test)

Total MD items: 280810
MD first item keys: ['detections', 'id', 'max_detection_conf']
First 5 samples (orig_path, basename, in_train, in_test, ndets):
(None, None, False, False, 1)
(None, None, False, False, 1)
(None, None, False, False, 0)
(None, None, False, False, 7)
(None, None, False, False, 1)
Basename matches -> train: 0 | test: 0
Subdir markers in MD JSON: {'train/': 0, 'test/': 0}
On-disk basename matches -> train: 0 | test: 0


In [22]:
# Launch MD-crop training (fold 0) with AMP+EMA and flip-TTA; img_size=320
import subprocess, sys, time
cmd = [sys.executable, 'train_md_v1.py',
       '--folds', '0',
       '--img_size', '320',
       '--batch_size', '64',
       '--workers', '8',
       '--epochs', '8',
       '--out_dir', 'out_md_v1_f0',
       '--amp',
       '--ema',
       '--tta_flip'
      ]
print('Running:', ' '.join(cmd), flush=True)
t0=time.time()
subprocess.run(cmd, check=False)
print(f'Done in {time.time()-t0:.1f}s', flush=True)

Running: /usr/bin/python3.11 train_md_v1.py --folds 0 --img_size 320 --batch_size 64 --workers 8 --epochs 8 --out_dir out_md_v1_f0 --amp --ema --tta_flip


=== MD train fold 0 ===
Fold 0: train 137166 | val 20033


Ep 1/8 It 200 acc=0.6178 loss=2.2649


Ep 1/8 It 400 acc=0.6864 loss=1.9755


Ep 1/8 It 600 acc=0.7182 loss=1.8467


Ep 1/8 It 800 acc=0.7380 loss=1.7680


Ep 1/8 It 1000 acc=0.7516 loss=1.7132


Ep 1/8 It 1200 acc=0.7628 loss=1.6694


Ep 1/8 It 1400 acc=0.7712 loss=1.6351


Ep 1/8 It 1600 acc=0.7779 loss=1.6082


Ep 1/8 It 1800 acc=0.7842 loss=1.5839


Ep 1/8 It 2000 acc=0.7894 loss=1.5642


Fold 0 Ep 1: val_acc=0.4461 time=578.3s


Ep 2/8 It 200 acc=0.8632 loss=1.2916


Ep 2/8 It 400 acc=0.8616 loss=1.2986


Ep 2/8 It 600 acc=0.8630 loss=1.2963


Ep 2/8 It 800 acc=0.8637 loss=1.2929


Ep 2/8 It 1000 acc=0.8649 loss=1.2875


Ep 2/8 It 1200 acc=0.8658 loss=1.2842


Ep 2/8 It 1400 acc=0.8667 loss=1.2815


Ep 2/8 It 1600 acc=0.8677 loss=1.2773


Ep 2/8 It 1800 acc=0.8687 loss=1.2738


Ep 2/8 It 2000 acc=0.8698 loss=1.2703


Fold 0 Ep 2: val_acc=0.5754 time=578.6s


Ep 3/8 It 200 acc=0.8984 loss=1.1769


Ep 3/8 It 400 acc=0.9012 loss=1.1738


Ep 3/8 It 600 acc=0.9006 loss=1.1738


Ep 3/8 It 800 acc=0.9003 loss=1.1736


Ep 3/8 It 1000 acc=0.9007 loss=1.1734


Ep 3/8 It 1200 acc=0.9008 loss=1.1737


Ep 3/8 It 1400 acc=0.9009 loss=1.1727


Ep 3/8 It 1600 acc=0.9012 loss=1.1712


Ep 3/8 It 1800 acc=0.9014 loss=1.1706


Ep 3/8 It 2000 acc=0.9017 loss=1.1697


Fold 0 Ep 3: val_acc=0.5835 time=579.9s


Ep 4/8 It 200 acc=0.9223 loss=1.1101


Ep 4/8 It 400 acc=0.9202 loss=1.1159


Ep 4/8 It 600 acc=0.9189 loss=1.1203


Ep 4/8 It 800 acc=0.9189 loss=1.1196


Ep 4/8 It 1000 acc=0.9193 loss=1.1183


Ep 4/8 It 1200 acc=0.9190 loss=1.1194


Ep 4/8 It 1400 acc=0.9193 loss=1.1193


Ep 4/8 It 1600 acc=0.9189 loss=1.1200


Ep 4/8 It 1800 acc=0.9187 loss=1.1202


Ep 4/8 It 2000 acc=0.9185 loss=1.1197


Fold 0 Ep 4: val_acc=0.5713 time=579.2s


Ep 5/8 It 200 acc=0.9302 loss=1.0820


Ep 5/8 It 400 acc=0.9301 loss=1.0829


Ep 5/8 It 600 acc=0.9295 loss=1.0836


Ep 5/8 It 800 acc=0.9287 loss=1.0849


Ep 5/8 It 1000 acc=0.9272 loss=1.0901


Ep 5/8 It 1200 acc=0.9278 loss=1.0897


Ep 5/8 It 1400 acc=0.9272 loss=1.0918


Ep 5/8 It 1600 acc=0.9275 loss=1.0906


Ep 5/8 It 1800 acc=0.9279 loss=1.0892


Ep 5/8 It 2000 acc=0.9279 loss=1.0893


Fold 0 Ep 5: val_acc=0.5565 time=580.8s


Ep 6/8 It 200 acc=0.9373 loss=1.0618


Ep 6/8 It 400 acc=0.9370 loss=1.0630


Ep 6/8 It 600 acc=0.9371 loss=1.0632


Ep 6/8 It 800 acc=0.9366 loss=1.0630


Ep 6/8 It 1000 acc=0.9370 loss=1.0625


Ep 6/8 It 1200 acc=0.9371 loss=1.0623


Ep 6/8 It 1400 acc=0.9369 loss=1.0629


Ep 6/8 It 1600 acc=0.9363 loss=1.0643


Ep 6/8 It 1800 acc=0.9359 loss=1.0654


Ep 6/8 It 2000 acc=0.9355 loss=1.0664


Fold 0 Ep 6: val_acc=0.5435 time=576.8s


Ep 7/8 It 200 acc=0.9413 loss=1.0471


Ep 7/8 It 400 acc=0.9422 loss=1.0437


Ep 7/8 It 600 acc=0.9411 loss=1.0471


Ep 7/8 It 800 acc=0.9411 loss=1.0473


Ep 7/8 It 1000 acc=0.9396 loss=1.0519


Ep 7/8 It 1200 acc=0.9394 loss=1.0519


Ep 7/8 It 1400 acc=0.9398 loss=1.0518


Ep 7/8 It 1600 acc=0.9396 loss=1.0525


Ep 7/8 It 1800 acc=0.9398 loss=1.0519


Ep 7/8 It 2000 acc=0.9397 loss=1.0526


Fold 0 Ep 7: val_acc=0.5353 time=581.5s


Ep 8/8 It 200 acc=0.9481 loss=1.0221


Ep 8/8 It 400 acc=0.9460 loss=1.0288


Ep 8/8 It 600 acc=0.9460 loss=1.0313


Ep 8/8 It 800 acc=0.9456 loss=1.0323


Ep 8/8 It 1000 acc=0.9449 loss=1.0347


Ep 8/8 It 1200 acc=0.9446 loss=1.0361


Ep 8/8 It 1400 acc=0.9444 loss=1.0371


Ep 8/8 It 1600 acc=0.9440 loss=1.0384


Ep 8/8 It 1800 acc=0.9437 loss=1.0389


Ep 8/8 It 2000 acc=0.9433 loss=1.0400


Fold 0 Ep 8: val_acc=0.5265 time=577.0s


  m = build_model(num_classes).to(device); sd=torch.load(ck,map_location='cpu')['state_dict']; m.load_state_dict(sd, strict=True); m.eval()


Traceback (most recent call last):
  File "/var/lib/simon/agent_run_states/iwildcam-2020-fgvc7-20250924-031313/train_md_v1.py", line 260, in <module>
    main()
  File "/var/lib/simon/agent_run_states/iwildcam-2020-fgvc7-20250924-031313/train_md_v1.py", line 257, in main
    infer_test(args, test_df, num_classes, ckpts, device)
  File "/var/lib/simon/agent_run_states/iwildcam-2020-fgvc7-20250924-031313/train_md_v1.py", line 220, in infer_test
    agg += run_model(m)
           ^^^^^^^^^^^^
  File "/var/lib/simon/agent_run_states/iwildcam-2020-fgvc7-20250924-031313/train_md_v1.py", line 199, in run_model
    if len(boxes[0]) == 0:
           ~~~~~^^^
IndexError: list index out of range


Done in 4642.3s


In [23]:
# Run MD-crop inference only (epochs=0) then blend with full-image logits to make submission
import os, sys, time, json, numpy as np, pandas as pd
from pathlib import Path
import subprocess

def run(cmd):
    print('Running:', ' '.join(cmd), flush=True)
    return subprocess.run(cmd, check=False)

# 1) MD inference (reuse existing ckpt_fold0.pt), skip training by epochs=0
md_out = Path('out_md_v1_f0'); md_out.mkdir(exist_ok=True, parents=True)
cmd = [sys.executable, 'train_md_v1.py',
       '--folds','0',
       '--epochs','0',
       '--img_size','320',
       '--batch_size','1',
       '--workers','8',
       '--out_dir','out_md_v1_f0',
       '--amp',
       '--tta_flip'
]
t0=time.time(); run(cmd); print(f'MD inference done in {time.time()-t0:.1f}s', flush=True)
assert (md_out/'ckpt_fold0.pt').exists(), 'Missing MD ckpt_fold0.pt (ensure prior training finished)'
assert (md_out/'test_logits.npy').exists(), 'MD test logits not found after inference'

# 2) Blend MD + Full logits and build submission
full_dir = Path('out_full_v2_f0')
md_dir = md_out
mapping_path = Path('label_mapping.json')
ann_train_path = Path('iwildcam2020_train_annotations.json')
test_info_path = Path('iwildcam2020_test_information.json')
sample_path = Path('sample_submission.csv')
md_det_path = Path('md_detections.json')

with open(mapping_path,'r') as f: mapping = json.load(f)
index2id = {int(k): int(v) for k,v in mapping['index2id'].items()}

test_info = json.load(open(test_info_path,'r'))
test_df = pd.DataFrame(test_info['images'])
has_seq = 'seq_id' in test_df.columns

full_logits = np.load(full_dir/'test_logits.npy')
md_logits = np.load(md_dir/'test_logits.npy')
assert md_logits.shape == full_logits.shape, f'MD {md_logits.shape} != Full {full_logits.shape}'

# empty class index if available
empty_idx = None
try:
    ann = json.load(open(ann_train_path,'r'))
    cats = pd.DataFrame(ann['categories'])
    if 'name' in cats.columns:
        row = cats[cats['name'].str.lower()=='empty']
        if len(row)>0:
            empty_cid = int(row.iloc[0]['id'])
            id2index = {int(cid):i for i,cid in enumerate(cats['id'].tolist())}
            empty_idx = int(id2index[empty_cid])
except Exception:
    pass

# MD confidence flags
hi_conf = np.zeros(len(test_df), dtype=bool)
lo_conf = np.zeros(len(test_df), dtype=bool)
if md_det_path.exists():
    md = json.load(open(md_det_path,'r'))
    det_map = md.get('test', {}) if 'test' in md else md
    name_to_idx = {n:i for i,n in enumerate(test_df['file_name'].tolist())}
    for name, dets in det_map.items():
        i = name_to_idx.get(name, None)
        if i is None: continue
        confs = [float(d.get('conf',0.0)) for d in dets]
        if any(c>=0.8 for c in confs):
            hi_conf[i] = True
        elif any(c>=0.6 for c in confs):
            lo_conf[i] = True

# Blend per rules
blend = full_logits.copy()
mask_hi = hi_conf
blend[mask_hi] = 0.7*md_logits[mask_hi] + 0.3*full_logits[mask_hi]
mask_lo = (~mask_hi) & lo_conf
blend[mask_lo] = 0.5*md_logits[mask_lo] + 0.5*full_logits[mask_lo]

if empty_idx is not None:
    none_mask = (~mask_hi) & (~mask_lo)
    if none_mask.any():
        bias = np.zeros_like(blend[0]); bias[empty_idx]=0.1
        blend[none_mask] = blend[none_mask] + bias

def seq_average_logits(df_items, logits):
    if not has_seq: return logits
    seq = df_items['seq_id'].values
    dfL = pd.DataFrame(logits)
    dfL['__seq'] = seq
    grp = dfL.groupby('__seq').mean()
    mapper = grp.to_dict(orient='index')
    arr = np.stack(dfL['__seq'].map(lambda s: np.array(list(mapper[s].values()))).values, axis=0)
    return arr

blend_seq = seq_average_logits(test_df, blend)
pred_idx = blend_seq.argmax(axis=1).astype(int)
pred_cat = [index2id[int(i)] for i in pred_idx]
samp = pd.read_csv(sample_path)
id_to_pos = {tid:i for i,tid in enumerate(test_df['id'].tolist())}
samp['Category'] = samp['Id'].map(lambda tid: pred_cat[id_to_pos.get(tid, 0)])
samp.to_csv('submission.csv', index=False)
print('Wrote submission.csv', samp.shape, flush=True)

Running: /usr/bin/python3.11 train_md_v1.py --folds 0 --epochs 0 --img_size 320 --batch_size 1 --workers 8 --out_dir out_md_v1_f0 --amp --tta_flip


=== MD train fold 0 ===
Fold 0: train 137166 | val 20033


  m = build_model(num_classes).to(device); sd=torch.load(ck,map_location='cpu')['state_dict']; m.load_state_dict(sd, strict=True); m.eval()
