# Plan

Goals:
- Win a medal (macro-F1). Baseline quickly, then iterate.

High-level approach:
1) Environment checks and setup
   - Verify GPU availability; install correct Torch stack (cu121).
   - Set up core deps: timm, albumentations, torchvision, pandas, scikit-learn.

2) Data understanding
   - Load train.csv/test.csv; inspect schema, counts, unique category_id, class imbalance.
   - Verify image paths and existence after extracting zips.

3) Validation protocol
   - Stratified k-fold by category_id.
   - Consider group-aware split if location or sequence metadata exists in CSV to reduce leakage; otherwise stick to stratified KFold with multiple seeds.

4) Baseline model (fast)
   - Image-only baseline with pretrained CNN (e.g., tf_efficientnet_b0/b3, resnet50) in PyTorch + timm.
   - 224px, lightweight augmentations, label-smoothing cross-entropy, cosine schedule, early stopping.
   - 5-fold, save OOF and test logits.

5) Improve
   - Increase res (256/320), stronger augs (RandomResizedCrop, ColorJitter, RandomErasing), mixup/cutmix.
   - Try better backbones (convnext_tiny, eva02_tiny if available), EMA, balanced sampler/focal loss for imbalance.
   - Calibrate thresholds per-class from OOF if needed for macro-F1.

6) Ensembling
   - Blend diverse backbones/seeds/resolutions via OOF-driven weights.

7) Error analysis
   - Mine OOF by class; apply class-weighting, focal loss, TTA for weak classes.

Risk controls / efficiency:
- Always log progress with elapsed time.
- Subsample smoke runs (1 fold, 2-3 epochs) to validate pipeline before full training.
- Cache datasets/transforms; avoid re-reading images unnecessarily.

Requests for expert review:
- Confirm best medal-winning backbones and input sizes for this dataset.
- Recommended validation (any grouping like location_id/sequence_id in CSV?).
- Loss/augmentation settings that historically worked (focal vs LSCE, mixup/cutmix ratios).
- Optimal TTA and ensembling strategy.

---

Next: run environment and data peek cell.

In [2]:
# Environment and quick data peek
import os, sys, subprocess, platform
import pandas as pd

print('Python', sys.version)
print('Platform', platform.platform())

def run(cmd):
    return subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True).stdout

print('nvidia-smi:')
print(run(['bash','-lc','nvidia-smi || true']))

train_path = 'train.csv'
test_path = 'test.csv'
assert os.path.exists(train_path) and os.path.exists(test_path), 'CSV files missing'
train = pd.read_csv(train_path)
test = pd.read_csv(test_path)
print('Train shape:', train.shape)
print('Test shape:', test.shape)
print('Train columns:', train.columns.tolist())
print('Test columns:', test.columns.tolist())
print('\nTrain head:')
print(train.head())
print('\nTest head:')
print(test.head())

target_col = 'category_id'
if target_col in train.columns:
    n_classes = train[target_col].nunique()
    print('Unique classes:', n_classes)
    print('Class distribution (top 20):')
    print(train[target_col].value_counts().head(20))
else:
    print('Target column not found!')

img_cols = [c for c in train.columns if any(k in c.lower() for k in ['image','file','path'])]
print('Possible image columns:', img_cols)

Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
Platform Linux-6.8.0-1031-azure-x86_64-with-glibc2.35
nvidia-smi:
Sat Sep 27 02:53:51 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     182MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-------------

Train shape: (179422, 11)
Test shape: (16877, 10)
Train columns: ['category_id', 'date_captured', 'file_name', 'frame_num', 'id', 'location', 'rights_holder', 'seq_id', 'seq_num_frames', 'width', 'height']
Test columns: ['date_captured', 'file_name', 'frame_num', 'id', 'location', 'rights_holder', 'seq_id', 'seq_num_frames', 'width', 'height']

Train head:
   category_id        date_captured                                 file_name  \
0           19  2012-03-17 03:48:44  588a679f-23d2-11e8-a6a3-ec086b02610b.jpg   
1            0  2014-05-11 11:56:46  59279ce3-23d2-11e8-a6a3-ec086b02610b.jpg   
2            0  2013-10-06 02:00:00  5a2af4ab-23d2-11e8-a6a3-ec086b02610b.jpg   
3            3  2011-06-28 15:29:42  593d68d7-23d2-11e8-a6a3-ec086b02610b.jpg   
4            0  2014-07-31 14:00:00  58782b45-23d2-11e8-a6a3-ec086b02610b.jpg   

   frame_num                                    id  location  rights_holder  \
0          2  588a679f-23d2-11e8-a6a3-ec086b02610b       115   Justin Brown

In [3]:
# Setup Torch cu121 stack, deps, and unzip images
import os, sys, subprocess, shutil, time, glob, random, pathlib, json
from pathlib import Path

def run(cmd):
    print('>', ' '.join(cmd), flush=True)
    return subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, check=False).stdout

# 0) Hard reset prior torch stacks
for pkg in ("torch","torchvision","torchaudio"):
    subprocess.run([sys.executable, "-m", "pip", "uninstall", "-y", pkg], check=False)

# Clean stray site dirs that can shadow correct wheels (idempotent)
for d in (
    "/app/.pip-target/torch",
    "/app/.pip-target/torchvision",
    "/app/.pip-target/torchaudio",
    "/app/.pip-target/torchgen",
    "/app/.pip-target/functorch",
):
    if os.path.exists(d):
        print("Removing", d); shutil.rmtree(d, ignore_errors=True)

# 1) Install exact cu121 torch stack
print('Installing torch/cu121 stack...', flush=True)
out = run([sys.executable, "-m", "pip", "install",
           "--index-url", "https://download.pytorch.org/whl/cu121",
           "--extra-index-url", "https://pypi.org/simple",
           "torch==2.4.1", "torchvision==0.19.1", "torchaudio==2.4.1"])
print(out[-1000:])

Path("constraints.txt").write_text("torch==2.4.1\ntorchvision==0.19.1\ntorchaudio==2.4.1\n")

# 2) Non-torch deps
print('Installing deps...', flush=True)
out = run([sys.executable, "-m", "pip", "install", "-c", "constraints.txt",
           "timm==1.0.9", "albumentations==1.4.14", "opencv-python-headless==4.10.0.84",
           "scikit-learn", "pandas", "numpy", "Pillow", "tqdm",
           "--upgrade-strategy", "only-if-needed"])
print(out[-1000:])

import torch
print("torch:", torch.__version__, "CUDA:", getattr(torch.version, "cuda", None), "CUDA available:", torch.cuda.is_available())
assert str(getattr(torch.version,'cuda','')).startswith('12.1'), f"Wrong CUDA build: {torch.version.cuda}"

# 3) Unzip images if needed
def unzip_if_needed(zip_path, out_dir):
    out = Path(out_dir); out.mkdir(parents=True, exist_ok=True)
    # Heuristic: if directory has > 10 jpgs, assume extracted
    existing = list(out.glob('*.jpg'))
    if len(existing) > 10:
        print(f"{out_dir} already extracted with {len(existing)} jpgs.")
        return
    assert Path(zip_path).exists(), f"Missing {zip_path}"
    print(f"Extracting {zip_path} -> {out_dir} ...", flush=True)
    t0 = time.time()
    cmd = ["bash","-lc", f"unzip -q -o {zip_path} -d {out_dir}"]
    print(run(cmd)[-1000:])
    print(f"Done in {time.time()-t0:.1f}s. Files: {len(list(Path(out_dir).glob('*.jpg')))}")

unzip_if_needed('train_images.zip', 'train_images')
unzip_if_needed('test_images.zip', 'test_images')

# 4) Quick check a few random images exist
train_imgs = sorted(glob.glob('train_images/*.jpg'))
test_imgs = sorted(glob.glob('test_images/*.jpg'))
print('Train images:', len(train_imgs), 'Test images:', len(test_imgs))
print('Sample train:', train_imgs[:3])
print('Sample test:', test_imgs[:3])

print('Setup complete.')





Installing torch/cu121 stack...


> /usr/bin/python3.11 -m pip install --index-url https://download.pytorch.org/whl/cu121 --extra-index-url https://pypi.org/simple torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1




pmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio
Successfully installed MarkupSafe-3.0.2 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.3.0 sympy-1.14.0 torch-2.4.1+cu121 torchaudio-2.4.1+cu121 torchvision-0.19.1+cu121 triton-3.0.0 typing-extensions-4.15.0


> /usr/bin/python3.11 -m pip install -c constraints.txt timm==1.0.9 albumentations==1.4.14 opencv-python-headless==4.10.0.84 scikit-learn pandas numpy Pillow tqdm --upgrade-strategy only-if-needed


 directory /app/.pip-target/sympy already exists. Specify --upgrade to force replacement.



torch: 2.4.1+cu121 CUDA: 12.1 CUDA available: True
Extracting train_images.zip -> train_images ...


> bash -lc unzip -q -o train_images.zip -d train_images


bash: line 1: unzip: command not found

Done in 0.0s. Files: 0
Extracting test_images.zip -> test_images ...


> bash -lc unzip -q -o test_images.zip -d test_images


bash: line 1: unzip: command not found

Done in 0.0s. Files: 0
Train images: 0 Test images: 0
Sample train: []
Sample test: []
Setup complete.


In [4]:
# Fallback extraction with Python's zipfile (no system unzip)
import zipfile, os, glob, time
from pathlib import Path

def py_unzip(zip_path, out_dir):
    out = Path(out_dir); out.mkdir(parents=True, exist_ok=True)
    existing = list(out.glob('*.jpg'))
    if len(existing) > 10:
        print(f"{out_dir} already has {len(existing)} jpgs; skipping.")
        return
    assert Path(zip_path).exists(), f"Missing {zip_path}"
    t0 = time.time()
    with zipfile.ZipFile(zip_path) as zf:
        members = zf.namelist()
        print(f"Extracting {len(members)} files from {zip_path} ...", flush=True)
        zf.extractall(out_dir)
    njpg = len(list(Path(out_dir).glob('*.jpg')))
    print(f"Done in {time.time()-t0:.1f}s. JPGs: {njpg}")

py_unzip('train_images.zip', 'train_images')
py_unzip('test_images.zip', 'test_images')
print('Post-extract counts:', len(glob.glob('train_images/*.jpg')), len(glob.glob('test_images/*.jpg')))

Extracting 179224 files from train_images.zip ...


Done in 98.5s. JPGs: 179224
Extracting 16862 files from test_images.zip ...


Done in 7.9s. JPGs: 16862


Post-extract counts: 179224 16862


In [8]:
# CV split, label mapping, and dataset/transforms setup
import pandas as pd, numpy as np, os
from sklearn.model_selection import StratifiedGroupKFold
from collections import Counter
from pathlib import Path
import albumentations as A
from albumentations.pytorch import ToTensorV2
from PIL import Image
import torch
from torch.utils.data import Dataset

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

# Label mapping: model indices [0..C-1] <-> original category_id
cats = np.sort(train['category_id'].unique())
cat2idx = {c:i for i,c in enumerate(cats)}
idx2cat = {i:c for c,i in cat2idx.items()}
train['label'] = train['category_id'].map(cat2idx).astype(int)
num_classes = len(cats)
print('Num classes:', num_classes)

# Save mapping for later use
pd.Series(idx2cat).to_csv('idx2cat.csv', header=['category_id'])
pd.Series(cat2idx).to_csv('cat2idx.csv', header=['idx'])

# StratifiedGroupKFold by location (primary per expert advice)
n_splits = 5
sgkf = StratifiedGroupKFold(n_splits=n_splits, shuffle=True, random_state=42)
folds = np.full(len(train), -1, dtype=int)
y = train['label'].values
groups = train['location'].values
for f,(tr,va) in enumerate(sgkf.split(train, y, groups)):
    folds[va] = f
assert (folds>=0).all()
train['fold'] = folds
train.to_csv('train_folds.csv', index=False)
print('Fold sizes:', Counter(folds))

# Basic transforms @320
IMG_SIZE = 320
mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)
train_tfms = A.Compose([
    A.RandomResizedCrop(IMG_SIZE, IMG_SIZE, scale=(0.6,1.0), interpolation=Image.BICUBIC),
    A.HorizontalFlip(p=0.5),
    A.ColorJitter(0.2,0.2,0.1,0.0,p=0.5),
    A.Normalize(mean=mean, std=std),
    A.CoarseDropout(max_holes=1, max_height=int(IMG_SIZE*0.2), max_width=int(IMG_SIZE*0.2), p=0.3),
    ToTensorV2(),
])
val_tfms = A.Compose([
    A.Resize(IMG_SIZE, IMG_SIZE, interpolation=Image.BICUBIC),
    A.Normalize(mean=mean, std=std),
    ToTensorV2(),
])

class ImgDataset(Dataset):
    def __init__(self, df, img_dir, transform=None):
        self.df = df.reset_index(drop=True)
        self.img_dir = Path(img_dir)
        self.transform = transform
        self.has_label = 'label' in df.columns
    def __len__(self):
        return len(self.df)
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        img_path = self.img_dir / row['file_name']
        try:
            img = Image.open(img_path).convert('RGB')
        except Exception:
            # Fallback: black image if missing/corrupt
            img = Image.fromarray(np.zeros((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8))
        img = np.array(img)
        if self.transform:
            img = self.transform(image=img)['image']
        if self.has_label:
            return img, int(row['label'])
        else:
            return img, row['id']

print('CV and dataset setup complete. Ready for smoke training next.')

Num classes: 14


Fold sizes: Counter({4: 77411, 3: 46425, 1: 35471, 0: 13121, 2: 6994})
CV and dataset setup complete. Ready for smoke training next.


In [7]:
# Fix albumentations/albucore mismatch by forcing a clean reinstall of albumentations==1.3.1
import sys, subprocess
def run(cmd):
    print('>', ' '.join(cmd), flush=True)
    out = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True).stdout
    print(out[-2000:])
    return out

# Uninstall conflicting packages
run([sys.executable, '-m', 'pip', 'uninstall', '-y', 'albumentations', 'albucore'])
# Install pinned albumentations without albucore dependency
run([sys.executable, '-m', 'pip', 'install', '-c', 'constraints.txt', 'albumentations==1.3.1', '--no-cache-dir', '--upgrade'])

import albumentations as A
from albumentations.pytorch import ToTensorV2
print('Albumentations version:', A.__version__)

> /usr/bin/python3.11 -m pip uninstall -y albumentations albucore


Found existing installation: albumentations 1.4.14
Uninstalling albumentations-1.4.14:
  Successfully uninstalled albumentations-1.4.14
Found existing installation: albucore 0.0.33
Uninstalling albucore-0.0.33:
  Successfully uninstalled albucore-0.0.33

> /usr/bin/python3.11 -m pip install -c constraints.txt albumentations==1.3.1 --no-cache-dir --upgrade


━━━━━━━━ 44.6/44.6 KB 417.5 MB/s eta 0:00:00
Collecting scikit-learn>=0.19.1
  Downloading scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (9.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 209.8 MB/s eta 0:00:00
Collecting pillow>=10.1
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 192.1 MB/s eta 0:00:00
Collecting lazy-loader>=0.4
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)
Collecting networkx>=3.0
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 302.7 MB/s eta 0:00:00
Collecting packaging>=21
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 457.6 MB/s eta 0:00:00
Collecting imageio!=2.35.0,>=2.33
  Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.8/31

Albumentations version: 1.3.1


In [9]:
# Smoke training: convnext_tiny @320, 1 fold, 1 epoch, AMP + sampler
import time, math, numpy as np, pandas as pd, torch, timm
from torch.utils.data import DataLoader, WeightedRandomSampler
import torch.nn as nn
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR
from sklearn.metrics import f1_score

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_folds = pd.read_csv('train_folds.csv')
num_classes = train_folds['label'].nunique()
print('Classes:', num_classes, 'Device:', device)

# Fold selection (smoke: use fold==0 as val)
VAL_FOLD = 0
df_tr = train_folds[train_folds.fold != VAL_FOLD].copy()
df_va = train_folds[train_folds.fold == VAL_FOLD].copy()
print('Train/Val sizes:', len(df_tr), len(df_va))

# Datasets
train_ds = ImgDataset(df_tr, 'train_images', transform=train_tfms)
val_ds = ImgDataset(df_va, 'train_images', transform=val_tfms)

# Sampler: class weights ~ 1/sqrt(count); downweight class 0
counts = df_tr['label'].value_counts().to_dict()
cls_w = {c: 1.0 / math.sqrt(counts.get(c, 1)) for c in range(num_classes)}
empty_idx = int((pd.Series(idx2cat).sort_index().to_dict()).get(0, 0))  # original cat 0 -> model idx
if empty_idx in cls_w:
    cls_w[empty_idx] *= 0.3
weights = df_tr['label'].map(cls_w).values
sampler = WeightedRandomSampler(weights=weights, num_samples=len(weights), replacement=True)

# Loaders
BATCH_SIZE = 32
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, sampler=sampler, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=4, pin_memory=True)

# Model
model = timm.create_model('convnext_tiny', pretrained=True, num_classes=num_classes)
model.to(device)

# Loss (label smoothing CE) and optimizer/scheduler
criterion = nn.CrossEntropyLoss(label_smoothing=0.05)
optimizer = AdamW(model.parameters(), lr=1e-3, weight_decay=0.03)
EPOCHS = 1  # smoke
scheduler = CosineAnnealingLR(optimizer, T_max=EPOCHS, eta_min=1e-6)
scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())

# Simple EMA
ema_decay = 0.999
ema_params = [p.detach().clone() for p in model.parameters() if p.requires_grad]
def ema_update():
    with torch.no_grad():
        i = 0
        for p in model.parameters():
            if not p.requires_grad: continue
            ema_params[i].mul_(ema_decay).add_(p.detach(), alpha=1-ema_decay)
            i += 1
def swap_to_ema(store):
    i = 0
    for p in model.parameters():
        if not p.requires_grad: continue
        store.append(p.detach().clone())
        p.data.copy_(ema_params[i])
        i += 1

def evaluate(loader):
    model.eval()
    all_preds, all_tgts = [], []
    with torch.no_grad():
        for xb, yb in loader:
            xb = xb.to(device, non_blocking=True)
            yb = yb.to(device, non_blocking=True)
            with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):
                logits = model(xb)
            preds = logits.argmax(1).detach().cpu().numpy().tolist()
            all_preds.extend(preds)
            all_tgts.extend(yb.detach().cpu().numpy().tolist())
    return f1_score(all_tgts, all_preds, average='macro')

# Train loop
t0 = time.time()
for epoch in range(EPOCHS):
    model.train()
    running_loss, n, t_ep = 0.0, 0, time.time()
    for it, (xb, yb) in enumerate(train_loader):
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)
        optimizer.zero_grad(set_to_none=True)
        with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):
            logits = model(xb)
            loss = criterion(logits, yb)
        scaler.scale(loss).backward()
        nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        scaler.step(optimizer)
        scaler.update()
        ema_update()
        running_loss += loss.item() * xb.size(0)
        n += xb.size(0)
        if (it+1) % 100 == 0:
            print(f"Epoch {epoch+1} Iter {it+1} | loss {running_loss/max(1,n):.4f} | elapsed {time.time()-t_ep:.1f}s", flush=True)
    scheduler.step()
    # Eval with EMA weights
    backup = []
    swap_to_ema(backup)
    val_f1 = evaluate(val_loader)
    # restore weights
    i = 0
    for p in model.parameters():
        if not p.requires_grad: continue
        p.data.copy_(backup[i]); i += 1
    print(f"Epoch {epoch+1}/{EPOCHS} | train_loss {running_loss/max(1,n):.4f} | val_macroF1 {val_f1:.4f} | epoch_time {time.time()-t_ep:.1f}s", flush=True)

print(f"Total time: {time.time()-t0:.1f}s")
print('Smoke training complete.')

  from .autonotebook import tqdm as notebook_tqdm


Classes: 14 Device: cuda
Train/Val sizes: 166301 13121


  scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())


  with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):


Epoch 1 Iter 100 | loss 2.5875 | elapsed 13.5s


Epoch 1 Iter 200 | loss 2.4527 | elapsed 26.2s


Epoch 1 Iter 300 | loss 2.3807 | elapsed 38.8s


Epoch 1 Iter 400 | loss 2.3192 | elapsed 51.5s


Epoch 1 Iter 500 | loss 2.2736 | elapsed 64.3s


Epoch 1 Iter 600 | loss 2.2387 | elapsed 77.0s


Epoch 1 Iter 700 | loss 2.2095 | elapsed 89.7s


Epoch 1 Iter 800 | loss 2.1896 | elapsed 102.5s


Epoch 1 Iter 900 | loss 2.1707 | elapsed 115.3s


Epoch 1 Iter 1000 | loss 2.1485 | elapsed 128.2s


Epoch 1 Iter 1100 | loss 2.1328 | elapsed 141.1s


Epoch 1 Iter 1200 | loss 2.1187 | elapsed 153.9s


Epoch 1 Iter 1300 | loss 2.1063 | elapsed 166.8s


Epoch 1 Iter 1400 | loss 2.0928 | elapsed 179.7s


Epoch 1 Iter 1500 | loss 2.0816 | elapsed 192.5s


Epoch 1 Iter 1600 | loss 2.0716 | elapsed 205.4s


Epoch 1 Iter 1700 | loss 2.0606 | elapsed 218.3s


Epoch 1 Iter 1800 | loss 2.0513 | elapsed 231.3s


Epoch 1 Iter 1900 | loss 2.0423 | elapsed 244.2s


Epoch 1 Iter 2000 | loss 2.0315 | elapsed 257.2s


Epoch 1 Iter 2100 | loss 2.0196 | elapsed 270.1s


Epoch 1 Iter 2200 | loss 2.0058 | elapsed 283.1s


Epoch 1 Iter 2300 | loss 1.9933 | elapsed 296.1s


Epoch 1 Iter 2400 | loss 1.9815 | elapsed 309.1s


Epoch 1 Iter 2500 | loss 1.9706 | elapsed 322.1s


Epoch 1 Iter 2600 | loss 1.9593 | elapsed 335.1s


Epoch 1 Iter 2700 | loss 1.9487 | elapsed 348.1s


Epoch 1 Iter 2800 | loss 1.9385 | elapsed 361.1s


Epoch 1 Iter 2900 | loss 1.9273 | elapsed 374.1s


Epoch 1 Iter 3000 | loss 1.9172 | elapsed 387.1s


Epoch 1 Iter 3100 | loss 1.9072 | elapsed 400.2s


Epoch 1 Iter 3200 | loss 1.8971 | elapsed 413.2s


Epoch 1 Iter 3300 | loss 1.8879 | elapsed 426.2s


Epoch 1 Iter 3400 | loss 1.8794 | elapsed 439.2s


Epoch 1 Iter 3500 | loss 1.8707 | elapsed 452.3s


Epoch 1 Iter 3600 | loss 1.8622 | elapsed 465.3s


Epoch 1 Iter 3700 | loss 1.8528 | elapsed 478.3s


Epoch 1 Iter 3800 | loss 1.8439 | elapsed 491.3s


Epoch 1 Iter 3900 | loss 1.8345 | elapsed 504.3s


Epoch 1 Iter 4000 | loss 1.8249 | elapsed 517.3s


Epoch 1 Iter 4100 | loss 1.8137 | elapsed 530.4s


Epoch 1 Iter 4200 | loss 1.8037 | elapsed 543.5s


Epoch 1 Iter 4300 | loss 1.7938 | elapsed 556.5s


Epoch 1 Iter 4400 | loss 1.7839 | elapsed 569.6s


Epoch 1 Iter 4500 | loss 1.7748 | elapsed 582.6s


Epoch 1 Iter 4600 | loss 1.7654 | elapsed 595.6s


Epoch 1 Iter 4700 | loss 1.7568 | elapsed 608.7s


Epoch 1 Iter 4800 | loss 1.7476 | elapsed 621.7s


Epoch 1 Iter 4900 | loss 1.7391 | elapsed 634.7s


Epoch 1 Iter 5000 | loss 1.7312 | elapsed 647.7s


Epoch 1 Iter 5100 | loss 1.7229 | elapsed 660.7s


  with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):


Epoch 1/1 | train_loss 1.7153 | val_macroF1 0.1710 | epoch_time 697.2s


Total time: 697.2s
Smoke training complete.


In [10]:
# Inference: TTA(hflip), sequence averaging, and submission
import pandas as pd, numpy as np, torch
from torch.utils.data import DataLoader
from pathlib import Path

test_df = pd.read_csv('test.csv').copy()

class TestDataset(torch.utils.data.Dataset):
    def __init__(self, df, img_dir, transform):
        self.df = df.reset_index(drop=True)
        self.img_dir = Path(img_dir)
        self.transform = transform
    def __len__(self): return len(self.df)
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        img_path = self.img_dir / row['file_name']
        try:
            img = Image.open(img_path).convert('RGB')
        except Exception:
            img = Image.fromarray(np.zeros((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8))
        img = np.array(img)
        x = self.transform(image=img)['image']
        return x, row['id'], row['seq_id']

model.eval()
test_ds = TestDataset(test_df, 'test_images', transform=val_tfms)
test_loader = DataLoader(test_ds, batch_size=64, shuffle=False, num_workers=4, pin_memory=True)

all_logits = []
all_ids = []
all_seqs = []
with torch.no_grad():
    for xb, ids, seqs in test_loader:
        xb = xb.to(device, non_blocking=True)
        with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):
            logits = model(xb)
            # TTA: horizontal flip
            xb_flip = torch.flip(xb, dims=[-1])
            logits_flip = model(xb_flip)
            logits = 0.5 * (logits + logits_flip)
        all_logits.append(logits.detach().cpu())
        all_ids.extend(ids)
        all_seqs.extend(seqs)

all_logits = torch.cat(all_logits, dim=0).numpy()
preds_df = pd.DataFrame({
    'id': all_ids,
    'seq_id': all_seqs
})
for j in range(all_logits.shape[1]):
    preds_df[f'logit_{j}'] = all_logits[:, j]

# Sequence-level averaging of logits
logit_cols = [c for c in preds_df.columns if c.startswith('logit_')]
seq_avg = preds_df.groupby('seq_id')[logit_cols].mean().reset_index()
preds_df = preds_df.drop(columns=logit_cols).merge(seq_avg, on='seq_id', how='left')

# Argmax to model label, then map back to original category_id
logits_mat = preds_df[logit_cols].values
model_preds = logits_mat.argmax(axis=1)
idx2cat_map = pd.read_csv('idx2cat.csv', index_col=0)['category_id'].to_dict()
cat_preds = [idx2cat_map[int(i)] for i in model_preds]

submission = pd.DataFrame({'id': preds_df['id'], 'category_id': cat_preds})
submission.to_csv('submission.csv', index=False)
print('Saved submission.csv with shape:', submission.shape)
print(submission.head())

  with torch.cuda.amp.autocast(enabled=torch.cuda.is_available()):


Saved submission.csv with shape: (16877, 2)
                                     id  category_id
0  5998cfa4-23d2-11e8-a6a3-ec086b02610b           13
1  599fbd89-23d2-11e8-a6a3-ec086b02610b           11
2  59fae563-23d2-11e8-a6a3-ec086b02610b           16
3  5a24a741-23d2-11e8-a6a3-ec086b02610b           18
4  59eab924-23d2-11e8-a6a3-ec086b02610b           16


In [11]:
# Train fold with Balanced Softmax, EMA, seq-avg val, checkpoint, and OOF/test logits saving
import os, math, time, json
import numpy as np, pandas as pd, torch, timm
import torch.nn as nn
from torch.utils.data import DataLoader, WeightedRandomSampler, Dataset
from sklearn.metrics import f1_score
from pathlib import Path
from collections import Counter
from albumentations.pytorch import ToTensorV2
import albumentations as A
from PIL import Image

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load folds and mappings
df_all = pd.read_csv('train_folds.csv')
cat2idx_map = pd.read_csv('cat2idx.csv', index_col=0)['idx'].to_dict()
idx2cat_map = pd.read_csv('idx2cat.csv', index_col=0)['category_id'].to_dict()
num_classes = df_all['label'].nunique()
empty_idx = cat2idx_map.get(0, None)  # correct empty mapping
print('num_classes:', num_classes, 'empty_idx:', empty_idx)

# Compute dataset priors for Balanced Softmax
class_counts = df_all['label'].value_counts().reindex(range(num_classes)).fillna(0).astype(int).values
priors = class_counts / class_counts.sum()
log_priors = torch.log(torch.tensor(priors + 1e-12, dtype=torch.float32, device=device))

class ValDataset(Dataset):
    def __init__(self, df, img_dir, transform=None):
        self.df = df.reset_index(drop=True)
        self.img_dir = Path(img_dir)
        self.transform = transform
    def __len__(self): return len(self.df)
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        img_path = self.img_dir / row['file_name']
        try:
            img = Image.open(img_path).convert('RGB')
        except Exception:
            img = Image.fromarray(np.zeros((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8))
        img = np.array(img)
        x = self.transform(image=img)['image'] if self.transform else img
        return x, int(row['label']), row['seq_id'], idx  # return local idx for OOF alignment

def make_sampler(df_tr):
    counts = df_tr['label'].value_counts().to_dict()
    w = {c: 1.0 / math.sqrt(counts.get(c, 1)) for c in range(num_classes)}
    if empty_idx is not None and empty_idx in w:
        w[empty_idx] *= 0.3  # downweight empty
    weights = df_tr['label'].map(w).astype('float32').values
    return WeightedRandomSampler(weights=weights, num_samples=len(weights), replacement=True)

def balanced_softmax_ce(logits, targets):
    # logits: (B, C), targets: (B,)
    return nn.functional.cross_entropy(logits + log_priors, targets)

def seq_avg_macro_f1(pred_logits, true_labels, seq_ids):
    # pred_logits: (N, C) numpy; true_labels: (N,), seq_ids: (N,) strings
    dfp = pd.DataFrame({'seq_id': seq_ids})
    logit_cols = [f'l{i}' for i in range(pred_logits.shape[1])]
    for i in range(pred_logits.shape[1]): dfp[logit_cols[i]] = pred_logits[:, i]
    dfp['label'] = true_labels
    seq_mean = dfp.groupby('seq_id')[logit_cols + ['label']].mean().reset_index()
    y_true = seq_mean['label'].astype(int).values
    y_pred = seq_mean[logit_cols].values.argmax(1)
    return f1_score(y_true, y_pred, average='macro')

def train_one_fold(fold=0, epochs=3, batch_size=32, lr=1e-3, wd=0.03, ema_decay=0.9996, save_dir='artifacts'):
    Path(save_dir).mkdir(parents=True, exist_ok=True)
    df_tr = df_all[df_all.fold != fold].copy()
    df_va = df_all[df_all.fold == fold].copy()
    print(f'Fold {fold}: train {len(df_tr)} val {len(df_va)}')

    train_ds = ImgDataset(df_tr, 'train_images', transform=train_tfms)
    val_ds = ValDataset(df_va, 'train_images', transform=val_tfms)
    sampler = make_sampler(df_tr)
    train_loader = DataLoader(train_ds, batch_size=batch_size, sampler=sampler, num_workers=4, pin_memory=True)
    val_loader = DataLoader(val_ds, batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True)

    model = timm.create_model('convnext_tiny', pretrained=True, num_classes=num_classes).to(device)
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=wd)
    # cosine with warmup (1 epoch warmup simple linear)
    total_steps = epochs * max(1, len(train_loader))
    warmup_steps = max(1, len(train_loader))  # ~1 epoch
    def lr_lambda(step):
        if step < warmup_steps:
            return float(step + 1) / warmup_steps
        t = (step - warmup_steps) / max(1, (total_steps - warmup_steps))
        return 0.5 * (1 + math.cos(math.pi * t))
    scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lr_lambda)
    scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())

    # EMA
    ema_params = [p.detach().clone() for p in model.parameters() if p.requires_grad]
    def ema_update():
        with torch.no_grad():
            i = 0
            for p in model.parameters():
                if not p.requires_grad: continue
                ema_params[i].mul_(ema_decay).add_(p.detach(), alpha=1-ema_decay)
                i += 1
    def swap_to_ema(store):
        i = 0
        for p in model.parameters():
            if not p.requires_grad: continue
            store.append(p.detach().clone())
            p.data.copy_(ema_params[i])
            i += 1

    best_f1, best_path = -1.0, str(Path(save_dir) / f'convnext_tiny_fold{fold}.pth')
    global_step = 0
    t0 = time.time()
    for ep in range(epochs):
        model.train()
        ep_loss, seen, t_ep = 0.0, 0, time.time()
        for it, (xb, yb) in enumerate(train_loader):
            xb = xb.to(device, non_blocking=True)
            yb = yb.to(device, non_blocking=True)
            optimizer.zero_grad(set_to_none=True)
            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                logits = model(xb)
                loss = balanced_softmax_ce(logits, yb)
            scaler.scale(loss).backward()
            nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            scaler.step(optimizer)
            scaler.update()
            scheduler.step()
            ema_update()
            bs = xb.size(0); ep_loss += loss.item() * bs; seen += bs; global_step += 1
            if (it+1) % 200 == 0:
                print(f'E{ep+1} It{it+1} | lr {scheduler.get_last_lr()[0]:.6f} | loss {ep_loss/max(1,seen):.4f} | {time.time()-t_ep:.1f}s', flush=True)

        # Eval with EMA
        backup = []; swap_to_ema(backup); model.eval()
        va_logits_list, va_labels, va_seq = [], [], []
        with torch.no_grad():
            for xb, yb, seqs, idxs in val_loader:
                xb = xb.to(device, non_blocking=True)
                with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                    lg = model(xb).detach().cpu().float()
                va_logits_list.append(lg)
                va_labels.extend(yb.numpy().tolist())
                va_seq.extend(seqs)
        # restore params
        i = 0
        for p in model.parameters():
            if not p.requires_grad: continue
            p.data.copy_(backup[i]); i += 1

        va_logits = torch.cat(va_logits_list, dim=0).numpy()
        va_f1 = seq_avg_macro_f1(va_logits, np.array(va_labels), np.array(va_seq))
        print(f'Epoch {ep+1}/{epochs} | train_loss {ep_loss/max(1,seen):.4f} | val_macroF1_seqavg {va_f1:.4f} | ep_time {time.time()-t_ep:.1f}s')

        # Save best
        if va_f1 > best_f1:
            best_f1 = va_f1
            torch.save({'state_dict': model.state_dict(), 'ema_params': [p.cpu() for p in ema_params], 'val_f1': best_f1}, best_path)
            # Save OOF logits for this fold
            np.save(Path(save_dir)/f'oof_logits_fold{fold}.npy', va_logits)
            np.save(Path(save_dir)/f'oof_labels_fold{fold}.npy', np.array(va_labels))
            pd.Series(va_seq).to_csv(Path(save_dir)/f'oof_seq_fold{fold}.csv', index=False, header=['seq_id'])
    print(f'Fold {fold} done. Best val_macroF1_seqavg={best_f1:.4f}. ckpt={best_path} | total {time.time()-t0:.1f}s')
    return best_f1, best_path

# Run a first improved fold-0 training (3 epochs) to validate BSCE setup quickly
best_f1, best_path = train_one_fold(fold=0, epochs=3, batch_size=32, lr=1e-3, wd=0.03, ema_decay=0.9996, save_dir='artifacts')
print('Finished fold0 quick run. Best F1:', best_f1)

num_classes: 14 empty_idx: 0
Fold 0: train 166301 val 13121


  scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())




E1 It200 | lr 0.000039 | loss 1.4582 | 25.8s


E1 It400 | lr 0.000077 | loss 1.1201 | 51.3s


E1 It600 | lr 0.000116 | loss 0.9761 | 76.9s


E1 It800 | lr 0.000154 | loss 0.9032 | 102.5s


E1 It1000 | lr 0.000193 | loss 0.8536 | 128.3s


E1 It1200 | lr 0.000231 | loss 0.8227 | 154.0s


E1 It1400 | lr 0.000270 | loss 0.8027 | 179.9s


E1 It1600 | lr 0.000308 | loss 0.7878 | 205.7s


E1 It1800 | lr 0.000347 | loss 0.7815 | 231.6s


E1 It2000 | lr 0.000385 | loss 0.7783 | 257.4s


E1 It2200 | lr 0.000424 | loss 0.7620 | 283.3s


E1 It2400 | lr 0.000462 | loss 0.7449 | 309.2s


E1 It2600 | lr 0.000500 | loss 0.7321 | 335.2s


E1 It2800 | lr 0.000539 | loss 0.7213 | 361.1s


E1 It3000 | lr 0.000577 | loss 0.7146 | 387.1s


E1 It3200 | lr 0.000616 | loss 0.7111 | 413.0s


E1 It3400 | lr 0.000654 | loss 0.7077 | 439.0s


E1 It3600 | lr 0.000693 | loss 0.7061 | 465.0s


E1 It3800 | lr 0.000731 | loss 0.7082 | 491.1s


E1 It4000 | lr 0.000770 | loss 0.7094 | 517.1s


E1 It4200 | lr 0.000808 | loss 0.7049 | 543.1s


E1 It4400 | lr 0.000847 | loss 0.6975 | 569.1s


E1 It4600 | lr 0.000885 | loss 0.6908 | 595.2s


E1 It4800 | lr 0.000924 | loss 0.6860 | 621.2s


E1 It5000 | lr 0.000962 | loss 0.6816 | 647.2s


Epoch 1/3 | train_loss 0.6791 | val_macroF1_seqavg 0.4424 | ep_time 693.5s


E2 It200 | lr 0.000999 | loss 0.6032 | 26.4s


E2 It400 | lr 0.000996 | loss 0.6150 | 52.4s


E2 It600 | lr 0.000992 | loss 0.6138 | 78.3s


E2 It800 | lr 0.000985 | loss 0.6147 | 104.3s


E2 It1000 | lr 0.000977 | loss 0.5988 | 130.3s


E2 It1200 | lr 0.000967 | loss 0.5715 | 156.3s


E2 It1400 | lr 0.000956 | loss 0.5534 | 182.3s


E2 It1600 | lr 0.000943 | loss 0.5357 | 208.2s


E2 It1800 | lr 0.000928 | loss 0.5241 | 234.2s


E2 It2000 | lr 0.000911 | loss 0.5127 | 260.2s


E2 It2200 | lr 0.000893 | loss 0.5047 | 286.1s


E2 It2400 | lr 0.000874 | loss 0.4997 | 312.1s


E2 It2600 | lr 0.000853 | loss 0.4933 | 338.0s


E2 It2800 | lr 0.000831 | loss 0.4874 | 363.8s


E2 It3000 | lr 0.000808 | loss 0.4797 | 389.8s


E2 It3200 | lr 0.000784 | loss 0.4706 | 415.8s


E2 It3400 | lr 0.000758 | loss 0.4614 | 441.7s


E2 It3600 | lr 0.000732 | loss 0.4528 | 467.7s


E2 It3800 | lr 0.000705 | loss 0.4453 | 493.7s


E2 It4000 | lr 0.000677 | loss 0.4390 | 519.6s


E2 It4200 | lr 0.000648 | loss 0.4324 | 545.5s


E2 It4400 | lr 0.000619 | loss 0.4265 | 571.5s


E2 It4600 | lr 0.000590 | loss 0.4207 | 597.3s


E2 It4800 | lr 0.000560 | loss 0.4146 | 623.2s


E2 It5000 | lr 0.000530 | loss 0.4073 | 649.2s


Epoch 2/3 | train_loss 0.4007 | val_macroF1_seqavg 0.4244 | ep_time 695.8s


E3 It200 | lr 0.000470 | loss 0.2073 | 26.3s


E3 It400 | lr 0.000440 | loss 0.2092 | 52.2s


E3 It600 | lr 0.000410 | loss 0.2081 | 78.1s


E3 It800 | lr 0.000380 | loss 0.2082 | 104.1s


E3 It1000 | lr 0.000351 | loss 0.2063 | 130.1s


E3 It1200 | lr 0.000323 | loss 0.2048 | 156.1s


E3 It1400 | lr 0.000295 | loss 0.2027 | 182.1s


E3 It1600 | lr 0.000268 | loss 0.2013 | 208.0s


E3 It1800 | lr 0.000241 | loss 0.1978 | 234.0s


E3 It2000 | lr 0.000216 | loss 0.1961 | 259.9s


E3 It2200 | lr 0.000191 | loss 0.1936 | 285.8s


E3 It2400 | lr 0.000168 | loss 0.1915 | 311.8s


E3 It2600 | lr 0.000146 | loss 0.1899 | 337.7s


E3 It2800 | lr 0.000126 | loss 0.1877 | 363.6s


E3 It3000 | lr 0.000106 | loss 0.1853 | 389.6s


E3 It3200 | lr 0.000088 | loss 0.1834 | 415.5s


E3 It3400 | lr 0.000072 | loss 0.1810 | 441.3s


E3 It3600 | lr 0.000057 | loss 0.1783 | 467.2s


E3 It3800 | lr 0.000044 | loss 0.1759 | 493.1s


E3 It4000 | lr 0.000032 | loss 0.1738 | 519.1s


E3 It4200 | lr 0.000023 | loss 0.1719 | 545.0s


E3 It4400 | lr 0.000014 | loss 0.1692 | 570.9s


E3 It4600 | lr 0.000008 | loss 0.1679 | 596.7s


E3 It4800 | lr 0.000004 | loss 0.1663 | 622.7s


E3 It5000 | lr 0.000001 | loss 0.1647 | 648.6s


Epoch 3/3 | train_loss 0.1634 | val_macroF1_seqavg 0.4403 | ep_time 693.8s
Fold 0 done. Best val_macroF1_seqavg=0.4424. ckpt=artifacts/convnext_tiny_fold0.pth | total 2083.2s
Finished fold0 quick run. Best F1: 0.4424157099420091


In [12]:
# EMA-based inference from saved checkpoint: hflip TTA + seq-avg + save logits and submission
import torch, pandas as pd, numpy as np
from torch.utils.data import DataLoader
from pathlib import Path

def load_model_with_ema(ckpt_path, num_classes):
    model = timm.create_model('convnext_tiny', pretrained=False, num_classes=num_classes).to(device)
    ckpt = torch.load(ckpt_path, map_location=device)
    model.load_state_dict(ckpt['state_dict'])
    # swap to EMA params
    ema_list = ckpt.get('ema_params', None)
    if ema_list is not None:
        i = 0
        with torch.no_grad():
            for p in model.parameters():
                if not p.requires_grad: continue
                p.data.copy_(ema_list[i].to(device)); i += 1
    model.eval()
    return model

def infer_test_and_save(ckpt_path, out_dir='artifacts', alpha_prior: float = 0.0):
    Path(out_dir).mkdir(parents=True, exist_ok=True)
    test_df = pd.read_csv('test.csv')
    model = load_model_with_ema(ckpt_path, num_classes)
    test_ds = TestDataset(test_df, 'test_images', transform=val_tfms)
    loader = DataLoader(test_ds, batch_size=64, shuffle=False, num_workers=4, pin_memory=True)
    all_logits, ids, seqs = [], [], []
    with torch.no_grad():
        for xb, idb, seqb in loader:
            xb = xb.to(device, non_blocking=True)
            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                lg = model(xb)
                lg_flip = model(torch.flip(xb, dims=[-1]))
                lg = 0.5 * (lg + lg_flip)
                if alpha_prior != 0.0:
                    # apply optional logit adjustment by priors
                    lg = lg + alpha_prior * log_priors
            all_logits.append(lg.detach().cpu())
            ids.extend(idb); seqs.extend(seqb)
    logits = torch.cat(all_logits, dim=0).numpy()
    np.save(Path(out_dir)/'test_logits_fold0.npy', logits)
    pred_df = pd.DataFrame({'id': ids, 'seq_id': seqs})
    for j in range(logits.shape[1]):
        pred_df[f'logit_{j}'] = logits[:, j]
    # sequence-average
    logit_cols = [c for c in pred_df.columns if c.startswith('logit_')]
    seq_avg = pred_df.groupby('seq_id')[logit_cols].mean().reset_index()
    pred_df = pred_df.drop(columns=logit_cols).merge(seq_avg, on='seq_id', how='left')
    mat = pred_df[logit_cols].values
    model_idx = mat.argmax(1)
    idx2cat_map = pd.read_csv('idx2cat.csv', index_col=0)['category_id'].to_dict()
    cat_preds = [idx2cat_map[int(i)] for i in model_idx]
    sub = pd.DataFrame({'id': pred_df['id'], 'category_id': cat_preds})
    sub.to_csv('submission.csv', index=False)
    print('Saved submission.csv', sub.shape, 'and test logits to', out_dir)

# After training completes, run:
# infer_test_and_save('artifacts/convnext_tiny_fold0.pth', out_dir='artifacts', alpha_prior=0.0)

In [13]:
# Run EMA-based inference for fold0 and write submission.csv
infer_test_and_save('artifacts/convnext_tiny_fold0.pth', out_dir='artifacts', alpha_prior=0.0)

  ckpt = torch.load(ckpt_path, map_location=device)


Saved submission.csv (16877, 2) and test logits to artifacts


In [14]:
# Train folds 1-2 (8 epochs) then ensemble fold0-2 EMA checkpoints for test submission
from pathlib import Path
import numpy as np, pandas as pd, torch
from torch.utils.data import DataLoader

# 1) Train additional folds (1 and 2) with the improved recipe
folds_to_train = [1, 2]
ckpt_paths = [Path('artifacts')/f'convnext_tiny_fold{f}.pth' for f in folds_to_train]
for f, ck in zip(folds_to_train, ckpt_paths):
    if ck.exists():
        print(f'Skipping fold {f}, checkpoint exists:', ck)
        continue
    print(f'Training fold {f} for 8 epochs...')
    train_one_fold(fold=f, epochs=8, batch_size=32, lr=1e-3, wd=0.03, ema_decay=0.9996, save_dir='artifacts')

# 2) Ensemble EMA checkpoints from folds [0,1,2] if available
all_ckpts = [Path('artifacts')/f'convnext_tiny_fold{i}.pth' for i in [0,1,2]]
all_ckpts = [str(p) for p in all_ckpts if p.exists()]
print('Ensembling checkpoints:', all_ckpts)
assert len(all_ckpts) >= 1, 'No checkpoints found to ensemble'

test_df = pd.read_csv('test.csv')
test_ds = TestDataset(test_df, 'test_images', transform=val_tfms)
loader = DataLoader(test_ds, batch_size=64, shuffle=False, num_workers=4, pin_memory=True)

logits_list = []
with torch.no_grad():
    for ck in all_ckpts:
        print('Infer with', ck)
        model_e = load_model_with_ema(ck, num_classes)
        part_logits = []
        for xb, idb, seqb in loader:
            xb = xb.to(device, non_blocking=True)
            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                lg = model_e(xb)
                lgf = model_e(torch.flip(xb, dims=[-1]))
                lg = 0.5*(lg+lgf)
            part_logits.append(lg.detach().cpu())
        logits_list.append(torch.cat(part_logits, dim=0).numpy())

blended = np.mean(np.stack(logits_list, axis=0), axis=0)
np.save(Path('artifacts')/'test_logits_fold0_2_blend.npy', blended)

# Sequence average and write submission
pred_df = pd.DataFrame({'id': test_df['id'].values, 'seq_id': test_df['seq_id'].values})
for j in range(blended.shape[1]):
    pred_df[f'logit_{j}'] = blended[:, j]
logit_cols = [c for c in pred_df.columns if c.startswith('logit_')]
seq_avg = pred_df.groupby('seq_id')[logit_cols].mean().reset_index()
pred_df = pred_df.drop(columns=logit_cols).merge(seq_avg, on='seq_id', how='left')
mat = pred_df[logit_cols].values
model_idx = mat.argmax(1)
idx2cat_map = pd.read_csv('idx2cat.csv', index_col=0)['category_id'].to_dict()
cat_preds = [idx2cat_map[int(i)] for i in model_idx]
sub = pd.DataFrame({'id': pred_df['id'], 'category_id': cat_preds})
sub.to_csv('submission.csv', index=False)
print('Wrote blended submission.csv', sub.shape, 'using', len(all_ckpts), 'checkpoints')

Training fold 1 for 8 epochs...
Fold 1: train 143951 val 35471


  scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())




E1 It200 | lr 0.000045 | loss 1.5544 | 26.0s


E1 It400 | lr 0.000089 | loss 1.1627 | 51.7s


E1 It600 | lr 0.000134 | loss 1.0189 | 77.4s


E1 It800 | lr 0.000178 | loss 0.9387 | 103.1s


E1 It1000 | lr 0.000222 | loss 0.8885 | 128.8s


E1 It1200 | lr 0.000267 | loss 0.8580 | 154.6s


E1 It1400 | lr 0.000311 | loss 0.8383 | 180.5s


E1 It1600 | lr 0.000356 | loss 0.8297 | 206.3s


E1 It1800 | lr 0.000400 | loss 0.8229 | 232.1s


E1 It2000 | lr 0.000445 | loss 0.8178 | 258.0s


E1 It2200 | lr 0.000489 | loss 0.8048 | 283.9s


E1 It2400 | lr 0.000534 | loss 0.7884 | 309.8s


E1 It2600 | lr 0.000578 | loss 0.7749 | 335.7s


E1 It2800 | lr 0.000623 | loss 0.7648 | 361.5s


E1 It3000 | lr 0.000667 | loss 0.7583 | 387.5s


E1 It3200 | lr 0.000711 | loss 0.7546 | 413.5s


E1 It3400 | lr 0.000756 | loss 0.7510 | 439.4s


E1 It3600 | lr 0.000800 | loss 0.7502 | 465.3s


E1 It3800 | lr 0.000845 | loss 0.7500 | 491.3s


E1 It4000 | lr 0.000889 | loss 0.7528 | 517.3s


E1 It4200 | lr 0.000934 | loss 0.7489 | 543.3s


E1 It4400 | lr 0.000978 | loss 0.7413 | 569.2s


Epoch 1/8 | train_loss 0.7379 | val_macroF1_seqavg 0.3295 | ep_time 639.8s


E2 It200 | lr 0.001000 | loss 0.5937 | 26.3s


E2 It400 | lr 0.001000 | loss 0.5762 | 52.2s


E2 It600 | lr 0.000999 | loss 0.5797 | 78.2s


E2 It800 | lr 0.000998 | loss 0.5811 | 104.2s


E2 It1000 | lr 0.000998 | loss 0.5851 | 130.2s


E2 It1200 | lr 0.000996 | loss 0.5889 | 156.1s


E2 It1400 | lr 0.000995 | loss 0.5913 | 182.1s


E2 It1600 | lr 0.000994 | loss 0.5864 | 208.1s


E2 It1800 | lr 0.000992 | loss 0.5716 | 234.1s


E2 It2000 | lr 0.000990 | loss 0.5610 | 260.0s


E2 It2200 | lr 0.000988 | loss 0.5523 | 286.0s


E2 It2400 | lr 0.000986 | loss 0.5428 | 312.0s


E2 It2600 | lr 0.000983 | loss 0.5360 | 338.0s


E2 It2800 | lr 0.000981 | loss 0.5298 | 363.9s


E2 It3000 | lr 0.000978 | loss 0.5260 | 389.9s


E2 It3200 | lr 0.000975 | loss 0.5235 | 415.9s


E2 It3400 | lr 0.000972 | loss 0.5206 | 442.0s


E2 It3600 | lr 0.000968 | loss 0.5167 | 468.0s


E2 It3800 | lr 0.000965 | loss 0.5083 | 494.0s


E2 It4000 | lr 0.000961 | loss 0.5006 | 520.0s


E2 It4200 | lr 0.000957 | loss 0.4937 | 546.0s


E2 It4400 | lr 0.000953 | loss 0.4876 | 571.9s


Epoch 2/8 | train_loss 0.4859 | val_macroF1_seqavg 0.2819 | ep_time 643.4s


E3 It200 | lr 0.000946 | loss 0.3818 | 26.3s


E3 It400 | lr 0.000941 | loss 0.3710 | 52.2s


E3 It600 | lr 0.000937 | loss 0.3708 | 78.1s


E3 It800 | lr 0.000932 | loss 0.3675 | 104.1s


E3 It1000 | lr 0.000927 | loss 0.3730 | 130.1s


E3 It1200 | lr 0.000921 | loss 0.3660 | 156.1s


E3 It1400 | lr 0.000916 | loss 0.3547 | 182.1s


E3 It1600 | lr 0.000910 | loss 0.3478 | 208.0s


E3 It1800 | lr 0.000904 | loss 0.3412 | 233.9s


E3 It2000 | lr 0.000899 | loss 0.3385 | 259.8s


E3 It2200 | lr 0.000892 | loss 0.3354 | 285.8s


E3 It2400 | lr 0.000886 | loss 0.3328 | 311.7s


E3 It2600 | lr 0.000880 | loss 0.3300 | 337.5s


E3 It2800 | lr 0.000873 | loss 0.3286 | 363.4s


E3 It3000 | lr 0.000867 | loss 0.3274 | 389.4s


E3 It3200 | lr 0.000860 | loss 0.3239 | 415.4s


E3 It3400 | lr 0.000853 | loss 0.3198 | 441.5s


E3 It3600 | lr 0.000846 | loss 0.3154 | 467.5s


E3 It3800 | lr 0.000838 | loss 0.3112 | 493.4s


E3 It4000 | lr 0.000831 | loss 0.3079 | 519.3s


E3 It4200 | lr 0.000823 | loss 0.3050 | 545.2s


E3 It4400 | lr 0.000816 | loss 0.3025 | 571.1s


Epoch 3/8 | train_loss 0.3012 | val_macroF1_seqavg 0.2733 | ep_time 642.9s


E4 It200 | lr 0.000804 | loss 0.2475 | 26.3s


E4 It400 | lr 0.000796 | loss 0.2535 | 52.2s


E4 It600 | lr 0.000788 | loss 0.2518 | 78.1s


E4 It800 | lr 0.000780 | loss 0.2399 | 104.1s


E4 It1000 | lr 0.000771 | loss 0.2322 | 130.0s


E4 It1200 | lr 0.000763 | loss 0.2280 | 156.0s


E4 It1400 | lr 0.000754 | loss 0.2251 | 182.0s


E4 It1600 | lr 0.000746 | loss 0.2236 | 208.0s


E4 It1800 | lr 0.000737 | loss 0.2203 | 234.0s


E4 It2000 | lr 0.000728 | loss 0.2171 | 260.0s


E4 It2200 | lr 0.000719 | loss 0.2162 | 286.0s


E4 It2400 | lr 0.000710 | loss 0.2176 | 311.9s


E4 It2600 | lr 0.000701 | loss 0.2215 | 338.0s


E4 It2800 | lr 0.000692 | loss 0.2242 | 363.9s


E4 It3000 | lr 0.000683 | loss 0.2260 | 389.9s


E4 It3200 | lr 0.000673 | loss 0.2281 | 415.9s


E4 It3400 | lr 0.000664 | loss 0.2300 | 442.0s


E4 It3600 | lr 0.000654 | loss 0.2314 | 468.0s


E4 It3800 | lr 0.000645 | loss 0.2319 | 494.0s


E4 It4000 | lr 0.000635 | loss 0.2319 | 520.0s


E4 It4200 | lr 0.000626 | loss 0.2312 | 546.0s


E4 It4400 | lr 0.000616 | loss 0.2315 | 571.9s


Epoch 4/8 | train_loss 0.2306 | val_macroF1_seqavg 0.2735 | ep_time 644.0s


E5 It200 | lr 0.000602 | loss 0.1827 | 26.2s


E5 It400 | lr 0.000592 | loss 0.1855 | 52.1s


E5 It600 | lr 0.000582 | loss 0.1837 | 77.9s


E5 It800 | lr 0.000572 | loss 0.1832 | 103.8s


E5 It1000 | lr 0.000562 | loss 0.1826 | 129.6s


E5 It1200 | lr 0.000552 | loss 0.1809 | 155.5s


E5 It1400 | lr 0.000542 | loss 0.1807 | 181.4s


E5 It1600 | lr 0.000532 | loss 0.1801 | 207.4s


E5 It1800 | lr 0.000522 | loss 0.1797 | 233.4s


E5 It2000 | lr 0.000512 | loss 0.1772 | 259.3s


E5 It2200 | lr 0.000502 | loss 0.1772 | 285.3s


E5 It2400 | lr 0.000492 | loss 0.1764 | 311.3s


E5 It2600 | lr 0.000483 | loss 0.1787 | 337.3s


E5 It2800 | lr 0.000473 | loss 0.1825 | 363.3s


E5 It3000 | lr 0.000463 | loss 0.1849 | 389.2s


E5 It3200 | lr 0.000453 | loss 0.1859 | 415.1s


E5 It3400 | lr 0.000443 | loss 0.1865 | 441.1s


E5 It3600 | lr 0.000433 | loss 0.1860 | 467.0s


E5 It3800 | lr 0.000423 | loss 0.1853 | 492.8s


E5 It4000 | lr 0.000413 | loss 0.1849 | 518.7s


E5 It4200 | lr 0.000403 | loss 0.1846 | 544.5s


E5 It4400 | lr 0.000394 | loss 0.1847 | 570.3s


Epoch 5/8 | train_loss 0.1840 | val_macroF1_seqavg 0.2738 | ep_time 641.8s


E6 It200 | lr 0.000379 | loss 0.1358 | 26.2s


E6 It400 | lr 0.000369 | loss 0.1335 | 52.1s


E6 It600 | lr 0.000360 | loss 0.1374 | 78.0s


E6 It800 | lr 0.000350 | loss 0.1382 | 103.8s


E6 It1000 | lr 0.000341 | loss 0.1372 | 129.7s


E6 It1200 | lr 0.000331 | loss 0.1377 | 155.6s


E6 It1400 | lr 0.000322 | loss 0.1412 | 181.5s


E6 It1600 | lr 0.000313 | loss 0.1436 | 207.5s


E6 It1800 | lr 0.000303 | loss 0.1448 | 233.5s


E6 It2000 | lr 0.000294 | loss 0.1453 | 259.6s


E6 It2200 | lr 0.000285 | loss 0.1458 | 285.6s


E6 It2400 | lr 0.000276 | loss 0.1468 | 311.6s


E6 It2600 | lr 0.000267 | loss 0.1464 | 337.4s


E6 It2800 | lr 0.000259 | loss 0.1462 | 363.3s


E6 It3000 | lr 0.000250 | loss 0.1455 | 389.1s


E6 It3200 | lr 0.000241 | loss 0.1446 | 414.9s


E6 It3400 | lr 0.000233 | loss 0.1436 | 440.8s


E6 It3600 | lr 0.000225 | loss 0.1435 | 466.6s


E6 It3800 | lr 0.000216 | loss 0.1432 | 492.4s


E6 It4000 | lr 0.000208 | loss 0.1419 | 518.1s


E6 It4200 | lr 0.000200 | loss 0.1409 | 543.9s


E6 It4400 | lr 0.000192 | loss 0.1403 | 569.7s


Epoch 6/8 | train_loss 0.1399 | val_macroF1_seqavg 0.2868 | ep_time 640.8s


E7 It200 | lr 0.000181 | loss 0.1269 | 26.2s


E7 It400 | lr 0.000173 | loss 0.1202 | 52.0s


E7 It600 | lr 0.000165 | loss 0.1258 | 77.8s


E7 It800 | lr 0.000158 | loss 0.1238 | 103.8s


E7 It1000 | lr 0.000151 | loss 0.1199 | 129.7s


E7 It1200 | lr 0.000144 | loss 0.1183 | 155.6s


E7 It1400 | lr 0.000137 | loss 0.1161 | 181.4s


E7 It1600 | lr 0.000130 | loss 0.1152 | 207.3s


E7 It1800 | lr 0.000123 | loss 0.1147 | 233.1s


E7 It2000 | lr 0.000117 | loss 0.1137 | 258.9s


E7 It2200 | lr 0.000111 | loss 0.1138 | 284.7s


E7 It2400 | lr 0.000104 | loss 0.1129 | 310.6s


E7 It2600 | lr 0.000098 | loss 0.1122 | 336.4s


E7 It2800 | lr 0.000093 | loss 0.1127 | 362.2s


E7 It3000 | lr 0.000087 | loss 0.1120 | 388.0s


E7 It3200 | lr 0.000081 | loss 0.1106 | 413.8s


E7 It3400 | lr 0.000076 | loss 0.1094 | 439.6s


E7 It3600 | lr 0.000071 | loss 0.1085 | 465.4s


E7 It3800 | lr 0.000066 | loss 0.1076 | 491.2s


E7 It4000 | lr 0.000061 | loss 0.1071 | 517.0s


E7 It4200 | lr 0.000056 | loss 0.1074 | 542.8s


E7 It4400 | lr 0.000052 | loss 0.1070 | 568.6s


Epoch 7/8 | train_loss 0.1073 | val_macroF1_seqavg 0.3245 | ep_time 641.0s


E8 It200 | lr 0.000045 | loss 0.1083 | 26.0s


E8 It400 | lr 0.000041 | loss 0.1029 | 51.7s


E8 It600 | lr 0.000037 | loss 0.1019 | 77.4s


E8 It800 | lr 0.000034 | loss 0.0977 | 103.2s


E8 It1000 | lr 0.000030 | loss 0.0943 | 128.8s


E8 It1200 | lr 0.000027 | loss 0.0957 | 154.4s


E8 It1400 | lr 0.000024 | loss 0.0965 | 180.1s


E8 It1600 | lr 0.000021 | loss 0.0960 | 205.8s


E8 It1800 | lr 0.000018 | loss 0.0971 | 231.5s


E8 It2000 | lr 0.000015 | loss 0.0968 | 257.2s


E8 It2200 | lr 0.000013 | loss 0.0960 | 283.0s


E8 It2400 | lr 0.000011 | loss 0.0971 | 308.7s


E8 It2600 | lr 0.000009 | loss 0.0964 | 334.5s


E8 It2800 | lr 0.000007 | loss 0.0972 | 360.2s


E8 It3000 | lr 0.000006 | loss 0.0964 | 386.0s


E8 It3200 | lr 0.000004 | loss 0.0959 | 411.6s


E8 It3400 | lr 0.000003 | loss 0.0954 | 437.3s


E8 It3600 | lr 0.000002 | loss 0.0951 | 463.0s


E8 It3800 | lr 0.000001 | loss 0.0953 | 488.7s


E8 It4000 | lr 0.000001 | loss 0.0951 | 514.3s


E8 It4200 | lr 0.000000 | loss 0.0949 | 539.9s


E8 It4400 | lr 0.000000 | loss 0.0953 | 565.6s


Epoch 8/8 | train_loss 0.0955 | val_macroF1_seqavg 0.3449 | ep_time 635.0s


Fold 1 done. Best val_macroF1_seqavg=0.3449. ckpt=artifacts/convnext_tiny_fold1.pth | total 5129.6s
Training fold 2 for 8 epochs...
Fold 2: train 172428 val 6994


  scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())




E1 It200 | lr 0.000037 | loss 1.5829 | 26.3s


E1 It400 | lr 0.000074 | loss 1.1827 | 52.4s


E1 It600 | lr 0.000112 | loss 1.0142 | 78.5s


E1 It800 | lr 0.000149 | loss 0.9252 | 104.6s


E1 It1000 | lr 0.000186 | loss 0.8737 | 130.7s


E1 It1200 | lr 0.000223 | loss 0.8352 | 156.7s


E1 It1400 | lr 0.000260 | loss 0.8140 | 182.7s


E1 It1600 | lr 0.000297 | loss 0.7989 | 208.7s


E1 It1800 | lr 0.000334 | loss 0.7900 | 234.7s


E1 It2000 | lr 0.000371 | loss 0.7854 | 260.7s


E1 It2200 | lr 0.000408 | loss 0.7757 | 286.7s


E1 It2400 | lr 0.000446 | loss 0.7569 | 312.6s


E1 It2600 | lr 0.000483 | loss 0.7419 | 338.6s


E1 It2800 | lr 0.000520 | loss 0.7288 | 364.7s


E1 It3000 | lr 0.000557 | loss 0.7203 | 390.7s


E1 It3200 | lr 0.000594 | loss 0.7155 | 416.7s


E1 It3400 | lr 0.000631 | loss 0.7139 | 442.7s


E1 It3600 | lr 0.000668 | loss 0.7135 | 468.7s


E1 It3800 | lr 0.000705 | loss 0.7143 | 494.7s


E1 It4000 | lr 0.000742 | loss 0.7151 | 520.8s


E1 It4200 | lr 0.000780 | loss 0.7164 | 546.9s


E1 It4400 | lr 0.000817 | loss 0.7084 | 573.0s


E1 It4600 | lr 0.000854 | loss 0.7012 | 599.0s


E1 It4800 | lr 0.000891 | loss 0.6948 | 625.1s


E1 It5000 | lr 0.000928 | loss 0.6895 | 651.1s


E1 It5200 | lr 0.000965 | loss 0.6853 | 677.1s


Epoch 1/8 | train_loss 0.6832 | val_macroF1_seqavg 0.4715 | ep_time 713.2s


E2 It200 | lr 0.001000 | loss 0.6560 | 26.3s


E2 It400 | lr 0.001000 | loss 0.6487 | 52.2s


E2 It600 | lr 0.000999 | loss 0.6398 | 78.1s


E2 It800 | lr 0.000999 | loss 0.6415 | 104.0s


E2 It1000 | lr 0.000998 | loss 0.6073 | 130.0s


E2 It1200 | lr 0.000998 | loss 0.5819 | 155.9s


E2 It1400 | lr 0.000997 | loss 0.5682 | 181.9s


E2 It1600 | lr 0.000996 | loss 0.5572 | 207.9s


E2 It1800 | lr 0.000994 | loss 0.5485 | 233.9s


E2 It2000 | lr 0.000993 | loss 0.5432 | 260.0s


E2 It2200 | lr 0.000992 | loss 0.5392 | 286.1s


E2 It2400 | lr 0.000990 | loss 0.5368 | 312.2s


E2 It2600 | lr 0.000988 | loss 0.5326 | 338.1s


E2 It2800 | lr 0.000986 | loss 0.5298 | 364.1s


E2 It3000 | lr 0.000984 | loss 0.5214 | 390.0s


E2 It3200 | lr 0.000982 | loss 0.5132 | 416.0s


E2 It3400 | lr 0.000980 | loss 0.5045 | 442.1s


E2 It3600 | lr 0.000978 | loss 0.4979 | 468.1s


E2 It3800 | lr 0.000975 | loss 0.4915 | 493.9s


E2 It4000 | lr 0.000973 | loss 0.4872 | 519.8s


E2 It4200 | lr 0.000970 | loss 0.4831 | 545.7s


E2 It4400 | lr 0.000967 | loss 0.4791 | 571.6s


E2 It4600 | lr 0.000964 | loss 0.4755 | 597.6s


E2 It4800 | lr 0.000961 | loss 0.4727 | 623.5s


E2 It5000 | lr 0.000957 | loss 0.4664 | 649.5s


E2 It5200 | lr 0.000954 | loss 0.4606 | 675.6s


Epoch 2/8 | train_loss 0.4555 | val_macroF1_seqavg 0.4710 | ep_time 710.9s


E3 It200 | lr 0.000947 | loss 0.3151 | 26.5s


E3 It400 | lr 0.000943 | loss 0.3033 | 52.5s


E3 It600 | lr 0.000939 | loss 0.3082 | 78.5s


E3 It800 | lr 0.000935 | loss 0.3117 | 104.5s


E3 It1000 | lr 0.000931 | loss 0.3178 | 130.5s


E3 It1200 | lr 0.000927 | loss 0.3178 | 156.5s


E3 It1400 | lr 0.000922 | loss 0.3195 | 182.4s


E3 It1600 | lr 0.000918 | loss 0.3145 | 208.3s


E3 It1800 | lr 0.000913 | loss 0.3089 | 234.2s


E3 It2000 | lr 0.000908 | loss 0.3044 | 260.2s


E3 It2200 | lr 0.000903 | loss 0.3000 | 286.2s


E3 It2400 | lr 0.000898 | loss 0.2953 | 312.1s


E3 It2600 | lr 0.000893 | loss 0.2942 | 338.1s


E3 It2800 | lr 0.000888 | loss 0.2925 | 364.2s


E3 It3000 | lr 0.000883 | loss 0.2905 | 390.2s


E3 It3200 | lr 0.000877 | loss 0.2893 | 416.4s


E3 It3400 | lr 0.000872 | loss 0.2883 | 442.5s


E3 It3600 | lr 0.000866 | loss 0.2844 | 468.5s


E3 It3800 | lr 0.000861 | loss 0.2811 | 494.5s


E3 It4000 | lr 0.000855 | loss 0.2780 | 520.5s


E3 It4200 | lr 0.000849 | loss 0.2747 | 546.5s


E3 It4400 | lr 0.000843 | loss 0.2718 | 572.3s


E3 It4600 | lr 0.000837 | loss 0.2691 | 598.2s


E3 It4800 | lr 0.000831 | loss 0.2685 | 624.1s


E3 It5000 | lr 0.000824 | loss 0.2693 | 650.1s


E3 It5200 | lr 0.000818 | loss 0.2705 | 676.2s


Epoch 3/8 | train_loss 0.2714 | val_macroF1_seqavg 0.4751 | ep_time 712.1s


E4 It200 | lr 0.000805 | loss 0.3020 | 26.4s


E4 It400 | lr 0.000799 | loss 0.3049 | 52.4s


E4 It600 | lr 0.000792 | loss 0.2992 | 78.4s


E4 It800 | lr 0.000785 | loss 0.2948 | 104.4s


E4 It1000 | lr 0.000778 | loss 0.2946 | 130.5s


E4 It1200 | lr 0.000771 | loss 0.2928 | 156.6s


E4 It1400 | lr 0.000764 | loss 0.2880 | 182.6s


E4 It1600 | lr 0.000757 | loss 0.2806 | 208.7s


E4 It1800 | lr 0.000750 | loss 0.2738 | 234.9s


E4 It2000 | lr 0.000743 | loss 0.2679 | 261.0s


E4 It2200 | lr 0.000735 | loss 0.2675 | 287.0s


E4 It2400 | lr 0.000728 | loss 0.2668 | 312.9s


E4 It2600 | lr 0.000720 | loss 0.2666 | 338.8s


E4 It2800 | lr 0.000713 | loss 0.2664 | 364.8s


E4 It3000 | lr 0.000705 | loss 0.2672 | 390.8s


E4 It3200 | lr 0.000698 | loss 0.2664 | 416.8s


E4 It3400 | lr 0.000690 | loss 0.2667 | 442.8s


E4 It3600 | lr 0.000682 | loss 0.2661 | 468.9s


E4 It3800 | lr 0.000675 | loss 0.2644 | 494.9s


E4 It4000 | lr 0.000667 | loss 0.2635 | 520.9s


E4 It4200 | lr 0.000659 | loss 0.2607 | 546.9s


E4 It4400 | lr 0.000651 | loss 0.2577 | 573.0s


E4 It4600 | lr 0.000643 | loss 0.2551 | 599.0s


E4 It4800 | lr 0.000635 | loss 0.2525 | 625.2s


E4 It5000 | lr 0.000627 | loss 0.2498 | 651.3s


E4 It5200 | lr 0.000619 | loss 0.2472 | 677.3s


Epoch 4/8 | train_loss 0.2447 | val_macroF1_seqavg 0.4878 | ep_time 713.2s


E5 It200 | lr 0.000603 | loss 0.1735 | 26.4s


E5 It400 | lr 0.000595 | loss 0.1793 | 52.4s


E5 It600 | lr 0.000587 | loss 0.1770 | 78.3s


E5 It800 | lr 0.000579 | loss 0.1714 | 104.3s


E5 It1000 | lr 0.000570 | loss 0.1728 | 130.2s


E5 It1200 | lr 0.000562 | loss 0.1740 | 156.1s


E5 It1400 | lr 0.000554 | loss 0.1736 | 182.0s


E5 It1600 | lr 0.000546 | loss 0.1725 | 207.8s


E5 It1800 | lr 0.000537 | loss 0.1733 | 233.7s


E5 It2000 | lr 0.000529 | loss 0.1727 | 259.5s


E5 It2200 | lr 0.000521 | loss 0.1722 | 285.5s


E5 It2400 | lr 0.000512 | loss 0.1747 | 311.4s


E5 It2600 | lr 0.000504 | loss 0.1772 | 337.3s


E5 It2800 | lr 0.000496 | loss 0.1801 | 363.2s


E5 It3000 | lr 0.000487 | loss 0.1820 | 389.2s


E5 It3200 | lr 0.000479 | loss 0.1826 | 415.1s


E5 It3400 | lr 0.000471 | loss 0.1829 | 441.0s


E5 It3600 | lr 0.000462 | loss 0.1836 | 466.8s


E5 It3800 | lr 0.000454 | loss 0.1841 | 492.7s


E5 It4000 | lr 0.000446 | loss 0.1843 | 518.7s


E5 It4200 | lr 0.000437 | loss 0.1837 | 544.6s


E5 It4400 | lr 0.000429 | loss 0.1829 | 570.5s


E5 It4600 | lr 0.000421 | loss 0.1813 | 596.5s


E5 It4800 | lr 0.000413 | loss 0.1801 | 622.3s


E5 It5000 | lr 0.000405 | loss 0.1784 | 648.2s


E5 It5200 | lr 0.000396 | loss 0.1773 | 674.1s


Epoch 5/8 | train_loss 0.1760 | val_macroF1_seqavg 0.4947 | ep_time 709.3s


E6 It200 | lr 0.000381 | loss 0.1429 | 26.4s


E6 It400 | lr 0.000373 | loss 0.1386 | 52.3s


E6 It600 | lr 0.000365 | loss 0.1347 | 78.3s


E6 It800 | lr 0.000357 | loss 0.1332 | 104.3s


E6 It1000 | lr 0.000349 | loss 0.1341 | 130.3s


E6 It1200 | lr 0.000341 | loss 0.1322 | 156.3s


E6 It1400 | lr 0.000333 | loss 0.1325 | 182.3s


E6 It1600 | lr 0.000325 | loss 0.1334 | 208.3s


E6 It1800 | lr 0.000317 | loss 0.1321 | 234.1s


E6 It2000 | lr 0.000309 | loss 0.1321 | 260.0s


E6 It2200 | lr 0.000302 | loss 0.1319 | 285.8s


E6 It2400 | lr 0.000294 | loss 0.1310 | 311.7s


E6 It2600 | lr 0.000287 | loss 0.1315 | 337.5s


E6 It2800 | lr 0.000279 | loss 0.1315 | 363.4s


E6 It3000 | lr 0.000272 | loss 0.1323 | 389.2s


E6 It3200 | lr 0.000264 | loss 0.1319 | 415.1s


E6 It3400 | lr 0.000257 | loss 0.1327 | 441.0s


E6 It3600 | lr 0.000250 | loss 0.1335 | 466.9s


E6 It3800 | lr 0.000243 | loss 0.1340 | 492.9s


E6 It4000 | lr 0.000235 | loss 0.1336 | 518.8s


E6 It4200 | lr 0.000228 | loss 0.1334 | 544.8s


E6 It4400 | lr 0.000221 | loss 0.1328 | 570.8s


E6 It4600 | lr 0.000215 | loss 0.1322 | 596.7s


E6 It4800 | lr 0.000208 | loss 0.1320 | 622.6s


E6 It5000 | lr 0.000201 | loss 0.1318 | 648.5s


E6 It5200 | lr 0.000194 | loss 0.1318 | 674.5s


Epoch 6/8 | train_loss 0.1311 | val_macroF1_seqavg 0.4970 | ep_time 710.2s


E7 It200 | lr 0.000182 | loss 0.1275 | 26.2s


E7 It400 | lr 0.000175 | loss 0.1374 | 52.0s


E7 It600 | lr 0.000169 | loss 0.1350 | 77.7s


E7 It800 | lr 0.000163 | loss 0.1355 | 103.5s


E7 It1000 | lr 0.000157 | loss 0.1374 | 129.2s


E7 It1200 | lr 0.000151 | loss 0.1349 | 155.0s


E7 It1400 | lr 0.000145 | loss 0.1330 | 180.7s


E7 It1600 | lr 0.000139 | loss 0.1300 | 206.5s


E7 It1800 | lr 0.000133 | loss 0.1277 | 232.4s


E7 It2000 | lr 0.000128 | loss 0.1268 | 258.2s


E7 It2200 | lr 0.000122 | loss 0.1251 | 284.1s


E7 It2400 | lr 0.000117 | loss 0.1227 | 309.9s


E7 It2600 | lr 0.000112 | loss 0.1211 | 335.8s


E7 It2800 | lr 0.000106 | loss 0.1186 | 361.5s


E7 It3000 | lr 0.000101 | loss 0.1168 | 387.3s


E7 It3200 | lr 0.000096 | loss 0.1164 | 413.0s


E7 It3400 | lr 0.000091 | loss 0.1150 | 438.6s


E7 It3600 | lr 0.000087 | loss 0.1135 | 464.3s


E7 It3800 | lr 0.000082 | loss 0.1125 | 490.0s


E7 It4000 | lr 0.000078 | loss 0.1114 | 515.7s


E7 It4200 | lr 0.000073 | loss 0.1105 | 541.5s


E7 It4400 | lr 0.000069 | loss 0.1098 | 567.2s


E7 It4600 | lr 0.000065 | loss 0.1093 | 592.9s


E7 It4800 | lr 0.000061 | loss 0.1091 | 618.6s


E7 It5000 | lr 0.000057 | loss 0.1080 | 644.3s


E7 It5200 | lr 0.000053 | loss 0.1078 | 670.0s


Epoch 7/8 | train_loss 0.1076 | val_macroF1_seqavg 0.5142 | ep_time 705.5s


E8 It200 | lr 0.000046 | loss 0.1077 | 26.1s


E8 It400 | lr 0.000043 | loss 0.0926 | 51.9s


E8 It600 | lr 0.000039 | loss 0.0957 | 77.7s


E8 It800 | lr 0.000036 | loss 0.0932 | 103.4s


E8 It1000 | lr 0.000033 | loss 0.0957 | 129.1s


E8 It1200 | lr 0.000030 | loss 0.0981 | 154.7s


E8 It1400 | lr 0.000027 | loss 0.0950 | 180.3s


E8 It1600 | lr 0.000025 | loss 0.0941 | 205.9s


E8 It1800 | lr 0.000022 | loss 0.0953 | 231.5s


E8 It2000 | lr 0.000020 | loss 0.0938 | 257.2s


E8 It2200 | lr 0.000018 | loss 0.0940 | 282.9s


E8 It2400 | lr 0.000015 | loss 0.0940 | 308.6s


E8 It2600 | lr 0.000013 | loss 0.0940 | 334.3s


E8 It2800 | lr 0.000012 | loss 0.0941 | 360.0s


E8 It3000 | lr 0.000010 | loss 0.0941 | 385.8s


E8 It3200 | lr 0.000008 | loss 0.0949 | 411.5s


E8 It3400 | lr 0.000007 | loss 0.0952 | 437.1s


E8 It3600 | lr 0.000006 | loss 0.0953 | 462.8s


E8 It3800 | lr 0.000004 | loss 0.0940 | 488.6s


E8 It4000 | lr 0.000003 | loss 0.0937 | 514.4s


E8 It4200 | lr 0.000002 | loss 0.0935 | 540.1s


E8 It4400 | lr 0.000002 | loss 0.0931 | 565.7s


E8 It4600 | lr 0.000001 | loss 0.0929 | 591.4s


E8 It4800 | lr 0.000001 | loss 0.0932 | 617.1s


E8 It5000 | lr 0.000000 | loss 0.0933 | 642.8s


E8 It5200 | lr 0.000000 | loss 0.0933 | 668.5s


Epoch 8/8 | train_loss 0.0929 | val_macroF1_seqavg 0.5316 | ep_time 704.1s


Fold 2 done. Best val_macroF1_seqavg=0.5316. ckpt=artifacts/convnext_tiny_fold2.pth | total 5682.8s
Ensembling checkpoints: ['artifacts/convnext_tiny_fold0.pth', 'artifacts/convnext_tiny_fold1.pth', 'artifacts/convnext_tiny_fold2.pth']
Infer with artifacts/convnext_tiny_fold0.pth


  ckpt = torch.load(ckpt_path, map_location=device)


Infer with artifacts/convnext_tiny_fold1.pth


Infer with artifacts/convnext_tiny_fold2.pth


Wrote blended submission.csv (16877, 2) using 3 checkpoints


In [15]:
# Fix submission.csv to match required headers: Id, Category
import pandas as pd
sub = pd.read_csv('submission.csv')
rename_map = {}
if 'id' in sub.columns: rename_map['id'] = 'Id'
if 'category_id' in sub.columns: rename_map['category_id'] = 'Category'
if rename_map:
    sub = sub.rename(columns=rename_map)
# Ensure correct column order and dtypes
expected_cols = ['Id', 'Category']
missing = [c for c in expected_cols if c not in sub.columns]
assert not missing, f'Missing required columns: {missing}'
sub = sub[expected_cols]
sub['Category'] = sub['Category'].astype(int)
sub.to_csv('submission.csv', index=False)
print('Fixed submission.csv. Columns:', sub.columns.tolist(), 'Shape:', sub.shape)
print(sub.head())

Fixed submission.csv. Columns: ['Id', 'Category'] Shape: (16877, 2)
                                     Id  Category
0  5998cfa4-23d2-11e8-a6a3-ec086b02610b        19
1  599fbd89-23d2-11e8-a6a3-ec086b02610b         0
2  59fae563-23d2-11e8-a6a3-ec086b02610b         4
3  5a24a741-23d2-11e8-a6a3-ec086b02610b         0
4  59eab924-23d2-11e8-a6a3-ec086b02610b         0
