# Cassava Leaf Disease Classification – Plan

Goal: Win a medal (>= 0.8978 accuracy).

Data available:
- train_images/ (21k imgs), train.csv with labels 0..4
- test_images/ (~3k imgs) to predict
- label map json
- TFRecords exist but we can start from raw JPEGs for simplicity; may switch if throughput needed

Metric: Accuracy (LB).

High-level approach:
1) Establish strong, fast baseline using pretrained ImageNet models from timm on GPU with mixed precision.
   - Start with convnext_base or efficientnet_b3/b4, image_size=512 (or 380/448 first for speed).
   - Augmentations: flips, light affine, color jitter, CutMix/MixUp (prob ~0.2), RandomResizedCrop, Normalize.
   - Loss: cross-entropy with label smoothing (0.05) or focal for class imbalance; track which wins on CV.
   - Optimizer: AdamW, lr ~2e-4, cosine schedule, warmup.
   - Training epochs: short 5-8 epochs for baseline with early stopping (patience 2).
   - Stratified KFold (5 folds), deterministic seed; Amp + gradient accumulation if needed.
   - Balanced sampler or class weights for imbalance.

2) Validation rigor:
   - Single fixed StratifiedKFold(5, shuffle=True, random_state=42).
   - Use same preproc inside each fold; no leakage.
   - Save OOF predictions and per-fold metrics; analyze confusion matrix and per-class recall.

3) Inference:
   - EMA model or best ckpt per fold.
   - TTA (horizontal flip and 3-scale crop or resize): start with 2-4 TTAs; ensure speed.
   - Blend folds by averaging softmax logits.

4) Iteration path:
   - Baseline A: effnet_b3 380px, 5-fold, 5 epochs -> sanity OOF and first submission.
   - Baseline B: convnext_base 448/512px, 5-8 epochs -> should reach medal zone.
   - Tune aug and loss; test CutMix/MixUp vs none.
   - Train second diverse model (e.g., tf_efficientnetv2_s or tf_efficientnet_b4) and blend.
   - If time: add SWA or EMA, calibrate thresholds (not needed for accuracy).

5) Efficiency:
   - Verify GPU usable; use num_workers and prefetch; cache resized images if helpful.
   - Log epoch times, fold index, memory usage.

6) Risk checks:
   - Ensure label mapping consistent, submission format matches sample.
   - Handle EXIF orientation.
   - Deterministic seeds; track experiments.

Questions for experts (next step):
- Which backbones and input sizes typically medal on Cassava? convnext_base 512 vs. effnet_b4 512 vs. nfnet_l0 448?
- Best augmentation recipes proven on this dataset? (color jitter strength, CutMix/MixUp probs, RandAugment?)
- Preferred TTA set and count for optimal LB vs. speed?
- Any pitfalls with this dataset (duplicates, leakage, domain shifts) and best CV protocol?

Next actions:
- Verify GPU, packages (timm, albumentations, torch).
- Create training notebook: dataset, transforms, model, 5-fold loop, OOF, and inference + submission.
- Run quick smoke (1 epoch, 1 fold) to validate pipeline, then scale to 5 folds.

In [1]:
# Setup: install deps, verify GPU, quick data peek
import sys, subprocess, json, os, time, glob, platform
from pathlib import Path

def pip_install(packages):
    cmd = [sys.executable, '-m', 'pip', 'install', '--upgrade'] + packages
    print('Installing:', ' '.join(packages)); sys.stdout.flush()
    return subprocess.run(cmd, check=True)

def pip_uninstall(packages):
    cmd = [sys.executable, '-m', 'pip', 'uninstall', '-y'] + packages
    print('Uninstalling:', ' '.join(packages)); sys.stdout.flush()
    return subprocess.run(cmd, check=False)

# Install PyTorch (CUDA 12.1) + libs if missing
try:
    import torch
    import torchvision
except Exception as e:
    print('Installing torch/torchvision for CUDA 12.1...')
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--index-url', 'https://download.pytorch.org/whl/cu121', 'torch', 'torchvision'], check=True)
    import torch, torchvision

try:
    import timm
except:
    pip_install(['timm'])
    import timm

# Prefer headless OpenCV to avoid libGL issues
cv2_ready = False
try:
    import albumentations as A
    import cv2
    cv2_ready = True
except Exception as e:
    print('Albumentations/cv2 import failed, switching to headless OpenCV. Error:', e)
    pip_uninstall(['opencv-python'])
    pip_install(['albumentations', 'opencv-python-headless'])
    import albumentations as A
    import cv2
    cv2_ready = True

import pandas as pd, numpy as np

print('Python:', platform.python_version())
print('Torch:', torch.__version__)
print('Torchvision:', torchvision.__version__)
print('timm:', timm.__version__)
print('Albumentations:', A.__version__)
print('cv2 headless OK:', cv2_ready)

# GPU check
print('GPU Available:', torch.cuda.is_available())
print('GPU Count:', torch.cuda.device_count())
if torch.cuda.is_available():
    print('GPU Name:', torch.cuda.get_device_name(0))
    props = torch.cuda.get_device_properties(0)
    print(f'GPU Memory: {props.total_memory / 1024**3:.1f} GB')

BASE = Path('.')
data_dir = BASE
train_csv = data_dir/'train.csv'
test_dir = data_dir/'test_images'
train_dir = data_dir/'train_images'
label_map_path = data_dir/'label_num_to_disease_map.json'

print('Files present:', os.listdir(data_dir))
df = pd.read_csv(train_csv)
print('train.csv shape:', df.shape)
print(df.head())
print('label value_counts:\n', df['label'].value_counts().sort_index())

with open(label_map_path) as f:
    label_map = json.load(f)
print('Label map keys:', list(label_map.keys()))

test_images = sorted([p.name for p in Path(test_dir).glob('*.jpg')])
train_images = sorted([p.name for p in Path(train_dir).glob('*.jpg')])
print('Train images:', len(train_images), 'Test images:', len(test_images))
print('Sample train images:', train_images[:5])
print('Sample test images:', test_images[:5])

# Sanity: sample_submission format
ss = pd.read_csv(data_dir/'sample_submission.csv')
print('sample_submission columns:', ss.columns.tolist(), 'shape:', ss.shape)

print('Setup complete.')

Python: 3.11.0rc1
Torch: 2.5.1+cu121
Torchvision: 0.20.1+cu121
timm: 1.0.19
Albumentations: 2.0.8
cv2 headless OK: True
GPU Available: True
GPU Count: 1
GPU Name: NVIDIA A10-24Q
GPU Memory: 23.7 GB
Files present: ['sample_submission.csv', 'test_tfrecords', 'docker_run.log', 'requirements.txt', 'test_images', 'train_with_groups.csv', 'submission.csv', 'train_images', 'train.csv', 'agent_metadata', 'task.txt', '00_eda_and_planning.ipynb', 'label_num_to_disease_map.json', 'train_tfrecords', 'description.md']
train.csv shape: (18721, 2)
         image_id  label
0  1000015157.jpg      0
1  1000201771.jpg      3
2   100042118.jpg      1
3  1000723321.jpg      1
4  1000812911.jpg      3
label value_counts:
 label
0      939
1     1901
2     2091
3    11523
4     2267
Name: count, dtype: int64
Label map keys: ['0', '1', '2', '3', '4']
Train images: 18721 Test images: 2676
Sample train images: ['1000015157.jpg', '1000201771.jpg', '100042118.jpg', '1000723321.jpg', '1000812911.jpg']
Sample test 

In [2]:
# Dataset, transforms, and utilities
import math, random
from PIL import Image, ImageOps
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
import albumentations as A
from albumentations.pytorch import ToTensorV2
import cv2

IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD  = (0.229, 0.224, 0.225)

def seed_everything(seed: int = 42):
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

def get_train_transforms(size=512):
    return A.Compose([
        A.RandomResizedCrop(size=(size, size), scale=(0.7, 1.0), ratio=(0.9, 1.1), p=1.0),
        A.HorizontalFlip(p=0.5),
        A.Affine(
            scale=(0.9, 1.1), translate_percent=(-0.1, 0.1), rotate=(-15, 15), shear=(-5, 5),
            border_mode=cv2.BORDER_REFLECT_101, value=0, p=0.7
        ),
        A.HueSaturationValue(10, 15, 10, p=0.5),
        A.RandomBrightnessContrast(0.2, 0.2, p=0.5),
        A.GaussianBlur(blur_limit=(3, 5), p=0.1),
        A.CoarseDropout(
            max_holes=1,
            max_height=int(0.2*size), max_width=int(0.2*size),
            min_height=int(0.05*size), min_width=int(0.05*size),
            fill_value=0, p=0.15
        ),
        A.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
        ToTensorV2(),
    ])

def get_valid_transforms(size=512):
    return A.Compose([
        A.LongestMaxSize(max_size=size),
        A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),
        A.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
        ToTensorV2(),
    ])

def load_image_rgb(path: str) -> Image.Image:
    img = Image.open(path).convert('RGB')
    # handle EXIF orientation
    img = ImageOps.exif_transpose(img)
    return img

class CassavaDataset(Dataset):
    def __init__(self, df, img_dir, transforms=None):
        self.df = df.reset_index(drop=True)
        self.img_dir = Path(img_dir)
        self.transforms = transforms
        self.has_label = 'label' in self.df.columns
    def __len__(self):
        return len(self.df)
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        img_path = self.img_dir / row['image_id']
        img = load_image_rgb(str(img_path))
        img_np = np.array(img)
        if self.transforms is not None:
            img_np = self.transforms(image=img_np)['image']
        if self.has_label:
            label = int(row['label'])
            return img_np, label
        else:
            return img_np, row['image_id']

def make_loader(df, img_dir, transforms, batch_size=32, shuffle=False, num_workers=4):
    ds = CassavaDataset(df, img_dir, transforms)
    return DataLoader(ds, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers, pin_memory=True, drop_last=shuffle, persistent_workers=num_workers>0)

def check_submission_format(sub_path='submission.csv', required_cols=('image_id','label')):
    if not os.path.exists(sub_path):
        print('submission.csv not found')
        return False
    sub = pd.read_csv(sub_path)
    ok = list(sub.columns)==list(required_cols)
    labs_ok = sub['label'].dtype.kind in 'iu' and sub['label'].between(0,4).all()
    print('Submission cols OK:', ok, 'Labels int[0..4]:', labs_ok, 'Shape:', sub.shape)
    print('Label value_counts:', sub['label'].value_counts().to_dict())
    return ok and labs_ok

seed_everything(42)
print('Utils ready.')

Utils ready.


In [3]:
# Training scaffold: model, loop, CV, and inference helpers (not executed yet)
import time
from sklearn.model_selection import StratifiedKFold
import timm
from timm.data import Mixup
from timm.utils import ModelEmaV2
from timm.loss import SoftTargetCrossEntropy

class CFG:
    seed = 42
    model_name = 'convnext_base'
    img_size = 448
    epochs = 8
    batch_size = 40
    lr = 2e-4
    weight_decay = 1e-4
    num_workers = 8
    n_splits = 5
    ls = 0.1
    mixup_alpha = 1.0
    cutmix_alpha = 1.0
    mixup_prob = 0.3
    mixup_switch_prob = 0.5
    use_mixup = True
    use_ema = True
    ema_decay = 0.999
    tta_hflip = True
    tta_scales = []  # e.g., [0.95, 1.05] later
    smoke = False  # set True for quick debug

def build_model(num_classes=5):
    model = timm.create_model(CFG.model_name, pretrained=True, num_classes=num_classes)
    model = model.to('cuda' if torch.cuda.is_available() else 'cpu')
    if torch.cuda.is_available():
        model = model.to(memory_format=torch.channels_last)
    return model

def get_mixup_fn():
    if not CFG.use_mixup:
        return None
    return Mixup(mixup_alpha=CFG.mixup_alpha, cutmix_alpha=CFG.cutmix_alpha, prob=CFG.mixup_prob, switch_prob=CFG.mixup_switch_prob, label_smoothing=CFG.ls, num_classes=5)

def train_one_epoch(model, loader, optimizer, scaler, mixup_fn=None, ema=None, scheduler=None):
    model.train()
    device = next(model.parameters()).device
    total_loss, total_cnt = 0.0, 0
    if mixup_fn is not None:
        criterion = SoftTargetCrossEntropy().to(device)
    else:
        criterion = nn.CrossEntropyLoss(label_smoothing=CFG.ls).to(device)
    start = time.time()
    for it, (x, y) in enumerate(loader):
        x = x.to(device, non_blocking=True)
        if torch.cuda.is_available():
            x = x.to(memory_format=torch.channels_last)
        y = y.to(device, non_blocking=True)
        if mixup_fn is not None:
            x, y = mixup_fn(x, y)
        optimizer.zero_grad(set_to_none=True)
        with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
            logits = model(x)
            loss = criterion(logits, y)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        if ema is not None:
            ema.update(model)
        if scheduler is not None:
            scheduler.step()
        total_loss += loss.item() * x.size(0)
        total_cnt += x.size(0)
        if it % 50 == 0:
            elapsed = time.time() - start
            print(f'  iter {it}/{len(loader)} loss {loss.item():.4f} elapsed {elapsed:.1f}s');
            start = time.time()
    return total_loss / max(total_cnt,1)

def validate(model, loader):
    model.eval()
    device = next(model.parameters()).device
    total = 0
    correct = 0
    criterion = nn.CrossEntropyLoss().to(device)
    loss_sum = 0.0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device, non_blocking=True)
            if torch.cuda.is_available():
                x = x.to(memory_format=torch.channels_last)
            y = y.to(device, non_blocking=True)
            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                logits = model(x)
                loss = criterion(logits, y)
            loss_sum += loss.item() * x.size(0)
            prob = logits.softmax(dim=1)
            pred = prob.argmax(dim=1)
            correct += (pred == y).sum().item()
            total += x.size(0)
    acc = correct / max(total,1)
    return loss_sum / max(total,1), acc

def get_scheduler(optimizer, steps_per_epoch):
    # Cosine schedule with warmup of 1 epoch
    warmup_steps = steps_per_epoch * 1
    total_steps = steps_per_epoch * CFG.epochs
    def lr_lambda(step):
        if step < warmup_steps:
            return max(1e-8, step / max(1, warmup_steps))
        progress = (step - warmup_steps) / max(1, total_steps - warmup_steps)
        return 0.5 * (1 + math.cos(math.pi * progress))
    return torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda)

def run_cv_and_train(df, train_dir):
    seed_everything(CFG.seed)
    skf = StratifiedKFold(n_splits=CFG.n_splits, shuffle=True, random_state=CFG.seed)
    oof = np.zeros((len(df), 5), dtype=np.float32)
    fold_indices = list(skf.split(df['image_id'], df['label']))
    if CFG.smoke:
        fold_indices = fold_indices[:1]
        print('SMOKE RUN: 1 fold only, 1 epoch');
        orig_epochs = CFG.epochs; CFG.epochs = 1
    for fold, (tr_idx, va_idx) in enumerate(fold_indices):
        print(f'Fold {fold} train {len(tr_idx)} valid {len(va_idx)}')
        df_tr = df.iloc[tr_idx].reset_index(drop=True)
        df_va = df.iloc[va_idx].reset_index(drop=True)
        train_tfms = get_train_transforms(CFG.img_size)
        valid_tfms = get_valid_transforms(CFG.img_size)
        train_loader = make_loader(df_tr, train_dir, train_tfms, batch_size=CFG.batch_size, shuffle=True, num_workers=CFG.num_workers)
        valid_loader = make_loader(df_va, train_dir, valid_tfms, batch_size=CFG.batch_size, shuffle=False, num_workers=CFG.num_workers)
        model = build_model(num_classes=5)
        optimizer = torch.optim.AdamW(model.parameters(), lr=CFG.lr, weight_decay=CFG.weight_decay)
        steps_per_epoch = max(1, len(train_loader))
        scheduler = get_scheduler(optimizer, steps_per_epoch)
        scaler = torch.amp.GradScaler('cuda', enabled=torch.cuda.is_available())
        mixup_fn = get_mixup_fn()
        ema = ModelEmaV2(model, decay=CFG.ema_decay, device='cpu') if CFG.use_ema else None
        best_acc, best_state = -1.0, None
        for epoch in range(CFG.epochs):
            print(f'Epoch {epoch+1}/{CFG.epochs}');
            t0 = time.time()
            train_loss = train_one_epoch(model, train_loader, optimizer, scaler, mixup_fn, ema=ema, scheduler=scheduler)
            val_loss, val_acc = validate(ema.module if ema is not None else model, valid_loader)
            print(f'  train_loss {train_loss:.4f} val_loss {val_loss:.4f} val_acc {val_acc:.4f} epoch_time {time.time()-t0:.1f}s')
            if val_acc > best_acc:
                best_acc = val_acc
                best_state = (ema.module if ema is not None else model).state_dict()
        if CFG.smoke:
            CFG.epochs = orig_epochs
    return oof

def infer_test(model, df_test, test_dir, size=None, tta_hflip=True, tta_scales=None, batch_size=32):
    size = size or CFG.img_size
    dev = next(model.parameters()).device
    model.eval()
    logits_sum = []
    # Base transform
    def make_tfms(sz):
        return A.Compose([
            A.LongestMaxSize(max_size=sz),
            A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
            A.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
            ToTensorV2()
        ])
    tfms_list = [(make_tfms(size), False)]
    if tta_hflip:
        tfms_list.append((make_tfms(size), True))
    if tta_scales:
        for s in tta_scales:
            sz = int(round(size * s))
            tfms_list.append((make_tfms(sz), False))
    for (tfms, do_flip) in tfms_list:
        ds = CassavaDataset(df_test[['image_id']].copy(), test_dir, transforms=tfms)
        dl = DataLoader(ds, batch_size=batch_size, shuffle=False, num_workers=CFG.num_workers, pin_memory=True)
        part_logits = []
        with torch.no_grad():
            for x, ids in dl:
                if do_flip:
                    x = torch.flip(x, dims=[-1])
                x = x.to(dev, non_blocking=True)
                if torch.cuda.is_available():
                    x = x.to(memory_format=torch.channels_last)
                with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                    logits = model(x)
                part_logits.append(logits.float().cpu().numpy())
        logits_sum.append(np.concatenate(part_logits, axis=0))
    logits_mean = np.mean(logits_sum, axis=0)
    return logits_mean

print('Training scaffold ready. Configure CFG and call run_cv_and_train(df, train_dir) when ready.')

Training scaffold ready. Configure CFG and call run_cv_and_train(df, train_dir) when ready.


In [None]:
# Smoke run: 1 fold x 1 epoch, then inference to submission.csv
from sklearn.model_selection import StratifiedKFold
import torch
import pandas as pd
import numpy as np

seed_everything(42)
CFG.model_name = 'convnext_tiny'
CFG.img_size = 384
CFG.batch_size = 32
CFG.epochs = 1
CFG.num_workers = 6
CFG.use_ema = True
CFG.use_mixup = True

print('Starting SMOKE training (1 fold, 1 epoch) ...')
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=CFG.seed)
tr_idx, va_idx = list(skf.split(df['image_id'], df['label']))[0]
df_tr = df.iloc[tr_idx].reset_index(drop=True)
df_va = df.iloc[va_idx].reset_index(drop=True)
train_loader = make_loader(df_tr, train_dir, get_train_transforms(CFG.img_size), batch_size=CFG.batch_size, shuffle=True, num_workers=CFG.num_workers)
valid_loader = make_loader(df_va, train_dir, get_valid_transforms(CFG.img_size), batch_size=CFG.batch_size, shuffle=False, num_workers=CFG.num_workers)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = build_model(num_classes=5)
optimizer = torch.optim.AdamW(model.parameters(), lr=CFG.lr, weight_decay=CFG.weight_decay)
scaler = torch.cuda.amp.GradScaler(enabled=torch.cuda.is_available())
mixup_fn = get_mixup_fn()
ema = ModelEmaV2(model, decay=CFG.ema_decay) if CFG.use_ema else None

best_acc, best_state = -1.0, None
print('Epoch 1/1')
train_loss = train_one_epoch(model, train_loader, optimizer, scaler, mixup_fn)
if ema is not None:
    ema.update(model)
val_loss, val_acc = validate(ema.module if ema is not None else model, valid_loader)
print(f'  train_loss {train_loss:.4f} val_loss {val_loss:.4f} val_acc {val_acc:.4f}')
best_acc = val_acc
best_state = (ema.module if ema is not None else model).state_dict()

# Load best and run inference on test
if ema is not None:
    ema.module.load_state_dict(best_state)
    best_model = ema.module
else:
    model.load_state_dict(best_state)
    best_model = model

df_test = pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})
logits = infer_test(best_model, df_test, test_dir, size=CFG.img_size, tta_hflip=True, tta_scales=None, batch_size=CFG.batch_size)
preds = logits.argmax(1).astype(int)
sub = pd.DataFrame({'image_id': df_test['image_id'], 'label': preds})
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv with shape:', sub.shape)
check_submission_format('submission.csv')

In [4]:
# Full training: phash groups + 5-fold convnext_base@448 with EMA + hflip+scale TTA; generate submission.csv
import time, sys, subprocess, os, torch
from sklearn.model_selection import StratifiedGroupKFold
from pathlib import Path

# Mitigate CUDA memory fragmentation
os.environ.setdefault('PYTORCH_CUDA_ALLOC_CONF', 'expandable_segments:True,max_split_size_mb:128')

# Ensure imagehash installed
try:
    import imagehash
except Exception as e:
    print('Installing imagehash...'); sys.stdout.flush()
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'ImageHash'], check=True)
    import imagehash

from PIL import ImageOps

def compute_phash_hex(path, hash_size=16):
    img = ImageOps.exif_transpose(Image.open(path).convert('RGB'))
    return str(imagehash.phash(img, hash_size=hash_size))

t0 = time.time()
# Fail-fast group merge: never silently recompute unless file missing
groups_path = Path('train_with_groups.csv')
if groups_path.exists():
    print('Loading groups from train_with_groups.csv'); sys.stdout.flush()
    gdf = pd.read_csv(groups_path)
    assert {'image_id','label','group'}.issubset(gdf.columns), 'Bad groups CSV'
    # Drop any pre-existing group-related cols to avoid suffix conflicts from prior runs
    dup_cols = [c for c in df.columns if c.startswith('group')]
    if dup_cols:
        print('Dropping pre-existing columns:', dup_cols); sys.stdout.flush()
        df = df.drop(columns=dup_cols)
    df = df.merge(gdf[['image_id','label','group']], on=['image_id','label'], how='left', validate='one_to_one')
    assert df['group'].notna().all(), 'Missing group after merge'
else:
    print('No saved groups; computing...'); sys.stdout.flush()
    df['phash'] = [compute_phash_hex(Path(train_dir)/iid) for iid in df['image_id']]
    df['group'] = df['phash'].str[:10]
    df[['image_id','label','group']].to_csv('train_with_groups.csv', index=False)

print('Unique groups:', df['group'].nunique(), 'elapsed:', f'{time.time()-t0:.1f}s')

# Configure training per expert advice
CFG.seed = 42
CFG.model_name = 'convnext_base'
CFG.img_size = 448
CFG.batch_size = 32  # A10-24GB fits 32 @448 with AMP
CFG.epochs = 10
CFG.lr = 2e-4
CFG.weight_decay = 1e-4
CFG.num_workers = 8
CFG.use_mixup = True
CFG.mixup_prob = 0.5
CFG.ls = 0.05
CFG.use_ema = True
CFG.ema_decay = 0.999
CFG.tta_scales = [0.95, 1.05]

sgkf = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=CFG.seed)
folds = list(sgkf.split(df['image_id'], df['label'], groups=df['group']))
print('Prepared StratifiedGroupKFold with 5 folds')

df_test = pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})
test_logits_folds = []
oof_logits = np.zeros((len(df), 5), dtype=np.float32)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
for fold, (tr_idx, va_idx) in enumerate(folds):
    fold_start = time.time()
    print(f'===== Fold {fold} start: train {len(tr_idx)} valid {len(va_idx)} ====='); sys.stdout.flush()
    df_tr = df.iloc[tr_idx].reset_index(drop=True)
    df_va = df.iloc[va_idx].reset_index(drop=True)

    # Free any stray CUDA allocations before building the model
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.reset_peak_memory_stats()

    train_loader = make_loader(df_tr, train_dir, get_train_transforms(CFG.img_size), batch_size=CFG.batch_size, shuffle=True, num_workers=CFG.num_workers)
    valid_loader = make_loader(df_va, train_dir, get_valid_transforms(CFG.img_size), batch_size=CFG.batch_size, shuffle=False, num_workers=CFG.num_workers)

    model = build_model(num_classes=5)
    optimizer = torch.optim.AdamW(model.parameters(), lr=CFG.lr, weight_decay=CFG.weight_decay)
    steps_per_epoch = max(1, len(train_loader))
    scheduler = get_scheduler(optimizer, steps_per_epoch)
    scaler = torch.amp.GradScaler('cuda', enabled=torch.cuda.is_available())
    mixup_fn = get_mixup_fn()
    ema = ModelEmaV2(model, decay=CFG.ema_decay, device='cpu') if CFG.use_ema else None

    best_acc, best_state = -1.0, None
    for epoch in range(CFG.epochs):
        ep_start = time.time()
        print(f'Fold {fold} Epoch {epoch+1}/{CFG.epochs}'); sys.stdout.flush()
        train_loss = train_one_epoch(model, train_loader, optimizer, scaler, mixup_fn, ema=ema, scheduler=scheduler)
        val_loss, val_acc = validate(ema.module if ema is not None else model, valid_loader)
        print(f'  train_loss {train_loss:.4f} val_loss {val_loss:.4f} val_acc {val_acc:.4f} epoch_time {time.time()-ep_start:.1f}s'); sys.stdout.flush()
        if val_acc > best_acc:
            best_acc = val_acc
            best_state = (ema.module if ema is not None else model).state_dict()

    # Load best and compute OOF logits
    with torch.no_grad():
        target_model = ema.module if ema is not None else model
        target_model.load_state_dict(best_state)
        target_model.eval()
        dev = next(target_model.parameters()).device
        logits_all = []
        for x, y in valid_loader:
            x = x.to(dev, non_blocking=True)
            if torch.cuda.is_available():
                x = x.to(memory_format=torch.channels_last)
            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                logits = target_model(x)
            logits_all.append(logits.float().cpu().numpy())
        logits_all = np.concatenate(logits_all, axis=0)
        oof_logits[va_idx] = logits_all
    print(f'Fold {fold} best_acc {best_acc:.4f} fold_time {time.time()-fold_start:.1f}s'); sys.stdout.flush()

    # Test inference for this fold
    fold_logits = infer_test(target_model, df_test, test_dir, size=CFG.img_size, tta_hflip=True, tta_scales=CFG.tta_scales, batch_size=CFG.batch_size)
    test_logits_folds.append(fold_logits)

# Average test logits across folds and save submission
test_logits_mean = np.mean(test_logits_folds, axis=0)
test_preds = test_logits_mean.argmax(1).astype(int)
submission = pd.DataFrame({'image_id': df_test['image_id'], 'label': test_preds})
submission.to_csv('submission.csv', index=False)
print('Saved submission.csv with shape:', submission.shape)
check_submission_format('submission.csv')

# Save OOF logits for future ensembling if needed
np.save('oof_logits_convnext_base_448.npy', oof_logits)
print('Saved oof logits to oof_logits_convnext_base_448.npy')

Loading groups from train_with_groups.csv


Unique groups: 18721 elapsed: 0.0s


Prepared StratifiedGroupKFold with 5 folds
===== Fold 0 start: train 14976 valid 3745 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),




  iter 0/468 loss 1.5367 elapsed 1.5s


  iter 50/468 loss 1.0321 elapsed 28.6s


  iter 100/468 loss 0.6981 elapsed 28.6s


  iter 150/468 loss 0.4477 elapsed 28.7s


  iter 200/468 loss 0.5799 elapsed 28.9s


  iter 250/468 loss 1.0340 elapsed 29.0s


  iter 300/468 loss 0.6085 elapsed 29.0s


  iter 350/468 loss 0.5979 elapsed 29.1s


  iter 400/468 loss 0.6233 elapsed 29.1s


  iter 450/468 loss 0.5710 elapsed 28.8s


  train_loss 0.8223 val_loss 1.0516 val_acc 0.6646 epoch_time 790.8s


Fold 0 Epoch 2/10


  iter 0/468 loss 0.3995 elapsed 1.3s


  iter 50/468 loss 0.7936 elapsed 28.9s


  iter 100/468 loss 1.1590 elapsed 29.3s


  iter 150/468 loss 0.5768 elapsed 29.1s


  iter 200/468 loss 0.6780 elapsed 29.1s


  iter 250/468 loss 0.4036 elapsed 29.1s


  iter 300/468 loss 0.4357 elapsed 29.2s


  iter 350/468 loss 0.3832 elapsed 29.2s


  iter 400/468 loss 0.4983 elapsed 29.2s


  iter 450/468 loss 0.5889 elapsed 29.2s


  train_loss 0.6760 val_loss 0.5386 val_acc 0.8395 epoch_time 780.7s


Fold 0 Epoch 3/10


  iter 0/468 loss 0.4829 elapsed 0.9s


  iter 50/468 loss 0.7056 elapsed 28.8s


  iter 100/468 loss 0.7329 elapsed 29.0s


  iter 150/468 loss 0.6492 elapsed 29.1s


  iter 200/468 loss 0.7102 elapsed 29.1s


  iter 250/468 loss 0.5210 elapsed 29.1s


  iter 300/468 loss 0.7812 elapsed 29.2s


  iter 350/468 loss 0.5211 elapsed 29.2s


  iter 400/468 loss 0.5725 elapsed 29.3s


  iter 450/468 loss 0.8068 elapsed 29.5s


  train_loss 0.6315 val_loss 0.3993 val_acc 0.8820 epoch_time 793.5s


Fold 0 Epoch 4/10


  iter 0/468 loss 0.5525 elapsed 1.0s


  iter 50/468 loss 0.3907 elapsed 28.8s


  iter 100/468 loss 0.3446 elapsed 29.0s


  iter 150/468 loss 0.8252 elapsed 29.0s


  iter 200/468 loss 0.5374 elapsed 29.1s


  iter 250/468 loss 0.4397 elapsed 29.2s


  iter 300/468 loss 0.4322 elapsed 29.0s


  iter 350/468 loss 0.8737 elapsed 29.1s


  iter 400/468 loss 1.0401 elapsed 29.3s


  iter 450/468 loss 0.3867 elapsed 29.4s


  train_loss 0.6167 val_loss 0.3582 val_acc 0.8948 epoch_time 787.3s


Fold 0 Epoch 5/10


  iter 0/468 loss 0.3211 elapsed 1.0s


  iter 50/468 loss 0.6218 elapsed 28.9s


  iter 100/468 loss 0.3378 elapsed 29.1s


  iter 150/468 loss 0.8480 elapsed 29.1s


  iter 200/468 loss 0.4309 elapsed 29.4s


  iter 250/468 loss 0.9204 elapsed 29.2s


  iter 300/468 loss 0.7789 elapsed 29.1s


  iter 350/468 loss 0.9147 elapsed 29.2s


  iter 400/468 loss 0.3828 elapsed 29.2s


  iter 450/468 loss 0.6873 elapsed 29.3s


  train_loss 0.5724 val_loss 0.3413 val_acc 0.9001 epoch_time 778.0s


Fold 0 Epoch 6/10


  iter 0/468 loss 0.3778 elapsed 1.0s


  iter 50/468 loss 0.3090 elapsed 28.9s


  iter 100/468 loss 0.7235 elapsed 29.0s


  iter 150/468 loss 0.9256 elapsed 29.1s


  iter 200/468 loss 0.3960 elapsed 29.1s


  iter 250/468 loss 0.2870 elapsed 29.0s


  iter 300/468 loss 0.3992 elapsed 29.1s


  iter 350/468 loss 0.6664 elapsed 29.1s


  iter 400/468 loss 0.6471 elapsed 29.1s


  iter 450/468 loss 0.6862 elapsed 29.1s


  train_loss 0.5339 val_loss 0.3332 val_acc 0.9004 epoch_time 774.6s


Fold 0 Epoch 7/10


  iter 0/468 loss 0.2391 elapsed 1.0s


  iter 50/468 loss 0.5413 elapsed 29.0s


  iter 100/468 loss 0.3509 elapsed 29.0s


  iter 150/468 loss 0.7935 elapsed 29.0s


  iter 200/468 loss 0.3331 elapsed 28.9s


  iter 250/468 loss 0.7754 elapsed 29.0s


  iter 300/468 loss 0.6885 elapsed 29.1s


  iter 350/468 loss 0.8981 elapsed 29.1s


  iter 400/468 loss 0.5828 elapsed 29.2s


  iter 450/468 loss 0.7726 elapsed 29.2s


  train_loss 0.4953 val_loss 0.3338 val_acc 0.8991 epoch_time 800.3s


Fold 0 Epoch 8/10


  iter 0/468 loss 1.0462 elapsed 0.9s


  iter 50/468 loss 0.5698 elapsed 28.9s


  iter 100/468 loss 0.2485 elapsed 29.2s


  iter 150/468 loss 0.2389 elapsed 29.1s


  iter 200/468 loss 0.2684 elapsed 29.2s


  iter 250/468 loss 0.5888 elapsed 29.3s


  iter 300/468 loss 0.2368 elapsed 29.2s


  iter 350/468 loss 0.2366 elapsed 29.6s


  iter 400/468 loss 0.3288 elapsed 29.3s


  iter 450/468 loss 0.3716 elapsed 29.4s


  train_loss 0.4997 val_loss 0.3396 val_acc 0.8999 epoch_time 794.0s


Fold 0 Epoch 9/10


  iter 0/468 loss 0.8933 elapsed 1.0s


  iter 50/468 loss 0.2288 elapsed 29.0s


  iter 100/468 loss 0.8007 elapsed 29.2s


  iter 150/468 loss 0.6244 elapsed 29.1s


  iter 200/468 loss 0.5770 elapsed 29.2s


  iter 250/468 loss 0.2320 elapsed 29.4s


  iter 300/468 loss 0.2384 elapsed 29.2s


  iter 350/468 loss 0.2510 elapsed 29.3s


  iter 400/468 loss 0.2320 elapsed 29.3s


  iter 450/468 loss 0.5324 elapsed 29.4s


  train_loss 0.4633 val_loss 0.3458 val_acc 0.8977 epoch_time 788.9s


Fold 0 Epoch 10/10


  iter 0/468 loss 0.2292 elapsed 1.0s


  iter 50/468 loss 0.2392 elapsed 29.2s


  iter 100/468 loss 0.6687 elapsed 29.1s


  iter 150/468 loss 0.6282 elapsed 29.4s


  iter 200/468 loss 0.5367 elapsed 29.2s


  iter 250/468 loss 0.2667 elapsed 29.1s


  iter 300/468 loss 0.5311 elapsed 29.2s


  iter 350/468 loss 0.7962 elapsed 29.2s


  iter 400/468 loss 0.7056 elapsed 29.3s


  iter 450/468 loss 0.2313 elapsed 29.3s


  train_loss 0.4359 val_loss 0.3524 val_acc 0.8961 epoch_time 794.6s


Fold 0 best_acc 0.9004 fold_time 8404.1s


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


===== Fold 1 start: train 14977 valid 3744 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),


Fold 1 Epoch 1/10




  iter 0/468 loss 1.8690 elapsed 1.6s


  iter 50/468 loss 1.0877 elapsed 28.9s


  iter 100/468 loss 0.7024 elapsed 29.1s


  iter 150/468 loss 0.9673 elapsed 29.0s


  iter 200/468 loss 0.6021 elapsed 29.2s


  iter 250/468 loss 0.5486 elapsed 29.5s


  iter 300/468 loss 0.6844 elapsed 29.1s


  iter 350/468 loss 0.5025 elapsed 29.0s


  iter 400/468 loss 0.6060 elapsed 29.0s


  iter 450/468 loss 0.4347 elapsed 29.1s


  train_loss 0.8104 val_loss 1.0542 val_acc 0.6934 epoch_time 771.4s


Fold 1 Epoch 2/10


  iter 0/468 loss 0.5338 elapsed 1.4s


  iter 50/468 loss 1.1539 elapsed 28.9s


  iter 100/468 loss 0.5830 elapsed 29.2s


  iter 150/468 loss 0.5000 elapsed 29.6s


  iter 200/468 loss 0.4599 elapsed 29.8s


  iter 250/468 loss 0.7310 elapsed 29.5s


  iter 300/468 loss 0.5351 elapsed 29.2s


  iter 350/468 loss 0.5363 elapsed 29.8s


  iter 400/468 loss 0.9072 elapsed 29.9s


  iter 450/468 loss 0.6734 elapsed 29.8s


  train_loss 0.6786 val_loss 0.5205 val_acc 0.8427 epoch_time 800.2s


Fold 1 Epoch 3/10


  iter 0/468 loss 0.5714 elapsed 1.0s


  iter 50/468 loss 0.9735 elapsed 28.9s


  iter 100/468 loss 0.8064 elapsed 29.0s


  iter 150/468 loss 0.5430 elapsed 29.3s


  iter 200/468 loss 0.4665 elapsed 29.1s


  iter 250/468 loss 0.4620 elapsed 29.1s


  iter 300/468 loss 0.4789 elapsed 29.2s


  iter 350/468 loss 0.9497 elapsed 29.3s


  iter 400/468 loss 0.7293 elapsed 29.4s


  iter 450/468 loss 0.3766 elapsed 29.4s


  train_loss 0.6124 val_loss 0.3973 val_acc 0.8758 epoch_time 795.6s


Fold 1 Epoch 4/10


  iter 0/468 loss 0.5191 elapsed 0.9s


  iter 50/468 loss 0.8462 elapsed 28.8s


  iter 100/468 loss 0.6794 elapsed 29.0s


  iter 150/468 loss 0.4063 elapsed 29.0s


  iter 200/468 loss 0.9621 elapsed 29.0s


  iter 250/468 loss 0.6409 elapsed 29.5s


  iter 300/468 loss 0.5321 elapsed 29.2s


  iter 350/468 loss 0.4166 elapsed 29.2s


  iter 400/468 loss 0.5044 elapsed 29.2s


  iter 450/468 loss 0.8901 elapsed 29.3s


  train_loss 0.6025 val_loss 0.3612 val_acc 0.8894 epoch_time 796.5s


Fold 1 Epoch 5/10


  iter 0/468 loss 0.7586 elapsed 1.0s


  iter 50/468 loss 0.8768 elapsed 29.0s


  iter 100/468 loss 0.3041 elapsed 29.1s


  iter 150/468 loss 0.3984 elapsed 29.0s


  iter 200/468 loss 0.6940 elapsed 29.2s


  iter 250/468 loss 0.4592 elapsed 29.2s


  iter 300/468 loss 0.6052 elapsed 29.1s


  iter 350/468 loss 0.7120 elapsed 29.2s


  iter 400/468 loss 0.5343 elapsed 29.2s


  iter 450/468 loss 0.8020 elapsed 29.2s


  train_loss 0.5811 val_loss 0.3454 val_acc 0.8897 epoch_time 784.8s


Fold 1 Epoch 6/10


  iter 0/468 loss 1.0021 elapsed 0.9s


  iter 50/468 loss 0.9311 elapsed 28.9s


  iter 100/468 loss 0.5641 elapsed 29.0s


  iter 150/468 loss 0.4014 elapsed 29.1s


  iter 200/468 loss 0.4355 elapsed 29.2s


  iter 250/468 loss 0.7659 elapsed 29.1s


  iter 300/468 loss 0.8365 elapsed 29.1s


  iter 350/468 loss 0.3087 elapsed 29.2s


  iter 400/468 loss 0.2766 elapsed 29.2s


  iter 450/468 loss 0.2382 elapsed 29.4s


  train_loss 0.5177 val_loss 0.3409 val_acc 0.8929 epoch_time 785.3s


Fold 1 Epoch 7/10


  iter 0/468 loss 0.5333 elapsed 0.9s


  iter 50/468 loss 0.6261 elapsed 28.9s


  iter 100/468 loss 0.3469 elapsed 29.0s


  iter 150/468 loss 0.7499 elapsed 29.1s


  iter 200/468 loss 0.2399 elapsed 29.0s


  iter 250/468 loss 0.3155 elapsed 29.1s


  iter 300/468 loss 0.2413 elapsed 29.2s


  iter 350/468 loss 0.2747 elapsed 29.2s


  iter 400/468 loss 0.6209 elapsed 29.2s


  iter 450/468 loss 0.8320 elapsed 29.2s


  train_loss 0.4795 val_loss 0.3407 val_acc 0.8945 epoch_time 791.0s


Fold 1 Epoch 8/10


  iter 0/468 loss 0.4561 elapsed 1.0s


  iter 50/468 loss 0.8006 elapsed 28.9s


  iter 100/468 loss 0.8919 elapsed 29.1s


  iter 150/468 loss 0.4893 elapsed 29.1s


  iter 200/468 loss 0.2340 elapsed 29.4s


  iter 250/468 loss 0.2560 elapsed 29.3s


  iter 300/468 loss 0.2767 elapsed 29.3s


  iter 350/468 loss 0.3019 elapsed 29.2s


  iter 400/468 loss 0.2329 elapsed 29.7s


  iter 450/468 loss 1.1203 elapsed 29.4s


  train_loss 0.4522 val_loss 0.3473 val_acc 0.8948 epoch_time 804.9s


Fold 1 Epoch 9/10


  iter 0/468 loss 0.2362 elapsed 0.9s


  iter 50/468 loss 0.2375 elapsed 28.9s


  iter 100/468 loss 0.2299 elapsed 29.1s


  iter 150/468 loss 0.5720 elapsed 29.1s


  iter 200/468 loss 0.4401 elapsed 29.2s


  iter 250/468 loss 0.6536 elapsed 29.2s


  iter 300/468 loss 0.7299 elapsed 29.2s


  iter 350/468 loss 0.3075 elapsed 29.3s


  iter 400/468 loss 0.8167 elapsed 29.4s


  iter 450/468 loss 0.2513 elapsed 29.5s


  train_loss 0.4300 val_loss 0.3552 val_acc 0.8918 epoch_time 788.8s


Fold 1 Epoch 10/10


  iter 0/468 loss 0.3645 elapsed 0.8s


  iter 50/468 loss 0.7260 elapsed 28.9s


  iter 100/468 loss 0.2395 elapsed 29.0s


  iter 150/468 loss 0.4508 elapsed 29.1s


  iter 200/468 loss 0.7033 elapsed 29.2s


  iter 250/468 loss 0.2285 elapsed 29.1s


  iter 300/468 loss 0.2280 elapsed 29.1s


  iter 350/468 loss 0.2340 elapsed 29.2s


  iter 400/468 loss 0.2278 elapsed 29.2s


  iter 450/468 loss 0.2290 elapsed 29.3s


  train_loss 0.4505 val_loss 0.3629 val_acc 0.8905 epoch_time 790.8s


Fold 1 best_acc 0.8948 fold_time 8420.0s


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


===== Fold 2 start: train 14977 valid 3744 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),


Fold 2 Epoch 1/10




  iter 0/468 loss 1.9680 elapsed 1.6s


  iter 50/468 loss 1.0804 elapsed 27.5s


  iter 100/468 loss 0.4900 elapsed 27.6s


  iter 150/468 loss 1.1049 elapsed 27.5s


  iter 200/468 loss 1.2697 elapsed 27.9s


  iter 250/468 loss 0.5418 elapsed 27.5s


  iter 300/468 loss 1.0029 elapsed 27.6s


  iter 350/468 loss 0.6484 elapsed 27.7s


  iter 400/468 loss 0.8050 elapsed 27.8s


  iter 450/468 loss 0.6144 elapsed 27.9s


  train_loss 0.8326 val_loss 1.0777 val_acc 0.6792 epoch_time 779.3s


Fold 2 Epoch 2/10


  iter 0/468 loss 1.0272 elapsed 1.4s


  iter 50/468 loss 1.1429 elapsed 27.5s


  iter 100/468 loss 1.2293 elapsed 27.7s


  iter 150/468 loss 0.4143 elapsed 27.6s


  iter 200/468 loss 1.0211 elapsed 27.7s


  iter 250/468 loss 0.9700 elapsed 27.7s


  iter 300/468 loss 0.5844 elapsed 27.8s


  iter 350/468 loss 0.5873 elapsed 27.9s


  iter 400/468 loss 0.6468 elapsed 28.0s


  iter 450/468 loss 0.5218 elapsed 27.9s


  train_loss 0.6721 val_loss 0.5490 val_acc 0.8435 epoch_time 769.6s


Fold 2 Epoch 3/10


  iter 0/468 loss 0.8089 elapsed 0.8s


  iter 50/468 loss 0.9406 elapsed 27.9s


  iter 100/468 loss 0.3306 elapsed 27.7s


  iter 150/468 loss 0.9132 elapsed 27.7s


  iter 200/468 loss 0.9107 elapsed 27.8s


  iter 250/468 loss 0.4216 elapsed 27.7s


  iter 300/468 loss 0.3999 elapsed 27.7s


  iter 350/468 loss 0.7165 elapsed 27.8s


  iter 400/468 loss 0.5454 elapsed 27.8s


  iter 450/468 loss 0.4841 elapsed 27.9s


  train_loss 0.6196 val_loss 0.4199 val_acc 0.8758 epoch_time 765.5s


Fold 2 Epoch 4/10


  iter 0/468 loss 0.4651 elapsed 0.9s


  iter 50/468 loss 1.0650 elapsed 27.5s


  iter 100/468 loss 0.6957 elapsed 27.7s


  iter 150/468 loss 0.4225 elapsed 27.9s


  iter 200/468 loss 0.5795 elapsed 28.0s


  iter 250/468 loss 0.5551 elapsed 27.8s


  iter 300/468 loss 0.5931 elapsed 28.2s


  iter 350/468 loss 0.2929 elapsed 28.0s


  iter 400/468 loss 0.4016 elapsed 27.9s


  iter 450/468 loss 0.4263 elapsed 28.0s


  train_loss 0.5995 val_loss 0.3758 val_acc 0.8846 epoch_time 766.4s


Fold 2 Epoch 5/10


  iter 0/468 loss 0.3171 elapsed 0.9s


  iter 50/468 loss 1.0600 elapsed 27.4s


  iter 100/468 loss 0.2581 elapsed 27.7s


  iter 150/468 loss 0.6271 elapsed 27.7s


  iter 200/468 loss 0.8229 elapsed 27.9s


  iter 250/468 loss 0.3596 elapsed 27.9s


  iter 300/468 loss 0.4538 elapsed 28.0s


  iter 350/468 loss 0.2504 elapsed 28.0s


  iter 400/468 loss 0.4425 elapsed 28.1s


  iter 450/468 loss 0.4747 elapsed 28.1s


  train_loss 0.5542 val_loss 0.3586 val_acc 0.8876 epoch_time 783.5s


Fold 2 Epoch 6/10


  iter 0/468 loss 0.2813 elapsed 0.9s


  iter 50/468 loss 0.2884 elapsed 27.6s


  iter 100/468 loss 0.7850 elapsed 27.9s


  iter 150/468 loss 0.2791 elapsed 27.7s


  iter 200/468 loss 0.6453 elapsed 27.9s


  iter 250/468 loss 0.2821 elapsed 28.1s


  iter 300/468 loss 0.3649 elapsed 28.2s


  iter 350/468 loss 0.2878 elapsed 28.1s


  iter 400/468 loss 0.4313 elapsed 28.0s


  iter 450/468 loss 0.2545 elapsed 28.0s


  train_loss 0.5156 val_loss 0.3533 val_acc 0.8897 epoch_time 789.6s


Fold 2 Epoch 7/10


  iter 0/468 loss 0.7105 elapsed 1.0s


  iter 50/468 loss 0.8528 elapsed 27.6s


  iter 100/468 loss 0.3253 elapsed 27.6s


  iter 150/468 loss 0.2498 elapsed 27.7s


  iter 200/468 loss 0.2988 elapsed 27.7s


  iter 250/468 loss 0.4587 elapsed 27.7s


  iter 300/468 loss 0.2562 elapsed 27.9s


  iter 350/468 loss 0.8111 elapsed 27.8s


  iter 400/468 loss 0.3190 elapsed 27.9s


  iter 450/468 loss 0.6338 elapsed 28.1s


  train_loss 0.4907 val_loss 0.3564 val_acc 0.8905 epoch_time 786.7s


Fold 2 Epoch 8/10


  iter 0/468 loss 0.2344 elapsed 1.0s


  iter 50/468 loss 0.7303 elapsed 27.6s


  iter 100/468 loss 0.2345 elapsed 27.8s


  iter 150/468 loss 0.6817 elapsed 27.8s


  iter 200/468 loss 0.2446 elapsed 27.8s


  iter 250/468 loss 0.9410 elapsed 28.0s


  iter 300/468 loss 0.2376 elapsed 28.1s


  iter 350/468 loss 0.7673 elapsed 28.0s


  iter 400/468 loss 0.7205 elapsed 28.1s


  iter 450/468 loss 0.2711 elapsed 28.2s


  train_loss 0.4683 val_loss 0.3647 val_acc 0.8889 epoch_time 783.5s


Fold 2 Epoch 9/10


  iter 0/468 loss 0.3392 elapsed 1.0s


  iter 50/468 loss 0.2376 elapsed 27.5s


  iter 100/468 loss 0.7284 elapsed 27.7s


  iter 150/468 loss 0.5258 elapsed 27.8s


  iter 200/468 loss 0.2301 elapsed 28.0s


  iter 250/468 loss 0.5198 elapsed 28.1s


  iter 300/468 loss 0.5629 elapsed 27.9s


  iter 350/468 loss 0.2439 elapsed 27.9s


  iter 400/468 loss 0.2682 elapsed 27.9s


  iter 450/468 loss 0.2863 elapsed 28.1s


  train_loss 0.4401 val_loss 0.3738 val_acc 0.8873 epoch_time 794.2s


Fold 2 Epoch 10/10


  iter 0/468 loss 0.3613 elapsed 0.8s


  iter 50/468 loss 0.2311 elapsed 27.7s


  iter 100/468 loss 0.5492 elapsed 27.9s


  iter 150/468 loss 0.3375 elapsed 27.9s


  iter 200/468 loss 0.6139 elapsed 27.9s


  iter 250/468 loss 0.3535 elapsed 27.9s


  iter 300/468 loss 0.2746 elapsed 27.9s


  iter 350/468 loss 0.7781 elapsed 28.0s


  iter 400/468 loss 0.2413 elapsed 28.2s


  iter 450/468 loss 0.2318 elapsed 28.3s


  train_loss 0.4376 val_loss 0.3814 val_acc 0.8857 epoch_time 750.3s


Fold 2 best_acc 0.8905 fold_time 8254.5s


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


===== Fold 3 start: train 14977 valid 3744 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),


Fold 3 Epoch 1/10




  iter 0/468 loss 1.5427 elapsed 1.8s


  iter 50/468 loss 1.0406 elapsed 28.0s


  iter 100/468 loss 0.7954 elapsed 27.7s


  iter 150/468 loss 0.8775 elapsed 27.7s


  iter 200/468 loss 1.1098 elapsed 27.7s


  iter 250/468 loss 1.2200 elapsed 27.7s


  iter 300/468 loss 0.5984 elapsed 27.8s


  iter 350/468 loss 0.3502 elapsed 28.0s


  iter 400/468 loss 0.4499 elapsed 28.2s


  iter 450/468 loss 0.3560 elapsed 28.2s


  train_loss 0.8276 val_loss 0.9069 val_acc 0.7623 epoch_time 758.2s


Fold 3 Epoch 2/10


  iter 0/468 loss 0.8343 elapsed 1.3s


  iter 50/468 loss 0.4772 elapsed 27.6s


  iter 100/468 loss 0.4181 elapsed 27.8s


  iter 150/468 loss 0.7097 elapsed 27.8s


  iter 200/468 loss 0.5040 elapsed 27.9s


  iter 250/468 loss 0.7386 elapsed 27.9s


  iter 300/468 loss 0.4845 elapsed 28.1s


  iter 350/468 loss 1.3986 elapsed 27.9s


  iter 400/468 loss 0.6985 elapsed 28.0s


  iter 450/468 loss 0.7129 elapsed 28.0s


  train_loss 0.6689 val_loss 0.4898 val_acc 0.8571 epoch_time 768.6s


Fold 3 Epoch 3/10


  iter 0/468 loss 1.0222 elapsed 1.0s


  iter 50/468 loss 0.6280 elapsed 27.5s


  iter 100/468 loss 0.5468 elapsed 27.7s


  iter 150/468 loss 0.9323 elapsed 27.7s


  iter 200/468 loss 0.5212 elapsed 27.8s


  iter 250/468 loss 0.8982 elapsed 27.8s


  iter 300/468 loss 1.0038 elapsed 27.9s


  iter 350/468 loss 0.5088 elapsed 27.9s


  iter 400/468 loss 0.4009 elapsed 28.0s


  iter 450/468 loss 0.7219 elapsed 28.0s


  train_loss 0.6118 val_loss 0.3799 val_acc 0.8932 epoch_time 807.8s


Fold 3 Epoch 4/10


  iter 0/468 loss 0.3166 elapsed 1.0s


  iter 50/468 loss 0.8422 elapsed 27.5s


  iter 100/468 loss 0.4132 elapsed 27.7s


  iter 150/468 loss 0.7615 elapsed 27.9s


  iter 200/468 loss 0.6394 elapsed 27.9s


  iter 250/468 loss 0.4476 elapsed 27.8s


  iter 300/468 loss 0.3084 elapsed 28.0s


  iter 350/468 loss 0.6361 elapsed 28.0s


  iter 400/468 loss 0.9828 elapsed 28.0s


  iter 450/468 loss 0.3775 elapsed 28.1s


  train_loss 0.5909 val_loss 0.3426 val_acc 0.9004 epoch_time 778.1s


Fold 3 Epoch 5/10


  iter 0/468 loss 0.2947 elapsed 1.1s


  iter 50/468 loss 0.5377 elapsed 27.5s


  iter 100/468 loss 0.8759 elapsed 27.8s


  iter 150/468 loss 0.3786 elapsed 27.8s


  iter 200/468 loss 0.3315 elapsed 27.8s


  iter 250/468 loss 0.7311 elapsed 27.8s


  iter 300/468 loss 0.9279 elapsed 28.0s


  iter 350/468 loss 0.2799 elapsed 28.1s


  iter 400/468 loss 0.4878 elapsed 28.2s


  iter 450/468 loss 0.4946 elapsed 28.2s


  train_loss 0.5868 val_loss 0.3287 val_acc 0.9020 epoch_time 776.9s


Fold 3 Epoch 6/10


  iter 0/468 loss 0.2580 elapsed 1.1s


  iter 50/468 loss 0.9546 elapsed 27.9s


  iter 100/468 loss 0.6914 elapsed 27.9s


  iter 150/468 loss 0.7061 elapsed 27.8s


  iter 200/468 loss 0.9120 elapsed 27.9s


  iter 250/468 loss 0.5080 elapsed 28.0s


  iter 300/468 loss 0.7520 elapsed 27.9s


  iter 350/468 loss 0.2432 elapsed 28.0s


  iter 400/468 loss 0.8567 elapsed 28.0s


  iter 450/468 loss 0.3522 elapsed 28.0s


  train_loss 0.5241 val_loss 0.3234 val_acc 0.9028 epoch_time 776.2s


Fold 3 Epoch 7/10


  iter 0/468 loss 1.3202 elapsed 1.0s


  iter 50/468 loss 0.9567 elapsed 27.6s


  iter 100/468 loss 0.3275 elapsed 27.8s


  iter 150/468 loss 1.1502 elapsed 27.7s


  iter 200/468 loss 0.7737 elapsed 27.9s


  iter 250/468 loss 0.2459 elapsed 27.8s


  iter 300/468 loss 0.4855 elapsed 28.1s


  iter 350/468 loss 0.4667 elapsed 28.0s


  iter 400/468 loss 0.2558 elapsed 28.0s


  iter 450/468 loss 0.2302 elapsed 28.1s


  train_loss 0.4880 val_loss 0.3274 val_acc 0.9022 epoch_time 782.8s


Fold 3 Epoch 8/10


  iter 0/468 loss 0.2362 elapsed 0.9s


  iter 50/468 loss 0.4751 elapsed 27.6s


  iter 100/468 loss 1.0241 elapsed 27.8s


  iter 150/468 loss 0.2678 elapsed 27.8s


  iter 200/468 loss 0.3336 elapsed 27.8s


  iter 250/468 loss 0.6269 elapsed 27.9s


  iter 300/468 loss 0.2439 elapsed 28.0s


  iter 350/468 loss 0.3067 elapsed 28.0s


  iter 400/468 loss 0.4784 elapsed 28.0s


  iter 450/468 loss 0.2422 elapsed 28.0s


  train_loss 0.4789 val_loss 0.3373 val_acc 0.9001 epoch_time 777.0s


Fold 3 Epoch 9/10


  iter 0/468 loss 0.5318 elapsed 1.0s


  iter 50/468 loss 0.3808 elapsed 27.5s


  iter 100/468 loss 0.2347 elapsed 27.7s


  iter 150/468 loss 0.4361 elapsed 27.9s


  iter 200/468 loss 0.3058 elapsed 27.7s


  iter 250/468 loss 0.2481 elapsed 27.7s


  iter 300/468 loss 0.3449 elapsed 27.9s


  iter 350/468 loss 0.8525 elapsed 28.0s


  iter 400/468 loss 0.2489 elapsed 28.1s


  iter 450/468 loss 0.2277 elapsed 28.1s


  train_loss 0.4456 val_loss 0.3481 val_acc 0.8972 epoch_time 775.2s


Fold 3 Epoch 10/10


  iter 0/468 loss 0.2390 elapsed 0.9s


  iter 50/468 loss 0.6033 elapsed 27.7s


  iter 100/468 loss 0.2317 elapsed 28.0s


  iter 150/468 loss 0.2764 elapsed 27.9s


  iter 200/468 loss 0.3120 elapsed 28.0s


  iter 250/468 loss 0.9385 elapsed 28.0s


  iter 300/468 loss 0.2370 elapsed 27.9s


  iter 350/468 loss 0.5527 elapsed 27.9s


  iter 400/468 loss 0.6858 elapsed 28.3s


  iter 450/468 loss 0.7512 elapsed 28.1s


  train_loss 0.4543 val_loss 0.3568 val_acc 0.8950 epoch_time 779.3s


Fold 3 best_acc 0.9028 fold_time 8289.9s


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


===== Fold 4 start: train 14977 valid 3744 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),


Fold 4 Epoch 1/10




  iter 0/468 loss 1.4997 elapsed 1.8s


  iter 50/468 loss 1.1493 elapsed 27.6s


  iter 100/468 loss 0.5925 elapsed 27.8s


  iter 150/468 loss 0.7875 elapsed 27.8s


  iter 200/468 loss 0.6275 elapsed 27.8s


  iter 250/468 loss 0.4559 elapsed 27.9s


  iter 300/468 loss 1.0060 elapsed 27.9s


  iter 350/468 loss 0.4749 elapsed 28.0s


  iter 400/468 loss 0.4701 elapsed 28.1s


  iter 450/468 loss 0.5895 elapsed 28.1s


  train_loss 0.8311 val_loss 0.9803 val_acc 0.6696 epoch_time 769.6s


Fold 4 Epoch 2/10


  iter 0/468 loss 0.6326 elapsed 1.4s


  iter 50/468 loss 0.4269 elapsed 27.5s


  iter 100/468 loss 0.8327 elapsed 27.7s


  iter 150/468 loss 1.0337 elapsed 28.3s


  iter 200/468 loss 0.5378 elapsed 27.9s


  iter 250/468 loss 0.4679 elapsed 27.8s


  iter 300/468 loss 0.8173 elapsed 27.8s


  iter 350/468 loss 0.7478 elapsed 27.9s


  iter 400/468 loss 0.4728 elapsed 28.0s


  iter 450/468 loss 0.5180 elapsed 28.1s


  train_loss 0.6747 val_loss 0.5321 val_acc 0.8360 epoch_time 766.1s


Fold 4 Epoch 3/10


  iter 0/468 loss 0.5296 elapsed 0.9s


  iter 50/468 loss 0.6890 elapsed 27.6s


  iter 100/468 loss 0.5165 elapsed 27.8s


  iter 150/468 loss 0.5805 elapsed 27.8s


  iter 200/468 loss 0.9415 elapsed 27.7s


  iter 250/468 loss 0.5095 elapsed 27.8s


  iter 300/468 loss 0.4550 elapsed 27.9s


  iter 350/468 loss 0.3966 elapsed 28.0s


  iter 400/468 loss 0.6447 elapsed 28.0s


  iter 450/468 loss 0.6494 elapsed 28.3s


  train_loss 0.6428 val_loss 0.4025 val_acc 0.8763 epoch_time 776.8s


Fold 4 Epoch 4/10


  iter 0/468 loss 0.4180 elapsed 0.9s


  iter 50/468 loss 0.3190 elapsed 27.8s


  iter 100/468 loss 0.3873 elapsed 28.0s


  iter 150/468 loss 0.3316 elapsed 28.0s


  iter 200/468 loss 0.5347 elapsed 28.0s


  iter 250/468 loss 0.4868 elapsed 28.0s


  iter 300/468 loss 0.6618 elapsed 28.0s


  iter 350/468 loss 0.3562 elapsed 28.1s


  iter 400/468 loss 0.5831 elapsed 28.1s


  iter 450/468 loss 0.5111 elapsed 28.2s


  train_loss 0.6008 val_loss 0.3539 val_acc 0.8918 epoch_time 786.8s


Fold 4 Epoch 5/10


  iter 0/468 loss 0.2988 elapsed 0.8s


  iter 50/468 loss 0.7886 elapsed 27.4s


  iter 100/468 loss 0.5899 elapsed 27.7s


  iter 150/468 loss 0.5670 elapsed 27.6s


  iter 200/468 loss 0.8261 elapsed 27.8s


  iter 250/468 loss 0.2729 elapsed 28.1s


  iter 300/468 loss 0.3366 elapsed 27.9s


  iter 350/468 loss 0.7649 elapsed 28.0s


  iter 400/468 loss 0.3279 elapsed 28.4s


  iter 450/468 loss 0.2903 elapsed 28.2s


  train_loss 0.5549 val_loss 0.3389 val_acc 0.8974 epoch_time 752.9s


Fold 4 Epoch 6/10


  iter 0/468 loss 0.5385 elapsed 1.0s


  iter 50/468 loss 0.2692 elapsed 27.6s


  iter 100/468 loss 0.5923 elapsed 27.9s


  iter 150/468 loss 0.6578 elapsed 27.8s


  iter 200/468 loss 0.4942 elapsed 27.9s


  iter 250/468 loss 0.3821 elapsed 27.8s


  iter 300/468 loss 0.3145 elapsed 27.8s


  iter 350/468 loss 0.7320 elapsed 27.9s


  iter 400/468 loss 0.8496 elapsed 28.0s


  iter 450/468 loss 0.8545 elapsed 28.0s


  train_loss 0.5340 val_loss 0.3355 val_acc 0.8958 epoch_time 748.9s


Fold 4 Epoch 7/10


  iter 0/468 loss 0.7195 elapsed 0.9s


  iter 50/468 loss 0.9747 elapsed 27.8s


  iter 100/468 loss 1.0020 elapsed 27.7s


  iter 150/468 loss 0.3192 elapsed 27.8s


  iter 200/468 loss 0.5558 elapsed 27.8s


  iter 250/468 loss 0.3938 elapsed 27.8s


  iter 300/468 loss 0.3293 elapsed 27.9s


  iter 350/468 loss 0.7112 elapsed 28.0s


  iter 400/468 loss 0.2852 elapsed 28.0s


  iter 450/468 loss 0.3550 elapsed 28.1s


  train_loss 0.4759 val_loss 0.3402 val_acc 0.8953 epoch_time 758.6s


Fold 4 Epoch 8/10


  iter 0/468 loss 0.9176 elapsed 1.0s


  iter 50/468 loss 0.5636 elapsed 27.5s


  iter 100/468 loss 0.7243 elapsed 27.7s


  iter 150/468 loss 0.2509 elapsed 27.6s


  iter 200/468 loss 0.4762 elapsed 27.7s


  iter 250/468 loss 0.2852 elapsed 27.6s


  iter 300/468 loss 0.4474 elapsed 27.8s


  iter 350/468 loss 0.6655 elapsed 28.1s


  iter 400/468 loss 0.2448 elapsed 28.0s


  iter 450/468 loss 0.5777 elapsed 28.0s


  train_loss 0.4634 val_loss 0.3490 val_acc 0.8918 epoch_time 752.4s


Fold 4 Epoch 9/10


  iter 0/468 loss 0.6575 elapsed 0.9s


  iter 50/468 loss 0.2371 elapsed 27.5s


  iter 100/468 loss 0.2367 elapsed 27.8s


  iter 150/468 loss 0.5991 elapsed 27.7s


  iter 200/468 loss 0.7190 elapsed 27.8s


  iter 250/468 loss 0.2515 elapsed 27.9s


  iter 300/468 loss 0.2410 elapsed 27.9s


  iter 350/468 loss 0.2341 elapsed 27.9s


  iter 400/468 loss 0.6485 elapsed 28.0s


  iter 450/468 loss 0.8504 elapsed 28.0s


  train_loss 0.4560 val_loss 0.3607 val_acc 0.8913 epoch_time 757.5s


Fold 4 Epoch 10/10


  iter 0/468 loss 0.3429 elapsed 1.0s


  iter 50/468 loss 0.5728 elapsed 27.5s


  iter 100/468 loss 0.2411 elapsed 27.7s


  iter 150/468 loss 0.2379 elapsed 27.8s


  iter 200/468 loss 0.9721 elapsed 28.1s


  iter 250/468 loss 0.3040 elapsed 27.9s


  iter 300/468 loss 0.2307 elapsed 27.9s


  iter 350/468 loss 0.2307 elapsed 28.0s


  iter 400/468 loss 0.2308 elapsed 28.0s


  iter 450/468 loss 0.3130 elapsed 28.0s


  train_loss 0.4560 val_loss 0.3708 val_acc 0.8889 epoch_time 774.8s


Fold 4 best_acc 0.8974 fold_time 8156.4s


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


Saved submission.csv with shape: (2676, 2)
Submission cols OK: True Labels int[0..4]: True Shape: (2676, 2)
Label value_counts: {3: 1673, 4: 327, 1: 281, 2: 257, 0: 138}
Saved oof logits to oof_logits_convnext_base_448.npy


In [None]:
# Hard reset GPU context: save groups then SIGKILL kernel
import os, signal
if 'group' in df.columns:
    df[['image_id','label','group']].to_csv('train_with_groups.csv', index=False)
    print('Saved train_with_groups.csv')
else:
    print('No group column to save; skipping')
print('Killing kernel to hard-reset CUDA context...')
os.kill(os.getpid(), signal.SIGKILL)

In [5]:
# Save convnext_base test logits after Cell 5 completes
import numpy as np, os
if 'test_logits_mean' in globals():
    np.save('test_logits_convnext_base_448.npy', test_logits_mean)
    print('Saved test logits to test_logits_convnext_base_448.npy with shape:', test_logits_mean.shape)
else:
    print('test_logits_mean not found in globals; run after Cell 5 finished.')

Saved test logits to test_logits_convnext_base_448.npy with shape: (2676, 5)


In [6]:
# Train tf_efficientnet_b4_ns@512 with SGKF(phash) + ES + EMA; save fold ckpts and test logits
import time, os, sys, subprocess
from sklearn.model_selection import StratifiedGroupKFold
import torch
from timm.utils import ModelEmaV2
from timm.data import Mixup
from timm.loss import SoftTargetCrossEntropy
import numpy as np
import pandas as pd
from pathlib import Path

# Speedup per expert advice
torch.backends.cudnn.benchmark = True

class CFG_B4:
    seed = 42
    model_name = 'tf_efficientnet_b4_ns'
    img_size = 512
    batch_size = 24
    epochs = 12
    min_epochs = 6
    patience = 2
    lr = 1e-4
    weight_decay = 1e-4
    num_workers = 8
    mixup_alpha = 1.0
    cutmix_alpha = 1.0
    mixup_prob = 0.5
    mixup_switch_prob = 0.5
    ls = 0.05
    use_mixup = True
    use_ema = True
    ema_decay = 0.999
    tta_scales = [0.95, 1.05]

def build_model_b4(num_classes=5):
    m = timm.create_model(CFG_B4.model_name, pretrained=True, num_classes=num_classes)
    m = m.to('cuda' if torch.cuda.is_available() else 'cpu')
    if torch.cuda.is_available():
        m = m.to(memory_format=torch.channels_last)
    return m

def get_mixup_fn_b4():
    if not CFG_B4.use_mixup:
        return None
    return Mixup(mixup_alpha=CFG_B4.mixup_alpha, cutmix_alpha=CFG_B4.cutmix_alpha, prob=CFG_B4.mixup_prob, switch_prob=CFG_B4.mixup_switch_prob, label_smoothing=CFG_B4.ls, num_classes=5)

def get_scheduler_b4(optimizer, steps_per_epoch):
    warmup = steps_per_epoch * 1
    total = steps_per_epoch * CFG_B4.epochs
    def lr_lambda(step):
        if step < warmup:
            return max(1e-8, step / max(1, warmup))
        prog = (step - warmup) / max(1, total - warmup)
        return 0.5 * (1 + math.cos(math.pi * prog))
    return torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda)

def train_epoch_b4(model, loader, optimizer, scaler, mixup_fn=None, ema=None, scheduler=None):
    model.train()
    dev = next(model.parameters()).device
    crit = SoftTargetCrossEntropy().to(dev) if mixup_fn is not None else nn.CrossEntropyLoss(label_smoothing=CFG_B4.ls).to(dev)
    tot, cnt = 0.0, 0
    t0 = time.time()
    for it, (x, y) in enumerate(loader):
        x = x.to(dev, non_blocking=True)
        if torch.cuda.is_available(): x = x.to(memory_format=torch.channels_last)
        y = y.to(dev, non_blocking=True)
        if mixup_fn is not None:
            x, y = mixup_fn(x, y)
        optimizer.zero_grad(set_to_none=True)
        with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
            logits = model(x)
            loss = crit(logits, y)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        if ema is not None: ema.update(model)
        if scheduler is not None: scheduler.step()
        tot += loss.item() * x.size(0)
        cnt += x.size(0)
        if it % 50 == 0:
            print(f'  iter {it}/{len(loader)} loss {loss.item():.4f}')
    return tot / max(cnt,1)

def validate_b4(model, loader):
    model.eval()
    dev = next(model.parameters()).device
    crit = nn.CrossEntropyLoss().to(dev)
    tot, cnt, correct = 0.0, 0, 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(dev, non_blocking=True)
            if torch.cuda.is_available(): x = x.to(memory_format=torch.channels_last)
            y = y.to(dev, non_blocking=True)
            with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                logits = model(x)
                loss = crit(logits, y)
            tot += loss.item() * x.size(0); cnt += x.size(0)
            correct += (logits.softmax(1).argmax(1) == y).sum().item()
    return tot / max(cnt,1), correct / max(cnt,1)

def train_b4_with_groups():
    seed_everything(CFG_B4.seed)
    groups_path = Path('train_with_groups.csv')
    assert groups_path.exists(), 'train_with_groups.csv missing; run Cell 5 or precompute groups first.'
    gdf = pd.read_csv(groups_path)
    # ensure df has group merged
    base_cols = ['image_id','label']
    if 'group' not in df.columns:
        mdf = df[base_cols].merge(gdf[base_cols+['group']], on=base_cols, how='left', validate='one_to_one')
    else:
        mdf = df.copy()
    assert mdf['group'].notna().all(), 'Group merge failed'
    sgkf = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=CFG_B4.seed)
    folds = list(sgkf.split(mdf['image_id'], mdf['label'], groups=mdf['group']))
    df_test = pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})
    test_logits_folds = []
    oof_logits = np.zeros((len(mdf), 5), dtype=np.float32)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    for fold, (tr_idx, va_idx) in enumerate(folds):
        print(f'===== B4 Fold {fold} start: train {len(tr_idx)} valid {len(va_idx)} =====')
        df_tr = mdf.iloc[tr_idx].reset_index(drop=True)
        df_va = mdf.iloc[va_idx].reset_index(drop=True)
        if torch.cuda.is_available():
            torch.cuda.empty_cache(); torch.cuda.reset_peak_memory_stats()
        train_loader = make_loader(df_tr, train_dir, get_train_transforms(CFG_B4.img_size), batch_size=CFG_B4.batch_size, shuffle=True, num_workers=CFG_B4.num_workers)
        valid_loader = make_loader(df_va, train_dir, get_valid_transforms(CFG_B4.img_size), batch_size=CFG_B4.batch_size, shuffle=False, num_workers=CFG_B4.num_workers)
        model = build_model_b4(num_classes=5)
        optimizer = torch.optim.AdamW(model.parameters(), lr=CFG_B4.lr, weight_decay=CFG_B4.weight_decay)
        steps_per_epoch = max(1, len(train_loader))
        scheduler = get_scheduler_b4(optimizer, steps_per_epoch)
        scaler = torch.amp.GradScaler('cuda', enabled=torch.cuda.is_available())
        mixup_fn = get_mixup_fn_b4()
        ema = ModelEmaV2(model, decay=CFG_B4.ema_decay, device='cpu') if CFG_B4.use_ema else None
        best_acc, best_state = -1.0, None
        no_improve = 0
        for epoch in range(CFG_B4.epochs):
            print(f'B4 Fold {fold} Epoch {epoch+1}/{CFG_B4.epochs}')
            tr_loss = train_epoch_b4(model, train_loader, optimizer, scaler, mixup_fn, ema=ema, scheduler=scheduler)
            val_loss, val_acc = validate_b4(ema.module if ema is not None else model, valid_loader)
            print(f'  train_loss {tr_loss:.4f} val_loss {val_loss:.4f} val_acc {val_acc:.4f}')
            improved = val_acc > best_acc + 1e-6
            if improved:
                best_acc = val_acc
                best_state = (ema.module if ema is not None else model).state_dict()
                no_improve = 0
                torch.save(best_state, f'ckpt_{CFG_B4.model_name}_{CFG_B4.img_size}_fold{fold}.pth')
            else:
                no_improve += 1
            if (epoch + 1) >= CFG_B4.min_epochs and no_improve > CFG_B4.patience:
                print('  Early stopping triggered')
                break
        # OOF logits
        with torch.no_grad():
            target = ema.module if ema is not None else model
            target.load_state_dict(best_state)
            target.eval()
            dev = next(target.parameters()).device
            fold_logits = []
            for x, y in valid_loader:
                x = x.to(dev, non_blocking=True)
                if torch.cuda.is_available(): x = x.to(memory_format=torch.channels_last)
                with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                    lg = target(x)
                fold_logits.append(lg.float().cpu().numpy())
            fold_logits = np.concatenate(fold_logits, axis=0)
            oof_logits[va_idx] = fold_logits
        # Test logits for this fold
        fold_test_logits = infer_test(target, df_test, test_dir, size=CFG_B4.img_size, tta_hflip=True, tta_scales=CFG_B4.tta_scales, batch_size=CFG_B4.batch_size)
        test_logits_folds.append(fold_test_logits)
        np.save(f'test_logits_{CFG_B4.model_name}_{CFG_B4.img_size}_fold{fold}.npy', fold_test_logits)
        print(f'B4 Fold {fold} best_acc {best_acc:.4f}')
    test_logits_mean = np.mean(test_logits_folds, axis=0)
    np.save(f'test_logits_{CFG_B4.model_name}_{CFG_B4.img_size}.npy', test_logits_mean)
    np.save(f'oof_logits_{CFG_B4.model_name}_{CFG_B4.img_size}.npy', oof_logits)
    print(f'Saved test logits to test_logits_{CFG_B4.model_name}_{CFG_B4.img_size}.npy with shape {test_logits_mean.shape}')
    return test_logits_mean

print('B4 training cell ready. After Cell 5 finishes, run train_b4_with_groups() to produce test logits for ensembling.')

# Auto-start training for B4
test_logits_mean_b4 = train_b4_with_groups()

B4 training cell ready. After Cell 5 finishes, run train_b4_with_groups() to produce test logits for ensembling.


===== B4 Fold 0 start: train 14976 valid 3745 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),
  model = create_fn(


B4 Fold 0 Epoch 1/12




  iter 0/624 loss 2.8598


  iter 50/624 loss 2.4599


  iter 100/624 loss 1.4088


  iter 150/624 loss 0.9404


  iter 200/624 loss 1.2813


  iter 250/624 loss 1.2813


  iter 300/624 loss 0.7952


  iter 350/624 loss 1.0220


  iter 400/624 loss 1.4439


  iter 450/624 loss 0.7891


  iter 500/624 loss 0.6243


  iter 550/624 loss 0.5917


  iter 600/624 loss 1.0702


  train_loss 1.2195 val_loss 1.8542 val_acc 0.2534
B4 Fold 0 Epoch 2/12


  iter 0/624 loss 0.5664


  iter 50/624 loss 0.8589


  iter 100/624 loss 0.4920


  iter 150/624 loss 0.7561


  iter 200/624 loss 0.7658


  iter 250/624 loss 0.8898


  iter 300/624 loss 0.5036


  iter 350/624 loss 0.6787


  iter 400/624 loss 0.9509


  iter 450/624 loss 0.5592


  iter 500/624 loss 0.9967


  iter 550/624 loss 0.9918


  iter 600/624 loss 0.5757


  train_loss 0.8079 val_loss 0.8106 val_acc 0.7268


B4 Fold 0 Epoch 3/12


  iter 0/624 loss 0.4146


  iter 50/624 loss 0.4813


  iter 100/624 loss 0.7312


  iter 150/624 loss 0.4637


  iter 200/624 loss 0.5062


  iter 250/624 loss 1.0919


  iter 300/624 loss 0.9653


  iter 350/624 loss 0.5733


  iter 400/624 loss 0.8218


  iter 450/624 loss 0.8217


  iter 500/624 loss 0.6980


  iter 550/624 loss 0.5797


  iter 600/624 loss 0.9332


  train_loss 0.7409 val_loss 0.5200 val_acc 0.8347


B4 Fold 0 Epoch 4/12


  iter 0/624 loss 0.6099


  iter 50/624 loss 0.8243


  iter 100/624 loss 0.3787


  iter 150/624 loss 0.6832


  iter 200/624 loss 0.6594


  iter 250/624 loss 0.9408


  iter 300/624 loss 0.4680


  iter 350/624 loss 0.5281


  iter 400/624 loss 0.8851


  iter 450/624 loss 1.1163


  iter 500/624 loss 0.3961


  iter 550/624 loss 0.6182


  iter 600/624 loss 0.6206


  train_loss 0.7130 val_loss 0.4598 val_acc 0.8563


B4 Fold 0 Epoch 5/12


  iter 0/624 loss 0.8376


  iter 50/624 loss 0.8608


  iter 100/624 loss 1.1773


  iter 150/624 loss 0.4363


  iter 200/624 loss 0.7144


  iter 250/624 loss 0.9424


  iter 300/624 loss 0.7422


  iter 350/624 loss 0.4064


  iter 400/624 loss 0.6730


  iter 450/624 loss 0.3810


  iter 500/624 loss 0.4688


  iter 550/624 loss 0.6547


  iter 600/624 loss 0.7008


  train_loss 0.6674 val_loss 0.4334 val_acc 0.8654


B4 Fold 0 Epoch 6/12


  iter 0/624 loss 0.6004


  iter 50/624 loss 0.3366


  iter 100/624 loss 0.7866


  iter 150/624 loss 0.6192


  iter 200/624 loss 0.9437


  iter 250/624 loss 0.9667


  iter 300/624 loss 0.3435


  iter 350/624 loss 0.7923


  iter 400/624 loss 0.9501


  iter 450/624 loss 0.4466


  iter 500/624 loss 0.2741


  iter 550/624 loss 1.0021


  iter 600/624 loss 0.4242


  train_loss 0.6465 val_loss 0.4227 val_acc 0.8697


B4 Fold 0 Epoch 7/12


  iter 0/624 loss 0.8698


  iter 50/624 loss 0.7297


  iter 100/624 loss 0.4523


  iter 150/624 loss 0.7297


  iter 200/624 loss 0.3671


  iter 250/624 loss 0.9921


  iter 300/624 loss 0.6289


  iter 350/624 loss 0.5449


  iter 400/624 loss 1.0670


  iter 450/624 loss 1.0288


  iter 500/624 loss 0.4036


  iter 550/624 loss 1.0997


  iter 600/624 loss 0.3263


  train_loss 0.6173 val_loss 0.4182 val_acc 0.8737


B4 Fold 0 Epoch 8/12


  iter 0/624 loss 0.3520


  iter 50/624 loss 1.1669


  iter 100/624 loss 0.4215


  iter 150/624 loss 0.3414


  iter 200/624 loss 0.8351


  iter 250/624 loss 0.6939


  iter 300/624 loss 0.8193


  iter 350/624 loss 0.4453


  iter 400/624 loss 1.0917


  iter 450/624 loss 0.5275


  iter 500/624 loss 0.5614


  iter 550/624 loss 0.4341


  iter 600/624 loss 0.7814


  train_loss 0.5838 val_loss 0.4174 val_acc 0.8745


B4 Fold 0 Epoch 9/12


  iter 0/624 loss 0.3479


  iter 50/624 loss 1.1903


  iter 100/624 loss 0.6718


  iter 150/624 loss 1.3395


  iter 200/624 loss 0.3003


  iter 250/624 loss 0.5904


  iter 300/624 loss 0.6687


  iter 350/624 loss 0.6713


  iter 400/624 loss 0.8683


  iter 450/624 loss 0.4642


  iter 500/624 loss 0.2994


  iter 550/624 loss 0.5921


  iter 600/624 loss 0.3333


  train_loss 0.5586 val_loss 0.4161 val_acc 0.8753


B4 Fold 0 Epoch 10/12


  iter 0/624 loss 0.4258


  iter 50/624 loss 1.0018


  iter 100/624 loss 0.3299


  iter 150/624 loss 0.3580


  iter 200/624 loss 0.3104


  iter 250/624 loss 0.5861


  iter 300/624 loss 0.3732


  iter 350/624 loss 0.8930


  iter 400/624 loss 1.0718


  iter 450/624 loss 0.7962


  iter 500/624 loss 0.3453


  iter 550/624 loss 0.3291


  iter 600/624 loss 1.0048


  train_loss 0.5339 val_loss 0.4218 val_acc 0.8732
B4 Fold 0 Epoch 11/12


  iter 0/624 loss 0.3225


  iter 50/624 loss 0.7798


  iter 100/624 loss 0.9691


  iter 150/624 loss 0.2894


  iter 200/624 loss 0.3564


  iter 250/624 loss 0.6277


  iter 300/624 loss 0.2550


  iter 350/624 loss 0.2729


  iter 400/624 loss 0.9995


  iter 450/624 loss 0.3263


  iter 500/624 loss 0.3136


  iter 550/624 loss 0.7349


  iter 600/624 loss 1.0206


  train_loss 0.5395 val_loss 0.4288 val_acc 0.8710
B4 Fold 0 Epoch 12/12


  iter 0/624 loss 0.7337


  iter 50/624 loss 0.4280


  iter 100/624 loss 1.0594


  iter 150/624 loss 0.2916


  iter 200/624 loss 0.2981


  iter 250/624 loss 0.4132


  iter 300/624 loss 0.3268


  iter 350/624 loss 0.2585


  iter 400/624 loss 0.4346


  iter 450/624 loss 0.2966


  iter 500/624 loss 0.3134


  iter 550/624 loss 0.7766


  iter 600/624 loss 0.4478


  train_loss 0.5464 val_loss 0.4308 val_acc 0.8705
  Early stopping triggered


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


B4 Fold 0 best_acc 0.8753
===== B4 Fold 1 start: train 14977 valid 3744 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),


  model = create_fn(


B4 Fold 1 Epoch 1/12




  iter 0/624 loss 2.5807


  iter 50/624 loss 2.5666


  iter 100/624 loss 1.7274


  iter 150/624 loss 1.1595


  iter 200/624 loss 1.1549


  iter 250/624 loss 1.2383


  iter 300/624 loss 1.0668


  iter 350/624 loss 0.7454


  iter 400/624 loss 0.6361


  iter 450/624 loss 1.0835


  iter 500/624 loss 0.8163


  iter 550/624 loss 0.7569


  iter 600/624 loss 0.7244


  train_loss 1.2485 val_loss 1.6423 val_acc 0.3381
B4 Fold 1 Epoch 2/12


  iter 0/624 loss 0.6675


  iter 50/624 loss 0.6895


  iter 100/624 loss 0.7291


  iter 150/624 loss 0.7679


  iter 200/624 loss 0.9554


  iter 250/624 loss 0.7861


  iter 300/624 loss 1.1115


  iter 350/624 loss 1.6729


  iter 400/624 loss 0.5150


  iter 450/624 loss 0.7879


  iter 500/624 loss 0.9961


  iter 550/624 loss 0.8547


  iter 600/624 loss 0.4588


  train_loss 0.8025 val_loss 0.8350 val_acc 0.7067


B4 Fold 1 Epoch 3/12


  iter 0/624 loss 0.6747


  iter 50/624 loss 0.5128


  iter 100/624 loss 0.9864


  iter 150/624 loss 1.5100


  iter 200/624 loss 0.7673


  iter 250/624 loss 0.5119


  iter 300/624 loss 0.6937


  iter 350/624 loss 1.0602


  iter 400/624 loss 0.9791


  iter 450/624 loss 0.7194


  iter 500/624 loss 1.0391


  iter 550/624 loss 1.0140


  iter 600/624 loss 0.7802


  train_loss 0.7359 val_loss 0.5410 val_acc 0.8267
B4 Fold 1 Epoch 4/12


  iter 0/624 loss 0.5051


  iter 50/624 loss 0.7131


  iter 100/624 loss 1.1132


  iter 150/624 loss 0.4124


  iter 200/624 loss 0.9462


  iter 250/624 loss 0.8354


  iter 300/624 loss 0.9247


  iter 350/624 loss 0.4785


  iter 400/624 loss 0.3591


  iter 450/624 loss 1.0075


  iter 500/624 loss 0.4789


  iter 550/624 loss 0.7313


  iter 600/624 loss 1.1887


  train_loss 0.6988 val_loss 0.4708 val_acc 0.8443


B4 Fold 1 Epoch 5/12


  iter 0/624 loss 0.3830


  iter 50/624 loss 0.9133


  iter 100/624 loss 0.4759


  iter 150/624 loss 0.4308


  iter 200/624 loss 0.4623


  iter 250/624 loss 0.4139


  iter 300/624 loss 0.6126


  iter 350/624 loss 0.7333


  iter 400/624 loss 0.6609


  iter 450/624 loss 0.6689


  iter 500/624 loss 0.9543


  iter 550/624 loss 0.6913


  iter 600/624 loss 0.3245


  train_loss 0.6642 val_loss 0.4486 val_acc 0.8563


B4 Fold 1 Epoch 6/12


  iter 0/624 loss 0.5952


  iter 50/624 loss 0.4331


  iter 100/624 loss 0.9494


  iter 150/624 loss 1.3290


  iter 200/624 loss 0.3954


  iter 250/624 loss 0.8231


  iter 300/624 loss 0.4579


  iter 350/624 loss 0.5735


  iter 400/624 loss 0.5629


  iter 450/624 loss 0.3510


  iter 500/624 loss 0.4100


  iter 550/624 loss 0.7675


  iter 600/624 loss 0.4052


  train_loss 0.6404 val_loss 0.4409 val_acc 0.8643


B4 Fold 1 Epoch 7/12


  iter 0/624 loss 0.8547


  iter 50/624 loss 0.5066


  iter 100/624 loss 0.4056


  iter 150/624 loss 0.3370


  iter 200/624 loss 0.3837


  iter 250/624 loss 0.8546


  iter 300/624 loss 0.3773


  iter 350/624 loss 0.3659


  iter 400/624 loss 0.3523


  iter 450/624 loss 0.9304


  iter 500/624 loss 0.4791


  iter 550/624 loss 0.8368


  iter 600/624 loss 0.7717


  train_loss 0.6267 val_loss 0.4278 val_acc 0.8681


B4 Fold 1 Epoch 8/12


  iter 0/624 loss 0.4596


  iter 50/624 loss 0.4646


  iter 100/624 loss 0.4608


  iter 150/624 loss 1.4210


  iter 200/624 loss 0.7096


  iter 250/624 loss 0.5464


  iter 300/624 loss 0.5303


  iter 350/624 loss 0.4812


  iter 400/624 loss 0.6138


  iter 450/624 loss 0.8766


  iter 500/624 loss 0.2797


  iter 550/624 loss 0.5106


  iter 600/624 loss 0.4312


  train_loss 0.5913 val_loss 0.4201 val_acc 0.8705


B4 Fold 1 Epoch 9/12


  iter 0/624 loss 1.0323


  iter 50/624 loss 0.3573


  iter 100/624 loss 0.9234


  iter 150/624 loss 1.0706


  iter 200/624 loss 0.3177


  iter 250/624 loss 0.5252


  iter 300/624 loss 0.4162


  iter 350/624 loss 0.3973


  iter 400/624 loss 0.5768


  iter 450/624 loss 0.6433


  iter 500/624 loss 0.6428


  iter 550/624 loss 0.3337


  iter 600/624 loss 1.2804


  train_loss 0.5769 val_loss 0.4200 val_acc 0.8681
B4 Fold 1 Epoch 10/12


  iter 0/624 loss 0.9897


  iter 50/624 loss 0.3495


  iter 100/624 loss 0.8251


  iter 150/624 loss 0.7892


  iter 200/624 loss 0.2654


  iter 250/624 loss 0.9011


  iter 300/624 loss 0.5009


  iter 350/624 loss 0.4515


  iter 400/624 loss 0.4798


  iter 450/624 loss 0.2966


  iter 500/624 loss 0.2827


  iter 550/624 loss 0.2716


  iter 600/624 loss 0.2697


  train_loss 0.5604 val_loss 0.4271 val_acc 0.8673
B4 Fold 1 Epoch 11/12


  iter 0/624 loss 1.3657


  iter 50/624 loss 0.4716


  iter 100/624 loss 1.0095


  iter 150/624 loss 0.6460


  iter 200/624 loss 0.2720


  iter 250/624 loss 1.2069


  iter 300/624 loss 0.3108


  iter 350/624 loss 0.2967


  iter 400/624 loss 0.3045


  iter 450/624 loss 0.9712


  iter 500/624 loss 0.4657


  iter 550/624 loss 0.6574


  iter 600/624 loss 0.5938


  train_loss 0.5445 val_loss 0.4294 val_acc 0.8649
  Early stopping triggered


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


B4 Fold 1 best_acc 0.8705
===== B4 Fold 2 start: train 14977 valid 3744 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),


  model = create_fn(


B4 Fold 2 Epoch 1/12




  iter 0/624 loss 3.2823


  iter 50/624 loss 2.4373


  iter 100/624 loss 2.0645


  iter 150/624 loss 1.1671


  iter 200/624 loss 1.2609


  iter 250/624 loss 1.2238


  iter 300/624 loss 1.1322


  iter 350/624 loss 1.1191


  iter 400/624 loss 1.1931


  iter 450/624 loss 0.9101


  iter 500/624 loss 0.9517


  iter 550/624 loss 1.0420


  iter 600/624 loss 0.7863


  train_loss 1.2444 val_loss 1.4976 val_acc 0.5067
B4 Fold 2 Epoch 2/12


  iter 0/624 loss 0.7143


  iter 50/624 loss 1.4037


  iter 100/624 loss 0.9903


  iter 150/624 loss 0.7561


  iter 200/624 loss 0.7297


  iter 250/624 loss 0.7224


  iter 300/624 loss 1.2630


  iter 350/624 loss 0.6235


  iter 400/624 loss 0.7955


  iter 450/624 loss 1.0479


  iter 500/624 loss 0.6435


  iter 550/624 loss 1.1647


  iter 600/624 loss 0.9737


  train_loss 0.7971 val_loss 0.8293 val_acc 0.7179


B4 Fold 2 Epoch 3/12


  iter 0/624 loss 0.6678


  iter 50/624 loss 0.7156


  iter 100/624 loss 0.7257


  iter 150/624 loss 0.9279


  iter 200/624 loss 1.2992


  iter 250/624 loss 1.5059


  iter 300/624 loss 0.8524


  iter 350/624 loss 0.5373


  iter 400/624 loss 0.6833


  iter 450/624 loss 0.7235


  iter 500/624 loss 0.4643


  iter 550/624 loss 0.7866


  iter 600/624 loss 0.4438


  train_loss 0.7200 val_loss 0.5433 val_acc 0.8178


B4 Fold 2 Epoch 4/12


  iter 0/624 loss 0.3961


  iter 50/624 loss 0.5953


  iter 100/624 loss 0.6044


  iter 150/624 loss 0.9088


  iter 200/624 loss 0.3569


  iter 250/624 loss 0.4337


  iter 300/624 loss 0.4815


  iter 350/624 loss 0.4172


  iter 400/624 loss 0.6568


  iter 450/624 loss 0.4804


  iter 500/624 loss 0.6747


  iter 550/624 loss 0.4406


  iter 600/624 loss 1.2808


  train_loss 0.6716 val_loss 0.4476 val_acc 0.8536


B4 Fold 2 Epoch 5/12


  iter 0/624 loss 0.4572


  iter 50/624 loss 0.4440


  iter 100/624 loss 1.1431


  iter 150/624 loss 0.8025


  iter 200/624 loss 0.4993


  iter 250/624 loss 0.5181


  iter 300/624 loss 0.5710


  iter 350/624 loss 0.5035


  iter 400/624 loss 0.5279


  iter 450/624 loss 0.5081


  iter 500/624 loss 0.4357


  iter 550/624 loss 0.9372


  iter 600/624 loss 1.0638


  train_loss 0.6591 val_loss 0.4244 val_acc 0.8598


B4 Fold 2 Epoch 6/12


  iter 0/624 loss 0.7193


  iter 50/624 loss 0.3204


  iter 100/624 loss 0.9456


  iter 150/624 loss 0.3776


  iter 200/624 loss 1.1510


  iter 250/624 loss 1.2231


  iter 300/624 loss 0.3289


  iter 350/624 loss 0.8368


  iter 400/624 loss 0.3791


  iter 450/624 loss 0.7906


  iter 500/624 loss 1.1951


  iter 550/624 loss 0.9936


  iter 600/624 loss 0.4046


  train_loss 0.6315 val_loss 0.4135 val_acc 0.8643


B4 Fold 2 Epoch 7/12


  iter 0/624 loss 0.4195


  iter 50/624 loss 0.4636


  iter 100/624 loss 0.4405


  iter 150/624 loss 1.0249


  iter 200/624 loss 0.9926


  iter 250/624 loss 0.3559


  iter 300/624 loss 0.8504


  iter 350/624 loss 0.9142


  iter 400/624 loss 0.8698


  iter 450/624 loss 0.5853


  iter 500/624 loss 0.7553


  iter 550/624 loss 0.4606


  iter 600/624 loss 0.6592


  train_loss 0.6098 val_loss 0.4096 val_acc 0.8662


B4 Fold 2 Epoch 8/12


  iter 0/624 loss 0.6131


  iter 50/624 loss 0.5608


  iter 100/624 loss 0.3689


  iter 150/624 loss 0.5882


  iter 200/624 loss 0.4204


  iter 250/624 loss 0.7373


  iter 300/624 loss 0.4251


  iter 350/624 loss 0.4275


  iter 400/624 loss 0.6697


  iter 450/624 loss 0.7727


  iter 500/624 loss 1.2562


  iter 550/624 loss 0.3885


  iter 600/624 loss 0.3613


  train_loss 0.5940 val_loss 0.4133 val_acc 0.8657
B4 Fold 2 Epoch 9/12


  iter 0/624 loss 0.7012


  iter 50/624 loss 0.9961


  iter 100/624 loss 0.3149


  iter 150/624 loss 0.3199


  iter 200/624 loss 0.3948


  iter 250/624 loss 0.7641


  iter 300/624 loss 0.3053


  iter 350/624 loss 0.4013


  iter 400/624 loss 0.7414


  iter 450/624 loss 0.4388


  iter 500/624 loss 1.1235


  iter 550/624 loss 0.9661


  iter 600/624 loss 0.3185


  train_loss 0.5560 val_loss 0.4191 val_acc 0.8659
B4 Fold 2 Epoch 10/12


  iter 0/624 loss 0.4488


  iter 50/624 loss 0.9419


  iter 100/624 loss 0.3004


  iter 150/624 loss 0.2998


  iter 200/624 loss 0.4527


  iter 250/624 loss 0.7149


  iter 300/624 loss 0.7855


  iter 350/624 loss 0.3917


  iter 400/624 loss 0.3685


  iter 450/624 loss 0.9699


  iter 500/624 loss 0.3162


  iter 550/624 loss 0.6308


  iter 600/624 loss 0.5526


  train_loss 0.5330 val_loss 0.4237 val_acc 0.8635
  Early stopping triggered


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


B4 Fold 2 best_acc 0.8662
===== B4 Fold 3 start: train 14977 valid 3744 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),


  model = create_fn(


B4 Fold 3 Epoch 1/12




  iter 0/624 loss 2.9978


  iter 50/624 loss 2.3268


  iter 100/624 loss 1.8944


  iter 150/624 loss 1.0351


  iter 200/624 loss 1.4867


  iter 250/624 loss 1.0475


  iter 300/624 loss 1.1041


  iter 350/624 loss 1.3236


  iter 400/624 loss 0.9282


  iter 450/624 loss 1.3146


  iter 500/624 loss 1.0877


  iter 550/624 loss 0.9570


  iter 600/624 loss 0.7905


  train_loss 1.2675 val_loss 2.2064 val_acc 0.1774
B4 Fold 3 Epoch 2/12


  iter 0/624 loss 0.6991


  iter 50/624 loss 0.7812


  iter 100/624 loss 1.2914


  iter 150/624 loss 0.4866


  iter 200/624 loss 0.5422


  iter 250/624 loss 0.4796


  iter 300/624 loss 0.7737


  iter 350/624 loss 0.6159


  iter 400/624 loss 1.2298


  iter 450/624 loss 0.9528


  iter 500/624 loss 0.6732


  iter 550/624 loss 0.6675


  iter 600/624 loss 0.4538


  train_loss 0.8103 val_loss 0.8667 val_acc 0.6939


B4 Fold 3 Epoch 3/12


  iter 0/624 loss 0.8488


  iter 50/624 loss 0.5250


  iter 100/624 loss 0.5296


  iter 150/624 loss 1.2711


  iter 200/624 loss 0.4882


  iter 250/624 loss 1.1991


  iter 300/624 loss 0.7848


  iter 350/624 loss 1.1209


  iter 400/624 loss 0.6631


  iter 450/624 loss 0.4935


  iter 500/624 loss 0.6834


  iter 550/624 loss 0.8495


  iter 600/624 loss 0.4530


  train_loss 0.7383 val_loss 0.5184 val_acc 0.8435
B4 Fold 3 Epoch 4/12


  iter 0/624 loss 0.5885


  iter 50/624 loss 0.5532


  iter 100/624 loss 0.6033


  iter 150/624 loss 1.1110


  iter 200/624 loss 1.1230


  iter 250/624 loss 0.4114


  iter 300/624 loss 1.0042


  iter 350/624 loss 0.7050


  iter 400/624 loss 0.4624


  iter 450/624 loss 0.4710


  iter 500/624 loss 0.4370


  iter 550/624 loss 0.4296


  iter 600/624 loss 0.7027


  train_loss 0.7001 val_loss 0.4410 val_acc 0.8665


B4 Fold 3 Epoch 5/12


  iter 0/624 loss 0.5852


  iter 50/624 loss 0.6956


  iter 100/624 loss 0.7647


  iter 150/624 loss 0.6936


  iter 200/624 loss 0.8224


  iter 250/624 loss 0.4598


  iter 300/624 loss 0.4403


  iter 350/624 loss 0.4846


  iter 400/624 loss 1.0856


  iter 450/624 loss 0.5693


  iter 500/624 loss 0.7948


  iter 550/624 loss 0.9199


  iter 600/624 loss 0.5740


  train_loss 0.6771 val_loss 0.4235 val_acc 0.8731


B4 Fold 3 Epoch 6/12


  iter 0/624 loss 0.6542


  iter 50/624 loss 0.4021


  iter 100/624 loss 0.9111


  iter 150/624 loss 0.4878


  iter 200/624 loss 0.4572


  iter 250/624 loss 0.3878


  iter 300/624 loss 0.6263


  iter 350/624 loss 0.7617


  iter 400/624 loss 0.3173


  iter 450/624 loss 0.9207


  iter 500/624 loss 0.5378


  iter 550/624 loss 0.3961


  iter 600/624 loss 1.0258


  train_loss 0.6504 val_loss 0.4169 val_acc 0.8750


B4 Fold 3 Epoch 7/12


  iter 0/624 loss 0.3050


  iter 50/624 loss 0.7794


  iter 100/624 loss 0.4437


  iter 150/624 loss 0.8183


  iter 200/624 loss 0.5579


  iter 250/624 loss 0.4638


  iter 300/624 loss 0.4266


  iter 350/624 loss 0.4494


  iter 400/624 loss 0.3563


  iter 450/624 loss 0.4812


  iter 500/624 loss 0.4563


  iter 550/624 loss 0.5301


  iter 600/624 loss 0.3933


  train_loss 0.6044 val_loss 0.4117 val_acc 0.8718
B4 Fold 3 Epoch 8/12


  iter 0/624 loss 0.2826


  iter 50/624 loss 0.4906


  iter 100/624 loss 0.3689


  iter 150/624 loss 0.8096


  iter 200/624 loss 0.9808


  iter 250/624 loss 0.3584


  iter 300/624 loss 0.4312


  iter 350/624 loss 0.8199


  iter 400/624 loss 0.9001


  iter 450/624 loss 0.4140


  iter 500/624 loss 0.5555


  iter 550/624 loss 0.9521


  iter 600/624 loss 0.5451


  train_loss 0.6042 val_loss 0.4135 val_acc 0.8723
B4 Fold 3 Epoch 9/12


  iter 0/624 loss 0.3114


  iter 50/624 loss 0.4029


  iter 100/624 loss 1.1603


  iter 150/624 loss 0.3250


  iter 200/624 loss 0.4313


  iter 250/624 loss 0.6652


  iter 300/624 loss 0.9580


  iter 350/624 loss 0.3752


  iter 400/624 loss 0.4482


  iter 450/624 loss 0.8609


  iter 500/624 loss 0.3155


  iter 550/624 loss 0.2688


  iter 600/624 loss 1.3286


  train_loss 0.5632 val_loss 0.4162 val_acc 0.8715
  Early stopping triggered


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


B4 Fold 3 best_acc 0.8750
===== B4 Fold 4 start: train 14977 valid 3744 =====


  A.Affine(
  A.CoarseDropout(
  A.PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT, value=0),


  model = create_fn(


B4 Fold 4 Epoch 1/12


  iter 0/624 loss 3.1940


  iter 50/624 loss 2.5770


  iter 100/624 loss 1.4517


  iter 150/624 loss 0.9734


  iter 200/624 loss 1.4398


  iter 250/624 loss 1.1588


  iter 300/624 loss 0.8997


  iter 350/624 loss 0.6635


  iter 400/624 loss 1.1189


  iter 450/624 loss 1.5604


  iter 500/624 loss 0.8363


  iter 550/624 loss 0.8168


  iter 600/624 loss 0.7419


  train_loss 1.2465 val_loss 1.1654 val_acc 0.5855
B4 Fold 4 Epoch 2/12


  iter 0/624 loss 0.7122


  iter 50/624 loss 0.5386


  iter 100/624 loss 0.9397


  iter 150/624 loss 0.5838


  iter 200/624 loss 0.8413


  iter 250/624 loss 0.5485


  iter 300/624 loss 1.3687


  iter 350/624 loss 1.3984


  iter 400/624 loss 1.1732


  iter 450/624 loss 1.1624


  iter 500/624 loss 0.8518


  iter 550/624 loss 0.6621


  iter 600/624 loss 0.9896


  train_loss 0.8149 val_loss 0.7389 val_acc 0.7372


B4 Fold 4 Epoch 3/12


  iter 0/624 loss 0.9738


  iter 50/624 loss 0.9241


  iter 100/624 loss 0.9023


  iter 150/624 loss 0.6743


  iter 200/624 loss 0.5874


  iter 250/624 loss 0.8252


  iter 300/624 loss 0.5360


  iter 350/624 loss 0.3551


  iter 400/624 loss 0.7765


  iter 450/624 loss 0.4645


  iter 500/624 loss 1.0762


  iter 550/624 loss 0.8032


  iter 600/624 loss 0.5363


  train_loss 0.7182 val_loss 0.5028 val_acc 0.8355


B4 Fold 4 Epoch 4/12


  iter 0/624 loss 0.7358


  iter 50/624 loss 0.6965


  iter 100/624 loss 0.7655


  iter 150/624 loss 0.4263


  iter 200/624 loss 0.4994


  iter 250/624 loss 0.5813


  iter 300/624 loss 0.7240


  iter 350/624 loss 0.6159


  iter 400/624 loss 0.5846


  iter 450/624 loss 0.5165


  iter 500/624 loss 0.9528


  iter 550/624 loss 0.7258


  iter 600/624 loss 0.7191


  train_loss 0.7015 val_loss 0.4405 val_acc 0.8576
B4 Fold 4 Epoch 5/12


  iter 0/624 loss 0.3216


  iter 50/624 loss 0.8441


  iter 100/624 loss 1.0773


  iter 150/624 loss 0.3798


  iter 200/624 loss 1.2502


  iter 250/624 loss 0.9454


  iter 300/624 loss 0.4166


  iter 350/624 loss 1.0986


  iter 400/624 loss 0.8147


  iter 450/624 loss 1.1123


  iter 500/624 loss 0.7952


  iter 550/624 loss 0.4023


  iter 600/624 loss 0.7655


  train_loss 0.6543 val_loss 0.4191 val_acc 0.8683


B4 Fold 4 Epoch 6/12


  iter 0/624 loss 0.4174


  iter 50/624 loss 0.4470


  iter 100/624 loss 0.5015


  iter 150/624 loss 0.3117


  iter 200/624 loss 0.9164


  iter 250/624 loss 1.3831


  iter 300/624 loss 0.4397


  iter 350/624 loss 0.3939


  iter 400/624 loss 0.7984


  iter 450/624 loss 1.0101


  iter 500/624 loss 0.3139


  iter 550/624 loss 0.4252


  iter 600/624 loss 0.3518


  train_loss 0.6263 val_loss 0.4075 val_acc 0.8683
B4 Fold 4 Epoch 7/12


  iter 0/624 loss 0.3002


  iter 50/624 loss 0.5934


  iter 100/624 loss 0.4732


  iter 150/624 loss 1.3081


  iter 200/624 loss 0.6225


  iter 250/624 loss 0.5077


  iter 300/624 loss 0.3407


  iter 350/624 loss 0.5997


  iter 400/624 loss 1.1783


  iter 450/624 loss 0.3712


  iter 500/624 loss 0.6992


  iter 550/624 loss 0.5698


  iter 600/624 loss 0.3891


  train_loss 0.6268 val_loss 0.4077 val_acc 0.8710


B4 Fold 4 Epoch 8/12


  iter 0/624 loss 0.8565


  iter 50/624 loss 0.6147


  iter 100/624 loss 0.4916


  iter 150/624 loss 0.4221


  iter 200/624 loss 0.9570


  iter 250/624 loss 0.7501


  iter 300/624 loss 0.8455


  iter 350/624 loss 0.2706


  iter 400/624 loss 0.9076


  iter 450/624 loss 0.5072


  iter 500/624 loss 0.3204


  iter 550/624 loss 0.8560


  iter 600/624 loss 0.4101


  train_loss 0.5871 val_loss 0.4133 val_acc 0.8702
B4 Fold 4 Epoch 9/12


  iter 0/624 loss 0.2667


  iter 50/624 loss 1.1506


  iter 100/624 loss 0.8952


  iter 150/624 loss 0.8380


  iter 200/624 loss 0.6669


  iter 250/624 loss 0.8083


  iter 300/624 loss 0.4114


  iter 350/624 loss 1.1431


  iter 400/624 loss 1.0856


  iter 450/624 loss 0.8886


  iter 500/624 loss 0.4460


  iter 550/624 loss 0.3116


  iter 600/624 loss 1.1394


  train_loss 0.5790 val_loss 0.4241 val_acc 0.8721
B4 Fold 4 Epoch 10/12


  iter 0/624 loss 0.3049


  iter 50/624 loss 0.3819


  iter 100/624 loss 0.8468


  iter 150/624 loss 0.2959


  iter 200/624 loss 0.6345


  iter 250/624 loss 0.5052


  iter 300/624 loss 0.7154


  iter 350/624 loss 0.5091


  iter 400/624 loss 0.2679


  iter 450/624 loss 0.6548


  iter 500/624 loss 0.4368


  iter 550/624 loss 0.8087


  iter 600/624 loss 0.3114


  train_loss 0.5545 val_loss 0.4319 val_acc 0.8713
B4 Fold 4 Epoch 11/12


  iter 0/624 loss 1.1243


  iter 50/624 loss 0.3806


  iter 100/624 loss 0.2676


  iter 150/624 loss 0.2830


  iter 200/624 loss 0.7178


  iter 250/624 loss 0.2713


  iter 300/624 loss 0.3465


  iter 350/624 loss 0.9792


  iter 400/624 loss 0.3344


  iter 450/624 loss 0.8262


  iter 500/624 loss 0.4426


  iter 550/624 loss 0.3436


  iter 600/624 loss 0.2514


  train_loss 0.5505 val_loss 0.4370 val_acc 0.8694
B4 Fold 4 Epoch 12/12


  iter 0/624 loss 0.8564


  iter 50/624 loss 0.8863


  iter 100/624 loss 0.3461


  iter 150/624 loss 0.3093


  iter 200/624 loss 0.3103


  iter 250/624 loss 0.6685


  iter 300/624 loss 0.9020


  iter 350/624 loss 0.7570


  iter 400/624 loss 0.2982


  iter 450/624 loss 0.6664


  iter 500/624 loss 0.2918


  iter 550/624 loss 0.6268


  iter 600/624 loss 0.9666


  train_loss 0.5509 val_loss 0.4418 val_acc 0.8691
  Early stopping triggered


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


B4 Fold 4 best_acc 0.8721
Saved test logits to test_logits_tf_efficientnet_b4_ns_512.npy with shape (2676, 5)


In [None]:
# Ensemble utility: blend convnext and effnet logits to submission
import numpy as np, pandas as pd, shutil
from pathlib import Path

def make_df_test():
    return pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})

def blend_and_write_submission(weights=(0.5, 0.5),
                               cnx_path='test_logits_convnext_base_448.npy',
                               b4_path='test_logits_tf_efficientnet_b4_ns_512.npy',
                               out_csv='submission_blend.csv'):
    w0, w1 = weights
    assert abs(w0 + w1 - 1.0) < 1e-6, 'weights must sum to 1'
    paths = [cnx_path, b4_path]
    avail = [Path(p).exists() for p in paths]
    if not all(avail):
        missing = [p for p, ok in zip(paths, avail) if not ok]
        print('Missing logits:', missing);
        return None
    L0 = np.load(cnx_path)
    L1 = np.load(b4_path)
    assert L0.shape == L1.shape, f'Logit shapes differ: {L0.shape} vs {L1.shape}'
    logits = w0 * L0 + w1 * L1
    preds = logits.argmax(1).astype(int)
    df_test = make_df_test()
    assert len(df_test) == len(preds), f'Test length mismatch: {len(df_test)} vs {len(preds)}'
    sub = pd.DataFrame({'image_id': df_test['image_id'], 'label': preds})
    sub.to_csv(out_csv, index=False)
    print(f'Saved {out_csv} (weights={weights}) shape:', sub.shape)
    return out_csv

print('Ensembling cell ready. After both logits .npy exist, call blend_and_write_submission().')

# Auto-blend when both logits are present; also copy to submission.csv for grading
cnx_p = 'test_logits_convnext_base_448.npy'
b4_p = 'test_logits_tf_efficientnet_b4_ns_512.npy'
if Path(cnx_p).exists() and Path(b4_p).exists():
    out = blend_and_write_submission((0.5, 0.5), cnx_path=cnx_p, b4_path=b4_p, out_csv='submission_blend.csv')
    if out is not None:
        shutil.copyfile(out, 'submission.csv')
        print('Copied blend to submission.csv')
else:
    print('Blend not run yet: waiting for both logits .npy files to exist.')

In [7]:
# OOF-based probability blend with optional temperature scaling; writes final submission.csv
import numpy as np, pandas as pd, shutil, os
from pathlib import Path
from sklearn.metrics import log_loss, accuracy_score

def softmax_np(x):
    x = x - x.max(axis=1, keepdims=True)
    ex = np.exp(x)
    return ex / ex.sum(axis=1, keepdims=True)

def nll_loss(probs, y_true):
    probs = np.clip(probs, 1e-9, 1.0)
    return log_loss(y_true, probs, labels=[0,1,2,3,4])

def temperature_scale_logits(logits, T):
    return logits / float(T)

def find_temperature_on_oof(logits, y_true, T_grid=None):
    if T_grid is None:
        T_grid = [round(t,2) for t in np.arange(0.7, 1.81, 0.1)]
    best_T, best_nll = 1.0, 1e9
    for T in T_grid:
        probs = softmax_np(temperature_scale_logits(logits, T))
        nll = nll_loss(probs, y_true)
        if nll < best_nll:
            best_nll, best_T = nll, T
    return best_T, best_nll

def grid_search_weight(P0, P1, y_true, w_grid=None):
    if w_grid is None:
        # Include endpoints 0.0 (B4-only) and 1.0 (ConvNeXt-only)
        w_grid = [round(w,2) for w in np.arange(0.0, 1.0001, 0.01)]
    best_w, best_acc = 0.5, -1
    for w in w_grid:
        P = w * P0 + (1 - w) * P1
        preds = P.argmax(1)
        acc = accuracy_score(y_true, preds)
        if acc > best_acc:
            best_acc, best_w = acc, w
    return best_w, best_acc

def run_oof_tuned_blend_and_submit(
    cnx_oof_path='oof_logits_convnext_base_448.npy',
    b4_oof_path='oof_logits_tf_efficientnet_b4_ns_512.npy',
    cnx_test_path='test_logits_convnext_base_448.npy',
    b4_test_path='test_logits_tf_efficientnet_b4_ns_512.npy',
    do_temperature=True,
    out_csv='submission_blend_optimized.csv'
):
    paths = [cnx_oof_path, b4_oof_path, cnx_test_path, b4_test_path]
    missing = [p for p in paths if not Path(p).exists()]
    if missing:
        print('Missing artifacts, cannot blend yet:', missing)
        return None
    # Load
    L0_oof = np.load(cnx_oof_path)
    L1_oof = np.load(b4_oof_path)
    L0_test = np.load(cnx_test_path)
    L1_test = np.load(b4_test_path)
    assert L0_oof.shape == L1_oof.shape, f'OOF shapes mismatch: {L0_oof.shape} vs {L1_oof.shape}'
    assert L0_test.shape == L1_test.shape, f'Test shapes mismatch: {L0_test.shape} vs {L1_test.shape}'
    # True labels from current df (aligned with training order)
    y_true = df['label'].to_numpy().astype(int)
    assert len(y_true) == L0_oof.shape[0], 'OOF rows must match df length'
    # Optional temperature scaling per model (fit on OOF NLL)
    if do_temperature:
        T0, nll0 = find_temperature_on_oof(L0_oof, y_true)
        T1, nll1 = find_temperature_on_oof(L1_oof, y_true)
        print(f'ConvNeXt T*={T0} OOF-NLL={nll0:.4f}; B4 T*={T1} OOF-NLL={nll1:.4f}')
        L0_oof, L0_test = L0_oof / T0, L0_test / T0
        L1_oof, L1_test = L1_oof / T1, L1_test / T1
    # Convert to probabilities
    P0_oof = softmax_np(L0_oof)
    P1_oof = softmax_np(L1_oof)
    # Weight search on OOF accuracy
    w_best, acc_best = grid_search_weight(P0_oof, P1_oof, y_true)
    print(f'Best OOF blend weight w (convnext) = {w_best:.2f}, OOF-acc={acc_best:.5f}')
    # Apply to test
    P0_test = softmax_np(L0_test)
    P1_test = softmax_np(L1_test)
    P_test = w_best * P0_test + (1 - w_best) * P1_test
    preds = P_test.argmax(1).astype(int)
    df_test = pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})
    assert len(df_test) == len(preds), 'Test length mismatch'
    sub = pd.DataFrame({'image_id': df_test['image_id'], 'label': preds})
    sub.to_csv(out_csv, index=False)
    shutil.copyfile(out_csv, 'submission.csv')
    print('Saved', out_csv, 'and copied to submission.csv. Shape:', sub.shape)
    return out_csv, w_best, acc_best

print('OOF-tuned blending cell ready. After B4 finishes and npys exist, call:')
print("run_oof_tuned_blend_and_submit(out_csv='submission.csv')")

OOF-tuned blending cell ready. After B4 finishes and npys exist, call:
run_oof_tuned_blend_and_submit(out_csv='submission.csv')


In [10]:
# Execute OOF-tuned blend and write submission.csv
res = run_oof_tuned_blend_and_submit(out_csv='submission_blend_optimized.csv')
print('Blend result:', res)
check_submission_format('submission.csv')

ConvNeXt T*=1.0 OOF-NLL=0.3648; B4 T*=0.8 OOF-NLL=0.4138
Best OOF blend weight w (convnext) = 0.72, OOF-acc=0.89397
Saved submission_blend_optimized.csv and copied to submission.csv. Shape: (2676, 2)
Blend result: ('submission_blend_optimized.csv', 0.72, 0.8939693392446985)
Submission cols OK: True Labels int[0..4]: True Shape: (2676, 2)
Label value_counts: {3: 1669, 4: 337, 1: 279, 2: 257, 0: 134}


True

In [11]:
# Try multiple ensemble options (no-temp vs temp) and convnext-only; pick best by OOF acc; write submission.csv
import numpy as np, pandas as pd, shutil
from pathlib import Path

def eval_oof_acc_from_logits(logits, y_true):
    preds = logits.argmax(1)
    return (preds == y_true).mean()

def run_ensemble_search_and_submit():
    # Load artifacts
    L0_oof = np.load('oof_logits_convnext_base_448.npy')
    L1_oof = np.load('oof_logits_tf_efficientnet_b4_ns_512.npy')
    L0_test = np.load('test_logits_convnext_base_448.npy')
    L1_test = np.load('test_logits_tf_efficientnet_b4_ns_512.npy')
    y_true = df['label'].to_numpy().astype(int)
    # Baselines: single-model OOF accuracies
    acc_cnx = eval_oof_acc_from_logits(L0_oof, y_true)
    acc_b4  = eval_oof_acc_from_logits(L1_oof, y_true)
    print(f'OOF acc single: convnext={acc_cnx:.5f}, b4={acc_b4:.5f}')
    # Option A: no temperature
    P0_oof_nt = softmax_np(L0_oof); P1_oof_nt = softmax_np(L1_oof)
    wA, accA = grid_search_weight(P0_oof_nt, P1_oof_nt, y_true)
    print(f'No-temp best w={wA:.2f} OOF-acc={accA:.5f}')
    # Option B: with temperature (refit)
    T0, _ = find_temperature_on_oof(L0_oof, y_true);
    T1, _ = find_temperature_on_oof(L1_oof, y_true);
    P0_oof_t = softmax_np(L0_oof / T0); P1_oof_t = softmax_np(L1_oof / T1)
    wB, accB = grid_search_weight(P0_oof_t, P1_oof_t, y_true)
    print(f'Temp best w={wB:.2f} (T0={T0}, T1={T1}) OOF-acc={accB:.5f}')
    # Decide best option by OOF acc among: convnext-only, no-temp blend, temp blend
    choices = [
        ('convnext_only', acc_cnx, 1.0, False, 1.0, 1.0),
        ('blend_no_temp', accA, wA, False, 1.0, 1.0),
        ('blend_temp', accB, wB, True, T0, T1)
    ]
    choices.sort(key=lambda x: x[1], reverse=True)
    name_best, acc_best, w_best, use_temp, t0, t1 = choices[0]
    print(f'Chosen {name_best} with OOF-acc={acc_best:.5f}, w={w_best:.2f}, use_temp={use_temp}')
    # Build test probabilities per choice
    if name_best == 'convnext_only':
        preds = L0_test.argmax(1).astype(int)
        out_csv = 'submission_convnext_only.csv'
    else:
        if use_temp:
            P0_test = softmax_np(L0_test / t0)
            P1_test = softmax_np(L1_test / t1)
        else:
            P0_test = softmax_np(L0_test)
            P1_test = softmax_np(L1_test)
        P_test = w_best * P0_test + (1 - w_best) * P1_test
        preds = P_test.argmax(1).astype(int)
        out_csv = f'submission_{name_best}.csv'
    df_test = pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})
    sub = pd.DataFrame({'image_id': df_test['image_id'], 'label': preds})
    sub.to_csv(out_csv, index=False)
    shutil.copyfile(out_csv, 'submission.csv')
    print('Wrote', out_csv, 'and copied to submission.csv')
    check_submission_format('submission.csv')
    return {'choice': name_best, 'oof_acc': float(acc_best), 'w': float(w_best), 'use_temp': use_temp, 'T0': float(t0), 'T1': float(t1)}

print('Ensemble search cell ready. Call run_ensemble_search_and_submit() to regenerate submission.csv with the best OOF option.')

Ensemble search cell ready. Call run_ensemble_search_and_submit() to regenerate submission.csv with the best OOF option.


In [12]:
# Run ensemble search and regenerate submission.csv with best OOF option
res_search = run_ensemble_search_and_submit()
print('Search result:', res_search)

OOF acc single: convnext=0.89125, b4=0.86790
No-temp best w=0.66 OOF-acc=0.89386
Temp best w=0.72 (T0=1.0, T1=0.8) OOF-acc=0.89397
Chosen blend_temp with OOF-acc=0.89397, w=0.72, use_temp=True
Wrote submission_blend_temp.csv and copied to submission.csv
Submission cols OK: True Labels int[0..4]: True Shape: (2676, 2)
Label value_counts: {3: 1669, 4: 337, 1: 279, 2: 257, 0: 134}
Search result: {'choice': 'blend_temp', 'oof_acc': 0.8939693392446985, 'w': 0.72, 'use_temp': True, 'T0': 1.0, 'T1': 0.8}


In [13]:
# Generate stronger B4 test logits with extended TTA (base, hflip, vflip, rot±10) across 5 folds
import numpy as np, torch, timm, os, math, time
from pathlib import Path
import albumentations as A
from albumentations.pytorch import ToTensorV2
import cv2

def build_b4_for_infer(num_classes=5):
    m = timm.create_model('tf_efficientnet_b4_ns', pretrained=False, num_classes=num_classes)
    m = m.to('cuda' if torch.cuda.is_available() else 'cpu')
    if torch.cuda.is_available():
        m = m.to(memory_format=torch.channels_last)
    m.eval()
    return m

def make_infer_tfms(sz=512, angle=0):
    # Rotate around center before pad/normalize
    return A.Compose([
        A.LongestMaxSize(max_size=sz),
        A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
        A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),
        A.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
        ToTensorV2()
    ])

def infer_model_with_tta(model, df_test, img_dir, batch_size=32, size=512):
    device = next(model.parameters()).device
    # TTA plan: (angle, hflip, vflip) - add -10 with hflip for symmetry
    tta_specs = [
        (0, False, False),
        (0, True,  False),
        (0, False, True),
        (+10, False, False),
        (-10, False, False),
        (+10, True,  False),
        (-10, True,  False),
    ]
    logits_accum = None
    for (ang, hf, vf) in tta_specs:
        tfms = make_infer_tfms(size, angle=ang)
        ds = CassavaDataset(df_test[['image_id']].copy(), img_dir, transforms=tfms)
        dl = DataLoader(ds, batch_size=batch_size, shuffle=False, num_workers=CFG_B4.num_workers, pin_memory=True)
        part_logits = []
        with torch.no_grad():
            for x, ids in dl:
                if hf:
                    x = torch.flip(x, dims=[-1])
                if vf:
                    x = torch.flip(x, dims=[-2])
                x = x.to(device, non_blocking=True)
                if torch.cuda.is_available():
                    x = x.to(memory_format=torch.channels_last)
                with torch.amp.autocast('cuda', enabled=torch.cuda.is_available()):
                    lg = model(x)
                part_logits.append(lg.float().cpu().numpy())
        logits_tta = np.concatenate(part_logits, axis=0)
        if logits_accum is None:
            logits_accum = logits_tta
        else:
            logits_accum += logits_tta
    logits_mean = logits_accum / len(tta_specs)
    return logits_mean

def build_df_test():
    return pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})

def compute_b4_test_logits_extended_tta():
    df_test_local = build_df_test()
    fold_paths = [f'ckpt_tf_efficientnet_b4_ns_512_fold{i}.pth' for i in range(5)]
    for p in fold_paths:
        assert Path(p).exists(), f'Missing {p}'
    fold_logits = []
    start_all = time.time()
    for i, ckpt_path in enumerate(fold_paths):
        t0 = time.time()
        model = build_b4_for_infer(num_classes=5)
        state = torch.load(ckpt_path, map_location='cpu')
        model.load_state_dict(state, strict=True)
        lg = infer_model_with_tta(model, df_test_local, test_dir, batch_size=CFG_B4.batch_size, size=CFG_B4.img_size)
        fold_logits.append(lg)
        print(f'Fold {i} TTA inference done. Shape {lg.shape}. Elapsed {time.time()-t0:.1f}s')
        del model; torch.cuda.empty_cache()
    logits_mean = np.mean(fold_logits, axis=0)
    np.save('test_logits_tf_efficientnet_b4_ns_512_tta7.npy', logits_mean)
    # Also overwrite the default path so downstream blend uses improved logits without code changes
    np.save('test_logits_tf_efficientnet_b4_ns_512.npy', logits_mean)
    print('Saved enhanced B4 test logits to test_logits_tf_efficientnet_b4_ns_512_tta7.npy and updated default .npy. Total elapsed', f'{time.time()-start_all:.1f}s')
    return logits_mean

print('Extended TTA cell ready. Call compute_b4_test_logits_extended_tta() to refresh B4 test logits and then re-run the ensemble cell.')

Extended TTA cell ready. Call compute_b4_test_logits_extended_tta() to refresh B4 test logits and then re-run the ensemble cell.


In [17]:
# Run extended B4 TTA inference, then re-run ensemble search and regenerate submission.csv
res_tta = compute_b4_test_logits_extended_tta()
print('Enhanced B4 test logits shape:', res_tta.shape)
res_search2 = run_ensemble_search_and_submit()
print('Updated search result:', res_search2)

  model = create_fn(


  state = torch.load(ckpt_path, map_location='cpu')
  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


Fold 0 TTA inference done. Shape (2676, 5). Elapsed 63.3s


  model = create_fn(
  state = torch.load(ckpt_path, map_location='cpu')


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


Fold 1 TTA inference done. Shape (2676, 5). Elapsed 63.2s


  model = create_fn(
  state = torch.load(ckpt_path, map_location='cpu')


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


Fold 2 TTA inference done. Shape (2676, 5). Elapsed 63.2s


  model = create_fn(
  state = torch.load(ckpt_path, map_location='cpu')


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


Fold 3 TTA inference done. Shape (2676, 5). Elapsed 63.3s


  model = create_fn(


  state = torch.load(ckpt_path, map_location='cpu')
  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


  A.PadIfNeeded(min_height=sz, min_width=sz, border_mode=cv2.BORDER_CONSTANT, value=0),
  A.Rotate(limit=(angle, angle), border_mode=cv2.BORDER_CONSTANT, value=0, p=1.0) if angle != 0 else A.NoOp(),


Fold 4 TTA inference done. Shape (2676, 5). Elapsed 63.5s
Saved enhanced B4 test logits to test_logits_tf_efficientnet_b4_ns_512_tta6.npy and updated default .npy. Total elapsed 316.7s
Enhanced B4 test logits shape: (2676, 5)
OOF acc single: convnext=0.89125, b4=0.86790
No-temp best w=0.66 OOF-acc=0.89386


Temp best w=0.72 (T0=1.0, T1=0.8) OOF-acc=0.89397
Chosen blend_temp with OOF-acc=0.89397, w=0.72, use_temp=True
Wrote submission_blend_temp.csv and copied to submission.csv
Submission cols OK: True Labels int[0..4]: True Shape: (2676, 2)
Label value_counts: {3: 1675, 4: 337, 1: 279, 2: 250, 0: 135}
Updated search result: {'choice': 'blend_temp', 'oof_acc': 0.8939693392446985, 'w': 0.72, 'use_temp': True, 'T0': 1.0, 'T1': 0.8}


In [15]:
# Stacking: multinomial logistic regression on concatenated OOF logits; apply to test
import numpy as np, pandas as pd, shutil
from pathlib import Path
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

def stack_lr_on_oof_and_submit(
    cnx_oof='oof_logits_convnext_base_448.npy',
    b4_oof='oof_logits_tf_efficientnet_b4_ns_512.npy',
    cnx_test='test_logits_convnext_base_448.npy',
    b4_test='test_logits_tf_efficientnet_b4_ns_512.npy',
    C_grid=(0.2, 0.5, 1.0),
    max_iter=300,
    out_csv='submission_stack_lr.csv'
):
    if not (Path(cnx_oof).exists() and Path(b4_oof).exists() and Path(cnx_test).exists() and Path(b4_test).exists()):
        print('Missing npy artifacts for stacking.'); return None
    L0_oof = np.load(cnx_oof); L1_oof = np.load(b4_oof)
    L0_test = np.load(cnx_test); L1_test = np.load(b4_test)
    assert L0_oof.shape == L1_oof.shape and L0_test.shape == L1_test.shape, 'Shape mismatch'
    y = df['label'].to_numpy().astype(int)
    X_oof = np.hstack([L0_oof, L1_oof])
    X_test = np.hstack([L0_test, L1_test])
    # Standardize features
    scaler = StandardScaler(with_mean=True, with_std=True)
    X_oof_std = scaler.fit_transform(X_oof)
    X_test_std = scaler.transform(X_test)
    # CV to pick C
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    best_C, best_cv = None, -1.0
    for C in C_grid:
        accs = []
        for tr_idx, va_idx in skf.split(X_oof_std, y):
            Xtr, Xva = X_oof_std[tr_idx], X_oof_std[va_idx]
            ytr, yva = y[tr_idx], y[va_idx]
            clf = LogisticRegression(multi_class='multinomial', solver='lbfgs', C=C, max_iter=max_iter, n_jobs=None)
            clf.fit(Xtr, ytr)
            pva = clf.predict(Xva)
            accs.append(accuracy_score(yva, pva))
        cv_mean = float(np.mean(accs))
        print(f'C={C} CV-acc={cv_mean:.5f} fold-accs={accs}')
        if cv_mean > best_cv:
            best_cv, best_C = cv_mean, C
    print(f'Chosen C={best_C} with CV-acc={best_cv:.5f}')
    # Fit on all OOF and predict test
    clf = LogisticRegression(multi_class='multinomial', solver='lbfgs', C=best_C, max_iter=max_iter, n_jobs=None)
    clf.fit(X_oof_std, y)
    proba_test = clf.predict_proba(X_test_std)
    preds = proba_test.argmax(1).astype(int)
    df_test = pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})
    sub = pd.DataFrame({'image_id': df_test['image_id'], 'label': preds})
    sub.to_csv(out_csv, index=False)
    shutil.copyfile(out_csv, 'submission.csv')
    print('Stacking wrote', out_csv, 'and copied to submission.csv; shape:', sub.shape)
    check_submission_format('submission.csv')
    return {'C': best_C, 'cv_acc': best_cv}

print('Stacking cell ready. Call stack_lr_on_oof_and_submit() to try LR meta-ensemble and regenerate submission.csv.')

Stacking cell ready. Call stack_lr_on_oof_and_submit() to try LR meta-ensemble and regenerate submission.csv.


In [16]:
# Run stacking LR meta-ensemble and regenerate submission.csv
res_stack = stack_lr_on_oof_and_submit()
print('Stack result:', res_stack)











C=0.2 CV-acc=0.89376 fold-accs=[0.8878504672897196, 0.8987713675213675, 0.9019764957264957, 0.8966346153846154, 0.8835470085470085]












C=0.5 CV-acc=0.89349 fold-accs=[0.8878504672897196, 0.8985042735042735, 0.9017094017094017, 0.8961004273504274, 0.8832799145299145]












C=1.0 CV-acc=0.89376 fold-accs=[0.8881174899866489, 0.8987713675213675, 0.9022435897435898, 0.8961004273504274, 0.8835470085470085]
Chosen C=0.2 with CV-acc=0.89376




Stacking wrote submission_stack_lr.csv and copied to submission.csv; shape: (2676, 2)
Submission cols OK: True Labels int[0..4]: True Shape: (2676, 2)
Label value_counts: {3: 1675, 4: 320, 1: 282, 2: 264, 0: 135}
Stack result: {'C': 0.2, 'cv_acc': 0.8937559908938415}


In [19]:
# Final temp-scaled geometric-mean vs arithmetic blend with fine weight sweep; write submission.csv
import numpy as np, pandas as pd, shutil
from pathlib import Path

def row_normalize(P):
    P = np.clip(P, 1e-12, 1.0)
    P /= P.sum(axis=1, keepdims=True)
    return P

def geo_mean_blend(P0, P1, w):
    # P_final ∝ (P0^w) * (P1^(1-w))
    return row_normalize((P0 ** w) * (P1 ** (1.0 - w)))

def acc_from_probs(P, y_true):
    return (P.argmax(1) == y_true).mean(),

def final_geo_arith_blend_and_submit(
    cnx_oof='oof_logits_convnext_base_448.npy',
    b4_oof='oof_logits_tf_efficientnet_b4_ns_512.npy',
    cnx_test='test_logits_convnext_base_448.npy',
    b4_test='test_logits_tf_efficientnet_b4_ns_512.npy',
    T0=1.0, T1=0.8,
    w_lo=0.70, w_hi=0.75, w_step=0.005,
    fallback_acc=0.89397,
    out_prefix='submission_final_blend'
):
    paths = [cnx_oof, b4_oof, cnx_test, b4_test]
    miss = [p for p in paths if not Path(p).exists()]
    if miss:
        print('Missing artifacts:', miss); return None
    L0_oof = np.load(cnx_oof); L1_oof = np.load(b4_oof)
    L0_test = np.load(cnx_test); L1_test = np.load(b4_test)
    y = df['label'].to_numpy().astype(int)
    # Temp-scaled probabilities
    P0_oof = softmax_np(L0_oof / T0); P1_oof = softmax_np(L1_oof / T1)
    P0_test = softmax_np(L0_test / T0); P1_test = softmax_np(L1_test / T1)
    # Fine grid
    W = np.arange(w_lo, w_hi + 1e-9, w_step)
    best_geo = (-1.0, None)  # (acc, w)
    best_lin = (-1.0, None)
    for w in W:
        P_geo = geo_mean_blend(P0_oof, P1_oof, w)
        acc_geo = (P_geo.argmax(1) == y).mean()
        if acc_geo > best_geo[0]: best_geo = (float(acc_geo), float(w))
        P_lin = row_normalize(w * P0_oof + (1.0 - w) * P1_oof)
        acc_lin = (P_lin.argmax(1) == y).mean()
        if acc_lin > best_lin[0]: best_lin = (float(acc_lin), float(w))
    print(f'Geo-best: acc={best_geo[0]:.5f} w={best_geo[1]:.3f}; Lin-best: acc={best_lin[0]:.5f} w={best_lin[1]:.3f}')
    # Choose method
    acc_geo, w_geo = best_geo; acc_lin, w_lin = best_lin
    method = 'geo' if acc_geo >= acc_lin and acc_geo >= fallback_acc else ('lin' if acc_lin >= fallback_acc else ('geo' if acc_geo >= acc_lin else 'lin'))
    if method == 'geo':
        w_sel = w_geo
        # Tie-bias toward slightly lower w if extremely close
        if abs(acc_geo - acc_lin) < 1e-6 and w_sel > 0.72:
            w_sel = max(w_lo, w_sel - 0.01)
        P_test = geo_mean_blend(P0_test, P1_test, w_sel)
    else:
        w_sel = w_lin
        P_test = row_normalize(w_sel * P0_test + (1.0 - w_sel) * P1_test)
    preds = P_test.argmax(1).astype(int)
    df_test = pd.DataFrame({'image_id': sorted([p.name for p in Path(test_dir).glob('*.jpg')])})
    sub = pd.DataFrame({'image_id': df_test['image_id'], 'label': preds})
    out_csv = f'{out_prefix}_{method}_w{w_sel:.3f}.csv'
    sub.to_csv(out_csv, index=False)
    shutil.copyfile(out_csv, 'submission.csv')
    print(f'Wrote {out_csv} using method={method}, w={w_sel:.3f} (OOF geo={acc_geo:.5f}, lin={acc_lin:.5f}); copied to submission.csv')
    check_submission_format('submission.csv')
    return {'method': method, 'w': w_sel, 'acc_geo': acc_geo, 'acc_lin': acc_lin, 'out_csv': out_csv}

print('Final geo/arithmetic blend cell ready. Call final_geo_arith_blend_and_submit() after updating B4 TTA logits.')

Final geo/arithmetic blend cell ready. Call final_geo_arith_blend_and_submit() after updating B4 TTA logits.


In [20]:
# Final call: run geometric vs arithmetic blend with fine weight sweep and write submission.csv
res_final = final_geo_arith_blend_and_submit()
print('Final blend result:', res_final)

Geo-best: acc=0.89349 w=0.750; Lin-best: acc=0.89402 w=0.730
Wrote submission_final_blend_lin_w0.730.csv using method=lin, w=0.730 (OOF geo=0.89349, lin=0.89402); copied to submission.csv
Submission cols OK: True Labels int[0..4]: True Shape: (2676, 2)
Label value_counts: {3: 1674, 4: 337, 1: 280, 2: 250, 0: 135}
Final blend result: {'method': 'lin', 'w': 0.73, 'acc_geo': 0.8934885956946744, 'acc_lin': 0.8940227551947011, 'out_csv': 'submission_final_blend_lin_w0.730.csv'}
