# Plan: PetFinder Pawpularity (medal-oriented)

Objectives:
- Establish strong, reliable CV and fast baseline quickly.
- Progress to high-signal image models + metadata, then blend.
- Always log progress/time per fold; iterate with subsamples first.

Milestones:
1) Environment & Data Checks
   - Verify GPU (nvidia-smi) and install correct cu121 torch/vision if needed.
   - Read train.csv/test.csv; verify image paths; basic stats/target distribution.

2) Validation Protocol
   - 5-fold CV, stratify by Pawpularity bins (e.g., 10-bin or quantiles).
   - Single deterministic split saved to disk; reuse for all models.

3) Baselines
   - Tabular baseline: LightGBM/XGBoost on metadata (the 12 binary features) + simple image stats (width/height/aspect/brightness). Quick 5-fold RMSE.
   - CNN baseline: Pretrained EfficientNet-B0 or ConvNeXt-Tiny at 256–320 px, RMSE loss, simple aug, 5-fold, infer TTA=off initially.

4) Improved Vision Model
   - Upgrade backbone (e.g., Swin-T/ConvNeXt-S/EfficientNetV2-S), 384–448 px, strong aug (RandomResizedCrop, HFlip, ColorJitter, CutMix/MixUp optional).
   - Cosine LR, warmup, EMA, label smoothing, early stopping by OOF.
   - Use per-fold checkpointing; predict averaged epochs near best.

5) Metadata Fusion
   - Late-fusion head: concat CNN pooled embedding + normalized metadata; small MLP head.
   - Compare vs training CNN on images only and blending with tabular model.

6) Ensembling
   - Blend diverse backbones/resolutions/seeds; weight by OOF.
   - Keep TTA modest (e.g., 4 flips) if helps.

7) Error Analysis
   - Bucket by target bins and brightness/aspect; adjust aug/resolution/center-crop policy.

Targets:
- Baseline tabular: RMSE ~19–20
- Baseline CNN small: ~17.5–17.8
- Upgraded model + fusion/ensemble: ≤17.1 (bronze) with aim for ≤17.0 (silver).

Next Steps:
- Expert review on plan.
- Add GPU/env check cell; install PyTorch cu121 if needed.
- Implement CV splitter and quick tabular baseline (with simple image stats).
- Implement CNN baseline script with logging; run 5-fold at 256–320 px.
- Iterate toward fusion/ensembling.

In [1]:
# Environment and data sanity checks
import os, sys, json, time, subprocess, pandas as pd

def run(cmd):
    print('$', ' '.join(cmd), flush=True)
    return subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True).stdout

print('Python:', sys.version)
print('CWD:', os.getcwd())

# 1) GPU check
print('--- nvidia-smi ---')
print(run(['bash','-lc','nvidia-smi || true']))

# 2) Dataset presence and counts
train_csv = 'train.csv'
test_csv = 'test.csv'
train_dir = 'train'
test_dir = 'test'
assert os.path.exists(train_csv) and os.path.exists(test_csv), 'CSV files missing'
assert os.path.isdir(train_dir) and os.path.isdir(test_dir), 'Image directories missing'

df_train = pd.read_csv(train_csv)
df_test = pd.read_csv(test_csv)
print('train.csv shape:', df_train.shape)
print('test.csv shape:', df_test.shape)
print('train columns:', list(df_train.columns))
print('test columns:', list(df_test.columns))

# Verify image files exist for a quick sample
missing_train = 0
for i, row in df_train.head(50).iterrows():
    fp = os.path.join(train_dir, f"{row['Id']}.jpg")
    if not os.path.exists(fp):
        missing_train += 1
missing_test = 0
for i, row in df_test.head(50).iterrows():
    fp = os.path.join(test_dir, f"{row['Id']}.jpg")
    if not os.path.exists(fp):
        missing_test += 1
print('Sample check missing files -> train:', missing_train, 'test:', missing_test)

# Basic target stats
if 'Pawpularity' in df_train.columns:
    print('Pawpularity describe (first look):')
    print(df_train['Pawpularity'].describe())

print('Setup OK. Next: lock CV splits and install PyTorch cu121 stack if needed.')

Python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
CWD: /var/lib/simon/agent_run_states/petfinder-pawpularity-score-20250927-162350
--- nvidia-smi ---
$ bash -lc nvidia-smi || true


Sat Sep 27 16:32:07 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     182MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
# Install exact cu121 PyTorch stack and verify GPU
import os, sys, shutil, subprocess
from pathlib import Path

def pip(*args):
    print('>', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

# 0) Uninstall any existing torch stacks (ignore errors)
for pkg in ('torch','torchvision','torchaudio'):
    subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', pkg], check=False)

# Clean possible stray site dirs that can shadow correct wheels (idempotent)
for d in (
    '/app/.pip-target/torch',
    '/app/.pip-target/torch-2.8.0.dist-info',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchvision-0.23.0.dist-info',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torchaudio-2.8.0.dist-info',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
    '/app/.pip-target/torchgen',
    '/app/.pip-target/functorch',
):
    if os.path.exists(d):
        print('Removing', d)
        shutil.rmtree(d, ignore_errors=True)

# 1) Install the EXACT cu121 torch stack
pip('install',
    '--index-url', 'https://download.pytorch.org/whl/cu121',
    '--extra-index-url', 'https://pypi.org/simple',
    'torch==2.4.1', 'torchvision==0.19.1', 'torchaudio==2.4.1')

# 2) Freeze versions for later installs
Path('constraints.txt').write_text(
    'torch==2.4.1\n'
    'torchvision==0.19.1\n'
    'torchaudio==2.4.1\n'
)

# 3) Sanity check
import torch
print('torch:', torch.__version__, 'built CUDA:', getattr(torch.version, 'cuda', None))
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))
assert str(getattr(torch.version,'cuda','')).startswith('12.1'), f'Wrong CUDA build: {torch.version.cuda}'
assert torch.cuda.is_available(), 'CUDA not available'
print('PyTorch cu121 stack installed and verified.')





> install --index-url https://download.pytorch.org/whl/cu121 --extra-index-url https://pypi.org/simple torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1




Looking in indexes: https://download.pytorch.org/whl/cu121, https://pypi.org/simple


Collecting torch==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torch-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (799.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 MB 265.5 MB/s eta 0:00:00


Collecting torchvision==0.19.1
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.19.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 269.7 MB/s eta 0:00:00


Collecting torchaudio==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 126.9 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1


  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 206.8 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 244.3 MB/s eta 0:00:00


Collecting typing-extensions>=4.8.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 412.3 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 205.6 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 314.1 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 250.9 MB/s eta 0:00:00
Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 290.5 MB/s eta 0:00:00


Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 237.2 MB/s eta 0:00:00


Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 237.2 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 475.7 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 261.6 MB/s eta 0:00:00


Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 267.4 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 235.5 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 234.8 MB/s eta 0:00:00


Collecting fsspec
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 496.7 MB/s eta 0:00:00


Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 469.7 MB/s eta 0:00:00


Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 494.7 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 251.3 MB/s eta 0:00:00


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 252.1 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 254.1 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 500.4 MB/s eta 0:00:00


Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio


Successfully installed MarkupSafe-3.0.2 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.3.0 sympy-1.14.0 torch-2.4.1+cu121 torchaudio-2.4.1+cu121 torchvision-0.19.1+cu121 triton-3.0.0 typing-extensions-4.15.0


torch: 2.4.1+cu121 built CUDA: 12.1
CUDA available: True
GPU: NVIDIA A10-24Q
PyTorch cu121 stack installed and verified.


In [3]:
# Create and save 5-fold stratified CV splits (qcut=10 bins)
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold

SEED = 42
N_FOLDS = 5
N_BINS = 10

df = pd.read_csv('train.csv')

# Create stratification bins on target
bins = pd.qcut(df['Pawpularity'], q=N_BINS, labels=False, duplicates='drop')
df['bin'] = bins

skf = StratifiedKFold(n_splits=N_FOLDS, shuffle=True, random_state=SEED)
folds = np.full(len(df), -1, dtype=int)

for fold, (trn_idx, val_idx) in enumerate(skf.split(df, df['bin'])):
    folds[val_idx] = fold

df['fold'] = folds
assert (df['fold']>=0).all(), 'Unassigned fold rows exist'

# Save splits to disk for reuse
cols_to_save = ['Id','Pawpularity','fold'] + [c for c in df.columns if c not in ['Id','Pawpularity','fold','bin']]
df[cols_to_save].to_csv('train_folds.csv', index=False)
df[['Id','fold']].to_csv('folds.csv', index=False)

# Log fold sizes and target means
print('Saved train_folds.csv and folds.csv')
for f in range(N_FOLDS):
    m = df.loc[df['fold']==f, 'Pawpularity'].mean()
    n = (df['fold']==f).sum()
    print(f'Fold {f}: n={n}, mean Pawpularity={m:.3f}')

# Clean up helper column
df.drop(columns=['bin'], inplace=True)

Saved train_folds.csv and folds.csv
Fold 0: n=1784, mean Pawpularity=38.078
Fold 1: n=1784, mean Pawpularity=38.007
Fold 2: n=1784, mean Pawpularity=38.016
Fold 3: n=1784, mean Pawpularity=38.105
Fold 4: n=1784, mean Pawpularity=37.917


In [4]:
# Tabular baseline: metadata + simple image stats with XGBoost (OOF RMSE + submission)
import os, sys, time, math, json, shutil, subprocess, gc
import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from PIL import Image, ImageOps

def ensure_pkg(pkg, pip_name=None):
    pip_name = pip_name or pkg
    try:
        __import__(pkg)
    except ImportError:
        print(f'Installing {pip_name} ...', flush=True)
        subprocess.run([sys.executable, '-m', 'pip', 'install', '-c', 'constraints.txt', pip_name, '--upgrade-strategy', 'only-if-needed'], check=True)

ensure_pkg('xgboost')
ensure_pkg('cv2', 'opencv-python-headless')
from tqdm import tqdm
import xgboost as xgb
import cv2

t0 = time.time()
train_dir, test_dir = 'train', 'test'
df_train = pd.read_csv('train_folds.csv')  # contains fold column
df_test = pd.read_csv('test.csv')

# Compute/cached simple image stats
stats_path_train = Path('img_stats_train.parquet')
stats_path_test = Path('img_stats_test.parquet')

def compute_stats(df, img_dir, out_path):
    if out_path.exists():
        return pd.read_parquet(out_path)
    rows = []
    for _, r in tqdm(df.iterrows(), total=len(df), desc=f'img-stats {img_dir}'):
        img_fp = os.path.join(img_dir, f"{r['Id']}.jpg")
        try:
            img = Image.open(img_fp)
            img = ImageOps.exif_transpose(img)
            w, h = img.size
            arr = np.asarray(img.convert('RGB'))
            gray = cv2.cvtColor(arr, cv2.COLOR_RGB2GRAY)
            mean = float(gray.mean())
            std = float(gray.std())
            lap = cv2.Laplacian(gray, cv2.CV_64F)
            sharp = float(lap.var())
            rows.append({'Id': r['Id'], 'width': w, 'height': h, 'aspect': w/(h+1e-6), 'brightness_mean': mean, 'brightness_std': std, 'sharpness': sharp})
        except Exception as e:
            rows.append({'Id': r['Id'], 'width': np.nan, 'height': np.nan, 'aspect': np.nan, 'brightness_mean': np.nan, 'brightness_std': np.nan, 'sharpness': np.nan})
    out = pd.DataFrame(rows)
    out.to_parquet(out_path, index=False)
    return out

stats_tr = compute_stats(df_train, train_dir, stats_path_train)
stats_te = compute_stats(df_test, test_dir, stats_path_test)

# Merge features
meta_cols = ['Subject Focus','Eyes','Face','Near','Action','Accessory','Group','Collage','Human','Occlusion','Info','Blur']
feat_cols = meta_cols + ['width','height','aspect','brightness_mean','brightness_std','sharpness']
df_tr = df_train.merge(stats_tr, on='Id', how='left')
df_te = df_test.merge(stats_te, on='Id', how='left')

# Fill NaNs and scale numeric (image stats); keep binaries as-is
num_cols = ['width','height','aspect','brightness_mean','brightness_std','sharpness']
scaler = StandardScaler()
df_tr[num_cols] = scaler.fit_transform(df_tr[num_cols])
df_te[num_cols] = scaler.transform(df_te[num_cols])

X = df_tr[feat_cols].astype(float).values
y = df_tr['Pawpularity'].values.astype(float)
folds = df_tr['fold'].values
X_test = df_te[feat_cols].astype(float).values

# XGBoost params
use_gpu = True
params = {
    'objective': 'reg:squarederror',
    'eval_metric': 'rmse',
    'learning_rate': 0.03,
    'max_depth': 6,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'lambda': 1.0,
    'tree_method': 'gpu_hist' if use_gpu else 'hist',
}

oof = np.zeros(len(df_tr), dtype=float)
pred_test = np.zeros(len(df_te), dtype=float)
nrounds = 2000
esr = 100

for f in range(5):
    tr_idx = np.where(folds != f)[0]
    va_idx = np.where(folds == f)[0]
    dtr = xgb.DMatrix(X[tr_idx], label=y[tr_idx])
    dva = xgb.DMatrix(X[va_idx], label=y[va_idx])
    dte = xgb.DMatrix(X_test)
    start = time.time()
    print(f'[Fold {f}] train={len(tr_idx)} val={len(va_idx)}', flush=True)
    booster = xgb.train(params, dtr, num_boost_round=nrounds, evals=[(dtr,'train'),(dva,'valid')], early_stopping_rounds=esr, verbose_eval=100)
    oof[va_idx] = booster.predict(dva)
    pred_test += booster.predict(dte) / 5.0
    elapsed = time.time() - start
    rmse = mean_squared_error(y[va_idx], oof[va_idx], squared=False)
    print(f'[Fold {f}] RMSE={rmse:.4f} elapsed={elapsed:.1f}s', flush=True)
    del dtr, dva, dte, booster; gc.collect()

oof_rmse = mean_squared_error(y, oof, squared=False)
print(f'OOF RMSE: {oof_rmse:.4f}')

# Clip predictions and save artifacts
oof_clip = np.clip(oof, 0, 100)
pred_test_clip = np.clip(pred_test, 0, 100)
pd.DataFrame({'Id': df_tr['Id'], 'oof': oof_clip, 'target': y, 'fold': folds}).to_csv('oof_tabular_xgb.csv', index=False)
sub = pd.DataFrame({'Id': df_te['Id'], 'Pawpularity': pred_test_clip})
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv, shape:', sub.shape, 'head:', sub.head().to_dict('records'))
print('Total time:', f'{time.time()-t0:.1f}s')

Installing opencv-python-headless ...


Collecting opencv-python-headless
  Downloading opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (54.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.0/54.0 MB 230.9 MB/s eta 0:00:00


  Downloading opencv_python_headless-4.11.0.86-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (50.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.0/50.0 MB 164.5 MB/s eta 0:00:00
Collecting numpy>=1.21.2
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 286.7 MB/s eta 0:00:00


Installing collected packages: numpy, opencv-python-headless


Successfully installed numpy-1.26.4 opencv-python-headless-4.11.0.86




img-stats train:   0%|          | 0/8920 [00:00<?, ?it/s]

img-stats train:   0%|          | 13/8920 [00:00<01:13, 121.05it/s]

img-stats train:   0%|          | 26/8920 [00:00<01:13, 120.29it/s]

[Fold 0] train=7136 val=1784


[0]	train-rmse:20.63620	valid-rmse:20.58708



    E.g. tree_method = "hist", device = "cuda"



[100]	train-rmse:19.05607	valid-rmse:20.55517


[134]	train-rmse:18.66589	valid-rmse:20.59224


[Fold 0] RMSE=20.5922 elapsed=0.4s


[Fold 1] train=7136 val=1784


[0]	train-rmse:20.62497	valid-rmse:20.65140



    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[100]	train-rmse:19.17051	valid-rmse:20.64301


[158]	train-rmse:18.49144	valid-rmse:20.65147


[Fold 1] RMSE=20.6573 elapsed=0.3s


[Fold 2] train=7136 val=1784


[0]	train-rmse:20.57302	valid-rmse:20.82708



    E.g. tree_method = "hist", device = "cuda"



[100]	train-rmse:19.11092	valid-rmse:20.88907


[105]	train-rmse:19.05176	valid-rmse:20.89002


[Fold 2] RMSE=20.8901 elapsed=0.2s


[Fold 3] train=7136 val=1784


[0]	train-rmse:20.56492	valid-rmse:20.85816



    E.g. tree_method = "hist", device = "cuda"



[100]	train-rmse:19.00888	valid-rmse:20.82457


[149]	train-rmse:18.47998	valid-rmse:20.85148


[Fold 3] RMSE=20.8515 elapsed=0.3s


[Fold 4] train=7136 val=1784


[0]	train-rmse:20.71184	valid-rmse:20.28531



    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"



[100]	train-rmse:19.15359	valid-rmse:20.34200


[118]	train-rmse:18.94411	valid-rmse:20.36454


[Fold 4] RMSE=20.3645 elapsed=0.3s


OOF RMSE: 20.6720
Saved submission.csv, shape: (992, 2) head: [{'Id': 'ee51b99832f1ba868f646df93d2b6b81', 'Pawpularity': 37.957308769226074}, {'Id': 'caddfb3f8bff9c4b95dbe022018eea21', 'Pawpularity': 38.96543836593628}, {'Id': '582eeabd4a448a53ebb79995888a4b0b', 'Pawpularity': 37.32875967025757}, {'Id': 'afc1ad7f0c5eea880759d09e77f7deee', 'Pawpularity': 38.45194959640503}, {'Id': 'd5bdf3446e86ce4ec67ce7a00f1cccc2', 'Pawpularity': 37.12950944900513}]
Total time: 74.6s




In [6]:
# Install CNN deps (pinned, honoring torch constraints) without touching torch stack
import sys, subprocess

def pip_install_no_deps(pkgs):
    cmd = [sys.executable, '-m', 'pip', 'install', '-c', 'constraints.txt', '--upgrade-strategy', 'only-if-needed', '--no-deps'] + pkgs
    print('>', ' '.join(cmd), flush=True)
    subprocess.run(cmd, check=True)

# Install timm and albumentations without pulling torch/torchvision again
pip_install_no_deps(['timm==1.0.9', 'albumentations==1.4.14'])
print('Installed timm and albumentations (no-deps). Next: implement train_cnn.py and run a 1-fold smoke.')

> /usr/bin/python3.11 -m pip install -c constraints.txt --upgrade-strategy only-if-needed --no-deps timm==1.0.9 albumentations==1.4.14


Collecting timm==1.0.9
  Downloading timm-1.0.9-py3-none-any.whl (2.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 61.4 MB/s eta 0:00:00
Collecting albumentations==1.4.14
  Downloading albumentations-1.4.14-py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.0/178.0 KB 323.2 MB/s eta 0:00:00
Installing collected packages: timm, albumentations


Successfully installed albumentations-1.4.14 timm-1.0.9
Installed timm and albumentations (no-deps). Next: implement train_cnn.py and run a 1-fold smoke.


In [7]:
# Install missing deps for albumentations
import sys, subprocess

def pip_install(pkgs):
    cmd = [sys.executable, '-m', 'pip', 'install', '-c', 'constraints.txt', '--upgrade-strategy', 'only-if-needed'] + pkgs
    print('>', ' '.join(cmd), flush=True)
    subprocess.run(cmd, check=True)

pip_install(['albucore==0.0.33', 'imageio==2.37.0', 'scikit-image==0.25.2'])
print('Installed albucore, imageio, scikit-image.')

> /usr/bin/python3.11 -m pip install -c constraints.txt --upgrade-strategy only-if-needed albucore==0.0.33 imageio==2.37.0 scikit-image==0.25.2


Collecting albucore==0.0.33
  Downloading albucore-0.0.33-py3-none-any.whl (18 kB)
Collecting imageio==2.37.0
  Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.8/315.8 KB 14.6 MB/s eta 0:00:00


Collecting scikit-image==0.25.2
  Downloading scikit_image-0.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/14.8 MB 275.1 MB/s eta 0:00:00


Collecting simsimd>=5.9.2
  Downloading simsimd-6.5.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 133.8 MB/s eta 0:00:00


Collecting stringzilla>=3.10.4
  Downloading stringzilla-4.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (496 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 496.5/496.5 KB 493.3 MB/s eta 0:00:00
Collecting opencv-python-headless>=4.9.0.80
  Downloading opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (54.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.0/54.0 MB 553.0 MB/s eta 0:00:00


Collecting numpy>=1.24.4
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 370.4 MB/s eta 0:00:00


Collecting pillow>=8.3.2
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 298.0 MB/s eta 0:00:00
Collecting tifffile>=2022.8.12
  Downloading tifffile-2025.9.20-py3-none-any.whl (230 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.1/230.1 KB 493.0 MB/s eta 0:00:00


Collecting scipy>=1.11.4
  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 244.9 MB/s eta 0:00:00
Collecting lazy-loader>=0.4
  Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)


Collecting packaging>=21
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 406.4 MB/s eta 0:00:00
Collecting networkx>=3.0
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 552.3 MB/s eta 0:00:00
Collecting opencv-python-headless>=4.9.0.80
  Downloading opencv_python_headless-4.11.0.86-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (50.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.0/50.0 MB 233.8 MB/s eta 0:00:00


Installing collected packages: simsimd, stringzilla, pillow, packaging, numpy, networkx, tifffile, scipy, opencv-python-headless, lazy-loader, imageio, scikit-image, albucore


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
timm 1.0.9 requires huggingface_hub, which is not installed.
timm 1.0.9 requires safetensors, which is not installed.
albumentations 1.4.14 requires eval-type-backport, which is not installed.


Successfully installed albucore-0.0.33 imageio-2.37.0 lazy-loader-0.4 networkx-3.5 numpy-1.26.4 opencv-python-headless-4.11.0.86 packaging-25.0 pillow-11.3.0 scikit-image-0.25.2 scipy-1.16.2 simsimd-6.5.3 stringzilla-4.0.14 tifffile-2025.9.20




Installed albucore, imageio, scikit-image.


In [13]:
# Run 1-fold 1-epoch smoke for CNN pipeline
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'tf_efficientnetv2_s_in21k',
       '--img_size', '384', '--batch_size', '32',
       '--use_fusion', '--tta_hflip', '--smoke', '--fold', '0',
       '--num_workers', '4']
print('Launching:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{time.time()-t0:.1f}s', flush=True)
assert proc.returncode == 0, 'train_cnn.py smoke run failed'

Launching: /usr/bin/python3.11 train_cnn.py --model tf_efficientnetv2_s_in21k --img_size 384 --batch_size 32 --use_fusion --tta_hflip --smoke --fold 0 --num_workers 4


  model = create_fn(


  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=1


  ckpt = torch.load(best_path, map_location=device)


Epoch 1/1 | loss=0.25296 | val_rmse=32.9759 | lr=[3e-05, 0.001] | time=47.1s
Fold 0 best RMSE: 32.9759 | elapsed 0.8 min | ckpt: work_cnn/tf_efficientnetv2_s_in21k_fold0_best.pth






Saved OOF to work_cnn/oof_tf_efficientnetv2_s_in21k_fusion1_sz384.csv OOF RMSE: 41.32097792064362
Saved test predictions to work_cnn/submission_tf_efficientnetv2_s_in21k_fusion1_sz384.csv shape: (992, 2)


Exit code: 0 Elapsed: 64.8s


In [12]:
# Install missing deps required by timm pretrained weights (HF hub + safetensors)
import sys, subprocess

def pip_install(pkgs):
    cmd = [sys.executable, '-m', 'pip', 'install', '-c', 'constraints.txt', '--upgrade-strategy', 'only-if-needed'] + pkgs
    print('>', ' '.join(cmd), flush=True)
    subprocess.run(cmd, check=True)

pip_install(['huggingface_hub==0.24.6', 'safetensors==0.4.5'])
print('Installed huggingface_hub and safetensors.')

> /usr/bin/python3.11 -m pip install -c constraints.txt --upgrade-strategy only-if-needed huggingface_hub==0.24.6 safetensors==0.4.5


Collecting huggingface_hub==0.24.6
  Downloading huggingface_hub-0.24.6-py3-none-any.whl (417 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 417.5/417.5 KB 17.9 MB/s eta 0:00:00


Collecting safetensors==0.4.5
  Downloading safetensors-0.4.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (435 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 435.0/435.0 KB 101.9 MB/s eta 0:00:00
Collecting typing-extensions>=3.7.4.3
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 388.7 MB/s eta 0:00:00
Collecting fsspec>=2023.5.0
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 425.8 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 315.2 MB/s eta 0:00:00
Collecting tqdm>=4.42.1
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 115.1 MB/s eta 0:00:00
Collecting pyyaml>=5.1
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 134.1 MB/s eta 0:00:00
Collecting packaging>=20.9
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 433.2 MB/s eta 0:00:00


Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 497.8 MB/s eta 0:00:00
Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 467.0 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 464.4 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 388.3 MB/s eta 0:00:00


Installing collected packages: urllib3, typing-extensions, tqdm, safetensors, pyyaml, packaging, idna, fsspec, filelock, charset_normalizer, certifi, requests, huggingface_hub


Successfully installed certifi-2025.8.3 charset_normalizer-3.4.3 filelock-3.19.1 fsspec-2025.9.0 huggingface_hub-0.24.6 idna-3.10 packaging-25.0 pyyaml-6.0.3 requests-2.32.5 safetensors-0.4.5 tqdm-4.67.1 typing-extensions-4.15.0 urllib3-2.5.0
Installed huggingface_hub and safetensors.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albumentations 1.4.14 requires eval-type-backport, which is not installed.


In [14]:
# Full 5-fold training: EfficientNetV2-S (384px) with fusion, EMA, AMP
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'tf_efficientnetv2_s_in21k',
       '--img_size', '384',
       '--batch_size', '32',
       '--epochs', '12',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '3e-5',
       '--lr_head', '1e-3',
       '--weight_decay', '1e-4',
       '--num_workers', '4',
       '--seed', '42',
       '--tta_hflip',
       '--use_fusion'
]
print('Launching full run:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0)/60:.1f} min', flush=True)
assert proc.returncode == 0, 'EffNetV2-S full run failed'

Launching full run: /usr/bin/python3.11 train_cnn.py --model tf_efficientnetv2_s_in21k --img_size 384 --batch_size 32 --epochs 12 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 3e-5 --lr_head 1e-3 --weight_decay 1e-4 --num_workers 4 --seed 42 --tta_hflip --use_fusion


  model = create_fn(


  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.25296 | val_rmse=32.9759 | lr=[3e-05, 0.001] | time=47.1s




Epoch 2/12 | loss=0.03185 | val_rmse=21.2210 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=47.2s




Epoch 3/12 | loss=0.02365 | val_rmse=21.7529 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=46.7s




Epoch 4/12 | loss=0.01577 | val_rmse=21.2814 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=47.0s




Epoch 5/12 | loss=0.01114 | val_rmse=20.4736 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=46.8s




Epoch 6/12 | loss=0.00855 | val_rmse=19.9937 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=47.0s




Epoch 7/12 | loss=0.00681 | val_rmse=19.7529 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=47.0s




Epoch 8/12 | loss=0.00548 | val_rmse=19.5623 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=47.2s




Epoch 9/12 | loss=0.00492 | val_rmse=19.4418 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=47.2s




Epoch 10/12 | loss=0.00448 | val_rmse=19.3542 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=47.2s




Epoch 11/12 | loss=0.00409 | val_rmse=19.3217 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=47.2s




  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00407 | val_rmse=19.3081 | lr=[0.0, 0.0] | time=47.1s
Fold 0 best RMSE: 19.3081 | elapsed 9.5 min | ckpt: work_cnn/tf_efficientnetv2_s_in21k_fold0_best.pth




  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 1 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.24713 | val_rmse=30.3625 | lr=[3e-05, 0.001] | time=47.5s




Epoch 2/12 | loss=0.03112 | val_rmse=22.5394 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=47.8s




Epoch 3/12 | loss=0.02212 | val_rmse=24.5014 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=48.2s




Epoch 4/12 | loss=0.01527 | val_rmse=23.2856 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=48.0s




Epoch 5/12 | loss=0.01027 | val_rmse=21.8691 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=47.9s




Epoch 6/12 | loss=0.00803 | val_rmse=20.8948 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=47.8s




Epoch 7/12 | loss=0.00647 | val_rmse=20.3047 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=47.9s




Epoch 8/12 | loss=0.00549 | val_rmse=19.9234 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=47.8s




Epoch 9/12 | loss=0.00465 | val_rmse=19.7014 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=47.8s




Epoch 10/12 | loss=0.00446 | val_rmse=19.5674 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=48.1s




Epoch 11/12 | loss=0.00407 | val_rmse=19.4945 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=48.2s




  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00417 | val_rmse=19.4575 | lr=[0.0, 0.0] | time=47.7s
Fold 1 best RMSE: 19.4575 | elapsed 9.6 min | ckpt: work_cnn/tf_efficientnetv2_s_in21k_fold1_best.pth




  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 2 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.24227 | val_rmse=33.1864 | lr=[3e-05, 0.001] | time=47.3s




Epoch 2/12 | loss=0.03182 | val_rmse=20.5112 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=47.6s




Epoch 3/12 | loss=0.02315 | val_rmse=21.3036 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=47.5s




Epoch 4/12 | loss=0.01503 | val_rmse=20.7755 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=47.7s




Epoch 5/12 | loss=0.01041 | val_rmse=20.0621 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=47.8s




Epoch 6/12 | loss=0.00838 | val_rmse=19.6355 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=47.4s




Epoch 7/12 | loss=0.00642 | val_rmse=19.4487 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=48.0s




Epoch 8/12 | loss=0.00558 | val_rmse=19.3533 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=47.7s




Epoch 9/12 | loss=0.00427 | val_rmse=19.3188 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=48.1s




Epoch 10/12 | loss=0.00423 | val_rmse=19.3165 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=47.7s




Epoch 11/12 | loss=0.00414 | val_rmse=19.3243 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=47.6s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00372 | val_rmse=19.3413 | lr=[0.0, 0.0] | time=47.5s
Fold 2 best RMSE: 19.3165 | elapsed 9.6 min | ckpt: work_cnn/tf_efficientnetv2_s_in21k_fold2_best.pth




  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 3 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.23663 | val_rmse=33.9669 | lr=[3e-05, 0.001] | time=47.4s




Epoch 2/12 | loss=0.03163 | val_rmse=20.5249 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=47.9s




Epoch 3/12 | loss=0.02307 | val_rmse=20.8392 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=47.9s




Epoch 4/12 | loss=0.01497 | val_rmse=20.4786 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=47.9s




Epoch 5/12 | loss=0.01034 | val_rmse=19.9334 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=47.6s




Epoch 6/12 | loss=0.00827 | val_rmse=19.6357 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=47.9s




Epoch 7/12 | loss=0.00649 | val_rmse=19.4861 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=48.0s




Epoch 8/12 | loss=0.00550 | val_rmse=19.3628 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=47.2s




Epoch 9/12 | loss=0.00468 | val_rmse=19.3036 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=47.7s




Epoch 10/12 | loss=0.00424 | val_rmse=19.2630 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=47.4s




Epoch 11/12 | loss=0.00391 | val_rmse=19.2323 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=47.5s




  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00403 | val_rmse=19.2201 | lr=[0.0, 0.0] | time=47.5s
Fold 3 best RMSE: 19.2201 | elapsed 9.6 min | ckpt: work_cnn/tf_efficientnetv2_s_in21k_fold3_best.pth




  model = create_fn(


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 4 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.26413 | val_rmse=32.7868 | lr=[3e-05, 0.001] | time=47.7s




Epoch 2/12 | loss=0.03208 | val_rmse=20.5088 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=47.4s




Epoch 3/12 | loss=0.02356 | val_rmse=20.0679 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=47.9s




Epoch 4/12 | loss=0.01537 | val_rmse=19.7557 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=47.9s




Epoch 5/12 | loss=0.01095 | val_rmse=19.4509 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=47.9s




Epoch 6/12 | loss=0.00877 | val_rmse=19.2358 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=47.7s




Epoch 7/12 | loss=0.00700 | val_rmse=19.1104 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=47.5s




Epoch 8/12 | loss=0.00585 | val_rmse=19.0249 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=47.4s




Epoch 9/12 | loss=0.00484 | val_rmse=18.9706 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=47.8s




Epoch 10/12 | loss=0.00462 | val_rmse=18.9674 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=47.9s




Epoch 11/12 | loss=0.00409 | val_rmse=18.9705 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=47.6s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00426 | val_rmse=18.9809 | lr=[0.0, 0.0] | time=47.7s
Fold 4 best RMSE: 18.9674 | elapsed 9.6 min | ckpt: work_cnn/tf_efficientnetv2_s_in21k_fold4_best.pth






Saved OOF to work_cnn/oof_tf_efficientnetv2_s_in21k_fusion1_sz384.csv OOF RMSE: 19.231653782965047
Saved test predictions to work_cnn/submission_tf_efficientnetv2_s_in21k_fusion1_sz384.csv shape: (992, 2)


Exit code: 0 Elapsed: 48.7 min


In [15]:
# Full 5-fold training: ConvNeXt-Small (384px) with fusion, EMA, AMP
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_small.in12k_ft_in1k',
       '--img_size', '384',
       '--batch_size', '32',  # drop to 24 if OOM
       '--epochs', '12',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '3e-5',
       '--lr_head', '1e-3',
       '--weight_decay', '1e-4',
       '--num_workers', '4',
       '--seed', '42',
       '--tta_hflip',
       '--use_fusion'
]
print('Launching full run:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0)/60:.1f} min', flush=True)
assert proc.returncode == 0, 'ConvNeXt-S full run failed'

Launching full run: /usr/bin/python3.11 train_cnn.py --model convnext_small.in12k_ft_in1k --img_size 384 --batch_size 32 --epochs 12 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 3e-5 --lr_head 1e-3 --weight_decay 1e-4 --num_workers 4 --seed 42 --tta_hflip --use_fusion


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.21422 | val_rmse=34.9970 | lr=[3e-05, 0.001] | time=71.6s




Epoch 2/12 | loss=0.03312 | val_rmse=22.5422 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=70.8s




Epoch 3/12 | loss=0.02459 | val_rmse=19.1612 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.0s




Epoch 4/12 | loss=0.01654 | val_rmse=18.4975 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.2s




Epoch 5/12 | loss=0.01014 | val_rmse=18.4345 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.6s




Epoch 6/12 | loss=0.00622 | val_rmse=18.5316 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.5s




Epoch 7/12 | loss=0.00379 | val_rmse=18.5977 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.5s




Epoch 8/12 | loss=0.00227 | val_rmse=18.5957 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.2s




Epoch 9/12 | loss=0.00150 | val_rmse=18.5631 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=71.4s




Epoch 10/12 | loss=0.00112 | val_rmse=18.5119 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.4s




Epoch 11/12 | loss=0.00094 | val_rmse=18.4723 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.5s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00088 | val_rmse=18.4349 | lr=[0.0, 0.0] | time=71.7s
Fold 0 best RMSE: 18.4345 | elapsed 14.3 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_fold0_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 1 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.10050 | val_rmse=24.5387 | lr=[3e-05, 0.001] | time=71.5s




Epoch 2/12 | loss=0.03381 | val_rmse=21.2244 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.3s




Epoch 3/12 | loss=0.02628 | val_rmse=19.4060 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.5s




Epoch 4/12 | loss=0.01736 | val_rmse=18.4298 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.4s




Epoch 5/12 | loss=0.01053 | val_rmse=18.2289 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.7s




Epoch 6/12 | loss=0.00652 | val_rmse=18.1384 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.9s




Epoch 7/12 | loss=0.00374 | val_rmse=18.0803 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.6s




Epoch 8/12 | loss=0.00228 | val_rmse=18.0646 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.6s




Epoch 9/12 | loss=0.00149 | val_rmse=18.0583 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=71.4s




Epoch 10/12 | loss=0.00106 | val_rmse=18.0613 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.5s




Epoch 11/12 | loss=0.00091 | val_rmse=18.0667 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.7s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00088 | val_rmse=18.0742 | lr=[0.0, 0.0] | time=71.9s
Fold 1 best RMSE: 18.0583 | elapsed 14.4 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_fold1_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 2 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.14950 | val_rmse=25.4308 | lr=[3e-05, 0.001] | time=71.5s




Epoch 2/12 | loss=0.03259 | val_rmse=20.7836 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.5s




Epoch 3/12 | loss=0.02627 | val_rmse=19.7689 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.4s




Epoch 4/12 | loss=0.01785 | val_rmse=19.0432 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.9s




Epoch 5/12 | loss=0.01053 | val_rmse=18.8110 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.7s




Epoch 6/12 | loss=0.00711 | val_rmse=18.7628 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.8s




Epoch 7/12 | loss=0.00427 | val_rmse=18.7824 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.6s




Epoch 8/12 | loss=0.00240 | val_rmse=18.7565 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.5s




Epoch 9/12 | loss=0.00165 | val_rmse=18.7050 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=71.3s




Epoch 10/12 | loss=0.00118 | val_rmse=18.6609 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.4s




Epoch 11/12 | loss=0.00103 | val_rmse=18.6324 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.7s




  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00094 | val_rmse=18.6099 | lr=[0.0, 0.0] | time=71.7s
Fold 2 best RMSE: 18.6099 | elapsed 14.4 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_fold2_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 3 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.16123 | val_rmse=24.0106 | lr=[3e-05, 0.001] | time=71.5s




Epoch 2/12 | loss=0.03304 | val_rmse=20.6541 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.6s




Epoch 3/12 | loss=0.02623 | val_rmse=19.2242 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.5s




Epoch 4/12 | loss=0.01714 | val_rmse=18.1544 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.7s




Epoch 5/12 | loss=0.01059 | val_rmse=18.0337 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.4s




Epoch 6/12 | loss=0.00653 | val_rmse=18.1369 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.9s




Epoch 7/12 | loss=0.00397 | val_rmse=18.2387 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.5s




Epoch 8/12 | loss=0.00237 | val_rmse=18.2997 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.4s




Epoch 9/12 | loss=0.00156 | val_rmse=18.3477 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=71.7s




Epoch 10/12 | loss=0.00118 | val_rmse=18.3811 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.7s




Epoch 11/12 | loss=0.00100 | val_rmse=18.4036 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.6s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00094 | val_rmse=18.4255 | lr=[0.0, 0.0] | time=71.8s
Fold 3 best RMSE: 18.0337 | elapsed 14.4 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_fold3_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 4 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.19668 | val_rmse=30.3309 | lr=[3e-05, 0.001] | time=71.6s




Epoch 2/12 | loss=0.03239 | val_rmse=21.6124 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.7s




Epoch 3/12 | loss=0.02548 | val_rmse=18.7628 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.8s




Epoch 4/12 | loss=0.01648 | val_rmse=18.0823 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.8s




Epoch 5/12 | loss=0.01025 | val_rmse=17.9405 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.8s




Epoch 6/12 | loss=0.00619 | val_rmse=17.9975 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.8s




Epoch 7/12 | loss=0.00378 | val_rmse=18.0567 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.9s




Epoch 8/12 | loss=0.00221 | val_rmse=18.0990 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.8s




Epoch 9/12 | loss=0.00143 | val_rmse=18.0929 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=71.7s




Epoch 10/12 | loss=0.00104 | val_rmse=18.0721 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.9s




Epoch 11/12 | loss=0.00091 | val_rmse=18.0466 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.8s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00085 | val_rmse=18.0262 | lr=[0.0, 0.0] | time=71.8s
Fold 4 best RMSE: 17.9405 | elapsed 14.4 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_fold4_best.pth






Saved OOF to work_cnn/oof_convnext_small.in12k_ft_in1k_fusion1_sz384.csv OOF RMSE: 18.21609435749985
Saved test predictions to work_cnn/submission_convnext_small.in12k_ft_in1k_fusion1_sz384.csv shape: (992, 2)


Exit code: 0 Elapsed: 73.2 min


In [19]:
# Ensemble OOF-weighted blend of available CNN models in work_cnn/
import os, glob, pandas as pd, numpy as np
from sklearn.metrics import mean_squared_error

workdir = 'work_cnn'
oof_paths = sorted(glob.glob(os.path.join(workdir, 'oof_*.csv')))
sub_paths = sorted(glob.glob(os.path.join(workdir, 'submission_*.csv')))
print('Found OOF files:', oof_paths)
print('Found SUB files:', sub_paths)
assert len(oof_paths) > 0 and len(sub_paths) > 0, 'No model artifacts to blend yet.'

# Map tag -> oof and sub
def tag_from(path):
    base = os.path.basename(path)
    return base.replace('oof_', '').replace('submission_', '')

oofs = {}
subs = {}
for p in oof_paths:
    tag = tag_from(p)
    df = pd.read_csv(p)
    oofs[tag] = df
for p in sub_paths:
    tag = tag_from(p)
    df = pd.read_csv(p)
    subs[tag] = df

# Keep only tags that have both
tags = [t for t in oofs.keys() if t in subs]
assert len(tags) > 0, 'No matching OOF+SUB pairs'
print('Blend tags:', tags)

# Compute OOF RMSE per tag and weights ~ 1/RMSE^2
rmses = {}
for t in tags:
    d = oofs[t]
    rmse = mean_squared_error(d['target'], d['oof'], squared=False)
    rmses[t] = rmse
weights = {t: 1.0/(rmses[t]**2 + 1e-9) for t in tags}
w_sum = sum(weights.values())
weights = {t: w/w_sum for t, w in weights.items()}
print('OOF RMSEs:', rmses)
print('Weights:', weights)

# Blend test submissions
blend = None
for t in tags:
    df = subs[t].copy()
    df['Pawpularity'] = df['Pawpularity'].clip(0, 100)
    if blend is None:
        blend = df.copy()
        blend.rename(columns={'Pawpularity': f'pred_{t}'}, inplace=True)
    else:
        blend = blend.merge(df.rename(columns={'Pawpularity': f'pred_{t}'}), on='Id', how='inner')

pred_cols = [c for c in blend.columns if c.startswith('pred_')]
pred = np.zeros(len(blend), dtype=float)
for t in tags:
    pred += weights[t] * blend[f'pred_{t}'].values
final_sub = pd.DataFrame({'Id': blend['Id'], 'Pawpularity': np.clip(pred, 0, 100)})
final_sub.to_csv('submission.csv', index=False)
print('Blended submission saved to submission.csv, shape:', final_sub.shape)
final_sub.head()

Found OOF files: ['work_cnn/oof_convnext_small.in12k_ft_in1k_fusion1_sz384.csv', 'work_cnn/oof_convnext_tiny_fusion1_sz384.csv', 'work_cnn/oof_tf_efficientnetv2_s_in21k_fusion1_sz384.csv']
Found SUB files: ['work_cnn/submission_convnext_small.in12k_ft_in1k_fusion1_sz384.csv', 'work_cnn/submission_convnext_tiny_fusion1_sz384.csv', 'work_cnn/submission_tf_efficientnetv2_s_in21k_fusion1_sz384.csv']
Blend tags: ['convnext_small.in12k_ft_in1k_fusion1_sz384.csv', 'convnext_tiny_fusion1_sz384.csv', 'tf_efficientnetv2_s_in21k_fusion1_sz384.csv']
OOF RMSEs: {'convnext_small.in12k_ft_in1k_fusion1_sz384.csv': 18.21609436551175, 'convnext_tiny_fusion1_sz384.csv': 18.144526453713713, 'tf_efficientnetv2_s_in21k_fusion1_sz384.csv': 19.23165377310838}
Weights: {'convnext_small.in12k_ft_in1k_fusion1_sz384.csv': 0.34422466711539135, 'convnext_tiny_fusion1_sz384.csv': 0.34694549047460777, 'tf_efficientnetv2_s_in21k_fusion1_sz384.csv': 0.3088298424100009}
Blended submission saved to submission.csv, shape:



Unnamed: 0,Id,Pawpularity
0,ee51b99832f1ba868f646df93d2b6b81,55.062599
1,caddfb3f8bff9c4b95dbe022018eea21,45.068822
2,582eeabd4a448a53ebb79995888a4b0b,35.200632
3,afc1ad7f0c5eea880759d09e77f7deee,26.694977
4,d5bdf3446e86ce4ec67ce7a00f1cccc2,20.958674


In [18]:
# Full 5-fold training: ConvNeXt-Tiny (in1k, 384px) with fusion, EMA, AMP
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_tiny',
       '--img_size', '384',
       '--batch_size', '32',  # drop to 24 if OOM
       '--epochs', '12',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '3e-5',
       '--lr_head', '1e-3',
       '--weight_decay', '1e-4',
       '--num_workers', '4',
       '--seed', '42',
       '--tta_hflip',
       '--use_fusion'
]
print('Launching full run:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0)/60:.1f} min', flush=True)
assert proc.returncode == 0, 'ConvNeXt-Tiny full run failed'

Launching full run: /usr/bin/python3.11 train_cnn.py --model convnext_tiny --img_size 384 --batch_size 32 --epochs 12 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 3e-5 --lr_head 1e-3 --weight_decay 1e-4 --num_workers 4 --seed 42 --tta_hflip --use_fusion


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.21194 | val_rmse=23.1661 | lr=[3e-05, 0.001] | time=46.0s




Epoch 2/12 | loss=0.03229 | val_rmse=19.4646 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=45.8s




Epoch 3/12 | loss=0.02481 | val_rmse=18.2313 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=45.9s




Epoch 4/12 | loss=0.01722 | val_rmse=18.1319 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=46.0s




Epoch 5/12 | loss=0.01043 | val_rmse=18.2556 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=46.0s




Epoch 6/12 | loss=0.00652 | val_rmse=18.3457 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=46.3s




Epoch 7/12 | loss=0.00416 | val_rmse=18.4228 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=46.2s




Epoch 8/12 | loss=0.00261 | val_rmse=18.4709 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=46.3s




Epoch 9/12 | loss=0.00173 | val_rmse=18.5284 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=46.3s




Epoch 10/12 | loss=0.00128 | val_rmse=18.5746 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=46.4s




Epoch 11/12 | loss=0.00107 | val_rmse=18.6111 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=46.3s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00104 | val_rmse=18.6430 | lr=[0.0, 0.0] | time=46.4s
Fold 0 best RMSE: 18.1319 | elapsed 9.3 min | ckpt: work_cnn/convnext_tiny_fold0_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 1 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.22572 | val_rmse=22.4019 | lr=[3e-05, 0.001] | time=46.5s




Epoch 2/12 | loss=0.03208 | val_rmse=19.1113 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=46.5s




Epoch 3/12 | loss=0.02571 | val_rmse=18.1191 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=46.3s




Epoch 4/12 | loss=0.01695 | val_rmse=18.0152 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=46.3s




Epoch 5/12 | loss=0.01082 | val_rmse=18.0032 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=46.6s




Epoch 6/12 | loss=0.00666 | val_rmse=18.0001 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=46.6s




Epoch 7/12 | loss=0.00409 | val_rmse=18.0286 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=46.4s




Epoch 8/12 | loss=0.00251 | val_rmse=18.0424 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=46.4s




Epoch 9/12 | loss=0.00164 | val_rmse=18.0585 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=46.5s




Epoch 10/12 | loss=0.00120 | val_rmse=18.0759 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=46.9s




Epoch 11/12 | loss=0.00105 | val_rmse=18.1020 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=46.5s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00099 | val_rmse=18.1277 | lr=[0.0, 0.0] | time=46.5s
Fold 1 best RMSE: 18.0001 | elapsed 9.3 min | ckpt: work_cnn/convnext_tiny_fold1_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 2 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.13243 | val_rmse=23.4561 | lr=[3e-05, 0.001] | time=46.2s




Epoch 2/12 | loss=0.03281 | val_rmse=19.3253 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=46.6s




Epoch 3/12 | loss=0.02482 | val_rmse=18.5502 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=46.6s




Epoch 4/12 | loss=0.01671 | val_rmse=18.4864 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=46.6s




Epoch 5/12 | loss=0.01010 | val_rmse=18.5533 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=46.7s




Epoch 6/12 | loss=0.00656 | val_rmse=18.6248 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=46.9s




Epoch 7/12 | loss=0.00392 | val_rmse=18.6384 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=46.5s




Epoch 8/12 | loss=0.00258 | val_rmse=18.5723 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=46.8s




Epoch 9/12 | loss=0.00171 | val_rmse=18.5176 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=46.7s




Epoch 10/12 | loss=0.00132 | val_rmse=18.4819 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=46.5s




Epoch 11/12 | loss=0.00112 | val_rmse=18.4677 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=46.3s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00107 | val_rmse=18.4683 | lr=[0.0, 0.0] | time=46.4s
Fold 2 best RMSE: 18.4677 | elapsed 9.3 min | ckpt: work_cnn/convnext_tiny_fold2_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 3 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.20203 | val_rmse=27.1960 | lr=[3e-05, 0.001] | time=46.6s




Epoch 2/12 | loss=0.03426 | val_rmse=20.9116 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=46.6s




Epoch 3/12 | loss=0.02752 | val_rmse=19.3527 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=46.5s




Epoch 4/12 | loss=0.01876 | val_rmse=18.6164 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=46.6s




Epoch 5/12 | loss=0.01181 | val_rmse=18.3738 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=46.8s




Epoch 6/12 | loss=0.00779 | val_rmse=18.3657 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=46.5s




Epoch 7/12 | loss=0.00466 | val_rmse=18.3835 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=46.4s




Epoch 8/12 | loss=0.00289 | val_rmse=18.4014 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=46.1s




Epoch 9/12 | loss=0.00194 | val_rmse=18.4226 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=46.4s




Epoch 10/12 | loss=0.00150 | val_rmse=18.4520 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=46.5s




Epoch 11/12 | loss=0.00123 | val_rmse=18.4841 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=46.6s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00118 | val_rmse=18.5201 | lr=[0.0, 0.0] | time=46.5s
Fold 3 best RMSE: 18.3657 | elapsed 9.3 min | ckpt: work_cnn/convnext_tiny_fold3_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 4 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.20772 | val_rmse=29.8742 | lr=[3e-05, 0.001] | time=46.7s




Epoch 2/12 | loss=0.03385 | val_rmse=19.2445 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=46.5s




Epoch 3/12 | loss=0.02734 | val_rmse=18.0391 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=46.8s




Epoch 4/12 | loss=0.01921 | val_rmse=17.7564 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=46.7s




Epoch 5/12 | loss=0.01164 | val_rmse=17.8964 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=46.6s




Epoch 6/12 | loss=0.00762 | val_rmse=18.0217 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=46.6s




Epoch 7/12 | loss=0.00482 | val_rmse=18.0792 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=46.5s




Epoch 8/12 | loss=0.00285 | val_rmse=18.1091 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=46.5s




Epoch 9/12 | loss=0.00188 | val_rmse=18.1111 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=46.7s




Epoch 10/12 | loss=0.00138 | val_rmse=18.0999 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=46.7s




Epoch 11/12 | loss=0.00120 | val_rmse=18.0912 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=46.7s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00117 | val_rmse=18.0919 | lr=[0.0, 0.0] | time=46.9s
Fold 4 best RMSE: 17.7564 | elapsed 9.3 min | ckpt: work_cnn/convnext_tiny_fold4_best.pth






Saved OOF to work_cnn/oof_convnext_tiny_fusion1_sz384.csv OOF RMSE: 18.144526442690495
Saved test predictions to work_cnn/submission_convnext_tiny_fusion1_sz384.csv shape: (992, 2)


Exit code: 0 Elapsed: 47.6 min


In [20]:
# Smoke test: convnext_small seed=43 with CutMix (fold 0, 3 epochs) and wd=0.05
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_small.in12k_ft_in1k',
       '--img_size', '384',
       '--batch_size', '32',
       '--epochs', '3',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '3e-5',
       '--lr_head', '1e-3',
       '--weight_decay', '0.05',
       '--num_workers', '4',
       '--seed', '43',
       '--tta_hflip',
       '--use_fusion',
       '--use_cutmix',
       '--cutmix_alpha', '0.5',
       '--cutmix_p', '0.5',
       '--fold', '0'
]
print('Launching smoke run:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0):.1f}s', flush=True)
assert proc.returncode == 0, 'Smoke run failed; check logs.'

Launching smoke run: /usr/bin/python3.11 train_cnn.py --model convnext_small.in12k_ft_in1k --img_size 384 --batch_size 32 --epochs 3 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 3e-5 --lr_head 1e-3 --weight_decay 0.05 --num_workers 4 --seed 43 --tta_hflip --use_fusion --use_cutmix --cutmix_alpha 0.5 --cutmix_p 0.5 --fold 0


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=3




Epoch 1/3 | loss=0.19000 | val_rmse=26.0774 | lr=[3e-05, 0.001] | time=70.2s




Epoch 2/3 | loss=0.03234 | val_rmse=20.0078 | lr=[1.5e-05, 0.0005] | time=70.1s




  ckpt = torch.load(best_path, map_location=device)


Epoch 3/3 | loss=0.02840 | val_rmse=18.6981 | lr=[0.0, 0.0] | time=70.4s
Fold 0 best RMSE: 18.6981 | elapsed 3.5 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed43_fold0_best.pth






Saved OOF to work_cnn/oof_convnext_small.in12k_ft_in1k_fusion1_sz384_seed43.csv OOF RMSE: 39.589261696852404
Saved test predictions to work_cnn/submission_convnext_small.in12k_ft_in1k_fusion1_sz384_seed43.csv shape: (992, 2)


Exit code: 0 Elapsed: 229.7s


In [21]:
# Full 5-fold: convnext_small.in12k_ft_in1k @384, seed=43, CutMix on, wd=0.05
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_small.in12k_ft_in1k',
       '--img_size', '384',
       '--batch_size', '32',
       '--epochs', '12',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '3e-5',
       '--lr_head', '1e-3',
       '--weight_decay', '0.05',
       '--num_workers', '4',
       '--seed', '43',
       '--tta_hflip',
       '--use_fusion',
       '--use_cutmix',
       '--cutmix_alpha', '0.5',
       '--cutmix_p', '0.5'
]
print('Launching full run:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0)/60:.1f} min', flush=True)
assert proc.returncode == 0, 'ConvNeXt-S seed=43 full run failed'

Launching full run: /usr/bin/python3.11 train_cnn.py --model convnext_small.in12k_ft_in1k --img_size 384 --batch_size 32 --epochs 12 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 3e-5 --lr_head 1e-3 --weight_decay 0.05 --num_workers 4 --seed 43 --tta_hflip --use_fusion --use_cutmix --cutmix_alpha 0.5 --cutmix_p 0.5


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.19000 | val_rmse=26.0774 | lr=[3e-05, 0.001] | time=71.2s




Epoch 2/12 | loss=0.02934 | val_rmse=20.0438 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.1s




Epoch 3/12 | loss=0.02520 | val_rmse=18.6755 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.3s




Epoch 4/12 | loss=0.01979 | val_rmse=18.1499 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.3s




Epoch 5/12 | loss=0.01495 | val_rmse=18.0084 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.3s




Epoch 6/12 | loss=0.01107 | val_rmse=18.0522 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.0s




Epoch 7/12 | loss=0.00713 | val_rmse=18.1545 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.6s




Epoch 8/12 | loss=0.00552 | val_rmse=18.2442 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.5s




Epoch 9/12 | loss=0.00446 | val_rmse=18.2945 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=71.8s




Epoch 10/12 | loss=0.00389 | val_rmse=18.3259 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.5s




Epoch 11/12 | loss=0.00228 | val_rmse=18.3533 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.5s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00212 | val_rmse=18.3728 | lr=[0.0, 0.0] | time=71.2s
Fold 0 best RMSE: 18.0084 | elapsed 14.3 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed43_fold0_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 1 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.27015 | val_rmse=27.2931 | lr=[3e-05, 0.001] | time=71.6s




Epoch 2/12 | loss=0.03031 | val_rmse=21.9269 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.9s




Epoch 3/12 | loss=0.02599 | val_rmse=19.0122 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.9s




Epoch 4/12 | loss=0.02000 | val_rmse=17.9788 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.8s




Epoch 5/12 | loss=0.01439 | val_rmse=17.6586 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.5s




Epoch 6/12 | loss=0.00977 | val_rmse=17.6557 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.8s




Epoch 7/12 | loss=0.00695 | val_rmse=17.7173 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.6s




Epoch 8/12 | loss=0.00513 | val_rmse=17.7869 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.8s




Epoch 9/12 | loss=0.00403 | val_rmse=17.8519 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=72.0s




Epoch 10/12 | loss=0.00330 | val_rmse=17.9079 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.8s




Epoch 11/12 | loss=0.00190 | val_rmse=17.9592 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.4s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00181 | val_rmse=18.0106 | lr=[0.0, 0.0] | time=71.7s
Fold 1 best RMSE: 17.6557 | elapsed 14.4 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed43_fold1_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 2 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.23163 | val_rmse=26.0169 | lr=[3e-05, 0.001] | time=71.3s




Epoch 2/12 | loss=0.02771 | val_rmse=21.0963 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.7s




Epoch 3/12 | loss=0.02439 | val_rmse=19.0339 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.9s




Epoch 4/12 | loss=0.01888 | val_rmse=18.2990 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.4s




Epoch 5/12 | loss=0.01357 | val_rmse=18.1241 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.2s




Epoch 6/12 | loss=0.00985 | val_rmse=18.0998 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.4s




Epoch 7/12 | loss=0.00668 | val_rmse=18.1202 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.5s




Epoch 8/12 | loss=0.00499 | val_rmse=18.1249 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.7s




Epoch 9/12 | loss=0.00411 | val_rmse=18.1329 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=71.8s




Epoch 10/12 | loss=0.00339 | val_rmse=18.1543 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.7s




Epoch 11/12 | loss=0.00199 | val_rmse=18.1871 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.5s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00181 | val_rmse=18.2248 | lr=[0.0, 0.0] | time=71.6s
Fold 2 best RMSE: 18.0998 | elapsed 14.4 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed43_fold2_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 3 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.15099 | val_rmse=22.9060 | lr=[3e-05, 0.001] | time=71.6s




Epoch 2/12 | loss=0.02861 | val_rmse=19.9252 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.7s




Epoch 3/12 | loss=0.02459 | val_rmse=19.1599 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.8s




Epoch 4/12 | loss=0.01971 | val_rmse=18.4060 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.9s




Epoch 5/12 | loss=0.01402 | val_rmse=17.9604 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.7s




Epoch 6/12 | loss=0.00986 | val_rmse=17.8502 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=72.0s




Epoch 7/12 | loss=0.00694 | val_rmse=17.9095 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.8s




Epoch 8/12 | loss=0.00496 | val_rmse=17.9900 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.9s




Epoch 9/12 | loss=0.00377 | val_rmse=18.0684 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=72.0s




Epoch 10/12 | loss=0.00318 | val_rmse=18.1405 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.9s




Epoch 11/12 | loss=0.00192 | val_rmse=18.1944 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.9s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00184 | val_rmse=18.2344 | lr=[0.0, 0.0] | time=71.7s
Fold 3 best RMSE: 17.8502 | elapsed 14.4 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed43_fold3_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 4 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.24085 | val_rmse=26.1367 | lr=[3e-05, 0.001] | time=71.6s




Epoch 2/12 | loss=0.03054 | val_rmse=19.6972 | lr=[2.939239460421746e-05, 0.0009797464868072487] | time=71.6s




Epoch 3/12 | loss=0.02598 | val_rmse=17.9560 | lr=[2.7618802992467718e-05, 0.0009206267664155906] | time=71.7s




Epoch 4/12 | loss=0.01990 | val_rmse=17.7989 | lr=[2.482291100917928e-05, 0.0008274303669726426] | time=71.6s




Epoch 5/12 | loss=0.01479 | val_rmse=17.6802 | lr=[2.12312251950283e-05, 0.0007077075065009433] | time=71.7s




Epoch 6/12 | loss=0.01041 | val_rmse=17.6036 | lr=[1.713472257409928e-05, 0.0005711574191366427] | time=71.2s




Epoch 7/12 | loss=0.00764 | val_rmse=17.6083 | lr=[1.2865277425900725e-05, 0.0004288425808633575] | time=71.7s




Epoch 8/12 | loss=0.00567 | val_rmse=17.6268 | lr=[8.768774804971705e-06, 0.0002922924934990568] | time=71.9s




Epoch 9/12 | loss=0.00454 | val_rmse=17.6450 | lr=[5.177088990820725e-06, 0.0001725696330273575] | time=71.8s




Epoch 10/12 | loss=0.00358 | val_rmse=17.6374 | lr=[2.38119700753228e-06, 7.937323358440934e-05] | time=71.8s




Epoch 11/12 | loss=0.00221 | val_rmse=17.6294 | lr=[6.076053957825411e-07, 2.025351319275137e-05] | time=71.7s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00207 | val_rmse=17.6309 | lr=[0.0, 0.0] | time=71.7s
Fold 4 best RMSE: 17.6036 | elapsed 14.4 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed43_fold4_best.pth






Saved OOF to work_cnn/oof_convnext_small.in12k_ft_in1k_fusion1_sz384_seed43.csv OOF RMSE: 17.843361034410087
Saved test predictions to work_cnn/submission_convnext_small.in12k_ft_in1k_fusion1_sz384_seed43.csv shape: (992, 2)


Exit code: 0 Elapsed: 73.1 min


In [35]:
# NNLS-optimized blend of all available models in work_cnn/
import os, glob, pandas as pd, numpy as np
from sklearn.metrics import mean_squared_error
from scipy.optimize import nnls

workdir = 'work_cnn'
oof_paths = sorted(glob.glob(os.path.join(workdir, 'oof_*.csv')))
sub_paths = sorted(glob.glob(os.path.join(workdir, 'submission_*.csv')))
print('Found OOF files:', len(oof_paths))
print('Found SUB files:', len(sub_paths))
assert len(oof_paths) > 0 and len(sub_paths) > 0, 'No model artifacts to blend yet.'

def tag_from(path):
    base = os.path.basename(path)
    return base.replace('oof_', '').replace('submission_', '')

# Load and align OOFs
oof_dfs = {}
for p in oof_paths:
    t = tag_from(p)
    df = pd.read_csv(p)
    assert {'Id','oof','target'}.issubset(df.columns), f'Bad OOF file cols: {p}'
    oof_dfs[t] = df[['Id','oof','target','fold']].copy()

tags = sorted(oof_dfs.keys())
base = oof_dfs[tags[0]][['Id','target']].rename(columns={'target':'y'})
for t in tags[1:]:
    base = base.merge(oof_dfs[t][['Id','oof']].rename(columns={'oof': f'oof_{t}'}), on='Id', how='inner')
base = base.merge(oof_dfs[tags[0]][['Id','oof']].rename(columns={'oof': f'oof_{tags[0]}'}), on='Id', how='inner')

# Ensure unique column order
pred_cols = [c for c in base.columns if c.startswith('oof_')]
X = base[pred_cols].values.astype(float)
y = base['y'].values.astype(float)

# NNLS weights
w, _ = nnls(X, y)
if w.sum() > 0:
    w = w / w.sum()
weights = dict(zip([c.replace('oof_','') for c in pred_cols], w.tolist()))
blend_oof = X.dot(w)
rmse_blend = mean_squared_error(y, blend_oof, squared=False)
print('NNLS Weights:', weights)
print('Blend OOF RMSE:', rmse_blend)

# Apply weights to SUB files
subs = {}
for p in sub_paths:
    t = tag_from(p)
    subs[t] = pd.read_csv(p)

# Merge predictions on Id
sub_merge = None
used = []
for t in weights.keys():
    if t in subs:
        df = subs[t][['Id','Pawpularity']].rename(columns={'Pawpularity': f'pred_{t}'})
        sub_merge = df if sub_merge is None else sub_merge.merge(df, on='Id', how='inner')
        used.append(t)
print('Using SUB tags:', used)
assert sub_merge is not None and len(used) > 0, 'No matching submissions for weights'

pred = np.zeros(len(sub_merge), dtype=float)
for t in used:
    pred += weights[t] * sub_merge[f'pred_{t}'].clip(0,100).values
final = pd.DataFrame({'Id': sub_merge['Id'], 'Pawpularity': np.clip(pred, 0, 100)})
final.to_csv('submission.csv', index=False)
print('Saved NNLS-blended submission.csv, shape:', final.shape)

Found OOF files: 8
Found SUB files: 8
NNLS Weights: {'convnext_small.in12k_ft_in1k_fusion1_sz384.csv': 0.0, 'convnext_small.in12k_ft_in1k_fusion1_sz384_seed43.csv': 0.12659622936652432, 'convnext_small.in12k_ft_in1k_fusion1_sz512_seed44.csv': 0.1962968613630638, 'convnext_small.in12k_ft_in1k_fusion1_sz512_seed45.csv': 0.1363520627481221, 'convnext_tiny_fusion1_sz384.csv': 0.0898123345552999, 'swin_base_patch4_window12_384_fusion1_sz384_seed42.csv': 0.21454231143062782, 'tf_efficientnetv2_s_in21k_fusion1_sz384.csv': 0.05477408877089537, 'convnext_base_fusion1_sz384_seed42.csv': 0.18162611176546678}
Blend OOF RMSE: 17.370572068035344
Using SUB tags: ['convnext_small.in12k_ft_in1k_fusion1_sz384.csv', 'convnext_small.in12k_ft_in1k_fusion1_sz384_seed43.csv', 'convnext_small.in12k_ft_in1k_fusion1_sz512_seed44.csv', 'convnext_small.in12k_ft_in1k_fusion1_sz512_seed45.csv', 'convnext_tiny_fusion1_sz384.csv', 'swin_base_patch4_window12_384_fusion1_sz384_seed42.csv', 'tf_efficientnetv2_s_in21k_fu



In [24]:
# Smoke test: convnext_base @384, seed=42, CutMix on (fold 0, 3 epochs)
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_base',
       '--img_size', '384',
       '--batch_size', '24',
       '--epochs', '3',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '2e-5',
       '--lr_head', '8e-4',
       '--weight_decay', '0.05',
       '--num_workers', '4',
       '--seed', '42',
       '--tta_hflip',
       '--use_fusion',
       '--use_cutmix',
       '--cutmix_alpha', '0.5',
       '--cutmix_p', '0.5',
       '--fold', '0'
]
print('Launching ConvNeXt-Base smoke:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0):.1f}s', flush=True)
assert proc.returncode == 0, 'ConvNeXt-Base smoke run failed (check tag/VRAM).'

Launching ConvNeXt-Base smoke: /usr/bin/python3.11 train_cnn.py --model convnext_base --img_size 384 --batch_size 24 --epochs 3 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 2e-5 --lr_head 8e-4 --weight_decay 0.05 --num_workers 4 --seed 42 --tta_hflip --use_fusion --use_cutmix --cutmix_alpha 0.5 --cutmix_p 0.5 --fold 0


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=3




Epoch 1/3 | loss=0.07621 | val_rmse=28.3327 | lr=[2e-05, 0.0008] | time=102.1s




Epoch 2/3 | loss=0.02822 | val_rmse=21.6977 | lr=[1e-05, 0.0004] | time=101.2s




  ckpt = torch.load(best_path, map_location=device)


Epoch 3/3 | loss=0.02215 | val_rmse=19.9412 | lr=[0.0, 0.0] | time=101.8s
Fold 0 best RMSE: 19.9412 | elapsed 5.1 min | ckpt: work_cnn/convnext_base_seed42_fold0_best.pth






Saved OOF to work_cnn/oof_convnext_base_fusion1_sz384_seed42.csv OOF RMSE: 39.71039751076835
Saved test predictions to work_cnn/submission_convnext_base_fusion1_sz384_seed42.csv shape: (992, 2)


Exit code: 0 Elapsed: 336.3s


In [25]:
# Full 5-fold: convnext_base @384, seed=42, CutMix on, wd=0.05
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_base',
       '--img_size', '384',
       '--batch_size', '24',
       '--epochs', '12',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '2e-5',
       '--lr_head', '8e-4',
       '--weight_decay', '0.05',
       '--num_workers', '4',
       '--seed', '42',
       '--tta_hflip',
       '--use_fusion',
       '--use_cutmix',
       '--cutmix_alpha', '0.5',
       '--cutmix_p', '0.5'
]
print('Launching ConvNeXt-Base full run:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0)/60:.1f} min', flush=True)
assert proc.returncode == 0, 'ConvNeXt-Base full run failed'

Launching ConvNeXt-Base full run: /usr/bin/python3.11 train_cnn.py --model convnext_base --img_size 384 --batch_size 24 --epochs 12 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 2e-5 --lr_head 8e-4 --weight_decay 0.05 --num_workers 4 --seed 42 --tta_hflip --use_fusion --use_cutmix --cutmix_alpha 0.5 --cutmix_p 0.5


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.07621 | val_rmse=28.3327 | lr=[2e-05, 0.0008] | time=101.4s




Epoch 2/12 | loss=0.02624 | val_rmse=21.0925 | lr=[1.9594929736144978e-05, 0.000783797189445799] | time=101.5s




Epoch 3/12 | loss=0.02216 | val_rmse=19.2604 | lr=[1.8412535328311813e-05, 0.0007365014131324725] | time=102.0s




Epoch 4/12 | loss=0.01796 | val_rmse=18.8669 | lr=[1.6548607339452853e-05, 0.0006619442935781141] | time=101.7s




Epoch 5/12 | loss=0.01289 | val_rmse=18.7739 | lr=[1.4154150130018867e-05, 0.0005661660052007546] | time=101.8s




Epoch 6/12 | loss=0.00958 | val_rmse=18.6825 | lr=[1.1423148382732854e-05, 0.0004569259353093141] | time=101.9s




Epoch 7/12 | loss=0.00716 | val_rmse=18.6378 | lr=[8.576851617267151e-06, 0.000343074064690686] | time=102.3s




Epoch 8/12 | loss=0.00571 | val_rmse=18.6218 | lr=[5.845849869981137e-06, 0.00023383399479924547] | time=102.1s




Epoch 9/12 | loss=0.00452 | val_rmse=18.6166 | lr=[3.4513926605471504e-06, 0.000138055706421886] | time=101.9s




Epoch 10/12 | loss=0.00400 | val_rmse=18.6093 | lr=[1.587464671688187e-06, 6.349858686752748e-05] | time=102.1s




Epoch 11/12 | loss=0.00216 | val_rmse=18.6171 | lr=[4.0507026385502747e-07, 1.62028105542011e-05] | time=101.8s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00201 | val_rmse=18.6294 | lr=[0.0, 0.0] | time=102.2s
Fold 0 best RMSE: 18.6093 | elapsed 20.5 min | ckpt: work_cnn/convnext_base_seed42_fold0_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 1 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.06044 | val_rmse=23.7192 | lr=[2e-05, 0.0008] | time=102.1s




Epoch 2/12 | loss=0.02718 | val_rmse=19.1435 | lr=[1.9594929736144978e-05, 0.000783797189445799] | time=102.0s




Epoch 3/12 | loss=0.02267 | val_rmse=18.2770 | lr=[1.8412535328311813e-05, 0.0007365014131324725] | time=102.4s




Epoch 4/12 | loss=0.01779 | val_rmse=18.1733 | lr=[1.6548607339452853e-05, 0.0006619442935781141] | time=102.5s




Epoch 5/12 | loss=0.01318 | val_rmse=18.1386 | lr=[1.4154150130018867e-05, 0.0005661660052007546] | time=102.1s




Epoch 6/12 | loss=0.00945 | val_rmse=18.1370 | lr=[1.1423148382732854e-05, 0.0004569259353093141] | time=101.9s




Epoch 7/12 | loss=0.00722 | val_rmse=18.1419 | lr=[8.576851617267151e-06, 0.000343074064690686] | time=102.2s




Epoch 8/12 | loss=0.00555 | val_rmse=18.1421 | lr=[5.845849869981137e-06, 0.00023383399479924547] | time=102.6s




Epoch 9/12 | loss=0.00433 | val_rmse=18.1639 | lr=[3.4513926605471504e-06, 0.000138055706421886] | time=102.5s




Epoch 10/12 | loss=0.00405 | val_rmse=18.1950 | lr=[1.587464671688187e-06, 6.349858686752748e-05] | time=102.0s




Epoch 11/12 | loss=0.00208 | val_rmse=18.2387 | lr=[4.0507026385502747e-07, 1.62028105542011e-05] | time=102.2s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00194 | val_rmse=18.2739 | lr=[0.0, 0.0] | time=102.8s
Fold 1 best RMSE: 18.1370 | elapsed 20.5 min | ckpt: work_cnn/convnext_base_seed42_fold1_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 2 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.05720 | val_rmse=25.4739 | lr=[2e-05, 0.0008] | time=102.1s




Epoch 2/12 | loss=0.02691 | val_rmse=19.7049 | lr=[1.9594929736144978e-05, 0.000783797189445799] | time=102.1s




Epoch 3/12 | loss=0.02184 | val_rmse=18.9337 | lr=[1.8412535328311813e-05, 0.0007365014131324725] | time=102.1s




Epoch 4/12 | loss=0.01763 | val_rmse=18.6950 | lr=[1.6548607339452853e-05, 0.0006619442935781141] | time=102.4s




Epoch 5/12 | loss=0.01281 | val_rmse=18.6255 | lr=[1.4154150130018867e-05, 0.0005661660052007546] | time=102.3s




Epoch 6/12 | loss=0.00968 | val_rmse=18.5591 | lr=[1.1423148382732854e-05, 0.0004569259353093141] | time=101.9s




Epoch 7/12 | loss=0.00727 | val_rmse=18.5446 | lr=[8.576851617267151e-06, 0.000343074064690686] | time=101.8s




Epoch 8/12 | loss=0.00538 | val_rmse=18.5457 | lr=[5.845849869981137e-06, 0.00023383399479924547] | time=102.4s




Epoch 9/12 | loss=0.00450 | val_rmse=18.5724 | lr=[3.4513926605471504e-06, 0.000138055706421886] | time=102.4s




Epoch 10/12 | loss=0.00401 | val_rmse=18.5852 | lr=[1.587464671688187e-06, 6.349858686752748e-05] | time=102.3s




Epoch 11/12 | loss=0.00205 | val_rmse=18.6045 | lr=[4.0507026385502747e-07, 1.62028105542011e-05] | time=101.9s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00196 | val_rmse=18.6198 | lr=[0.0, 0.0] | time=102.2s
Fold 2 best RMSE: 18.5446 | elapsed 20.5 min | ckpt: work_cnn/convnext_base_seed42_fold2_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 3 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.06692 | val_rmse=21.1248 | lr=[2e-05, 0.0008] | time=101.9s




Epoch 2/12 | loss=0.02603 | val_rmse=19.1108 | lr=[1.9594929736144978e-05, 0.000783797189445799] | time=102.1s




Epoch 3/12 | loss=0.02189 | val_rmse=18.5304 | lr=[1.8412535328311813e-05, 0.0007365014131324725] | time=102.0s




Epoch 4/12 | loss=0.01774 | val_rmse=18.2988 | lr=[1.6548607339452853e-05, 0.0006619442935781141] | time=102.1s




Epoch 5/12 | loss=0.01284 | val_rmse=18.2397 | lr=[1.4154150130018867e-05, 0.0005661660052007546] | time=102.1s




Epoch 6/12 | loss=0.00972 | val_rmse=18.2527 | lr=[1.1423148382732854e-05, 0.0004569259353093141] | time=102.2s




Epoch 7/12 | loss=0.00702 | val_rmse=18.2751 | lr=[8.576851617267151e-06, 0.000343074064690686] | time=102.2s




Epoch 8/12 | loss=0.00562 | val_rmse=18.3345 | lr=[5.845849869981137e-06, 0.00023383399479924547] | time=102.3s




Epoch 9/12 | loss=0.00482 | val_rmse=18.3761 | lr=[3.4513926605471504e-06, 0.000138055706421886] | time=102.3s




Epoch 10/12 | loss=0.00407 | val_rmse=18.4107 | lr=[1.587464671688187e-06, 6.349858686752748e-05] | time=102.3s




Epoch 11/12 | loss=0.00206 | val_rmse=18.4470 | lr=[4.0507026385502747e-07, 1.62028105542011e-05] | time=102.2s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00193 | val_rmse=18.4742 | lr=[0.0, 0.0] | time=102.1s
Fold 3 best RMSE: 18.2397 | elapsed 20.5 min | ckpt: work_cnn/convnext_base_seed42_fold3_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 4 | train=7136 val=1784 epochs=12




Epoch 1/12 | loss=0.06777 | val_rmse=22.5978 | lr=[2e-05, 0.0008] | time=102.3s




Epoch 2/12 | loss=0.02682 | val_rmse=19.3215 | lr=[1.9594929736144978e-05, 0.000783797189445799] | time=102.1s




Epoch 3/12 | loss=0.02238 | val_rmse=18.5465 | lr=[1.8412535328311813e-05, 0.0007365014131324725] | time=102.1s




Epoch 4/12 | loss=0.01762 | val_rmse=18.2074 | lr=[1.6548607339452853e-05, 0.0006619442935781141] | time=102.0s




Epoch 5/12 | loss=0.01276 | val_rmse=18.0733 | lr=[1.4154150130018867e-05, 0.0005661660052007546] | time=102.4s




Epoch 6/12 | loss=0.00924 | val_rmse=17.9639 | lr=[1.1423148382732854e-05, 0.0004569259353093141] | time=102.4s




Epoch 7/12 | loss=0.00721 | val_rmse=17.9192 | lr=[8.576851617267151e-06, 0.000343074064690686] | time=102.3s




Epoch 8/12 | loss=0.00546 | val_rmse=17.9055 | lr=[5.845849869981137e-06, 0.00023383399479924547] | time=102.2s




Epoch 9/12 | loss=0.00470 | val_rmse=17.8919 | lr=[3.4513926605471504e-06, 0.000138055706421886] | time=102.4s




Epoch 10/12 | loss=0.00416 | val_rmse=17.9002 | lr=[1.587464671688187e-06, 6.349858686752748e-05] | time=102.5s




Epoch 11/12 | loss=0.00213 | val_rmse=17.9116 | lr=[4.0507026385502747e-07, 1.62028105542011e-05] | time=102.5s


  ckpt = torch.load(best_path, map_location=device)


Epoch 12/12 | loss=0.00206 | val_rmse=17.9220 | lr=[0.0, 0.0] | time=102.1s
Fold 4 best RMSE: 17.8919 | elapsed 20.6 min | ckpt: work_cnn/convnext_base_seed42_fold4_best.pth






Saved OOF to work_cnn/oof_convnext_base_fusion1_sz384_seed42.csv OOF RMSE: 18.2587059237385
Saved test predictions to work_cnn/submission_convnext_base_fusion1_sz384_seed42.csv shape: (992, 2)


Exit code: 0 Elapsed: 104.2 min


In [27]:
# Launch Run 1 — ConvNeXt-S @512 (fusion), seed 44 per expert advice
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_small.in12k_ft_in1k',
       '--img_size', '512',
       '--batch_size', '24',
       '--epochs', '20',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '2.5e-5',
       '--lr_head', '8e-4',
       '--weight_decay', '0.05',
       '--num_workers', '4',
       '--seed', '44',
       '--tta_hflip',
       '--use_fusion',
       '--use_cutmix',
       '--cutmix_alpha', '0.5',
       '--cutmix_p', '0.5'
]
print('Launching:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0)/60:.1f} min', flush=True)
assert proc.returncode == 0, 'ConvNeXt-S @512 run failed'

Launching: /usr/bin/python3.11 train_cnn.py --model convnext_small.in12k_ft_in1k --img_size 512 --batch_size 24 --epochs 20 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 2.5e-5 --lr_head 8e-4 --weight_decay 0.05 --num_workers 4 --seed 44 --tta_hflip --use_fusion --use_cutmix --cutmix_alpha 0.5 --cutmix_p 0.5


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.08418 | val_rmse=22.1432 | lr=[2.5e-05, 0.0008] | time=122.0s




Epoch 2/20 | loss=0.02907 | val_rmse=19.2236 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=119.6s




Epoch 3/20 | loss=0.02532 | val_rmse=18.6370 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=121.5s




Epoch 4/20 | loss=0.01984 | val_rmse=18.5060 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=121.7s




Epoch 5/20 | loss=0.01516 | val_rmse=18.3847 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=121.5s




Epoch 6/20 | loss=0.01065 | val_rmse=18.3122 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=121.9s




Epoch 7/20 | loss=0.00817 | val_rmse=18.3243 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=121.8s




Epoch 8/20 | loss=0.00598 | val_rmse=18.3054 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=121.9s




Epoch 9/20 | loss=0.00431 | val_rmse=18.3043 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=122.2s




Epoch 10/20 | loss=0.00333 | val_rmse=18.2976 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=121.8s




Epoch 11/20 | loss=0.00284 | val_rmse=18.3206 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=121.8s




Epoch 12/20 | loss=0.00227 | val_rmse=18.3330 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=122.0s




Epoch 13/20 | loss=0.00193 | val_rmse=18.3325 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=122.1s




Epoch 14/20 | loss=0.00180 | val_rmse=18.3374 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=121.9s




Epoch 15/20 | loss=0.00159 | val_rmse=18.3428 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=121.8s




Epoch 16/20 | loss=0.00148 | val_rmse=18.3439 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=122.0s




Epoch 17/20 | loss=0.00131 | val_rmse=18.3489 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=122.0s




Epoch 18/20 | loss=0.00120 | val_rmse=18.3471 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=122.0s




Epoch 19/20 | loss=0.00044 | val_rmse=18.3474 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=121.8s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00043 | val_rmse=18.3457 | lr=[0.0, 0.0] | time=121.8s
Fold 0 best RMSE: 18.2976 | elapsed 40.7 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed44_fold0_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 1 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.06218 | val_rmse=22.9418 | lr=[2.5e-05, 0.0008] | time=121.7s




Epoch 2/20 | loss=0.03142 | val_rmse=20.1600 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=122.0s




Epoch 3/20 | loss=0.02652 | val_rmse=18.7091 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=121.9s




Epoch 4/20 | loss=0.02201 | val_rmse=17.9406 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=122.1s




Epoch 5/20 | loss=0.01541 | val_rmse=17.6456 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=121.8s




Epoch 6/20 | loss=0.01104 | val_rmse=17.6533 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=122.1s




Epoch 7/20 | loss=0.00854 | val_rmse=17.7524 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=121.8s




Epoch 8/20 | loss=0.00569 | val_rmse=17.8854 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=122.0s




Epoch 9/20 | loss=0.00436 | val_rmse=17.9865 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=122.2s




Epoch 10/20 | loss=0.00345 | val_rmse=18.0576 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=122.1s




Epoch 11/20 | loss=0.00276 | val_rmse=18.1147 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=121.7s




Epoch 12/20 | loss=0.00225 | val_rmse=18.1595 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=122.1s




Epoch 13/20 | loss=0.00200 | val_rmse=18.1961 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=122.3s




Epoch 14/20 | loss=0.00180 | val_rmse=18.2251 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=122.1s




Epoch 15/20 | loss=0.00158 | val_rmse=18.2447 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=121.8s




Epoch 16/20 | loss=0.00149 | val_rmse=18.2578 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=122.4s




Epoch 17/20 | loss=0.00131 | val_rmse=18.2667 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=122.2s




Epoch 18/20 | loss=0.00134 | val_rmse=18.2744 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=122.2s




Epoch 19/20 | loss=0.00048 | val_rmse=18.2801 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=121.7s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00047 | val_rmse=18.2853 | lr=[0.0, 0.0] | time=122.2s
Fold 1 best RMSE: 17.6456 | elapsed 40.7 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed44_fold1_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 2 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.12981 | val_rmse=22.6198 | lr=[2.5e-05, 0.0008] | time=121.6s




Epoch 2/20 | loss=0.03030 | val_rmse=20.5355 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=121.8s




Epoch 3/20 | loss=0.02534 | val_rmse=19.0677 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=121.9s




Epoch 4/20 | loss=0.01984 | val_rmse=18.5271 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=122.1s




Epoch 5/20 | loss=0.01510 | val_rmse=18.3460 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=122.1s




Epoch 6/20 | loss=0.01085 | val_rmse=18.3435 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=121.7s




Epoch 7/20 | loss=0.00809 | val_rmse=18.3202 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=122.0s




Epoch 8/20 | loss=0.00610 | val_rmse=18.3177 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=122.2s




Epoch 9/20 | loss=0.00461 | val_rmse=18.2866 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=121.9s




Epoch 10/20 | loss=0.00381 | val_rmse=18.2263 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=122.1s




Epoch 11/20 | loss=0.00323 | val_rmse=18.1586 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=122.1s




Epoch 12/20 | loss=0.00252 | val_rmse=18.1227 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=121.7s




Epoch 13/20 | loss=0.00221 | val_rmse=18.0994 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=121.7s




Epoch 14/20 | loss=0.00174 | val_rmse=18.0954 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=122.0s




Epoch 15/20 | loss=0.00167 | val_rmse=18.1065 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=122.0s




Epoch 16/20 | loss=0.00156 | val_rmse=18.1080 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=121.7s




Epoch 17/20 | loss=0.00148 | val_rmse=18.1088 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=121.8s




Epoch 18/20 | loss=0.00151 | val_rmse=18.1099 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=122.5s




Epoch 19/20 | loss=0.00046 | val_rmse=18.1115 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=121.9s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00046 | val_rmse=18.1140 | lr=[0.0, 0.0] | time=121.6s
Fold 2 best RMSE: 18.0954 | elapsed 40.8 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed44_fold2_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 3 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.11019 | val_rmse=21.1892 | lr=[2.5e-05, 0.0008] | time=121.8s




Epoch 2/20 | loss=0.02911 | val_rmse=18.5558 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=122.3s




Epoch 3/20 | loss=0.02450 | val_rmse=18.0063 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=122.2s




Epoch 4/20 | loss=0.01970 | val_rmse=17.8554 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=121.6s




Epoch 5/20 | loss=0.01469 | val_rmse=17.8945 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=122.3s




Epoch 6/20 | loss=0.01072 | val_rmse=18.0218 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=122.5s




Epoch 7/20 | loss=0.00777 | val_rmse=18.1261 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=121.9s




Epoch 8/20 | loss=0.00584 | val_rmse=18.2163 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=122.3s




Epoch 9/20 | loss=0.00446 | val_rmse=18.2886 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=122.5s




Epoch 10/20 | loss=0.00334 | val_rmse=18.3395 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=122.0s




Epoch 11/20 | loss=0.00280 | val_rmse=18.3505 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=121.8s




Epoch 12/20 | loss=0.00248 | val_rmse=18.3491 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=122.5s




Epoch 13/20 | loss=0.00209 | val_rmse=18.3479 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=122.4s




Epoch 14/20 | loss=0.00199 | val_rmse=18.3374 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=122.0s




Epoch 15/20 | loss=0.00155 | val_rmse=18.3338 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=121.8s




Epoch 16/20 | loss=0.00164 | val_rmse=18.3202 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=122.2s




Epoch 17/20 | loss=0.00156 | val_rmse=18.3179 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=122.6s




Epoch 18/20 | loss=0.00131 | val_rmse=18.3245 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=122.0s




Epoch 19/20 | loss=0.00047 | val_rmse=18.3299 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=121.6s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00045 | val_rmse=18.3322 | lr=[0.0, 0.0] | time=122.4s
Fold 3 best RMSE: 17.8554 | elapsed 40.7 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed44_fold3_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 4 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.10391 | val_rmse=21.1879 | lr=[2.5e-05, 0.0008] | time=121.9s




Epoch 2/20 | loss=0.03013 | val_rmse=18.9744 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=121.6s




Epoch 3/20 | loss=0.02576 | val_rmse=17.8719 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=122.0s




Epoch 4/20 | loss=0.02183 | val_rmse=17.4273 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=122.0s




Epoch 5/20 | loss=0.01549 | val_rmse=17.4285 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=121.6s




Epoch 6/20 | loss=0.01176 | val_rmse=17.5229 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=121.9s




Epoch 7/20 | loss=0.00840 | val_rmse=17.6540 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=122.2s




Epoch 8/20 | loss=0.00663 | val_rmse=17.7441 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=122.2s




Epoch 9/20 | loss=0.00468 | val_rmse=17.7892 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=122.0s




Epoch 10/20 | loss=0.00406 | val_rmse=17.7887 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=122.0s




Epoch 11/20 | loss=0.00300 | val_rmse=17.7777 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=122.3s




Epoch 12/20 | loss=0.00267 | val_rmse=17.7652 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=122.0s




Epoch 13/20 | loss=0.00237 | val_rmse=17.7323 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=122.0s




Epoch 14/20 | loss=0.00200 | val_rmse=17.7160 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=122.5s




Epoch 15/20 | loss=0.00179 | val_rmse=17.6998 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=122.3s




Epoch 16/20 | loss=0.00173 | val_rmse=17.7047 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=121.9s




Epoch 17/20 | loss=0.00164 | val_rmse=17.7090 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=121.6s




Epoch 18/20 | loss=0.00139 | val_rmse=17.7134 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=122.4s




Epoch 19/20 | loss=0.00053 | val_rmse=17.7191 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=122.1s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00052 | val_rmse=17.7227 | lr=[0.0, 0.0] | time=121.7s
Fold 4 best RMSE: 17.4273 | elapsed 40.7 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed44_fold4_best.pth






Saved OOF to work_cnn/oof_convnext_small.in12k_ft_in1k_fusion1_sz512_seed44.csv OOF RMSE: 17.86528552004247
Saved test predictions to work_cnn/submission_convnext_small.in12k_ft_in1k_fusion1_sz512_seed44.csv shape: (992, 2)


Exit code: 0 Elapsed: 205.6 min


In [32]:
# NNLS blend + isotonic calibration on OOF, then apply to test
import os, glob, numpy as np, pandas as pd
from sklearn.isotonic import IsotonicRegression
from sklearn.metrics import mean_squared_error
from scipy.optimize import nnls

workdir = 'work_cnn'
oof_paths = sorted(glob.glob(os.path.join(workdir, 'oof_*.csv')))
sub_paths = sorted(glob.glob(os.path.join(workdir, 'submission_*.csv')))
assert oof_paths and sub_paths, 'No artifacts found'

def tag_from(path):
    b = os.path.basename(path)
    return b.replace('oof_', '').replace('submission_', '')

# Load OOFs and align
oof_dfs = {tag_from(p): pd.read_csv(p)[['Id','oof','target']] for p in oof_paths}
tags = sorted(oof_dfs.keys())
base = oof_dfs[tags[0]][['Id','target']].rename(columns={'target':'y'})
for t in tags:
    base = base.merge(oof_dfs[t][['Id','oof']].rename(columns={'oof': f'oof_{t}'}), on='Id', how='inner')

pred_cols = [c for c in base.columns if c.startswith('oof_')]
X = base[pred_cols].values.astype(float)
y = base['y'].values.astype(float)

# NNLS weights (non-neg, sum-to-one)
w, _ = nnls(X, y)
w = w / w.sum() if w.sum() > 0 else w
weights = dict(zip([c.replace('oof_','') for c in pred_cols], w.tolist()))
blend_oof = X.dot(w)
rmse_blend = mean_squared_error(y, blend_oof, squared=False)
print('NNLS OOF RMSE:', rmse_blend, 'weights_n=', len(weights))

# Fit isotonic on OOF
iso = IsotonicRegression(out_of_bounds='clip')
iso.fit(blend_oof, y)
blend_oof_cal = iso.transform(blend_oof)
rmse_cal = mean_squared_error(y, blend_oof_cal, squared=False)
print('Isotonic-calibrated OOF RMSE:', rmse_cal)

# Apply to test
subs = {tag_from(p): pd.read_csv(p)[['Id','Pawpularity']] for p in sub_paths}
sub_merge = None
used = []
for t in weights.keys():
    if t in subs:
        df = subs[t].rename(columns={'Pawpularity': f'pred_{t}'})
        sub_merge = df if sub_merge is None else sub_merge.merge(df, on='Id', how='inner')
        used.append(t)
assert sub_merge is not None and used, 'No matching subs for weights'
pred_test = np.zeros(len(sub_merge), dtype=float)
for t in used:
    pred_test += weights[t] * sub_merge[f'pred_{t}'].clip(0,100).values
pred_test_cal = iso.transform(pred_test)
sub_cal = pd.DataFrame({'Id': sub_merge['Id'], 'Pawpularity': np.clip(pred_test_cal, 0, 100)})
sub_cal.to_csv('submission_calibrated.csv', index=False)
print('Saved submission_calibrated.csv', sub_cal.shape)

NNLS OOF RMSE: 17.437700366849946 weights_n= 7
Isotonic-calibrated OOF RMSE: 17.27066089462739
Saved submission_calibrated.csv (992, 2)




In [30]:
# Launch Run 2 — ConvNeXt-S @512 (fusion), seed 45
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_small.in12k_ft_in1k',
       '--img_size', '512',
       '--batch_size', '24',
       '--epochs', '20',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '2.5e-5',
       '--lr_head', '8e-4',
       '--weight_decay', '0.05',
       '--num_workers', '4',
       '--seed', '45',
       '--tta_hflip',
       '--use_fusion',
       '--use_cutmix',
       '--cutmix_alpha', '0.5',
       '--cutmix_p', '0.5'
]
print('Launching:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0)/60:.1f} min', flush=True)
assert proc.returncode == 0, 'ConvNeXt-S @512 seed45 run failed'

Launching: /usr/bin/python3.11 train_cnn.py --model convnext_small.in12k_ft_in1k --img_size 512 --batch_size 24 --epochs 20 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 2.5e-5 --lr_head 8e-4 --weight_decay 0.05 --num_workers 4 --seed 45 --tta_hflip --use_fusion --use_cutmix --cutmix_alpha 0.5 --cutmix_p 0.5


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.16845 | val_rmse=20.9426 | lr=[2.5e-05, 0.0008] | time=124.2s




Epoch 2/20 | loss=0.03083 | val_rmse=19.4796 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=121.2s




Epoch 3/20 | loss=0.02573 | val_rmse=18.7754 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=121.1s




Epoch 4/20 | loss=0.02075 | val_rmse=18.2701 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=121.4s




Epoch 5/20 | loss=0.01639 | val_rmse=18.1313 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=121.8s




Epoch 6/20 | loss=0.01189 | val_rmse=18.2123 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=121.7s




Epoch 7/20 | loss=0.00883 | val_rmse=18.3224 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=121.6s




Epoch 8/20 | loss=0.00677 | val_rmse=18.3979 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=121.7s




Epoch 9/20 | loss=0.00528 | val_rmse=18.4412 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=122.1s




Epoch 10/20 | loss=0.00382 | val_rmse=18.4628 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=121.7s




Epoch 11/20 | loss=0.00337 | val_rmse=18.4782 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=121.6s




Epoch 12/20 | loss=0.00258 | val_rmse=18.4789 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=122.0s




Epoch 13/20 | loss=0.00219 | val_rmse=18.4718 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=121.7s




Epoch 14/20 | loss=0.00196 | val_rmse=18.4635 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=121.5s




Epoch 15/20 | loss=0.00162 | val_rmse=18.4638 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=121.7s




Epoch 16/20 | loss=0.00169 | val_rmse=18.4562 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=122.0s




Epoch 17/20 | loss=0.00160 | val_rmse=18.4520 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=121.6s




Epoch 18/20 | loss=0.00138 | val_rmse=18.4489 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=121.8s




Epoch 19/20 | loss=0.00052 | val_rmse=18.4488 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=121.5s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00049 | val_rmse=18.4455 | lr=[0.0, 0.0] | time=121.9s
Fold 0 best RMSE: 18.1313 | elapsed 40.6 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed45_fold0_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 1 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.10370 | val_rmse=35.8341 | lr=[2.5e-05, 0.0008] | time=121.4s




Epoch 2/20 | loss=0.02931 | val_rmse=19.9352 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=121.4s




Epoch 3/20 | loss=0.02686 | val_rmse=18.1480 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=121.7s




Epoch 4/20 | loss=0.01970 | val_rmse=17.6297 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=121.7s




Epoch 5/20 | loss=0.01399 | val_rmse=17.4913 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=121.8s




Epoch 6/20 | loss=0.01006 | val_rmse=17.5747 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=121.6s




Epoch 7/20 | loss=0.00735 | val_rmse=17.7232 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=121.9s




Epoch 8/20 | loss=0.00597 | val_rmse=17.8538 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=121.8s




Epoch 9/20 | loss=0.00415 | val_rmse=17.9498 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=121.6s




Epoch 10/20 | loss=0.00326 | val_rmse=18.0186 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=121.7s




Epoch 11/20 | loss=0.00275 | val_rmse=18.0650 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=121.8s




Epoch 12/20 | loss=0.00234 | val_rmse=18.1025 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=121.9s




Epoch 13/20 | loss=0.00198 | val_rmse=18.1357 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=121.8s




Epoch 14/20 | loss=0.00169 | val_rmse=18.1592 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=121.7s




Epoch 15/20 | loss=0.00164 | val_rmse=18.1832 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=121.8s




Epoch 16/20 | loss=0.00159 | val_rmse=18.1907 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=121.7s




Epoch 17/20 | loss=0.00138 | val_rmse=18.1930 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=122.2s




Epoch 18/20 | loss=0.00121 | val_rmse=18.1984 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=122.0s




Epoch 19/20 | loss=0.00047 | val_rmse=18.2024 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=121.9s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00047 | val_rmse=18.2053 | lr=[0.0, 0.0] | time=121.7s
Fold 1 best RMSE: 17.4913 | elapsed 40.6 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed45_fold1_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 2 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.09828 | val_rmse=28.6190 | lr=[2.5e-05, 0.0008] | time=121.9s




Epoch 2/20 | loss=0.02893 | val_rmse=19.1603 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=121.8s




Epoch 3/20 | loss=0.02470 | val_rmse=18.2623 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=121.6s




Epoch 4/20 | loss=0.02109 | val_rmse=17.8970 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=121.9s




Epoch 5/20 | loss=0.01406 | val_rmse=17.8554 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=121.8s




Epoch 6/20 | loss=0.01044 | val_rmse=17.9847 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=121.9s




Epoch 7/20 | loss=0.00765 | val_rmse=18.1165 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=121.9s




Epoch 8/20 | loss=0.00613 | val_rmse=18.2176 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=122.1s




Epoch 9/20 | loss=0.00435 | val_rmse=18.2782 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=122.4s




Epoch 10/20 | loss=0.00361 | val_rmse=18.2972 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=121.8s




Epoch 11/20 | loss=0.00266 | val_rmse=18.3068 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=122.2s




Epoch 12/20 | loss=0.00239 | val_rmse=18.2854 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=122.0s




Epoch 13/20 | loss=0.00202 | val_rmse=18.2660 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=121.8s




Epoch 14/20 | loss=0.00183 | val_rmse=18.2640 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=122.1s




Epoch 15/20 | loss=0.00174 | val_rmse=18.2591 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=122.1s




Epoch 16/20 | loss=0.00161 | val_rmse=18.2522 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=122.0s




Epoch 17/20 | loss=0.00133 | val_rmse=18.2461 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=121.6s




Epoch 18/20 | loss=0.00128 | val_rmse=18.2457 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=122.2s




Epoch 19/20 | loss=0.00049 | val_rmse=18.2439 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=122.1s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00047 | val_rmse=18.2418 | lr=[0.0, 0.0] | time=121.9s
Fold 2 best RMSE: 17.8554 | elapsed 40.7 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed45_fold2_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 3 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.08865 | val_rmse=22.0381 | lr=[2.5e-05, 0.0008] | time=121.7s




Epoch 2/20 | loss=0.03024 | val_rmse=19.6873 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=122.2s




Epoch 3/20 | loss=0.02532 | val_rmse=18.7446 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=122.0s




Epoch 4/20 | loss=0.02034 | val_rmse=18.3396 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=121.8s




Epoch 5/20 | loss=0.01609 | val_rmse=18.2684 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=122.2s




Epoch 6/20 | loss=0.01121 | val_rmse=18.2383 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=121.7s




Epoch 7/20 | loss=0.00821 | val_rmse=18.2104 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=122.0s




Epoch 8/20 | loss=0.00622 | val_rmse=18.1749 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=122.1s




Epoch 9/20 | loss=0.00482 | val_rmse=18.1528 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=122.3s




Epoch 10/20 | loss=0.00373 | val_rmse=18.1420 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=122.0s




Epoch 11/20 | loss=0.00297 | val_rmse=18.1576 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=121.9s




Epoch 12/20 | loss=0.00230 | val_rmse=18.1616 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=121.9s




Epoch 13/20 | loss=0.00201 | val_rmse=18.1922 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=122.2s




Epoch 14/20 | loss=0.00187 | val_rmse=18.2126 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=122.0s




Epoch 15/20 | loss=0.00165 | val_rmse=18.2286 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=122.2s




Epoch 16/20 | loss=0.00157 | val_rmse=18.2477 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=122.3s




Epoch 17/20 | loss=0.00131 | val_rmse=18.2524 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=122.2s




Epoch 18/20 | loss=0.00130 | val_rmse=18.2515 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=122.2s




Epoch 19/20 | loss=0.00046 | val_rmse=18.2545 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=122.2s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00046 | val_rmse=18.2571 | lr=[0.0, 0.0] | time=122.2s
Fold 3 best RMSE: 18.1420 | elapsed 40.8 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed45_fold3_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 4 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.10153 | val_rmse=19.9078 | lr=[2.5e-05, 0.0008] | time=121.9s




Epoch 2/20 | loss=0.02940 | val_rmse=19.0998 | lr=[2.482951629253403e-05, 0.000794544521361089] | time=122.0s




Epoch 3/20 | loss=0.02496 | val_rmse=18.0016 | lr=[2.4322715521257934e-05, 0.0007783268966802539] | time=122.3s




Epoch 4/20 | loss=0.02127 | val_rmse=17.4637 | lr=[2.349342189008111e-05, 0.0007517895004825956] | time=122.1s




Epoch 5/20 | loss=0.01515 | val_rmse=17.4369 | lr=[2.2364256367454922e-05, 0.0007156562037585575] | time=122.2s




Epoch 6/20 | loss=0.01133 | val_rmse=17.5184 | lr=[2.0966019645321765e-05, 0.0006709126286502965] | time=122.1s




Epoch 7/20 | loss=0.00849 | val_rmse=17.6049 | lr=[1.933685197653034e-05, 0.0006187792632489709] | time=122.2s




Epoch 8/20 | loss=0.00648 | val_rmse=17.6803 | lr=[1.752119280816212e-05, 0.0005606781698611878] | time=122.3s




Epoch 9/20 | loss=0.00510 | val_rmse=17.7143 | lr=[1.5568568589259992e-05, 0.0004981941948563198] | time=122.2s




Epoch 10/20 | loss=0.00405 | val_rmse=17.7216 | lr=[1.3532241818404156e-05, 0.000433031738188933] | time=122.4s




Epoch 11/20 | loss=0.00304 | val_rmse=17.7188 | lr=[1.1467758181595847e-05, 0.0003669682618110671] | time=122.3s




Epoch 12/20 | loss=0.00242 | val_rmse=17.7162 | lr=[9.431431410740012e-06, 0.0003018058051436804] | time=122.4s




Epoch 13/20 | loss=0.00222 | val_rmse=17.7081 | lr=[7.478807191837885e-06, 0.0002393218301388123] | time=122.3s




Epoch 14/20 | loss=0.00189 | val_rmse=17.6995 | lr=[5.6631480234696665e-06, 0.00018122073675102933] | time=122.0s




Epoch 15/20 | loss=0.00158 | val_rmse=17.6999 | lr=[4.033980354678239e-06, 0.00012908737134970365] | time=122.5s




Epoch 16/20 | loss=0.00166 | val_rmse=17.7028 | lr=[2.6357436325450817e-06, 8.434379624144261e-05] | time=122.3s




Epoch 17/20 | loss=0.00146 | val_rmse=17.7068 | lr=[1.506578109918888e-06, 4.821049951740442e-05] | time=122.2s




Epoch 18/20 | loss=0.00156 | val_rmse=17.7138 | lr=[6.772844787420671e-07, 2.1673103319746148e-05] | time=121.9s




Epoch 19/20 | loss=0.00049 | val_rmse=17.7167 | lr=[1.7048370746597098e-07, 5.4554786389110715e-06] | time=122.7s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00046 | val_rmse=17.7177 | lr=[0.0, 0.0] | time=122.0s
Fold 4 best RMSE: 17.4369 | elapsed 40.8 min | ckpt: work_cnn/convnext_small.in12k_ft_in1k_seed45_fold4_best.pth






Saved OOF to work_cnn/oof_convnext_small.in12k_ft_in1k_fusion1_sz512_seed45.csv OOF RMSE: 17.813752510902557
Saved test predictions to work_cnn/submission_convnext_small.in12k_ft_in1k_fusion1_sz512_seed45.csv shape: (992, 2)


Exit code: 0 Elapsed: 205.5 min


In [34]:
# Launch Run 3 — Swin-Base @384 (fusion), seed 42
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'swin_base_patch4_window12_384',
       '--img_size', '384',
       '--batch_size', '24',
       '--epochs', '20',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '3e-5',
       '--lr_head', '1e-3',
       '--weight_decay', '1e-4',
       '--num_workers', '4',
       '--seed', '42',
       '--tta_hflip',
       '--use_fusion'
]
print('Launching:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f'{(time.time()-t0)/60:.1f} min', flush=True)
assert proc.returncode == 0, 'Swin-Base @384 run failed'

Launching: /usr/bin/python3.11 train_cnn.py --model swin_base_patch4_window12_384 --img_size 384 --batch_size 24 --epochs 20 --warmup_epochs 1 --ema_decay 0.999 --lr_backbone 3e-5 --lr_head 1e-3 --weight_decay 1e-4 --num_workers 4 --seed 42 --tta_hflip --use_fusion


  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Device: cuda
Fold 0 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.04843 | val_rmse=24.7223 | lr=[3e-05, 0.001] | time=160.0s




Epoch 2/20 | loss=0.03074 | val_rmse=19.0243 | lr=[2.9795419551040836e-05, 0.0009931806517013613] | time=161.5s




Epoch 3/20 | loss=0.02471 | val_rmse=18.6149 | lr=[2.9187258625509518e-05, 0.0009729086208503173] | time=162.5s




Epoch 4/20 | loss=0.01846 | val_rmse=18.7765 | lr=[2.8192106268097336e-05, 0.0009397368756032445] | time=162.1s




Epoch 5/20 | loss=0.01306 | val_rmse=18.7351 | lr=[2.6837107640945904e-05, 0.0008945702546981969] | time=162.5s




Epoch 6/20 | loss=0.00918 | val_rmse=18.7033 | lr=[2.5159223574386117e-05, 0.0008386407858128706] | time=162.5s




Epoch 7/20 | loss=0.00703 | val_rmse=18.6507 | lr=[2.320422237183641e-05, 0.0007734740790612135] | time=162.5s




Epoch 8/20 | loss=0.00529 | val_rmse=18.5669 | lr=[2.1025431369794546e-05, 0.0007008477123264848] | time=162.7s




Epoch 9/20 | loss=0.00426 | val_rmse=18.5139 | lr=[1.8682282307111988e-05, 0.0006227427435703996] | time=162.6s




Epoch 10/20 | loss=0.00358 | val_rmse=18.4343 | lr=[1.623869018208499e-05, 0.0005412896727361663] | time=162.2s




Epoch 11/20 | loss=0.00280 | val_rmse=18.3780 | lr=[1.3761309817915017e-05, 0.0004587103272638339] | time=162.7s




Epoch 12/20 | loss=0.00233 | val_rmse=18.3318 | lr=[1.1317717692888014e-05, 0.00037725725642960046] | time=162.4s




Epoch 13/20 | loss=0.00189 | val_rmse=18.3082 | lr=[8.974568630205462e-06, 0.0002991522876735154] | time=162.5s




Epoch 14/20 | loss=0.00172 | val_rmse=18.2851 | lr=[6.795777628163599e-06, 0.00022652592093878665] | time=162.6s




Epoch 15/20 | loss=0.00149 | val_rmse=18.2569 | lr=[4.840776425613887e-06, 0.00016135921418712956] | time=162.6s




Epoch 16/20 | loss=0.00140 | val_rmse=18.2349 | lr=[3.162892359054098e-06, 0.00010542974530180327] | time=162.5s




Epoch 17/20 | loss=0.00129 | val_rmse=18.2214 | lr=[1.8078937319026655e-06, 6.026312439675552e-05] | time=162.6s




Epoch 18/20 | loss=0.00120 | val_rmse=18.2162 | lr=[8.127413744904805e-07, 2.709137914968268e-05] | time=162.7s




Epoch 19/20 | loss=0.00117 | val_rmse=18.2134 | lr=[2.0458044895916516e-07, 6.819348298638839e-06] | time=162.7s




  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00114 | val_rmse=18.2102 | lr=[0.0, 0.0] | time=162.5s
Fold 0 best RMSE: 18.2102 | elapsed 54.3 min | ckpt: work_cnn/swin_base_patch4_window12_384_seed42_fold0_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 1 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.05410 | val_rmse=23.7995 | lr=[3e-05, 0.001] | time=162.4s




Epoch 2/20 | loss=0.03135 | val_rmse=19.5161 | lr=[2.9795419551040836e-05, 0.0009931806517013613] | time=162.7s




Epoch 3/20 | loss=0.02493 | val_rmse=18.9111 | lr=[2.9187258625509518e-05, 0.0009729086208503173] | time=162.6s




Epoch 4/20 | loss=0.01910 | val_rmse=18.6999 | lr=[2.8192106268097336e-05, 0.0009397368756032445] | time=162.6s




Epoch 5/20 | loss=0.01339 | val_rmse=18.6493 | lr=[2.6837107640945904e-05, 0.0008945702546981969] | time=162.7s




Epoch 6/20 | loss=0.00922 | val_rmse=18.5520 | lr=[2.5159223574386117e-05, 0.0008386407858128706] | time=162.7s




Epoch 7/20 | loss=0.00724 | val_rmse=18.4613 | lr=[2.320422237183641e-05, 0.0007734740790612135] | time=162.7s




Epoch 8/20 | loss=0.00518 | val_rmse=18.4190 | lr=[2.1025431369794546e-05, 0.0007008477123264848] | time=162.6s




Epoch 9/20 | loss=0.00413 | val_rmse=18.3613 | lr=[1.8682282307111988e-05, 0.0006227427435703996] | time=162.6s




Epoch 10/20 | loss=0.00353 | val_rmse=18.3114 | lr=[1.623869018208499e-05, 0.0005412896727361663] | time=162.7s




Epoch 11/20 | loss=0.00284 | val_rmse=18.2712 | lr=[1.3761309817915017e-05, 0.0004587103272638339] | time=162.4s




Epoch 12/20 | loss=0.00241 | val_rmse=18.2472 | lr=[1.1317717692888014e-05, 0.00037725725642960046] | time=162.5s




Epoch 13/20 | loss=0.00200 | val_rmse=18.2452 | lr=[8.974568630205462e-06, 0.0002991522876735154] | time=162.6s




Epoch 14/20 | loss=0.00177 | val_rmse=18.2315 | lr=[6.795777628163599e-06, 0.00022652592093878665] | time=162.6s




Epoch 15/20 | loss=0.00159 | val_rmse=18.2297 | lr=[4.840776425613887e-06, 0.00016135921418712956] | time=162.8s




Epoch 16/20 | loss=0.00144 | val_rmse=18.2370 | lr=[3.162892359054098e-06, 0.00010542974530180327] | time=162.6s




Epoch 17/20 | loss=0.00133 | val_rmse=18.2392 | lr=[1.8078937319026655e-06, 6.026312439675552e-05] | time=162.2s




Epoch 18/20 | loss=0.00121 | val_rmse=18.2325 | lr=[8.127413744904805e-07, 2.709137914968268e-05] | time=162.9s




Epoch 19/20 | loss=0.00127 | val_rmse=18.2260 | lr=[2.0458044895916516e-07, 6.819348298638839e-06] | time=162.4s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00121 | val_rmse=18.2266 | lr=[0.0, 0.0] | time=162.4s
Fold 1 best RMSE: 18.2260 | elapsed 54.4 min | ckpt: work_cnn/swin_base_patch4_window12_384_seed42_fold1_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 2 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.06250 | val_rmse=21.8291 | lr=[3e-05, 0.001] | time=162.4s




Epoch 2/20 | loss=0.03049 | val_rmse=19.7463 | lr=[2.9795419551040836e-05, 0.0009931806517013613] | time=162.6s




Epoch 3/20 | loss=0.02531 | val_rmse=19.3123 | lr=[2.9187258625509518e-05, 0.0009729086208503173] | time=162.6s




Epoch 4/20 | loss=0.01900 | val_rmse=19.4429 | lr=[2.8192106268097336e-05, 0.0009397368756032445] | time=162.9s




Epoch 5/20 | loss=0.01334 | val_rmse=19.4828 | lr=[2.6837107640945904e-05, 0.0008945702546981969] | time=162.6s




Epoch 6/20 | loss=0.00938 | val_rmse=19.3179 | lr=[2.5159223574386117e-05, 0.0008386407858128706] | time=162.8s




Epoch 7/20 | loss=0.00706 | val_rmse=19.0645 | lr=[2.320422237183641e-05, 0.0007734740790612135] | time=162.4s




Epoch 8/20 | loss=0.00530 | val_rmse=18.8561 | lr=[2.1025431369794546e-05, 0.0007008477123264848] | time=162.7s




Epoch 9/20 | loss=0.00427 | val_rmse=18.6828 | lr=[1.8682282307111988e-05, 0.0006227427435703996] | time=162.5s




Epoch 10/20 | loss=0.00338 | val_rmse=18.5999 | lr=[1.623869018208499e-05, 0.0005412896727361663] | time=162.6s




Epoch 11/20 | loss=0.00279 | val_rmse=18.5307 | lr=[1.3761309817915017e-05, 0.0004587103272638339] | time=162.7s




Epoch 12/20 | loss=0.00236 | val_rmse=18.4847 | lr=[1.1317717692888014e-05, 0.00037725725642960046] | time=162.6s




Epoch 13/20 | loss=0.00210 | val_rmse=18.4357 | lr=[8.974568630205462e-06, 0.0002991522876735154] | time=162.7s




Epoch 14/20 | loss=0.00183 | val_rmse=18.4249 | lr=[6.795777628163599e-06, 0.00022652592093878665] | time=162.6s




Epoch 15/20 | loss=0.00164 | val_rmse=18.4171 | lr=[4.840776425613887e-06, 0.00016135921418712956] | time=162.8s




Epoch 16/20 | loss=0.00145 | val_rmse=18.4084 | lr=[3.162892359054098e-06, 0.00010542974530180327] | time=162.7s




Epoch 17/20 | loss=0.00128 | val_rmse=18.4017 | lr=[1.8078937319026655e-06, 6.026312439675552e-05] | time=162.4s




Epoch 18/20 | loss=0.00133 | val_rmse=18.3969 | lr=[8.127413744904805e-07, 2.709137914968268e-05] | time=162.7s




Epoch 19/20 | loss=0.00119 | val_rmse=18.3984 | lr=[2.0458044895916516e-07, 6.819348298638839e-06] | time=162.7s


  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00119 | val_rmse=18.3976 | lr=[0.0, 0.0] | time=162.6s
Fold 2 best RMSE: 18.3969 | elapsed 54.4 min | ckpt: work_cnn/swin_base_patch4_window12_384_seed42_fold2_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 3 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.06015 | val_rmse=25.1754 | lr=[3e-05, 0.001] | time=162.7s




Epoch 2/20 | loss=0.03004 | val_rmse=20.8729 | lr=[2.9795419551040836e-05, 0.0009931806517013613] | time=162.5s




Epoch 3/20 | loss=0.02394 | val_rmse=19.6652 | lr=[2.9187258625509518e-05, 0.0009729086208503173] | time=162.7s




Epoch 4/20 | loss=0.01685 | val_rmse=19.2284 | lr=[2.8192106268097336e-05, 0.0009397368756032445] | time=162.4s




Epoch 5/20 | loss=0.01236 | val_rmse=19.1082 | lr=[2.6837107640945904e-05, 0.0008945702546981969] | time=162.9s




Epoch 6/20 | loss=0.00904 | val_rmse=19.0438 | lr=[2.5159223574386117e-05, 0.0008386407858128706] | time=162.8s




Epoch 7/20 | loss=0.00689 | val_rmse=18.9392 | lr=[2.320422237183641e-05, 0.0007734740790612135] | time=162.4s




Epoch 8/20 | loss=0.00526 | val_rmse=18.8563 | lr=[2.1025431369794546e-05, 0.0007008477123264848] | time=162.8s




Epoch 9/20 | loss=0.00442 | val_rmse=18.7580 | lr=[1.8682282307111988e-05, 0.0006227427435703996] | time=162.7s




Epoch 10/20 | loss=0.00345 | val_rmse=18.6670 | lr=[1.623869018208499e-05, 0.0005412896727361663] | time=162.8s




Epoch 11/20 | loss=0.00289 | val_rmse=18.5953 | lr=[1.3761309817915017e-05, 0.0004587103272638339] | time=162.9s




Epoch 12/20 | loss=0.00257 | val_rmse=18.5621 | lr=[1.1317717692888014e-05, 0.00037725725642960046] | time=162.6s




Epoch 13/20 | loss=0.00221 | val_rmse=18.5351 | lr=[8.974568630205462e-06, 0.0002991522876735154] | time=162.5s




Epoch 14/20 | loss=0.00180 | val_rmse=18.5202 | lr=[6.795777628163599e-06, 0.00022652592093878665] | time=162.9s




Epoch 15/20 | loss=0.00159 | val_rmse=18.5234 | lr=[4.840776425613887e-06, 0.00016135921418712956] | time=162.8s




Epoch 16/20 | loss=0.00145 | val_rmse=18.5178 | lr=[3.162892359054098e-06, 0.00010542974530180327] | time=162.6s




Epoch 17/20 | loss=0.00137 | val_rmse=18.5060 | lr=[1.8078937319026655e-06, 6.026312439675552e-05] | time=163.0s




Epoch 18/20 | loss=0.00125 | val_rmse=18.4989 | lr=[8.127413744904805e-07, 2.709137914968268e-05] | time=162.5s




Epoch 19/20 | loss=0.00123 | val_rmse=18.4973 | lr=[2.0458044895916516e-07, 6.819348298638839e-06] | time=162.4s




  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00128 | val_rmse=18.4961 | lr=[0.0, 0.0] | time=162.6s
Fold 3 best RMSE: 18.4961 | elapsed 54.5 min | ckpt: work_cnn/swin_base_patch4_window12_384_seed42_fold3_best.pth




  scaler = torch.cuda.amp.GradScaler(enabled=(device=='cuda'))


Fold 4 | train=7136 val=1784 epochs=20




Epoch 1/20 | loss=0.04741 | val_rmse=27.5499 | lr=[3e-05, 0.001] | time=162.5s




Epoch 2/20 | loss=0.03118 | val_rmse=22.0043 | lr=[2.9795419551040836e-05, 0.0009931806517013613] | time=162.8s




Epoch 3/20 | loss=0.02451 | val_rmse=19.8556 | lr=[2.9187258625509518e-05, 0.0009729086208503173] | time=162.7s




Epoch 4/20 | loss=0.01805 | val_rmse=18.9902 | lr=[2.8192106268097336e-05, 0.0009397368756032445] | time=162.0s




Epoch 5/20 | loss=0.01296 | val_rmse=18.6529 | lr=[2.6837107640945904e-05, 0.0008945702546981969] | time=163.0s




Epoch 6/20 | loss=0.00883 | val_rmse=18.5014 | lr=[2.5159223574386117e-05, 0.0008386407858128706] | time=162.4s




Epoch 7/20 | loss=0.00649 | val_rmse=18.3798 | lr=[2.320422237183641e-05, 0.0007734740790612135] | time=162.6s




Epoch 8/20 | loss=0.00507 | val_rmse=18.3063 | lr=[2.1025431369794546e-05, 0.0007008477123264848] | time=163.0s




Epoch 9/20 | loss=0.00420 | val_rmse=18.2257 | lr=[1.8682282307111988e-05, 0.0006227427435703996] | time=162.3s




Epoch 10/20 | loss=0.00363 | val_rmse=18.1446 | lr=[1.623869018208499e-05, 0.0005412896727361663] | time=162.7s




Epoch 11/20 | loss=0.00272 | val_rmse=18.0816 | lr=[1.3761309817915017e-05, 0.0004587103272638339] | time=162.7s




Epoch 12/20 | loss=0.00229 | val_rmse=18.0480 | lr=[1.1317717692888014e-05, 0.00037725725642960046] | time=162.2s




Epoch 13/20 | loss=0.00209 | val_rmse=17.9992 | lr=[8.974568630205462e-06, 0.0002991522876735154] | time=162.8s




Epoch 14/20 | loss=0.00180 | val_rmse=17.9827 | lr=[6.795777628163599e-06, 0.00022652592093878665] | time=162.8s




Epoch 15/20 | loss=0.00154 | val_rmse=17.9672 | lr=[4.840776425613887e-06, 0.00016135921418712956] | time=162.6s




Epoch 16/20 | loss=0.00145 | val_rmse=17.9544 | lr=[3.162892359054098e-06, 0.00010542974530180327] | time=162.7s




Epoch 17/20 | loss=0.00130 | val_rmse=17.9427 | lr=[1.8078937319026655e-06, 6.026312439675552e-05] | time=162.8s




Epoch 18/20 | loss=0.00122 | val_rmse=17.9338 | lr=[8.127413744904805e-07, 2.709137914968268e-05] | time=162.5s




Epoch 19/20 | loss=0.00117 | val_rmse=17.9202 | lr=[2.0458044895916516e-07, 6.819348298638839e-06] | time=162.9s




  ckpt = torch.load(best_path, map_location=device)


Epoch 20/20 | loss=0.00119 | val_rmse=17.9056 | lr=[0.0, 0.0] | time=162.8s
Fold 4 best RMSE: 17.9056 | elapsed 54.5 min | ckpt: work_cnn/swin_base_patch4_window12_384_seed42_fold4_best.pth






Saved OOF to work_cnn/oof_swin_base_patch4_window12_384_fusion1_sz384_seed42.csv OOF RMSE: 18.24796057356082
Saved test predictions to work_cnn/submission_swin_base_patch4_window12_384_fusion1_sz384_seed42.csv shape: (992, 2)


Exit code: 0 Elapsed: 274.8 min


In [None]:
# Launch Run 5 — ConvNeXt-S @512 (image-only), seed 46
import sys, subprocess, time

cmd = [sys.executable, 'train_cnn.py',
       '--model', 'convnext_small.in12k_ft_in1k',
       '--img_size', '512',
       '--batch_size', '24',
       '--epochs', '20',
       '--warmup_epochs', '1',
       '--ema_decay', '0.999',
       '--lr_backbone', '2.5e-5',
       '--lr_head', '8e-4',
       '--weight_decay', '0.05',
       '--num_workers', '4',
       '--seed', '46',
       '--tta_hflip',
       # image-only: do NOT pass --use_fusion
       '--use_cutmix',
       '--cutmix_alpha', '0.5',
       '--cutmix_p', '0.5'
]
print('Launching:', ' '.join(cmd), flush=True)
t0 = time.time()
proc = subprocess.run(cmd, check=False)
print('Exit code:', proc.returncode, 'Elapsed:', f"{(time.time()-t0)/60:.1f} min", flush=True)
assert proc.returncode == 0, 'ConvNeXt-S @512 image-only seed46 run failed'