# Medal-push plan and checkpoints

Goal: Beat current best OOF mean=4.290 (worst=5.02) and secure a medal.

Milestones (request expert review at each):
- M0: Environment sanity (GPU check).
- M1: Cheap feature add-ons (v3+global motions):
  - Add g1[t]=sum(|v[t]|) across joints, g2[t]=EMA(g1).
  - Train 1 fold x 1 seed CE baseline on v3+g; compare ΔOOF vs v3 and v2.
  - If ≥0.05 OOF gain, train remaining folds/seeds; else abort.
- M2: Model diversity (MS-TCN++):
  - Train minimal MS-TCN++ (1 seed per fold) on best features (current v2/v3 or v3+g).
  - Ensemble with CE meta-blend using small weight (start w_ms∈{0.05,0.1,0.2}).
- M3: Decoder refinements:
  - Extend local-search with duration-aware penalty and temperature per class (reuse per-class temps from meta-blend).
  - Validate via OOF sweep; keep only if ≥0.02 gain.
- M4: v4 canonicalized features (root-centering, scale, yaw-align); add hand-face distances and hand speed norms.
  - Prototype on 1 fold x 1 seed; proceed only if ≥0.05 gain.
- M5: Multi-modality (RGB):
  - Extract lightweight per-frame CNN embeddings (e.g., MobileNetV2 at 112-160px).
  - Late-fuse probs with skeleton models; start with small weights; validate via OOF.

Discipline:
- No retraining unless OOF on small pilot justifies.
- Reuse cached OOF/test probs in probs_cache/; avoid recompute.
- Always log per-fold times and progress; interrupt long runs if stalled.
- Keep next.ipynb lean; one change per experiment; track ΔOOF.

Immediate next steps:
1) M0: Add a GPU check cell (nvidia-smi).
2) Implement v3+global motions feature join (reuse cached v3 and derive g1,g2 on-the-fly for a pilot).
3) Train CE on fold0 seed0 only; cache OOF/test probs; quick decoder sweep; compare OOF.
4) Request expert review on whether to scale to 3 folds x 2 seeds or pivot to MS-TCN++.

In [1]:
# M0: GPU sanity check
import subprocess, time
t0 = time.time()
print("Running nvidia-smi...", flush=True)
subprocess.run(['bash','-lc','nvidia-smi || true'], check=False)
try:
    import torch
    print("torch:", getattr(torch, '__version__', None), "CUDA build:", getattr(torch.version, 'cuda', None))
    print("CUDA available:", torch.cuda.is_available())
    if torch.cuda.is_available():
        print("GPU:", torch.cuda.get_device_name(0))
except Exception as e:
    print("torch not installed or error:", e)
print(f"Done in {time.time()-t0:.2f}s", flush=True)

Running nvidia-smi...


Mon Sep 29 15:01:10 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |    1551MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

torch: 2.4.1+cu121 CUDA build: 12.1
CUDA available: True
GPU: NVIDIA A10-24Q
Done in 0.92s


In [2]:
# Inspect features3d_v3 structure to design g1/g2 augmentation
import os, json, numpy as np, glob
from pathlib import Path

train_dir = Path('features3d_v3/train')
files = sorted(train_dir.glob('*.npz'))
print('num train files v3:', len(files))
sample = np.load(files[0])
print('keys:', list(sample.keys()))
for k in sample.files:
    arr = sample[k]
    try:
        shape = arr.shape
        dtype = arr.dtype
        print(f'  {k}: shape={shape}, dtype={dtype}')
    except Exception as e:
        print('  key error for', k, e)

# If a consolidated array exists (e.g., x or feat), show stats
for k in ('x','feat','features','data'):
    if k in sample.files:
        a = sample[k]
        print(k, 'C x T:', a.shape if a.ndim==2 else a.shape)
        print('mean/std:', float(a.mean()), float(a.std()))

# Persist a brief schema summary for reference
schema = {k: (sample[k].shape, str(sample[k].dtype)) for k in sample.files}
with open('features3d_v3_schema.json','w') as f:
    json.dump(schema, f, indent=2)
print('Saved schema to features3d_v3_schema.json')

num train files v3: 297
keys: ['X']
  X: shape=(1254, 1095), dtype=float32
Saved schema to features3d_v3_schema.json


In [3]:
# Inspect probs_cache and calibration to prepare duration-aware decoder pilot (no retraining)
import glob, json, os, re
from pathlib import Path

cache_dir = Path('probs_cache')
paths = sorted(glob.glob(str(cache_dir / '*')))[:20]
print('probs_cache sample (first 20):')
for p in paths:
    print(os.path.basename(p))

print('\nCounts by suffix:')
from collections import Counter
cnt = Counter(Path(p).suffix for p in glob.glob(str(cache_dir / '*')))
print(cnt)

# Show some representative filenames by pattern
samples = sorted([os.path.basename(p) for p in glob.glob(str(cache_dir / '*fold*'))])[:30]
print('\nfiles with fold in name (first 30):')
for s in samples:
    print(s)

# Load per-class meta calibration if present
calib_meta_path = Path('calib_all_v2v3_meta.json')
if calib_meta_path.exists():
    with open(calib_meta_path) as f:
        calib = json.load(f)
    print('\nLoaded calib_all_v2v3_meta.json with keys:', list(calib.keys()))
    # Print a small snippet of parameters
    for k in ('alpha','T2','T3'):
        if k in calib:
            if isinstance(calib[k], dict):
                # show first 5 classes
                items = sorted((int(c), v) for c, v in calib[k].items())[:5]
                print(k, 'sample:', items)
            else:
                print(k, calib[k])
else:
    print('\ncalib_all_v2v3_meta.json not found')

probs_cache sample (first 20):
101_ce.npy
101_ce_new.npy
101_ce_new_s1.npy
101_ce_v3.npy
101_ce_v3_s1.npy
101_ms.npy
101_tc.npy
101_tc_s1.npy
102_ce.npy
102_ce_new.npy
102_ce_new_s1.npy
102_ce_v3.npy
102_ce_v3_s1.npy
102_ms.npy
102_tc.npy
102_tc_s1.npy
103_ce.npy
103_ce_new.npy
103_ce_new_s1.npy
103_ce_v3.npy

Counts by suffix:
Counter({'.npy': 2376})

files with fold in name (first 30):

Loaded calib_all_v2v3_meta.json with keys: ['T2', 'T3', 'A']
T2 [1.0, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858]
T3 [1.0, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.100000023841858, 1.10000002384185

In [10]:
# Duration-aware post-process decoder pilot on OOF (fold0) using cached probs and meta calibration
import numpy as np, json, os, time
from pathlib import Path
from collections import defaultdict

t0=time.time()
cache = Path('probs_cache')
train_v3 = Path('features3d_v3/train')
labels_dir = Path('labels3d_v2/train')

# Load fold splits (list of dicts with keys: fold, train_ids, val_ids)
with open('folds_archive_cv.json') as f:
    folds_list = json.load(f)
fold0 = next(fd for fd in folds_list if int(fd.get('fold', -1)) == 0)
fold0_train_ids = set(map(int, fold0['train_ids']))
fold0_val_ids = set(map(int, fold0['val_ids']))

# Load calibration (per-class temperatures and blend weights)
calib_path = Path('calib_all_v2v3_meta.json')
assert calib_path.exists(), 'Missing calib_all_v2v3_meta.json'
calib = json.loads(calib_path.read_text())
T2 = np.array(calib.get('T2'), dtype=np.float32)
T3 = np.array(calib.get('T3'), dtype=np.float32)
A = np.array(calib.get('A'), dtype=np.float32) if isinstance(calib.get('A', None), list) else None
if A is None:
    # fallback alpha if not per-class provided
    A = np.full_like(T2, 0.7, dtype=np.float32)

def temp_scale(p, T):
    # Robust to orientation: accepts CxT or TxC; returns same orientation as input
    T = np.asarray(T, dtype=np.float32).reshape(-1)
    p = np.clip(p, 1e-8, 1.0)
    logp = np.log(p)
    if p.shape[0] == T.shape[0]:  # CxT
        logp = logp / np.maximum(T[:, None], 1e-6)
        out = np.exp(logp)
        out /= out.sum(axis=0, keepdims=True)
        return out
    elif p.shape[-1] == T.shape[0]:  # TxC
        logp = logp / np.maximum(T[None, :], 1e-6)
        out = np.exp(logp)
        out /= out.sum(axis=1, keepdims=True)
        return out
    else:
        raise ValueError(f'Temperature length {T.shape[0]} not matching probs shape {p.shape}')

def ensure_CxT(p, C):
    # Ensure output is CxT
    if p.shape[0] == C:
        return p
    if p.shape[1] == C:
        return p.T
    raise ValueError(f'Cannot ensure CxT; probs shape {p.shape}, C={C}')

def load_probs(seq_id):
    # Expect ensemble-level caches: <id>_ce.npy (v2) and <id>_ce_v3.npy (v3)
    p2 = np.load(cache / f"{seq_id}_ce.npy").astype(np.float32)  # CxT or TxC
    p3 = np.load(cache / f"{seq_id}_ce_v3.npy").astype(np.float32)
    # Apply per-class temperatures (keep native orientation for stability)
    p2 = temp_scale(p2, T2)
    p3 = temp_scale(p3, T3)
    # Ensure both are CxT before per-class blending
    C = int(T2.shape[0])
    p2 = ensure_CxT(p2, C)
    p3 = ensure_CxT(p3, C)
    # Time-align by cropping to the minimum T
    Tm = min(p2.shape[1], p3.shape[1])
    if p2.shape[1] != Tm:
        p2 = p2[:, :Tm]
    if p3.shape[1] != Tm:
        p3 = p3[:, :Tm]
    # Per-class blend
    alpha = A.reshape(-1,1)
    p = alpha*p2 + (1.0-alpha)*p3
    p /= p.sum(axis=0, keepdims=True)
    return p  # CxT

def load_frame_labels(seq_id):
    # labels3d_v2 contains per-frame integer labels, saved as .npy, shape (T,), classes 1..20
    y = np.load(labels_dir / f"{seq_id}.npy")
    return y.astype(np.int32)

def compress_to_sequence(y_frames):
    # remove repeats and zeros
    seq = []
    last = -1
    for c in y_frames:
        if c == 0:
            continue
        if c != last:
            seq.append(int(c))
            last = int(c)
    return seq

def levenshtein(a, b):
    # sequences of ints
    n, m = len(a), len(b)
    if n==0: return m
    if m==0: return n
    dp = list(range(m+1))
    for i in range(1, n+1):
        prev = dp[0]
        dp[0] = i
        for j in range(1, m+1):
            temp = dp[j]
            cost = 0 if a[i-1]==b[j-1] else 1
            dp[j] = min(dp[j]+1, dp[j-1]+1, prev+cost)
            prev = temp
    return dp[m]

def segment_lengths(y_frames):
    # returns dict class->list of lengths (excluding 0)
    lens = defaultdict(list)
    cur_c, run = None, 0
    for c in y_frames:
        if c==0:
            if cur_c is not None:
                lens[cur_c].append(run)
                cur_c, run = None, 0
            continue
        if cur_c is None:
            cur_c, run = int(c), 1
        elif c == cur_c:
            run += 1
        else:
            lens[cur_c].append(run)
            cur_c, run = int(c), 1
    if cur_c is not None:
        lens[cur_c].append(run)
    return lens

def compute_min_dur_stats(ids):
    # Use ground-truth frame labels on train folds to get median lens
    agg = defaultdict(list)
    for sid in ids:
        y = load_frame_labels(sid)
        lens = segment_lengths(y)
        for c, ls in lens.items():
            if c!=0:
                agg[c].extend(ls)
    med = np.zeros(21, dtype=np.float32)
    for c in range(21):
        if c==0:
            med[c]=0
            continue
        ls = agg.get(c, [])
        med[c] = float(np.median(ls)) if ls else 1.0
    return med

def decode_with_min_segment(p, min_dur):
    # p: CxT probs, min_dur: per-class minimal duration (frames)
    # initial argmax path
    y = p.argmax(axis=0).astype(np.int32)
    # post-process: merge segments shorter than threshold
    Tlen = y.shape[0]
    i = 0
    while i < Tlen:
        c = y[i]
        j = i+1
        while j<Tlen and y[j]==c:
            j+=1
        seg_len = j-i
        if c!=0 and seg_len < min_dur[c]:
            # decide merge direction by average alt prob in neighbors
            left_c = y[i-1] if i>0 else None
            right_c = y[j] if j<Tlen else None
            # compute average probability for left/right classes over this segment
            left_score = -np.inf
            right_score = -np.inf
            if left_c is not None:
                left_score = float(p[left_c, i:j].mean())
            if right_c is not None:
                right_score = float(p[right_c, i:j].mean())
            if right_score >= left_score:
                # merge into right
                y[i:j] = right_c if right_c is not None else 0
            else:
                y[i:j] = left_c if left_c is not None else 0
            # step back a bit to re-evaluate merges
            i = max(0, i-1)
            continue
        i = j
    return y

# Build train/val id sets for fold0 (already parsed above)
# Compute per-class median lengths from fold0 TRAIN ids (not val) to avoid leakage
med_lens = compute_min_dur_stats(sorted(fold0_train_ids))
print('Median segment lengths (sample):', med_lens[1:6])

def eval_oof_val(min_dur_mult=0.5):
    min_dur = np.floor(med_lens * min_dur_mult + 0.5).astype(np.int32)
    min_dur[0]=0
    dists = []
    base_dists = []
    n=0
    t_start=time.time()
    for npz_path in sorted(train_v3.glob('*.npz')):
        sid = int(npz_path.stem)
        if sid not in fold0_val_ids:
            continue
        # load probs and labels
        p = load_probs(sid)  # CxT
        y_true = load_frame_labels(sid)
        # baseline greedy
        y_base = p.argmax(axis=0).astype(np.int32)
        seq_base = compress_to_sequence(y_base)
        seq_true = compress_to_sequence(y_true)
        base_dists.append(levenshtein(seq_base, seq_true))
        # duration-aware merge
        y_hat = decode_with_min_segment(p, min_dur)
        seq_hat = compress_to_sequence(y_hat)
        dists.append(levenshtein(seq_hat, seq_true))
        n+=1
        if n%20==0:
            print(f'.. processed {n} seqs, elapsed {time.time()-t_start:.1f}s', flush=True)
    return float(np.mean(base_dists)), float(np.mean(dists)), n

for mult in [0.3, 0.5, 0.7, 1.0]:
    b, d, n = eval_oof_val(mult)
    print(f'min_dur_mult={mult}: baseline_mean={b:.3f} -> duraware_mean={d:.3f} over {n} seqs')

print('Done in', time.time()-t0, 's')

Median segment lengths (sample): [40. 40. 50. 46. 48.]
.. processed 20 seqs, elapsed 0.2s


.. processed 40 seqs, elapsed 0.4s


.. processed 60 seqs, elapsed 0.6s


.. processed 80 seqs, elapsed 0.9s


min_dur_mult=0.3: baseline_mean=27.510 -> duraware_mean=5.949 over 98 seqs


.. processed 20 seqs, elapsed 0.2s


.. processed 40 seqs, elapsed 0.5s


.. processed 60 seqs, elapsed 0.7s


.. processed 80 seqs, elapsed 1.0s


min_dur_mult=0.5: baseline_mean=27.510 -> duraware_mean=4.153 over 98 seqs


.. processed 20 seqs, elapsed 0.3s


.. processed 40 seqs, elapsed 0.5s


.. processed 60 seqs, elapsed 0.9s


.. processed 80 seqs, elapsed 1.3s


min_dur_mult=0.7: baseline_mean=27.510 -> duraware_mean=3.878 over 98 seqs


.. processed 20 seqs, elapsed 0.3s


.. processed 40 seqs, elapsed 0.7s


.. processed 60 seqs, elapsed 1.1s


.. processed 80 seqs, elapsed 1.6s


min_dur_mult=1.0: baseline_mean=27.510 -> duraware_mean=5.061 over 98 seqs
Done in 6.200224876403809 s


In [5]:
# Inspect folds_archive_cv.json structure to fix indexing
import json, pprint
with open('folds_archive_cv.json') as f:
    folds_raw = json.load(f)
print('type:', type(folds_raw))
if isinstance(folds_raw, dict):
    print('dict keys:', list(folds_raw.keys())[:10])
    # show a sample entry
    for k,v in list(folds_raw.items())[:1]:
        print('sample key:', k, 'type:', type(v))
        pprint.pprint(v if isinstance(v, (dict,list)) else str(v))
elif isinstance(folds_raw, list):
    print('list length:', len(folds_raw))
    if folds_raw:
        print('elem0 type:', type(folds_raw[0]))
        pprint.pprint(folds_raw[0])
else:
    print('Unknown structure')

type: <class 'list'>
list length: 3
elem0 type: <class 'dict'>
{'fold': 0,
 'train_ids': [101,
               102,
               103,
               104,
               105,
               106,
               107,
               108,
               109,
               110,
               111,
               112,
               113,
               114,
               115,
               116,
               117,
               118,
               119,
               120,
               121,
               122,
               123,
               124,
               125,
               126,
               127,
               128,
               129,
               130,
               131,
               132,
               133,
               134,
               135,
               136,
               137,
               138,
               139,
               140,
               141,
               142,
               143,
               144,
               145,
               146,
     

In [7]:
# Inspect labels3d_v2/train to determine file naming and format
import glob, os
from pathlib import Path
import numpy as np

labels_dir = Path('labels3d_v2/train')
lbl_files = sorted(labels_dir.glob('*'))
print('labels count:', len(lbl_files))
print('first 10:', [os.path.basename(p) for p in lbl_files[:10]])
if lbl_files:
    p0 = lbl_files[0]
    print('Sample file:', p0.name)
    try:
        if p0.suffix == '.npz':
            z = np.load(p0)
            print('npz keys:', list(z.keys()))
            for k in z.files:
                arr = z[k]
                print('  ', k, arr.shape, arr.dtype, 'min/max', float(arr.min()), float(arr.max()))
        elif p0.suffix == '.npy':
            a = np.load(p0)
            print('npy shape:', a.shape, a.dtype, 'min/max', float(a.min()), float(a.max()))
        else:
            print('Unknown suffix:', p0.suffix)
    except Exception as e:
        print('Error reading sample label file:', e)

labels count: 297
first 10: ['1.npy', '10.npy', '101.npy', '102.npy', '103.npy', '104.npy', '105.npy', '106.npy', '107.npy', '108.npy']
Sample file: 1.npy
npy shape: (1254,) int16 min/max 1.0 20.0


In [14]:
# Full OOF sweep across folds with guardrails, then test decode + submission (sweep rho too)
import numpy as np, json, time
from pathlib import Path
from collections import defaultdict

def compute_runlen_stats(ids):
    agg = defaultdict(list)
    for sid in ids:
        y = load_frame_labels(sid)
        cur, run = None, 0
        for c in y:
            if c==0:
                if cur is not None:
                    agg[cur].append(run)
                    cur, run = None, 0
                continue
            if cur is None:
                cur, run = int(c), 1
            elif c==cur:
                run += 1
            else:
                agg[cur].append(run)
                cur, run = int(c), 1
        if cur is not None:
            agg[cur].append(run)
    med = np.zeros(21, dtype=np.float32)
    q75 = np.zeros(21, dtype=np.float32)
    for c in range(21):
        if c==0:
            continue
        ls = agg.get(c, [])
        if ls:
            arr = np.array(ls, dtype=np.float32)
            med[c] = float(np.median(arr))
            q75[c] = float(np.percentile(arr, 75.0))
        else:
            med[c] = 1.0
            q75[c] = 2.0
    return med, q75

def build_min_dur(med, q75, mult):
    md = np.round(med * mult).astype(np.int32)
    md = np.clip(md, 2, np.maximum(q75.astype(np.int32), 2))
    md[0] = 0
    return md

def decode_minseg_guarded(p, min_dur, rho=None):
    # p: CxT probs
    y = p.argmax(axis=0).astype(np.int32)
    Tlen = y.shape[0]
    i = 0
    merges = 0
    blocked_rho = 0
    blocked_zero = 0
    while i < Tlen:
        c = y[i]
        j = i+1
        while j<Tlen and y[j]==c:
            j += 1
        seg_len = j-i
        if c!=0 and seg_len < min_dur[c]:
            left_c = y[i-1] if i>0 else None
            right_c = y[j] if j<Tlen else None
            # pick neighbor candidate (avoid 0 if possible)
            cand = None
            if left_c is not None and right_c is not None:
                L = left_c if left_c!=0 else None
                R = right_c if right_c!=0 else None
                if L is None and R is None:
                    cand = None
                elif L is None:
                    cand = R
                elif R is None:
                    cand = L
                else:
                    lscore = float(p[L, i:j].mean())
                    rscore = float(p[R, i:j].mean())
                    cand = R if rscore >= lscore else L
            else:
                only = left_c if right_c is None else right_c
                if only == 0:
                    cand = None
                else:
                    cand = only
            if cand is not None and cand!=0:
                mean_c = float(p[c, i:j].mean())
                mean_k = float(p[cand, i:j].mean())
                # if rho is None: always allow; else require mean_k >= rho*mean_c
                if (rho is None) or (mean_k >= rho * mean_c):
                    y[i:j] = cand
                    merges += 1
                    i = max(0, i-1)
                    continue
                else:
                    blocked_rho += 1
            else:
                blocked_zero += 1
        i = j
    return y, merges, blocked_rho, blocked_zero

# Load folds list
with open('folds_archive_cv.json') as f:
    folds_list = json.load(f)

def eval_all_folds(mult_list=(0.5,0.6,0.7,0.8), rho_list=(None,0.9,0.95,1.0)):
    results = []  # list of dict per fold
    for fd in folds_list:
        fidx = int(fd['fold'])
        tr_ids = list(map(int, fd['train_ids']))
        va_ids = set(map(int, fd['val_ids']))
        print(f'Fold {fidx}: computing run-length stats on train ({len(tr_ids)} seqs) ...', flush=True)
        med, q75 = compute_runlen_stats(tr_ids)
        print('  med[1:6]=', med[1:6], ' q75[1:6]=', q75[1:6])
        # baseline and per setting metrics
        base_d = []
        per_setting = {(m,r): {'d': [], 'merges':0, 'blocked_rho':0, 'blocked_zero':0} for m in mult_list for r in rho_list}
        n=0
        t0 = time.time()
        for npz_path in sorted(Path('features3d_v3/train').glob('*.npz')):
            sid = int(npz_path.stem)
            if sid not in va_ids:
                continue
            p = load_probs(sid)  # CxT
            if n==0:
                # sanity check normalization
                sums = p.sum(axis=0)
                print(f'  sanity: p shape {p.shape}, mean frame sum {float(sums.mean()):.4f}, min/max {float(sums.min()):.4f}/{float(sums.max()):.4f}')
            y_true = load_frame_labels(sid)
            y_base = p.argmax(axis=0).astype(np.int32)
            base_d.append(levenshtein(compress_to_sequence(y_base), compress_to_sequence(y_true)))
            for m in mult_list:
                md = build_min_dur(med, q75, m)
                for r in rho_list:
                    y_hat, merges, br, bz = decode_minseg_guarded(p, md, rho=r)
                    per_setting[(m,r)]['d'].append(levenshtein(compress_to_sequence(y_hat), compress_to_sequence(y_true)))
                    per_setting[(m,r)]['merges'] += merges
                    per_setting[(m,r)]['blocked_rho'] += br
                    per_setting[(m,r)]['blocked_zero'] += bz
            n+=1
            if n%20==0:
                print(f'  .. fold {fidx} processed {n} seqs, elapsed {time.time()-t0:.1f}s', flush=True)
        base_mean = float(np.mean(base_d)) if base_d else 0.0
        fold_rec = {'fold': fidx, 'baseline': base_mean, 'per_setting': {}}
        for m in mult_list:
            for r in rho_list:
                ds = per_setting[(m,r)]['d']
                fold_rec['per_setting'][f'{m}_{r}'] = {
                    'mean': float(np.mean(ds)) if ds else 0.0,
                    'merges': per_setting[(m,r)]['merges'],
                    'blocked_rho': per_setting[(m,r)]['blocked_rho'],
                    'blocked_zero': per_setting[(m,r)]['blocked_zero'],
                    'n': len(ds)
                }
        results.append(fold_rec)
        print(f"Fold {fidx} baseline={base_mean:.3f}")
        for m in mult_list:
            for r in rho_list:
                rec = fold_rec['per_setting'][f'{m}_{r}']
                print(f"  mult={m} rho={r}: mean={rec['mean']:.3f} merges={rec['merges']} blocked_rho={rec['blocked_rho']} blocked_zero={rec['blocked_zero']} n={rec['n']}")
    # choose global (mult, rho) by worst-fold then mean
    candidates = [(m,r) for m in mult_list for r in rho_list]
    worst_by = {}
    mean_by = {}
    for (m,r) in candidates:
        vals = []
        for fd in results:
            vals.append(fd['per_setting'][f'{m}_{r}']['mean'])
        worst_by[(m,r)] = max(vals)
        mean_by[(m,r)] = float(np.mean(vals))
    best_pair = min(candidates, key=lambda k: (worst_by[k], mean_by[k]))
    print('Selection summary (by worst then mean):')
    for (m,r) in candidates:
        print(f'  mult={m} rho={r}: worst-fold={worst_by[(m,r)]:.3f} mean={mean_by[(m,r)]:.3f}')
    print('Chosen (mult, rho):', best_pair)
    overall = {
        'results': results,
        'chosen': {'mult': best_pair[0], 'rho': best_pair[1]},
        'worst_by': {f'{m}_{r}': worst_by[(m,r)] for (m,r) in candidates},
        'mean_by': {f'{m}_{r}': mean_by[(m,r)] for (m,r) in candidates}
    }
    with open('cv_sweep_decoder_minseg.json','w') as f:
        json.dump(overall, f, indent=2)
    return best_pair, results

def decode_test_and_write(best_mult, rho=None, out_path='submission_primary_ce_v2v3_meta_minseg.csv'):
    # recompute med/q75 on all training IDs
    all_train_ids = []
    for fd in folds_list:
        all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids)))
    md = build_min_dur(med, q75, best_mult)
    test_dir = Path('features3d_v3/test')
    rows = []
    ids = []
    n=0; t0=time.time()
    for npz_path in sorted(test_dir.glob('*.npz')):
        sid = int(npz_path.stem)
        p2 = Path('probs_cache') / f"{sid}_ce.npy"
        p3 = Path('probs_cache') / f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()):
            # no cached test probs available; skip (will produce 0 rows)
            continue
        else:
            p = load_probs(sid)
        y_hat, _, _, _ = decode_minseg_guarded(p, md, rho=rho)
        seq = compress_to_sequence(y_hat)
        ids.append(sid)
        rows.append(' '.join(map(str, seq)))
        n+=1
        if n%20==0:
            print(f'.. decoded {n} test seqs, elapsed {time.time()-t0:.1f}s', flush=True)
    import pandas as pd
    sub = pd.DataFrame({'Id': ids, 'Predicted': rows}).sort_values('Id')
    sub.to_csv(out_path, index=False)
    print('Wrote', out_path, 'with', len(sub), 'rows')

print('Running full OOF sweep with guardrails (sweeping rho incl. None to allow merges)...', flush=True)
best_pair, res = eval_all_folds(mult_list=(0.5,0.6,0.7,0.8), rho_list=(None,0.9,0.95,1.0))
print('Best (mult, rho) from OOF:', best_pair)
print('Decoding test and writing submission...')
decode_test_and_write(best_pair[0], rho=best_pair[1], out_path='submission_primary_ce_v2v3_meta_minseg.csv')
print('Done.')

Running full OOF sweep with guardrails (sweeping rho incl. None to allow merges)...


Fold 0: computing run-length stats on train (199 seqs) ...


  med[1:6]= [40. 40. 50. 46. 48.]  q75[1:6]= [79.75 61.   60.   80.   60.  ]
  sanity: p shape (21, 1254), mean frame sum 1.0000, min/max 1.0000/1.0000


  .. fold 0 processed 20 seqs, elapsed 2.8s


  .. fold 0 processed 40 seqs, elapsed 5.9s


  .. fold 0 processed 60 seqs, elapsed 9.2s


  .. fold 0 processed 80 seqs, elapsed 13.5s


Fold 0 baseline=27.510
  mult=0.5 rho=None: mean=4.827 merges=31911 blocked_rho=0 blocked_zero=82 n=98
  mult=0.5 rho=0.9: mean=13.765 merges=12077 blocked_rho=1464 blocked_zero=57 n=98
  mult=0.5 rho=0.95: mean=18.184 merges=5822 blocked_rho=1889 blocked_zero=57 n=98
  mult=0.5 rho=1.0: mean=27.510 merges=0 blocked_rho=2509 blocked_zero=56 n=98
  mult=0.6 rho=None: mean=4.592 merges=40909 blocked_rho=0 blocked_zero=83 n=98
  mult=0.6 rho=0.9: mean=13.673 merges=14281 blocked_rho=1601 blocked_zero=57 n=98
  mult=0.6 rho=0.95: mean=18.184 merges=6759 blocked_rho=2005 blocked_zero=57 n=98
  mult=0.6 rho=1.0: mean=27.510 merges=0 blocked_rho=2614 blocked_zero=56 n=98
  mult=0.7 rho=None: mean=4.520 merges=51015 blocked_rho=0 blocked_zero=83 n=98
  mult=0.7 rho=0.9: mean=13.643 merges=16249 blocked_rho=1728 blocked_zero=57 n=98
  mult=0.7 rho=0.95: mean=18.163 merges=7644 blocked_rho=2119 blocked_zero=57 n=98
  mult=0.7 rho=1.0: mean=27.510 merges=0 blocked_rho=2720 blocked_zero=56 n=98
  

  med[1:6]= [40. 40. 49. 40. 42.]  q75[1:6]= [68. 60. 60. 60. 60.]
  sanity: p shape (21, 1286), mean frame sum 1.0000, min/max 1.0000/1.0000


  .. fold 1 processed 20 seqs, elapsed 2.0s


  .. fold 1 processed 40 seqs, elapsed 4.6s


  .. fold 1 processed 60 seqs, elapsed 7.9s


  .. fold 1 processed 80 seqs, elapsed 11.6s


Fold 1 baseline=22.758
  mult=0.5 rho=None: mean=3.101 merges=26678 blocked_rho=0 blocked_zero=94 n=99
  mult=0.5 rho=0.9: mean=12.242 merges=7533 blocked_rho=1458 blocked_zero=84 n=99
  mult=0.5 rho=0.95: mean=16.323 merges=3688 blocked_rho=1805 blocked_zero=83 n=99
  mult=0.5 rho=1.0: mean=22.758 merges=0 blocked_rho=2222 blocked_zero=82 n=99
  mult=0.6 rho=None: mean=3.091 merges=33127 blocked_rho=0 blocked_zero=94 n=99
  mult=0.6 rho=0.9: mean=12.242 merges=8825 blocked_rho=1558 blocked_zero=84 n=99
  mult=0.6 rho=0.95: mean=16.323 merges=4264 blocked_rho=1877 blocked_zero=83 n=99
  mult=0.6 rho=1.0: mean=22.758 merges=0 blocked_rho=2282 blocked_zero=82 n=99
  mult=0.7 rho=None: mean=3.303 merges=40478 blocked_rho=0 blocked_zero=95 n=99
  mult=0.7 rho=0.9: mean=12.232 merges=10095 blocked_rho=1673 blocked_zero=85 n=99
  mult=0.7 rho=0.95: mean=16.313 merges=4841 blocked_rho=1976 blocked_zero=84 n=99
  mult=0.7 rho=1.0: mean=22.758 merges=0 blocked_rho=2372 blocked_zero=83 n=99
  mu

  med[1:6]= [33. 35. 38. 37. 37.]  q75[1:6]= [40.   43.   47.25 44.   42.25]
  sanity: p shape (21, 1147), mean frame sum 1.0000, min/max 1.0000/1.0000


  .. fold 2 processed 20 seqs, elapsed 2.3s


  .. fold 2 processed 40 seqs, elapsed 4.6s


  .. fold 2 processed 60 seqs, elapsed 7.1s


  .. fold 2 processed 80 seqs, elapsed 9.5s


  .. fold 2 processed 100 seqs, elapsed 12.1s


Fold 2 baseline=21.070
  mult=0.5 rho=None: mean=4.960 merges=15656 blocked_rho=0 blocked_zero=128 n=100
  mult=0.5 rho=0.9: mean=11.290 merges=5735 blocked_rho=1148 blocked_zero=118 n=100
  mult=0.5 rho=0.95: mean=14.910 merges=2987 blocked_rho=1479 blocked_zero=118 n=100
  mult=0.5 rho=1.0: mean=21.070 merges=0 blocked_rho=1908 blocked_zero=116 n=100
  mult=0.6 rho=None: mean=4.710 merges=19837 blocked_rho=0 blocked_zero=130 n=100
  mult=0.6 rho=0.9: mean=11.210 merges=6792 blocked_rho=1252 blocked_zero=119 n=100
  mult=0.6 rho=0.95: mean=14.900 merges=3475 blocked_rho=1575 blocked_zero=119 n=100
  mult=0.6 rho=1.0: mean=21.070 merges=0 blocked_rho=1987 blocked_zero=117 n=100
  mult=0.7 rho=None: mean=4.610 merges=24359 blocked_rho=0 blocked_zero=134 n=100
  mult=0.7 rho=0.9: mean=11.190 merges=7750 blocked_rho=1337 blocked_zero=122 n=100
  mult=0.7 rho=0.95: mean=14.900 merges=3856 blocked_rho=1654 blocked_zero=121 n=100
  mult=0.7 rho=1.0: mean=21.070 merges=0 blocked_rho=2054 bloc

Wrote submission_primary_ce_v2v3_meta_minseg.csv with 0 rows
Done.


In [13]:
# Inspect test IDs and availability of cached probs for test
from pathlib import Path
import numpy as np, os
test_dir = Path('features3d_v3/test')
cache = Path('probs_cache')
test_files = sorted(test_dir.glob('*.npz'))
print('num test seqs:', len(test_files))
ids = [int(p.stem) for p in test_files]
print('first 15 test ids:', ids[:15])
missing_both = 0
have_v2 = 0
have_v3 = 0
for sid in ids[:50]:
    p2 = cache / f"{sid}_ce.npy"
    p3 = cache / f"{sid}_ce_v3.npy"
    have_v2 += int(p2.exists())
    have_v3 += int(p3.exists())
    if not p2.exists() and not p3.exists():
        missing_both += 1
print('among first 50 test ids: have_v2:', have_v2, 'have_v3:', have_v3, 'missing both:', missing_both)
# Also check a few random ids across all
miss_total = sum(1 for sid in ids if not (cache / f"{sid}_ce.npy").exists() and not (cache / f"{sid}_ce_v3.npy").exists())
print('total missing both across all test ids:', miss_total)

num test seqs: 95
first 15 test ids: [300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314]
among first 50 test ids: have_v2: 0 have_v3: 0 missing both: 50
total missing both across all test ids: 95


In [15]:
# Cache test per-frame probs for v2 and v3 CE models into probs_cache/<id>_ce.npy and <id>_ce_v3.npy
import os, json, time, math
from pathlib import Path
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('CUDA available:', torch.cuda.is_available(), flush=True)

probs_cache = Path('probs_cache'); probs_cache.mkdir(exist_ok=True)
folds = json.load(open('folds_archive_cv.json','r'))
test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())

def load_feat(split_dir: Path, sid: int):
    d = np.load(split_dir / f"{sid}.npz")
    # Support either 'X' or single array
    if 'X' in d.files:
        X = d['X'].astype(np.float32)
    else:
        # fall back: first array
        X = d[d.files[0]].astype(np.float32)
    return X

def compute_fold_scaler(id_list, feat_train_dir: Path):
    n = 0; mean=None; M2=None
    for sid in id_list:
        X = load_feat(feat_train_dir, int(sid))
        n_i = X.shape[0]
        if mean is None:
            mean = X.mean(axis=0); M2 = ((X - mean)**2).sum(axis=0); n = n_i
        else:
            mean_i = X.mean(axis=0); n_new = n + n_i; delta = mean_i - mean
            mean = mean + delta * (n_i / max(1, n_new))
            M2 = M2 + ((X - mean_i)**2).sum(axis=0) + (delta**2) * (n * n_i / max(1, n_new)); n = n_new
    var = M2 / max(1, (n - 1)); std = np.sqrt(np.clip(var, 1e-8, None))
    return mean.astype(np.float32), std.astype(np.float32)

def apply_tta_timewarp(p_t_c: torch.Tensor, factors=(0.9,1.0,1.1)) -> torch.Tensor:
    acc=None
    for s in factors:
        T, C = p_t_c.shape
        tgt_len = max(1, int(round(T*s)))
        x = p_t_c.T.unsqueeze(0)
        y = F.interpolate(x, size=tgt_len, mode='linear', align_corners=False)
        y2 = F.interpolate(y, size=T, mode='linear', align_corners=False)[0].T
        y2 = y2 / (y2.sum(dim=-1, keepdim=True) + 1e-8)
        acc = y2 if acc is None else (acc + y2)
    out = acc / float(len(factors))
    return out / (out.sum(dim=-1, keepdim=True) + 1e-8)

class DilatedResBlock(nn.Module):
    def __init__(self, ch, dilation, drop=0.35, groups=8, k=3):
        super().__init__()
        self.conv1 = nn.Conv1d(ch, ch, k, padding=dilation, dilation=dilation)
        self.gn1 = nn.GroupNorm(groups, ch)
        self.drop = nn.Dropout(drop)
        self.conv2 = nn.Conv1d(ch, ch, 1)
        self.gn2 = nn.GroupNorm(groups, ch)
    def forward(self, x):
        h = self.conv1(x); h = self.gn1(h); h = F.relu(h, inplace=True); h = self.drop(h)
        h = self.conv2(h); h = self.gn2(h); h = F.relu(h, inplace=True)
        return x + h

class DilatedTCN(nn.Module):
    def __init__(self, d_in, channels=128, layers=12, num_classes=21, dropout=0.35):
        super().__init__()
        self.inp = nn.Conv1d(d_in, channels, 1)
        blocks=[]; dil=1
        for _ in range(layers):
            blocks.append(DilatedResBlock(channels, dil, drop=dropout, groups=8, k=3));
            dil = min(dil*2, 512)
        self.blocks = nn.ModuleList(blocks)
        self.head = nn.Conv1d(channels, num_classes, 1)
    def forward(self, x_b_t_d):
        x = x_b_t_d.transpose(1,2)
        h = self.inp(x)
        for b in self.blocks:
            h = b(h)
        out = self.head(h)
        return out.transpose(1,2)  # B,T,C

def cache_test_probs_for_feature_set(tag: str, feat_train_dir: Path, feat_test_dir: Path, ckpt_tmpl: str, out_suffix: str):
    # tag: 'v2' or 'v3'
    # ckpt_tmpl: e.g., 'model_ce_fold{fi}{suf}.pth' or 'model_ce_v3_fold{fi}{suf}.pth'
    # out_suffix: '_ce.npy' or '_ce_v3.npy'
    # Determine input dim from train file
    sample_npz = next(iter(sorted(feat_train_dir.glob('*.npz'))))
    D_in = load_feat(feat_train_dir, int(sample_npz.stem)).shape[1]
    print(f'[{tag}] D_in={D_in}', flush=True)
    # Precompute per-fold scalers on TRAIN ids for this feature set
    fold_scalers = []
    for fd in folds:
        mean,std = compute_fold_scaler(fd['train_ids'], feat_train_dir)
        fold_scalers.append((torch.from_numpy(mean).float().to(device), torch.from_numpy(std).float().to(device)))
    t0 = time.time(); n_saved=0; n_skip=0
    for i, sid in enumerate(test_ids, 1):
        out_path = probs_cache / f"{sid}{out_suffix}"
        if out_path.exists():
            n_skip += 1
            if (i%20)==0 or i==len(test_ids):
                print(f'  [{tag}] skip {i}/{len(test_ids)} (exists) elapsed={(time.time()-t0):.1f}s', flush=True)
            continue
        X = load_feat(feat_test_dir, int(sid))
        acc=None; n_models=0
        with torch.no_grad(), torch.amp.autocast('cuda' if device.type=='cuda' else 'cpu'):
            for fi in range(3):
                mean_t, std_t = fold_scalers[fi]
                for suf in ['', '_s1']:
                    ckpt = Path(ckpt_tmpl.format(fi=fi, suf=suf))
                    if not ckpt.exists():
                        continue
                    model = DilatedTCN(d_in=D_in, channels=128, layers=12, num_classes=21, dropout=0.35).to(device)
                    model.load_state_dict(torch.load(ckpt, map_location=device)); model.eval()
                    xb = torch.from_numpy(X).float().to(device)
                    xb = (xb - mean_t) / (std_t + 1e-6)
                    xb = xb.unsqueeze(0)
                    p = model(xb)[0].softmax(dim=-1)  # T,C
                    p = apply_tta_timewarp(p, factors=(0.9,1.0,1.1))
                    acc = p if acc is None else (acc + p)
                    n_models += 1
                    del model
        if acc is None or n_models == 0:
            print(f'  [{tag}] WARNING: no models found for sid={sid}; skipping save', flush=True)
            continue
        probs = acc / float(n_models)  # T,C
        probs = probs / (probs.sum(dim=-1, keepdim=True) + 1e-8)
        # Save as CxT to be consistent with downstream ensure_CxT
        np.save(out_path, probs.transpose(0,1).cpu().numpy().astype(np.float32))  # CxT
        n_saved += 1
        if (i%20)==0 or i==len(test_ids):
            print(f'  [{tag}] saved {n_saved} (processed {i}/{len(test_ids)}) elapsed={(time.time()-t0):.1f}s', flush=True)
    print(f'[{tag}] Done. saved={n_saved} skipped_existing={n_skip} total={len(test_ids)} elapsed={(time.time()-t0):.1f}s', flush=True)

# Paths for v2 and v3
feat_v2_tr = Path('features3d_v2')/'train'
feat_v2_te = Path('features3d_v2')/'test'
feat_v3_tr = Path('features3d_v3')/'train'
feat_v3_te = Path('features3d_v3')/'test'

# Run caching for v2 and v3
print('Caching test probs for v2 CE models -> _ce.npy ...', flush=True)
cache_test_probs_for_feature_set('v2', feat_v2_tr, feat_v2_te, ckpt_tmpl='model_ce_fold{fi}{suf}.pth', out_suffix='_ce.npy')
print('Caching test probs for v3 CE models -> _ce_v3.npy ...', flush=True)
cache_test_probs_for_feature_set('v3', feat_v3_tr, feat_v3_te, ckpt_tmpl='model_ce_v3_fold{fi}{suf}.pth', out_suffix='_ce_v3.npy')
print('All test probs cached.')

CUDA available: True


Caching test probs for v2 CE models -> _ce.npy ...


[v2] D_in=219


  model.load_state_dict(torch.load(ckpt, map_location=device)); model.eval()


  [v2] saved 20 (processed 20/95) elapsed=3.2s


  [v2] saved 40 (processed 40/95) elapsed=6.3s


  [v2] saved 60 (processed 60/95) elapsed=9.3s


  [v2] saved 80 (processed 80/95) elapsed=12.3s


  [v2] saved 95 (processed 95/95) elapsed=14.6s


[v2] Done. saved=95 skipped_existing=0 total=95 elapsed=14.6s


Caching test probs for v3 CE models -> _ce_v3.npy ...


[v3] D_in=1095


  [v3] saved 20 (processed 20/95) elapsed=3.3s


  [v3] saved 40 (processed 40/95) elapsed=6.6s


  [v3] saved 60 (processed 60/95) elapsed=9.8s


  [v3] saved 80 (processed 80/95) elapsed=13.1s


  [v3] saved 95 (processed 95/95) elapsed=15.5s


[v3] Done. saved=95 skipped_existing=0 total=95 elapsed=15.5s


All test probs cached.


In [16]:
# Generate duration-aware minseg submission with uniqueness enforcement (perm20)
import numpy as np, json, time, pandas as pd
from pathlib import Path

def make_perm20(seq, p_c_t):
    # seq: list of ints (1..20), p_c_t: CxT probs (C=21 with index 0 unused)
    seen=set(); out=[]
    for c in seq:
        if 1<=c<=20 and c not in seen:
            seen.add(c); out.append(int(c))
    if len(out) < 20:
        # score missing classes by total prob mass over time
        C = p_c_t.shape[0]
        scores=[]
        for c in range(1,21):
            if c in seen: continue
            if c < C:
                s=float(p_c_t[c].sum())
            else:
                s=0.0
            scores.append((s, c))
        scores.sort(key=lambda x: -x[0])
        for _, c in scores:
            if len(out) >= 20: break
            out.append(int(c))
    return out[:20]

def decode_test_and_write_perm20(best_mult=0.7, rho=None, out_path='submission_primary_ce_v2v3_meta_minseg.csv', col_name='Sequence'):
    # Reuse helpers from Cell 7: folds_list, compute_runlen_stats, build_min_dur, load_probs, decode_minseg_guarded, compress_to_sequence
    all_train_ids=[]
    for fd in folds_list:
        all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids)))
    md = build_min_dur(med, q75, best_mult)
    test_dir = Path('features3d_v3/test')
    rows=[]; ids=[]; n=0; t0=time.time()
    for npz_path in sorted(test_dir.glob('*.npz')):
        sid = int(npz_path.stem)
        p2 = Path('probs_cache')/f"{sid}_ce.npy"
        p3 = Path('probs_cache')/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()):
            continue
        p = load_probs(sid)  # CxT
        y_hat, _, _, _ = decode_minseg_guarded(p, md, rho=rho)
        seq_raw = compress_to_sequence(y_hat)
        seq = make_perm20(seq_raw, p)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%20)==0 or n==95:
            print(f".. decoded {n} test seqs in {time.time()-t0:.1f}s", flush=True)
    sub = pd.DataFrame({'Id': ids, col_name: rows}).sort_values('Id')
    sub.to_csv(out_path, index=False)
    print('Wrote', out_path, 'with', len(sub), 'rows; head:\n', sub.head())
    # Basic sanity checks
    assert len(sub)==95, f'Expected 95 rows, got {len(sub)}'
    toks_ok = sub[col_name].apply(lambda s: len(s.split())==20 and set(map(int, s.split()))==set(range(1,21))).all()
    assert toks_ok, 'Each row must be a permutation of 1..20'
    # Copy to submission.csv
    sub.to_csv('submission.csv', index=False)
    print('submission.csv written ->', out_path)

print('Decoding test with minseg (mult=0.7, rho=None) and enforcing perm20...', flush=True)
decode_test_and_write_perm20(best_mult=0.7, rho=None, out_path='submission_primary_ce_v2v3_meta_minseg.csv', col_name='Sequence')

Decoding test with minseg (mult=0.7, rho=None) and enforcing perm20...


.. decoded 20 test seqs in 0.1s


.. decoded 40 test seqs in 0.3s


.. decoded 60 test seqs in 0.4s


.. decoded 80 test seqs in 0.5s


.. decoded 95 test seqs in 0.6s


Wrote submission_primary_ce_v2v3_meta_minseg.csv with 95 rows; head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 3 5 9 19 13 20 18 11 4 6 8 14 10 2 7 1...
3  303  17 18 13 4 3 10 14 6 5 19 20 7 11 16 8 2 9 15 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 15 1...
submission.csv written -> submission_primary_ce_v2v3_meta_minseg.csv


In [17]:
# HSMM (fixed K=20 segmental DP) decoder with downsampling and duration priors; quick test decode
import numpy as np, time, math, pandas as pd
from pathlib import Path

# Reuse helpers already defined in earlier cells: folds_list, compute_runlen_stats, load_probs, make_perm20

def avg_pool_time(p_c_t: np.ndarray, k: int) -> np.ndarray:
    # p_c_t: CxT
    C,T = p_c_t.shape
    if k <= 1: return p_c_t
    pad = k//2
    x = np.pad(p_c_t, ((0,0),(pad,pad)), mode='edge')
    y = np.empty((C,T), dtype=np.float32)
    kk = float(k)
    for t in range(T):
        y[:,t] = x[:, t:t+k].mean(axis=1)
    y /= (y.sum(axis=0, keepdims=True) + 1e-8)
    return y

def downsample_time(p_c_t: np.ndarray, s: int) -> np.ndarray:
    # average over non-overlapping windows of size s
    if s <= 1: return p_c_t
    C,T = p_c_t.shape
    T2 = T // s
    if T2 <= 0:
        return p_c_t
    x = p_c_t[:, :T2*s].reshape(C, T2, s).mean(axis=2)
    x /= (x.sum(axis=0, keepdims=True) + 1e-8)
    return x

def build_duration_bounds_from_stats(med: np.ndarray, q95: np.ndarray, a: float, b: float, cap_max: int = 150):
    # med, q95: arrays size >=21, index 0 unused
    lmin = np.zeros_like(med, dtype=np.int32)
    lmax = np.zeros_like(med, dtype=np.int32)
    for c in range(21):
        if c == 0:
            lmin[c] = 0; lmax[c] = 0; continue
        m = float(med[c]) if med[c] > 0 else 10.0
        q = float(q95[c]) if q95[c] > 0 else (m * 2.0)
        lmin[c] = max(3, int(math.floor(a * m)))
        lmax[c] = min(int(math.ceil(b * m)), int(q), int(cap_max))
        if lmax[c] < lmin[c]:
            lmax[c] = lmin[c]
    return lmin, lmax

def robust_q95_from_ids(ids):
    from collections import defaultdict
    agg = defaultdict(list)
    for sid in ids:
        y = load_frame_labels(int(sid))
        cur, run = None, 0
        for c in y:
            if c == 0:
                if cur is not None:
                    agg[cur].append(run); cur=None; run=0
                continue
            if cur is None:
                cur, run = int(c), 1
            elif c == cur:
                run += 1
            else:
                agg[cur].append(run); cur=int(c); run=1
        if cur is not None:
            agg[cur].append(run)
    med = np.zeros(21, dtype=np.float32)
    q95 = np.zeros(21, dtype=np.float32)
    for c in range(1,21):
        ls = agg.get(c, [])
        if ls:
            arr = np.array(ls, dtype=np.float32)
            med[c] = float(np.median(arr))
            q95[c] = float(np.percentile(arr, 95.0))
        else:
            med[c] = 10.0; q95[c] = 30.0
    return med, q95

def hsmm_decode_perm20(p_c_t: np.ndarray, med: np.ndarray, lmin: np.ndarray, lmax: np.ndarray, lambda_len: float = 0.4, mu: float = 0.1, smooth_k: int = 5, ds: int = 4):
    # p_c_t: CxT with C>=21 and index 0 unused
    p = np.clip(p_c_t, 1e-8, 1.0)
    p = p / (p.sum(axis=0, keepdims=True) + 1e-8)
    if smooth_k > 1:
        p = avg_pool_time(p, k=smooth_k)
    pds = downsample_time(p, s=ds) if ds > 1 else p
    C, T = pds.shape
    # precompute negative log probs cumulative per class
    neglog = -np.log(pds + 1e-8)
    cum = np.cumsum(neglog, axis=1)
    def seg_cost(c, t0, t1):
        # inclusive t0..t1
        if t0 > 0:
            s = cum[c, t1] - cum[c, t0-1]
        else:
            s = cum[c, t1]
        L = (t1 - t0 + 1)
        m = float(med[c]) if med[c] > 0 else 10.0
        phi = abs(math.log(max(1.0, L)) - math.log(max(1.0, m/ds)))  # log-scale penalty in ds domain
        return float(s) + lambda_len * phi
    # bounds in ds domain
    lmin_ds = np.maximum(1, (lmin // max(1, ds))).astype(np.int32)
    lmax_ds = np.maximum(lmin_ds, (lmax // max(1, ds)).astype(np.int32))
    K = 20
    # DP arrays: for current k segment end at t, best cost per class; and backpointers
    INF = 1e18
    best = np.full((K+1, T, C), INF, dtype=np.float32)
    prev_t = -np.ones((K+1, T, C), dtype=np.int32)
    prev_c = -np.ones((K+1, T, C), dtype=np.int16)
    # initialize k=1
    k = 1
    for t in range(T):
        # feasible l for segment ending at t given remaining segments
        for c in range(1,21):
            Lmin = lmin_ds[c]; Lmax = lmax_ds[c]
            # remaining time must allow K-k segments with at least min_l each; use global min over classes ~1
            for L in range(Lmin, Lmax+1):
                t0 = t - L + 1
                if t0 < 0: break
                # ensure remaining space for (K-1) segments
                if (T - 1 - t) < (K - k) * 1:  # min 1 per remaining segment
                    continue
                cost = seg_cost(c, t0, t)
                if cost < best[k, t, c]:
                    best[k, t, c] = cost
                    prev_t[k, t, c] = t0 - 1  # previous end index
                    prev_c[k, t, c] = 0      # marker for start
    # iterate k=2..K
    for k in range(2, K+1):
        # precompute for each u (prev end time) the best and second-best across classes for k-1
        # to implement 'best previous except same class' trick
        best1_val = np.full((T,), INF, dtype=np.float32)
        best1_c = -np.ones((T,), dtype=np.int16)
        best2_val = np.full((T,), INF, dtype=np.float32)
        for u in range(T):
            # prune: at least k-1 frames must be used before u; simple guard
            v = best[k-1, u, 1:21]
            if v.size == 0: continue
            i_min1 = int(np.argmin(v)) + 1
            val1 = float(v[i_min1-1])
            best1_val[u] = val1; best1_c[u] = i_min1
            if v.size >= 2:
                # mask out i_min1 to get second-best
                tmp = v.copy()
                tmp[i_min1-1] = INF
                val2 = float(tmp.min())
                best2_val[u] = val2
        for t in range(T):
            # At least k segments of length >=1 must fit in 0..t
            if t < k - 1:
                continue
            rem_after = T - 1 - t
            # feasible per-class lengths
            for c in range(1,21):
                Lmin = lmin_ds[c]; Lmax = lmax_ds[c]
                # loop lengths
                for L in range(Lmin, Lmax+1):
                    t0 = t - L + 1
                    if t0 < 0: break
                    # feasibility: remaining segments K-k must fit in rem_after with at least 1 each
                    if rem_after < (K - k) * 1:
                        continue
                    u = t0 - 1  # previous segment end index
                    if u < 0:
                        continue
                    # best prev except same class
                    prev_val = best1_val[u]
                    if best1_c[u] == c:
                        prev_val = best2_val[u]
                    if not np.isfinite(prev_val):
                        continue
                    cost = prev_val + mu + seg_cost(c, t0, t)
                    if cost < best[k, t, c]:
                        best[k, t, c] = cost
                        prev_t[k, t, c] = u
                        prev_c[k, t, c] = best1_c[u] if best1_c[u] != c else -1  # store some prev class info
    # termination: choose best end t for k=K
    end_t = -1; end_c = -1; val = INF
    for t in range(T):
        # all frames used by K segments is not required; choose best overall
        v = best[K, t, 1:21]
        if v.size == 0: continue
        i = int(np.argmin(v)) + 1
        if v[i-1] < val:
            val = float(v[i-1]); end_t = t; end_c = i
    # backtrack to get classes (lengths optional)
    classes = []
    k = K; t = end_t; c = end_c
    if end_t < 0 or end_c < 0:
        # fallback: peak order by center-of-mass
        C,T = pds.shape
        com = []
        idx = np.arange(T, dtype=np.float32)
        for cc in range(1,21):
            w = pds[cc]; s = float(w.sum()) + 1e-8; com_t = float((idx * w).sum() / s); com.append((com_t, cc))
        com.sort(key=lambda x: x[0])
        seq = [c for _,c in com][:20]
    else:
        while k >= 1 and t >= 0:
            classes.append(int(c))
            u = int(prev_t[k, t, c])
            # find prev class: among best[k-1, u, :] minimal, excluding current c ideally
            if k > 1 and u >= 0:
                v = best[k-1, u, 1:21]
                if v.size > 0:
                    i = int(np.argmin(v)) + 1
                else:
                    i = 1
                c = i; t = u; k -= 1
            else:
                break
        classes = classes[::-1]
        seq = classes if len(classes)==20 else (classes + [c for c in range(1,21) if c not in set(classes)])[:20]
    # Map to original timeline only for ordering; HSMM already yields order. Enforce perm20 with make_perm20 using original probs.
    return seq

def decode_test_hsmm_and_write(mu=0.1, lambda_len=0.4, a=0.7, b=1.5, smooth_k=5, ds=4, out_path='submission_hsmm_perm20.csv', col_name='Sequence'):
    # durations from ALL train ids (non-leaky for test)
    all_train_ids=[]
    for fd in folds_list:
        all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q95 = robust_q95_from_ids(sorted(set(all_train_ids)))
    lmin, lmax = build_duration_bounds_from_stats(med, q95, a=a, b=b, cap_max=150)
    test_dir = Path('features3d_v3/test')
    rows=[]; ids=[]; t0=time.time()
    n=0
    for npz_path in sorted(test_dir.glob('*.npz')):
        sid = int(npz_path.stem)
        p2 = Path('probs_cache')/f"{sid}_ce.npy"
        p3 = Path('probs_cache')/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()):
            continue
        p = load_probs(sid)  # CxT blended
        seq_h = hsmm_decode_perm20(p, med, lmin, lmax, lambda_len=lambda_len, mu=mu, smooth_k=smooth_k, ds=ds)
        seq = make_perm20(seq_h, p)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%10)==0 or n==95:
            print(f"  [HSMM test] {n}/95 elapsed={(time.time()-t0):.1f}s", flush=True)
    sub = pd.DataFrame({'Id': ids, col_name: rows}).sort_values('Id')
    sub.to_csv(out_path, index=False)
    print('Wrote', out_path, 'rows=', len(sub), 'head:\n', sub.head(), flush=True)
    # Sanity
    assert len(sub)==95, f'Expected 95 rows, got {len(sub)}'
    ok = sub[col_name].apply(lambda s: len(s.split())==20 and set(map(int, s.split()))==set(range(1,21))).all()
    assert ok, 'Each row must be permutation of 1..20'
    # also copy to submission.csv if desired (manual step later)
    return out_path

print('HSMM decoder ready. To run test decode quickly with default params:')
print("decode_test_hsmm_and_write(mu=0.1, lambda_len=0.4, a=0.7, b=1.5, smooth_k=5, ds=4, out_path='submission_hsmm_perm20.csv')")

HSMM decoder ready. To run test decode quickly with default params:
decode_test_hsmm_and_write(mu=0.1, lambda_len=0.4, a=0.7, b=1.5, smooth_k=5, ds=4, out_path='submission_hsmm_perm20.csv')


In [18]:
# Run HSMM decode on test and prepare submission
import pandas as pd, shutil, os, time
print('Running HSMM test decode (mu=0.1, lambda_len=0.4, a=0.7, b=1.5, smooth_k=5, ds=4)...', flush=True)
out_path = decode_test_hsmm_and_write(mu=0.1, lambda_len=0.4, a=0.7, b=1.5, smooth_k=5, ds=4, out_path='submission_hsmm_perm20.csv', col_name='Sequence')
assert os.path.exists(out_path), 'HSMM submission not written'
shutil.copyfile(out_path, 'submission.csv')
print('submission.csv updated ->', out_path)
print(pd.read_csv('submission.csv').head())

Running HSMM test decode (mu=0.1, lambda_len=0.4, a=0.7, b=1.5, smooth_k=5, ds=4)...


  [HSMM test] 10/95 elapsed=57.2s


  [HSMM test] 20/95 elapsed=115.8s


  [HSMM test] 30/95 elapsed=172.7s


  [HSMM test] 40/95 elapsed=243.8s


  [HSMM test] 50/95 elapsed=300.0s


  [HSMM test] 60/95 elapsed=359.5s


  [HSMM test] 70/95 elapsed=424.1s


  [HSMM test] 80/95 elapsed=481.8s


  [HSMM test] 90/95 elapsed=540.3s


  [HSMM test] 95/95 elapsed=569.6s


Wrote submission_hsmm_perm20.csv rows= 95 head:
     Id                                           Sequence
0  300  8 4 20 13 12 3 15 14 11 6 16 19 7 10 9 2 17 1 ...
1  301  1 7 5 4 20 6 2 11 15 13 19 9 18 3 17 8 14 10 1...
2  302  17 16 12 3 5 9 7 19 13 20 18 11 4 6 2 14 8 1 1...
3  303  5 19 15 20 17 1 11 7 16 8 18 9 3 6 2 14 4 13 1...
4  304  13 9 7 2 11 3 20 19 5 10 14 6 15 17 16 4 18 8 ...


submission.csv updated -> submission_hsmm_perm20.csv
    Id                                           Sequence
0  300  8 4 20 13 12 3 15 14 11 6 16 19 7 10 9 2 17 1 ...
1  301  1 7 5 4 20 6 2 11 15 13 19 9 18 3 17 8 14 10 1...
2  302  17 16 12 3 5 9 7 19 13 20 18 11 4 6 2 14 8 1 1...
3  303  5 19 15 20 17 1 11 7 16 8 18 9 3 6 2 14 4 13 1...
4  304  13 9 7 2 11 3 20 19 5 10 14 6 15 17 16 4 18 8 ...


In [19]:
# HSMM OOF sweep (leave-one-archive-out) with tight grid; select by worst-fold then mean
import numpy as np, json, time
from pathlib import Path
from collections import defaultdict

# Uses helpers/functions defined earlier in this notebook:
# - folds_list (loaded in Cell 7), load_probs, load_frame_labels, hsmm_decode_perm20, robust_q95_from_ids, build_duration_bounds_from_stats, make_perm20

def compress_to_sequence(y_frames):
    seq=[]; last=-1
    for c in y_frames:
        if c==0: continue
        if c!=last: seq.append(int(c)); last=int(c)
    return seq

def levenshtein(a,b):
    n,m=len(a),len(b)
    if n==0: return m
    if m==0: return n
    dp=list(range(m+1))
    for i in range(1,n+1):
        prev=dp[0]; dp[0]=i; ai=a[i-1]
        for j in range(1,m+1):
            tmp=dp[j]
            dp[j]=min(dp[j]+1, dp[j-1]+1, prev + (0 if ai==b[j-1] else 1))
            prev=tmp
    return dp[m]

def eval_hsmm_on_fold(fd, mu, lambda_len, a, b, smooth_k=5, ds=4):
    tr_ids = list(map(int, fd['train_ids']))
    va_ids = list(map(int, fd['val_ids']))
    med, q95 = robust_q95_from_ids(tr_ids)  # train-only (fold-pure)
    lmin, lmax = build_duration_bounds_from_stats(med, q95, a=a, b=b, cap_max=150)
    dists=[]; n=0
    t0=time.time()
    for sid in va_ids:
        p = load_probs(int(sid))  # CxT blended v2+v3
        seq_h = hsmm_decode_perm20(p, med, lmin, lmax, lambda_len=lambda_len, mu=mu, smooth_k=smooth_k, ds=ds)
        seq = make_perm20(seq_h, p)
        y_true = load_frame_labels(int(sid))
        seq_true = compress_to_sequence(y_true)
        dists.append(levenshtein(seq, seq_true)); n+=1
        if (n%30)==0 or n==len(va_ids):
            print(f"    fold {fd['fold']} {n}/{len(va_ids)} elapsed={time.time()-t0:.1f}s", flush=True)
    return float(np.mean(dists)) if dists else 0.0

def hsmm_oof_sweep(mu_list=(0.0,0.05,0.1,0.2), lam_list=(0.2,0.4,0.6), a_list=(0.6,0.7), b_list=(1.4,1.5), smooth_k=5, ds=4):
    cfgs=[]
    for mu in mu_list:
        for lam in lam_list:
            for a in a_list:
                for b in b_list:
                    cfgs.append((mu,lam,a,b))
    results=[]
    print(f"Sweeping {len(cfgs)} HSMM configs across {len(folds_list)} folds ...", flush=True)
    for (mu,lam,a,b) in cfgs:
        per_fold=[]
        print(f"  cfg mu={mu} lam={lam} a={a} b={b}", flush=True)
        for fd in folds_list:
            m = eval_hsmm_on_fold(fd, mu=mu, lambda_len=lam, a=a, b=b, smooth_k=smooth_k, ds=ds)
            per_fold.append(m)
        mean_v=float(np.mean(per_fold)); worst_v=float(np.max(per_fold))
        results.append((worst_v, mean_v, {'mu':mu,'lambda_len':lam,'a':a,'b':b,'smooth_k':smooth_k,'ds':ds}))
        print(f"    -> worst={worst_v:.3f} mean={mean_v:.3f}", flush=True)
    results.sort(key=lambda x: (x[0], x[1]))
    print('\nTop 5 by worst then mean:')
    for r in results[:5]:
        print(r)
    # save
    import pandas as pd
    pd.DataFrame([{'worst':w,'mean':m, **cfg} for (w,m,cfg) in results]).to_csv('cv_sweep_hsmm.csv', index=False)
    print('Saved cv_sweep_hsmm.csv', flush=True)
    best=results[0] if results else None
    return best, results

print('Running HSMM OOF sweep (tight grid)...', flush=True)
best, res = hsmm_oof_sweep()
print('Best (worst, mean, cfg)=', best)
print('Reference minseg OOF: worst~4.61 mean~4.144 (lower is better).')

Running HSMM OOF sweep (tight grid)...


Sweeping 48 HSMM configs across 3 folds ...


  cfg mu=0.0 lam=0.2 a=0.6 b=1.4


KeyboardInterrupt: 

In [20]:
# Set localsrch-meta submission as current submission.csv and preview head
import shutil, pandas as pd, os
src = 'submission_primary_ce_v2v3_meta_localsrch.csv'
assert os.path.exists(src), f'Missing {src}'
shutil.copyfile(src, 'submission.csv')
print('submission.csv ->', src)
print(pd.read_csv('submission.csv').head())

submission.csv -> submission_primary_ce_v2v3_meta_localsrch.csv
    Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 19 7 1...
1  301  10 12 1 5 4 20 6 2 11 15 13 19 7 9 8 18 14 3 1...
2  302  1 17 16 12 5 19 7 13 20 18 11 3 4 6 15 8 14 10...
3  303  13 4 12 10 5 19 15 20 17 11 16 8 18 7 3 1 6 2 ...
4  304  8 1 12 14 18 13 9 7 2 11 3 20 19 5 10 6 15 17 ...


In [21]:
# Quick alt decode: minseg perm20 with mult=0.6 (slightly better mean OOF), then set as submission.csv
print('Decoding test with minseg (mult=0.6, rho=None) and enforcing perm20...', flush=True)
alt_path = 'submission_primary_ce_v2v3_meta_minseg_m06.csv'
decode_test_and_write_perm20(best_mult=0.6, rho=None, out_path=alt_path, col_name='Sequence')
import shutil, os, pandas as pd
assert os.path.exists(alt_path)
shutil.copyfile(alt_path, 'submission.csv')
print('submission.csv ->', alt_path)
print(pd.read_csv('submission.csv').head())

Decoding test with minseg (mult=0.6, rho=None) and enforcing perm20...


.. decoded 20 test seqs in 0.1s


.. decoded 40 test seqs in 0.3s


.. decoded 60 test seqs in 0.4s


.. decoded 80 test seqs in 0.5s


.. decoded 95 test seqs in 0.5s


Wrote submission_primary_ce_v2v3_meta_minseg_m06.csv with 95 rows; head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 13 20 18 11 3 4 6 8 14 10 2 ...
3  303  17 18 13 4 3 7 10 14 6 5 19 20 2 11 16 8 9 15 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 10 6 17 16 4 1...
submission.csv written -> submission_primary_ce_v2v3_meta_minseg_m06.csv
submission.csv -> submission_primary_ce_v2v3_meta_minseg_m06.csv
    Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 13 20 18 11 3 4 6 8 14 10 2 ...
3  303  17 18 13 4 3 7 10 14 6 5 19 20 2 11 16 8 9 15 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 10 6 17 16 4 1...


In [22]:
# Time-align v2/v3 before blending, quick OOF check (mult in {0.6,0.7}), then test decode + perm20 submission
import numpy as np, json, time, os
from pathlib import Path

# Aligned loader: replaces previous load_probs by applying integer shift alignment via entropy corr (fallback: foreground mass)
def _entropy_series(p_c_t: np.ndarray) -> np.ndarray:
    # p: CxT (probabilities, normalized per frame). Return length-T entropy series.
    p = np.clip(p_c_t, 1e-8, 1.0)
    return (- (p * np.log(p)).sum(axis=0)).astype(np.float32)

def _fgmass_series(p_c_t: np.ndarray) -> np.ndarray:
    # 1 - prob of class 0 per frame
    return (1.0 - np.clip(p_c_t[0], 0.0, 1.0)).astype(np.float32)

def _best_shift_by_corr(a: np.ndarray, b: np.ndarray, max_shift: int = 15) -> int:
    # Find s in [-max_shift, +max_shift] maximizing Pearson corr between a and b shifted by s (b shifted relative to a)
    best_s = 0; best_r = -1e9
    T = int(min(a.shape[0], b.shape[0]))
    a = a[:T]; b = b[:T]
    for s in range(-max_shift, max_shift+1):
        if s >= 0:
            x = a[:T - s]
            y = b[s:T]
        else:
            x = a[-s:T]
            y = b[:T + s]
        if x.size < 8:
            continue
        sx = float(np.std(x)); sy = float(np.std(y))
        if sx < 1e-6 or sy < 1e-6:
            continue
        r = float(np.corrcoef(x, y)[0,1])
        if np.isfinite(r) and r > best_r:
            best_r = r; best_s = s
    return int(best_s)

def load_probs_aligned(seq_id: int):
    # robustly access global temps; else reload
    global T2, T3, A
    if 'T2' not in globals() or 'T3' not in globals() or 'A' not in globals():
        calib = json.loads(Path('calib_all_v2v3_meta.json').read_text())
        T2 = np.array(calib.get('T2'), dtype=np.float32)
        T3 = np.array(calib.get('T3'), dtype=np.float32)
        A = np.array(calib.get('A'), dtype=np.float32) if isinstance(calib.get('A', None), list) else np.full_like(T2, 0.7, dtype=np.float32)
    cache = Path('probs_cache')
    p2 = np.load(cache / f"{seq_id}_ce.npy").astype(np.float32)
    p3 = np.load(cache / f"{seq_id}_ce_v3.npy").astype(np.float32)
    # temp scale, ensure CxT
    p2 = temp_scale(p2, T2); p3 = temp_scale(p3, T3)
    C = int(T2.shape[0])
    p2 = ensure_CxT(p2, C); p3 = ensure_CxT(p3, C)
    # Normalize per frame for stability
    p2 /= (p2.sum(axis=0, keepdims=True) + 1e-8)
    p3 /= (p3.sum(axis=0, keepdims=True) + 1e-8)
    # Compute entropy series
    e2 = _entropy_series(p2); e3 = _entropy_series(p3)
    s = _best_shift_by_corr(e2, e3, max_shift=15)
    # If degenerate (no variance), fallback to foreground mass
    if s == 0:
        if (np.std(e2) < 1e-6) or (np.std(e3) < 1e-6):
            f2 = _fgmass_series(p2); f3 = _fgmass_series(p3)
            s = _best_shift_by_corr(f2, f3, max_shift=15)
    # Apply shift: shift p3 by s relative to p2 (s>0 means p3 lags -> drop first s frames of p3)
    if s > 0:
        p3s = p3[:, s:]
        p2s = p2[:, :p3s.shape[1]]
    elif s < 0:
        s2 = -s
        p2s = p2[:, s2:]
        p3s = p3[:, :p2s.shape[1]]
    else:
        Tm = min(p2.shape[1], p3.shape[1])
        p2s = p2[:, :Tm]; p3s = p3[:, :Tm]
    # Final crop to common length
    Tm = min(p2s.shape[1], p3s.shape[1])
    if Tm <= 0:
        # fallback to min-crop without shift
        Tm = min(p2.shape[1], p3.shape[1])
        p2s = p2[:, :Tm]; p3s = p3[:, :Tm]
    alpha = A.reshape(-1,1).astype(np.float32)
    p = alpha * p2s + (1.0 - alpha) * p3s
    p /= (p.sum(axis=0, keepdims=True) + 1e-8)
    return p  # CxT

# Override global loader used by decoders
load_probs = load_probs_aligned
print('Aligned load_probs installed (entropy-based, window +-15).', flush=True)

# Quick OOF check: mult in {0.6, 0.7}, rho=None
def quick_oof_check(mult_list=(0.6, 0.7)):
    with open('folds_archive_cv.json') as f:
        folds_list_local = json.load(f)
    from collections import defaultdict
    def compute_runlen_stats(ids):
        agg = defaultdict(list)
        for sid in ids:
            y = load_frame_labels(int(sid))
            cur, run = None, 0
            for c in y:
                if c==0:
                    if cur is not None:
                        agg[cur].append(run); cur=None; run=0
                    continue
                if cur is None:
                    cur, run = int(c), 1
                elif c==cur:
                    run += 1
                else:
                    agg[cur].append(run); cur=int(c); run=1
            if cur is not None:
                agg[cur].append(run)
        med = np.zeros(21, dtype=np.float32); q75 = np.zeros(21, dtype=np.float32)
        for c in range(1,21):
            ls = agg.get(c, [])
            if ls:
                arr = np.array(ls, np.float32); med[c] = float(np.median(arr)); q75[c] = float(np.percentile(arr, 75.0))
            else:
                med[c] = 1.0; q75[c] = 2.0
        return med, q75
    def build_min_dur(med, q75, mult):
        md = np.round(med * mult).astype(np.int32)
        md = np.clip(md, 2, np.maximum(q75.astype(np.int32), 2)); md[0]=0; return md
    def compress_to_sequence(y_frames):
        seq=[]; last=-1
        for c in y_frames:
            if c==0: continue
            if c!=last: seq.append(int(c)); last=int(c)
        return seq
    def levenshtein(a,b):
        n,m=len(a),len(b)
        if n==0: return m
        if m==0: return n
        dp=list(range(m+1))
        for i in range(1,n+1):
            prev=dp[0]; dp[0]=i; ai=a[i-1]
            for j in range(1,m+1):
                tmp=dp[j]; dp[j]=min(dp[j]+1, dp[j-1]+1, prev + (0 if ai==b[j-1] else 1)); prev=tmp
        return dp[m]
    print('Running quick OOF with aligned blending...', flush=True)
    worst_by={}; mean_by={}
    for mult in mult_list:
        per_fold=[]
        for fd in folds_list_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult)
            dists=[]; n=0; t0=time.time()
            for sid in va_ids:
                p = load_probs(int(sid))
                y_hat, _, _, _ = decode_minseg_guarded(p, md, rho=None)
                seq = compress_to_sequence(y_hat); seq_true = compress_to_sequence(load_frame_labels(int(sid)))
                dists.append(levenshtein(seq, seq_true)); n+=1
            mval = float(np.mean(dists)) if dists else 0.0
            per_fold.append(mval)
        worst_by[mult] = max(per_fold); mean_by[mult] = float(np.mean(per_fold))
    print('OOF (aligned) summary:')
    for mult in mult_list:
        print(f'  mult={mult}: worst={worst_by[mult]:.3f} mean={mean_by[mult]:.3f}')
    best_mult = min(mult_list, key=lambda m: (worst_by[m], mean_by[m]))
    return best_mult, worst_by, mean_by

t0=time.time()
best_mult, worst_by, mean_by = quick_oof_check(mult_list=(0.6, 0.7))
print('Chosen mult (by worst then mean):', best_mult, 'elapsed', f'{time.time()-t0:.1f}s', flush=True)

# Decode test with aligned loader and perm20
print('Decoding test with aligned blend + minseg perm20... (rho=None)')
out_path = f'submission_aligned_minseg_perm20_m{str(best_mult).replace(".","")}.csv'
decode_test_and_write_perm20(best_mult=best_mult, rho=None, out_path=out_path, col_name='Sequence')
print('Done aligned decode; wrote', out_path, flush=True)

Aligned load_probs installed (entropy-based, window +-15).


Running quick OOF with aligned blending...


OOF (aligned) summary:
  mult=0.6: worst=4.710 mean=4.090
  mult=0.7: worst=4.580 mean=4.114
Chosen mult (by worst then mean): 0.7 elapsed 13.0s


Decoding test with aligned blend + minseg perm20... (rho=None)


.. decoded 20 test seqs in 0.2s


.. decoded 40 test seqs in 0.4s


.. decoded 60 test seqs in 0.6s


.. decoded 80 test seqs in 0.7s


.. decoded 95 test seqs in 0.9s


Wrote submission_aligned_minseg_perm20_m07.csv with 95 rows; head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 3 5 9 19 13 20 18 11 4 6 8 14 10 2 7 1...
3  303  18 17 13 4 3 10 14 6 5 19 20 7 11 16 8 2 9 15 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 15 1...
submission.csv written -> submission_aligned_minseg_perm20_m07.csv
Done aligned decode; wrote submission_aligned_minseg_perm20_m07.csv


In [23]:
# Switch to CE+MS geometric blend submission as probe
import shutil, os, pandas as pd
src = 'submission_primary_ce_ms.csv'
assert os.path.exists(src), f'Missing {src}'
shutil.copyfile(src, 'submission.csv')
print('submission.csv ->', src)
print(pd.read_csv('submission.csv').head())

submission.csv -> submission_primary_ce_ms.csv
    Id                                           Sequence
0  300  5 9 7 1 2 18 3 8 4 20 13 12 15 14 11 6 16 19 1...
1  301  10 12 3 1 5 4 20 6 2 11 15 13 19 7 9 8 18 14 1...
2  302  1 17 16 12 3 5 19 13 20 18 11 4 6 15 8 14 10 9...
3  303  13 4 12 3 10 5 19 15 20 17 1 11 16 8 18 7 6 2 ...
4  304  8 1 7 12 18 13 9 2 11 3 20 19 5 14 6 15 17 16 ...


In [24]:
# Micro-tweak: pre-smooth probs (k=5) and collapse ABA islands (len<=2, ratio r=1.04); quick OOF pick mult in {0.65,0.7}; decode test
import numpy as np, json, time
from pathlib import Path

def smooth_probs_time(p: np.ndarray, k: int = 5) -> np.ndarray:
    # p: CxT, moving average per class; renormalize per frame
    if k <= 1: return p
    C, T = p.shape
    pad = k // 2
    x = np.pad(p, ((0,0),(pad,pad)), mode='edge')
    y = np.empty_like(p, dtype=np.float32)
    for t in range(T):
        y[:, t] = x[:, t:t+k].mean(axis=1)
    y = np.clip(y, 1e-8, None)
    y /= (y.sum(axis=0, keepdims=True) + 1e-8)
    return y

def collapse_ABA(y: np.ndarray, p: np.ndarray, max_len: int = 2, ratio: float = 1.04) -> np.ndarray:
    # y: int path length T; p: CxT; replace A-B-A where len(B)<=max_len and mean_p(A) >= ratio*mean_p(B)
    T = y.shape[0]
    i = 0
    while i < T:
        a = y[i];
        j = i + 1
        while j < T and y[j] == a:
            j += 1
        # now [i, j-1] is A
        k = j
        if k >= T:
            break
        b = y[k]
        m = k + 1
        while m < T and y[m] == b:
            m += 1
        # [k, m-1] is B
        if (m - k) <= max_len:
            n = m
            if n < T and y[n] == a:
                # have A-B-A
                mean_a = float(p[a, k:m].mean()) if a != 0 else 0.0
                mean_b = float(p[b, k:m].mean()) if b != 0 else 1e-8
                if mean_a >= ratio * mean_b:
                    y[k:m] = a
                    i = max(0, i - 1)
                    continue
        i = m
    return y

def decode_minseg_smooth_aba(p: np.ndarray, min_dur: np.ndarray, smooth_k: int = 5, aba_len: int = 2, aba_ratio: float = 1.04):
    # Pre-smooth
    ps = smooth_probs_time(p, k=smooth_k) if smooth_k and smooth_k > 1 else p
    # Argmax + min-seg merge (reuse existing guarded merge with rho=None)
    y_hat, _, _, _ = decode_minseg_guarded(ps, min_dur, rho=None)
    # ABA collapse
    y_hat = collapse_ABA(y_hat, ps, max_len=aba_len, ratio=aba_ratio)
    return y_hat

def quick_oof_smooth_aba(mult_list=(0.65, 0.7), smooth_k=5, aba_len=2, aba_ratio=1.04):
    with open('folds_archive_cv.json') as f:
        folds_local = json.load(f)
    from collections import defaultdict
    def compute_runlen_stats(ids):
        agg = defaultdict(list)
        for sid in ids:
            y = load_frame_labels(int(sid))
            cur, run = None, 0
            for c in y:
                if c == 0:
                    if cur is not None:
                        agg[cur].append(run); cur=None; run=0
                    continue
                if cur is None:
                    cur, run = int(c), 1
                elif c == cur:
                    run += 1
                else:
                    agg[cur].append(run); cur=int(c); run=1
            if cur is not None:
                agg[cur].append(run)
        med = np.zeros(21, dtype=np.float32); q75 = np.zeros(21, dtype=np.float32)
        for c in range(1,21):
            ls = agg.get(c, [])
            if ls:
                arr = np.array(ls, np.float32); med[c] = float(np.median(arr)); q75[c] = float(np.percentile(arr, 75.0))
            else:
                med[c] = 1.0; q75[c] = 2.0
        return med, q75
    def build_min_dur(med, q75, mult):
        md = np.round(med * mult).astype(np.int32)
        md = np.clip(md, 2, np.maximum(q75.astype(np.int32), 2)); md[0] = 0; return md
    def compress_to_sequence(y_frames):
        seq=[]; last=-1
        for c in y_frames:
            if c==0: continue
            if c!=last: seq.append(int(c)); last=int(c)
        return seq
    def levenshtein(a,b):
        n,m=len(a),len(b)
        if n==0: return m
        if m==0: return n
        dp=list(range(m+1))
        for i in range(1,n+1):
            prev=dp[0]; dp[0]=i; ai=a[i-1]
            for j in range(1,m+1):
                tmp=dp[j]; dp[j]=min(dp[j]+1, dp[j-1]+1, prev + (0 if ai==b[j-1] else 1)); prev=tmp
        return dp[m]
    worst_by={}; mean_by={}
    print('Running OOF (aligned) with smoothing+ABA...')
    for mult in mult_list:
        per_fold=[]
        for fd in folds_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult)
            dists=[]
            for sid in va_ids:
                p = load_probs(int(sid))  # aligned, CxT
                y_hat = decode_minseg_smooth_aba(p, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                seq = compress_to_sequence(y_hat); seq_true = compress_to_sequence(load_frame_labels(int(sid)))
                dists.append(levenshtein(seq, seq_true))
            per_fold.append(float(np.mean(dists)))
        worst_by[mult] = max(per_fold); mean_by[mult] = float(np.mean(per_fold))
    print('OOF smooth+ABA summary:')
    for m in mult_list:
        print(f'  mult={m}: worst={worst_by[m]:.3f} mean={mean_by[m]:.3f}')
    best_mult = min(mult_list, key=lambda m: (worst_by[m], mean_by[m]))
    return best_mult, worst_by, mean_by

# Run quick OOF for smoothing+ABA and decode test
t0=time.time()
best_mult_s, worst_by_s, mean_by_s = quick_oof_smooth_aba(mult_list=(0.65, 0.7), smooth_k=5, aba_len=2, aba_ratio=1.04)
print('Chosen mult (smooth+ABA):', best_mult_s, 'elapsed', f'{time.time()-t0:.1f}s')

print('Decoding test with aligned blend + smooth+ABA minseg perm20...')
def make_perm20(seq, p_c_t):
    seen=set(); out=[]
    for c in seq:
        if 1<=c<=20 and c not in seen:
            seen.add(c); out.append(int(c))
    if len(out) < 20:
        C = p_c_t.shape[0]; scores=[]
        for c in range(1,21):
            if c in seen: continue
            s=float(p_c_t[c].sum()) if c < C else 0.0
            scores.append((s, c))
        scores.sort(key=lambda x: -x[0])
        for _, c in scores:
            if len(out) >= 20: break
            out.append(int(c))
    return out[:20]

test_dir = Path('features3d_v3/test')
rows=[]; ids=[]; n=0; t1=time.time()
from collections import defaultdict
all_train_ids=[]
for fd in json.load(open('folds_archive_cv.json','r')):
    all_train_ids.extend(list(map(int, fd['train_ids'])))
def compute_runlen_stats_all(ids):
    agg=defaultdict(list)
    for sid in ids:
        y=load_frame_labels(int(sid)); cur=None; run=0
        for c in y:
            if c==0:
                if cur is not None: agg[cur].append(run); cur=None; run=0; continue
            if cur is None: cur=int(c); run=1
            elif c==cur: run+=1
            else: agg[cur].append(run); cur=int(c); run=1
        if cur is not None: agg[cur].append(run)
    med = np.zeros(21, np.float32); q75=np.zeros(21, np.float32)
    for c in range(1,21):
        ls=agg.get(c, [])
        if ls:
            arr=np.array(ls, np.float32); med[c]=float(np.median(arr)); q75[c]=float(np.percentile(arr, 75.0))
        else:
            med[c]=1.0; q75[c]=2.0
    return med, q75
med_all, q75_all = compute_runlen_stats_all(sorted(set(all_train_ids)))
md_all = np.clip(np.round(med_all * float(best_mult_s)).astype(np.int32), 2, np.maximum(q75_all.astype(np.int32), 2)); md_all[0]=0
for npz_path in sorted(test_dir.glob('*.npz')):
    sid = int(npz_path.stem)
    p2 = Path('probs_cache')/f"{sid}_ce.npy"
    p3 = Path('probs_cache')/f"{sid}_ce_v3.npy"
    if not (p2.exists() and p3.exists()):
        continue
    p = load_probs(int(sid))
    y_hat = decode_minseg_smooth_aba(p, md_all, smooth_k=5, aba_len=2, aba_ratio=1.04)
    # perm20
    seq_raw = []
    last=-1
    for c in y_hat:
        if c==0: continue
        if c!=last: seq_raw.append(int(c)); last=int(c)
    seq = make_perm20(seq_raw, p)
    ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
    if (n%20)==0 or n==95:
        print(f".. decoded {n} test seqs in {time.time()-t1:.1f}s", flush=True)
import pandas as pd
sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
out_path = f'submission_aligned_smoothABA_perm20_m{str(best_mult_s).replace(".","")}.csv'
sub.to_csv(out_path, index=False)
print('Wrote', out_path, 'rows=', len(sub))
assert len(sub)==95
ok = sub['Sequence'].apply(lambda s: len(s.split())==20 and set(map(int, s.split()))==set(range(1,21))).all()
assert ok, 'Permutation 1..20 check failed'
sub.to_csv('submission.csv', index=False)
print('submission.csv written ->', out_path)

Running OOF (aligned) with smoothing+ABA...


OOF smooth+ABA summary:
  mult=0.65: worst=4.560 mean=3.837
  mult=0.7: worst=4.500 mean=3.861
Chosen mult (smooth+ABA): 0.7 elapsed 15.0s
Decoding test with aligned blend + smooth+ABA minseg perm20...


.. decoded 20 test seqs in 1.0s


.. decoded 40 test seqs in 1.3s


.. decoded 60 test seqs in 1.6s


.. decoded 80 test seqs in 1.9s


.. decoded 95 test seqs in 2.1s


Wrote submission_aligned_smoothABA_perm20_m07.csv rows= 95
submission.csv written -> submission_aligned_smoothABA_perm20_m07.csv


In [25]:
# RGB modality: inspect archives and map IDs -> video paths
import os, tarfile, json, pandas as pd, numpy as np
from pathlib import Path

print('Preparing RGB pipeline: listing archive contents and mapping IDs...', flush=True)
id_map_path = Path('id_to_archive.csv')
assert id_map_path.exists(), 'id_to_archive.csv missing'
id_map = pd.read_csv(id_map_path)
print('id_to_archive head:')
print(id_map.head())

# Known archives
archives = {
    'training1.tar.gz': Path('training1.tar.gz'),
    'training2.tar.gz': Path('training2.tar.gz'),
    'training3.tar.gz': Path('training3.tar.gz'),
    'validation1.tar.gz': Path('validation1.tar.gz'),
    'validation2.tar.gz': Path('validation2.tar.gz'),
    'validation3.tar.gz': Path('validation3.tar.gz'),
    'test.tar.gz': Path('test.tar.gz'),
}

for k,p in archives.items():
    if not p.exists():
        print('MISSING', k)

# Peek inside each archive (sample first 10 members) to detect video file patterns
def list_archive_members(arc_path: Path, max_show: int = 20):
    names = []
    with tarfile.open(arc_path, 'r:gz') as tf:
        for i, m in enumerate(tf):
            if not m.isreg():
                continue
            names.append(m.name)
            if len(names) >= max_show:
                break
    return names

for name, arc in archives.items():
    if not arc.exists():
        continue
    print(f'\nArchive {name}:')
    try:
        sample = list_archive_members(arc, max_show=30)
        # Show a few and collect suspected video entries
        vids = [s for s in sample if any(s.lower().endswith(ext) for ext in ('.mp4', '.avi', '.mov', '.mkv'))]
        print('  sample members (first up to 10):')
        for s in sample[:10]:
            print('   ', s)
        print('  suspected video entries among sample:', vids[:5])
    except Exception as e:
        print('  error reading', name, e)

# Build per-ID candidate member prefixes by probing a few IDs to learn path pattern
probe_ids = []
if len(id_map) > 0:
    probe_ids = id_map['Id'].astype(int).tolist()[:5] + id_map['Id'].astype(int).tolist()[-5:]
probe_ids = sorted(set(probe_ids))
print('\nProbing member paths for IDs:', probe_ids)

def find_members_for_id(arc_path: Path, sid: int):
    hits = []
    with tarfile.open(arc_path, 'r:gz') as tf:
        for m in tf:
            if not m.isreg():
                continue
            nm = m.name
            # Heuristics: id embedded in folder or filename, e.g., /<sid>/ or _<sid> or <sid>.
            if f'/{sid}/' in nm or nm.endswith(f'/{sid}') or f'_{sid}_' in nm or nm.endswith(f'_{sid}.mp4') or nm.endswith(f'/{sid}.mp4') or nm.endswith(f'/{sid}.avi') or nm.endswith(f'/{sid}.mov') or nm.endswith(f'/{sid}.mkv') or f'/{sid}_' in nm:
                hits.append(nm)
    return hits

for _, row in id_map.iterrows():
    sid = int(row['Id'])
    arc_name = str(row['Archive']) if 'Archive' in row else None
    if not arc_name or arc_name not in archives:
        continue
    arc = archives[arc_name]
    if not arc.exists():
        continue
    hits = find_members_for_id(arc, sid)
    if hits:
        print(f'Id {sid} in {arc_name}:', hits[:5])
    if len(probe_ids) and sid in probe_ids:
        print(f'  (probe) first hits for id {sid}:', hits[:10])

print('\nNext: implement cached extraction -> rgb_videos/{split}/{id}.mp4 and embeddings -> rgb_embed/{split}/{id}.npy')

Preparing RGB pipeline: listing archive contents and mapping IDs...


id_to_archive head:
   Id  archive_group
0   1              1
1   3              1
2   4              1
3   5              1
4   6              1

Archive training1.tar.gz:


  sample members (first up to 10):
    ./Sample00001.zip
    ./Sample00003.zip
    ./Sample00004.zip
    ./Sample00005.zip
    ./Sample00006.zip
    ./Sample00007.zip
    ./Sample00008.zip
    ./Sample00009.zip
    ./Sample00010.zip
    ./Sample00011.zip
  suspected video entries among sample: []

Archive training2.tar.gz:


  sample members (first up to 10):
    ./Sample00101.zip
    ./Sample00102.zip
    ./Sample00103.zip
    ./Sample00104.zip
    ./Sample00105.zip
    ./Sample00106.zip
    ./Sample00107.zip
    ./Sample00108.zip
    ./Sample00109.zip
    ./Sample00110.zip
  suspected video entries among sample: []

Archive training3.tar.gz:


  sample members (first up to 10):
    ./Sample00200.zip
    ./Sample00201.zip
    ./Sample00202.zip
    ./Sample00203.zip
    ./Sample00204.zip
    ./Sample00205.zip
    ./Sample00206.zip
    ./Sample00207.zip
    ./Sample00208.zip
    ./Sample00209.zip
  suspected video entries among sample: []

Archive validation1.tar.gz:


  sample members (first up to 10):
    Sample00410.zip
    Sample00411.zip
    Sample00412.zip
    Sample00413.zip
    Sample00414.zip
    Sample00415.zip
    Sample00416.zip
    Sample00417.zip
    Sample00418.zip
    Sample00420.zip
  suspected video entries among sample: []

Archive validation2.tar.gz:


  sample members (first up to 10):
    Sample00510.zip
    Sample00516.zip
    Sample00517.zip
    Sample00518.zip
    Sample00519.zip
    Sample00520.zip
    Sample00521.zip
    Sample00522.zip
    Sample00523.zip
    Sample00524.zip
  suspected video entries among sample: []

Archive validation3.tar.gz:


  sample members (first up to 10):
    Sample00621.zip
    Sample00622.zip
    Sample00623.zip
    Sample00624.zip
    Sample00625.zip
    Sample00626.zip
    Sample00627.zip
    Sample00628.zip
    Sample00629.zip
    Sample00630.zip
  suspected video entries among sample: []

Archive test.tar.gz:


  sample members (first up to 10):
    ./Sample00300.zip
    ./Sample00301.zip
    ./Sample00302.zip
    ./Sample00303.zip
    ./Sample00304.zip
    ./Sample00305.zip
    ./Sample00306.zip
    ./Sample00307.zip
    ./Sample00308.zip
    ./Sample00309.zip
  suspected video entries among sample: []

Probing member paths for IDs: [1, 3, 4, 5, 6, 295, 296, 297, 298, 299]

Next: implement cached extraction -> rgb_videos/{split}/{id}.mp4 and embeddings -> rgb_embed/{split}/{id}.npy


In [26]:
# RGB modality: inspect inner ZIPs (SampleXXXXX.zip) to find RGB video filenames
import tarfile, zipfile, io, re
from pathlib import Path

def list_zip_members_from_tar(tar_path: Path, sample_zip_name: str, max_show: int = 50):
    print(f'Opening {tar_path} -> {sample_zip_name}', flush=True)
    with tarfile.open(tar_path, 'r:gz') as tf:
        m = next((m for m in tf if m.isreg() and Path(m.name).name == sample_zip_name), None)
        if m is None:
            print('  zip member not found')
            return []
        data = tf.extractfile(m).read()
    with zipfile.ZipFile(io.BytesIO(data)) as zf:
        names = zf.namelist()
        print('  zip contains', len(names), 'files; showing up to', max_show)
        for s in names[:max_show]:
            print('   ', s)
        # detect likely video files
        vids = [s for s in names if s.lower().endswith(('.mp4', '.avi', '.mov', '.mkv'))]
        print('  suspected video files:', vids)
        return names

def id_to_sample_name(sid: int) -> str:
    # Training IDs mapping observed from tar listing: 1->Sample00001.zip, 101->Sample00101.zip, 200->Sample00200.zip, 300->Sample00300.zip
    return f"Sample{sid:05d}.zip"

def id_to_tar(sid: int) -> Path | None:
    if 1 <= sid <= 99: return Path('training1.tar.gz')
    if 101 <= sid <= 199: return Path('training2.tar.gz')
    if 200 <= sid <= 299: return Path('training3.tar.gz')
    if 300 <= sid <= 399: return Path('test.tar.gz')
    # validation archives (4xx-6xx) exist but labels only for 1..299; skip for now
    return None

# Probe a few known IDs across groups
probe_ids = [1, 3, 10, 101, 150, 200, 250, 299, 300, 305]
for sid in probe_ids:
    tar_p = id_to_tar(sid)
    if tar_p is None or not tar_p.exists():
        print(f'ID {sid}: tar not found or unsupported ->', tar_p)
        continue
    zip_name = id_to_sample_name(sid)
    try:
        list_zip_members_from_tar(tar_p, zip_name, max_show=40)
    except Exception as e:
        print(f'Error reading {zip_name} from {tar_p}:', e)

print('Done ZIP inspection. Next: implement extraction of RGB video file from ZIP into cache and MobileNetV2 embedding.', flush=True)

Opening training1.tar.gz -> Sample00001.zip


  zip contains 5 files; showing up to 40
    Sample00001_color.mp4
    Sample00001_depth.mp4
    Sample00001_user.mp4
    Sample00001_data.mat
    Sample00001_audio.wav
  suspected video files: ['Sample00001_color.mp4', 'Sample00001_depth.mp4', 'Sample00001_user.mp4']
Opening training1.tar.gz -> Sample00003.zip


  zip contains 5 files; showing up to 40
    Sample00003_color.mp4
    Sample00003_depth.mp4
    Sample00003_user.mp4
    Sample00003_data.mat
    Sample00003_audio.wav
  suspected video files: ['Sample00003_color.mp4', 'Sample00003_depth.mp4', 'Sample00003_user.mp4']
Opening training1.tar.gz -> Sample00010.zip


  zip contains 5 files; showing up to 40
    Sample00010_color.mp4
    Sample00010_depth.mp4
    Sample00010_user.mp4
    Sample00010_data.mat
    Sample00010_audio.wav
  suspected video files: ['Sample00010_color.mp4', 'Sample00010_depth.mp4', 'Sample00010_user.mp4']
Opening training2.tar.gz -> Sample00101.zip


  zip contains 5 files; showing up to 40
    Sample00101_data.mat
    Sample00101_user.mp4
    Sample00101_color.mp4
    Sample00101_audio.wav
    Sample00101_depth.mp4
  suspected video files: ['Sample00101_user.mp4', 'Sample00101_color.mp4', 'Sample00101_depth.mp4']
Opening training2.tar.gz -> Sample00150.zip


  zip contains 5 files; showing up to 40
    Sample00150_color.mp4
    Sample00150_depth.mp4
    Sample00150_user.mp4
    Sample00150_data.mat
    Sample00150_audio.wav
  suspected video files: ['Sample00150_color.mp4', 'Sample00150_depth.mp4', 'Sample00150_user.mp4']
Opening training3.tar.gz -> Sample00200.zip


  zip contains 5 files; showing up to 40
    Sample00200_color.mp4
    Sample00200_depth.mp4
    Sample00200_user.mp4
    Sample00200_data.mat
    Sample00200_audio.wav
  suspected video files: ['Sample00200_color.mp4', 'Sample00200_depth.mp4', 'Sample00200_user.mp4']
Opening training3.tar.gz -> Sample00250.zip


  zip contains 5 files; showing up to 40
    Sample00250_color.mp4
    Sample00250_depth.mp4
    Sample00250_user.mp4
    Sample00250_data.mat
    Sample00250_audio.wav
  suspected video files: ['Sample00250_color.mp4', 'Sample00250_depth.mp4', 'Sample00250_user.mp4']
Opening training3.tar.gz -> Sample00299.zip


  zip contains 5 files; showing up to 40
    Sample00299_color.mp4
    Sample00299_depth.mp4
    Sample00299_user.mp4
    Sample00299_data.mat
    Sample00299_audio.wav
  suspected video files: ['Sample00299_color.mp4', 'Sample00299_depth.mp4', 'Sample00299_user.mp4']
Opening test.tar.gz -> Sample00300.zip


  zip contains 5 files; showing up to 40
    Sample00300_color.mp4
    Sample00300_depth.mp4
    Sample00300_user.mp4
    Sample00300_data.mat
    Sample00300_audio.wav
  suspected video files: ['Sample00300_color.mp4', 'Sample00300_depth.mp4', 'Sample00300_user.mp4']
Opening test.tar.gz -> Sample00305.zip


  zip contains 5 files; showing up to 40
    Sample00305_color.mp4
    Sample00305_depth.mp4
    Sample00305_user.mp4
    Sample00305_data.mat
    Sample00305_audio.wav
  suspected video files: ['Sample00305_color.mp4', 'Sample00305_depth.mp4', 'Sample00305_user.mp4']
Done ZIP inspection. Next: implement extraction of RGB video file from ZIP into cache and MobileNetV2 embedding.


In [32]:
# RGB modality: extract color.mp4 from nested archives and cache MobileNetV2 embeddings (stride=2, 112x112, fp16)
import os, io, time, tarfile, zipfile, math, shutil
from pathlib import Path
import numpy as np

import torch
import torch.nn as nn
from torchvision import transforms
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights

# Optional readers
try:
    import decord
    from decord import VideoReader, cpu
    HAS_DECORD = True
except Exception:
    HAS_DECORD = False
try:
    import cv2
    HAS_CV2 = True
except Exception:
    HAS_CV2 = False
try:
    import imageio.v3 as iio
    HAS_IMAGEIO = True
except Exception:
    HAS_IMAGEIO = False

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('CUDA available for RGB:', torch.cuda.is_available(), 'decord:', HAS_DECORD, 'cv2:', HAS_CV2, 'imageio:', HAS_IMAGEIO, flush=True)

# Paths
rgb_vid_dir = Path('rgb_videos'); (rgb_vid_dir/'train').mkdir(parents=True, exist_ok=True); (rgb_vid_dir/'test').mkdir(parents=True, exist_ok=True)
rgb_emb_dir = Path('rgb_embed'); (rgb_emb_dir/'train').mkdir(parents=True, exist_ok=True); (rgb_emb_dir/'test').mkdir(parents=True, exist_ok=True)

def id_to_sample_name(sid: int) -> str:
    return f"Sample{sid:05d}.zip"

def id_to_tar(sid: int) -> Path | None:
    if 1 <= sid <= 99: return Path('training1.tar.gz')
    if 101 <= sid <= 199: return Path('training2.tar.gz')
    if 200 <= sid <= 299: return Path('training3.tar.gz')
    if 300 <= sid <= 399: return Path('test.tar.gz')
    # validation sets not used for training labels here
    return None

def split_of_id(sid: int) -> str:
    return 'train' if sid < 300 else 'test'

def extract_color_mp4_to_cache(sid: int) -> Path | None:
    split = split_of_id(sid)
    out_path = rgb_vid_dir / split / f"{sid}.mp4"
    if out_path.exists():
        return out_path
    tar_p = id_to_tar(sid)
    if tar_p is None or not tar_p.exists():
        print(f'[extract] Missing tar for id={sid}:', tar_p); return None
    zip_name = id_to_sample_name(sid)
    color_member = f"Sample{sid:05d}_color.mp4"
    try:
        with tarfile.open(tar_p, 'r:gz') as tf:
            m = next((m for m in tf if m.isreg() and Path(m.name).name == zip_name), None)
            if m is None:
                print(f'[extract] zip {zip_name} not found in {tar_p}'); return None
            data = tf.extractfile(m).read()
        with zipfile.ZipFile(io.BytesIO(data)) as zf:
            if color_member not in zf.namelist():
                # fallback: find *_color.mp4
                cand = [n for n in zf.namelist() if n.lower().endswith('_color.mp4')]
                if not cand:
                    print(f'[extract] color mp4 not found for id={sid}')
                    return None
                member = cand[0]
            else:
                member = color_member
            # extract to temp then move
            tmp = out_path.with_suffix('.mp4.tmp')
            with zf.open(member) as fsrc, open(tmp, 'wb') as fdst:
                shutil.copyfileobj(fsrc, fdst)
            tmp.replace(out_path)
        return out_path
    except Exception as e:
        print(f'[extract] error id={sid}:', e); return None

# Preprocess and model setup
weights = MobileNet_V2_Weights.IMAGENET1K_V1
# Use standard ImageNet normalization constants (do not rely on weights.meta)
IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)
preproc = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((112,112)),
    transforms.ToTensor(),
    transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD)
])
mb = mobilenet_v2(weights=weights).features.eval().to(device)
pool = nn.AdaptiveAvgPool2d((1,1)).to(device)
mb.requires_grad_(False)

def read_video_frames(path: Path, stride: int = 2):
    frames = []
    # Try decord first; if 0 frames, fallback to cv2 then imageio
    if HAS_DECORD:
        try:
            vr = VideoReader(str(path), ctx=cpu(0))
            nfr = len(vr)
            if nfr > 0:
                idxs = list(range(0, nfr, stride))
                for i in idxs:
                    img = vr[i].asnumpy()  # HWC RGB uint8
                    frames.append(img)
            else:
                print('[decord] zero frames for', path, '-> fallback')
        except Exception as e:
            print('[decord] fail, fallback readers:', e)
            frames = []
    if not frames and HAS_CV2:
        try:
            cap = cv2.VideoCapture(str(path))
            ok = cap.isOpened()
            if not ok:
                print('[cv2] cannot open', path)
            i = 0
            while ok:
                ret, frame = cap.read()
                if not ret: break
                if (i % stride)==0:
                    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                    frames.append(frame)
                i += 1
            cap.release()
            if len(frames)==0:
                print('[cv2] zero frames for', path, '-> fallback')
        except Exception as e:
            print('[cv2] fail, fallback readers:', e)
            frames = []
    if not frames and HAS_IMAGEIO:
        try:
            i = 0
            for frm in iio.imiter(str(path)):
                if (i % stride)==0:
                    frames.append(frm)  # already RGB HxWxC uint8
                i += 1
            if len(frames)==0:
                print('[imageio] zero frames for', path)
        except Exception as e:
            print('[imageio] fail:', e)
    if not frames:
        print('[read_video_frames] FAILED to read frames from', path)
    else:
        print(f'[read_video_frames] {path.name}: frames={len(frames)} stride={stride} first_shape={frames[0].shape}')
    return frames

def embed_frames(frames, batch_size: int = 128, use_fp16: bool = True):
    if not frames:
        return np.zeros((0, 1280), dtype=np.float16)
    embs = []
    with torch.no_grad(), torch.amp.autocast(device_type='cuda', enabled=(device.type=='cuda' and use_fp16)):
        batch = []
        for i, img in enumerate(frames, 1):
            x = preproc(img)  # C,H,W
            batch.append(x)
            if (len(batch) == batch_size) or (i == len(frames)):
                xb = torch.stack(batch, dim=0).to(device)
                feat = mb(xb)  # B,1280,H',W'
                feat = pool(feat).flatten(1)  # B,1280
                embs.append(feat.detach().float().cpu())
                batch.clear()
    E = torch.cat(embs, dim=0).numpy().astype(np.float16)
    return E

def upsample_to_T(emb: np.ndarray, T: int) -> np.ndarray:
    # emb: (T',D), linear interp to T along time
    if emb.shape[0] == 0:
        return np.zeros((T, emb.shape[1] if emb.ndim==2 else 1280), dtype=np.float16)
    if emb.shape[0] == T:
        return emb
    import torch.nn.functional as F
    x = torch.from_numpy(emb.astype(np.float32)).unsqueeze(0).transpose(1,2)  # 1, D, T'
    y = F.interpolate(x, size=T, mode='linear', align_corners=False)  # 1, D, T
    y = y.transpose(1,2).squeeze(0).cpu().numpy().astype(np.float16)  # T, D
    return y

def cache_rgb_embedding_for_id(sid: int, stride: int = 2, force: bool = False):
    split = split_of_id(sid)
    out = rgb_emb_dir / split / f"{sid}.npy"
    if out.exists() and not force:
        try:
            arr = np.load(out, mmap_mode='r')
            if arr.shape[0] > 0:
                return out
            else:
                print(f'[cache] existing empty embedding for id={sid}, recomputing...')
        except Exception:
            print(f'[cache] failed to read existing embedding for id={sid}, recomputing...')
    vpath = extract_color_mp4_to_cache(sid)
    if vpath is None:
        return None
    frames = read_video_frames(vpath, stride=stride)
    E = embed_frames(frames, batch_size=128, use_fp16=True)  # (T',1280)
    np.save(out, E.astype(np.float16))
    return out

# Pilot: extract and embed a few IDs to validate the pipeline (force recompute if empty)
pilot_ids = [1, 3, 10, 101, 200, 250, 299, 300]
t0=time.time()
ok, fail = 0, 0
for sid in pilot_ids:
    vpath = extract_color_mp4_to_cache(sid)
    if vpath is None:
        print('[pilot] FAIL id', sid); fail += 1; continue
    _ = read_video_frames(vpath, stride=2)
    p = cache_rgb_embedding_for_id(sid, stride=2, force=True)
    if p is None:
        print('[pilot] FAIL id', sid); fail += 1
    else:
        arr = np.load(p, mmap_mode='r')
        print('[pilot] id', sid, '->', p, 'shape', arr.shape, 'dtype', arr.dtype)
        ok += 1
print(f'[pilot] done ok={ok} fail={fail} elapsed={time.time()-t0:.1f}s')

print('Next steps: bulk-extract embeddings for all train/test IDs, then train per-frame linear head on frozen embeddings (fold-pure) and cache RGB probs.', flush=True)

CUDA available for RGB: True decord: True cv2: True imageio: True


[read_video_frames] 1.mp4: frames=627 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 1.mp4: frames=627 stride=2 first_shape=(480, 640, 3)


[pilot] id 1 -> rgb_embed/train/1.npy shape (627, 1280) dtype float16


[read_video_frames] 3.mp4: frames=559 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 3.mp4: frames=559 stride=2 first_shape=(480, 640, 3)


[pilot] id 3 -> rgb_embed/train/3.npy shape (559, 1280) dtype float16


[read_video_frames] 10.mp4: frames=613 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 10.mp4: frames=613 stride=2 first_shape=(480, 640, 3)


[pilot] id 10 -> rgb_embed/train/10.npy shape (613, 1280) dtype float16


[read_video_frames] 101.mp4: frames=643 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 101.mp4: frames=643 stride=2 first_shape=(480, 640, 3)


[pilot] id 101 -> rgb_embed/train/101.npy shape (643, 1280) dtype float16


[read_video_frames] 200.mp4: frames=574 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 200.mp4: frames=574 stride=2 first_shape=(480, 640, 3)


[pilot] id 200 -> rgb_embed/train/200.npy shape (574, 1280) dtype float16


[read_video_frames] 250.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 250.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[pilot] id 250 -> rgb_embed/train/250.npy shape (606, 1280) dtype float16


[read_video_frames] 299.mp4: frames=576 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 299.mp4: frames=576 stride=2 first_shape=(480, 640, 3)


[pilot] id 299 -> rgb_embed/train/299.npy shape (576, 1280) dtype float16


[read_video_frames] 300.mp4: frames=624 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 300.mp4: frames=624 stride=2 first_shape=(480, 640, 3)


[pilot] id 300 -> rgb_embed/test/300.npy shape (624, 1280) dtype float16
[pilot] done ok=8 fail=0 elapsed=15.4s
Next steps: bulk-extract embeddings for all train/test IDs, then train per-frame linear head on frozen embeddings (fold-pure) and cache RGB probs.


In [29]:
# Install video I/O deps (decord, opencv-headless, imageio-ffmpeg) with torch constraints
import sys, subprocess, time, shutil, os
def pip(*args):
    print('> pip', ' '.join(args), flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

print('Installing decord, opencv-python-headless, imageio-ffmpeg with constraints...', flush=True)
constr = 'constraints.txt' if os.path.exists('constraints.txt') else None
args = ['install']
if constr: args += ['-c', constr]
args += ['decord==0.6.0', 'opencv-python-headless==4.10.0.84', 'imageio-ffmpeg==0.5.1']
t0=time.time()
pip(*args)
print(f'Done installs in {time.time()-t0:.1f}s', flush=True)

# Sanity import checks
try:
    import decord; from decord import VideoReader, cpu
    print('decord:', decord.__version__)
except Exception as e:
    print('decord import FAIL:', e)
try:
    import cv2
    print('cv2:', cv2.__version__)
except Exception as e:
    print('cv2 import FAIL:', e)
try:
    import imageio.v3 as iio, imageio_ffmpeg
    print('imageio-ffmpeg OK:', getattr(imageio_ffmpeg, '__version__', 'unknown'))
except Exception as e:
    print('imageio-ffmpeg import FAIL:', e)

Installing decord, opencv-python-headless, imageio-ffmpeg with constraints...


> pip install -c constraints.txt decord==0.6.0 opencv-python-headless==4.10.0.84 imageio-ffmpeg==0.5.1


Collecting decord==0.6.0
  Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl (13.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.6/13.6 MB 92.8 MB/s eta 0:00:00


Collecting opencv-python-headless==4.10.0.84
  Downloading opencv_python_headless-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.9/49.9 MB 39.7 MB/s eta 0:00:00
Collecting imageio-ffmpeg==0.5.1
  Downloading imageio_ffmpeg-0.5.1-py3-none-manylinux2010_x86_64.whl (26.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.9/26.9 MB 131.8 MB/s eta 0:00:00


Collecting numpy>=1.14.0
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 228.5 MB/s eta 0:00:00


Collecting setuptools
  Downloading setuptools-80.9.0-py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 550.4 MB/s eta 0:00:00


Installing collected packages: setuptools, numpy, opencv-python-headless, imageio-ffmpeg, decord


Successfully installed decord-0.6.0 imageio-ffmpeg-0.5.1 numpy-1.26.4 opencv-python-headless-4.10.0.84 setuptools-80.9.0


Done installs in 6.1s




decord: 0.6.0
cv2: 4.10.0
imageio-ffmpeg OK: 0.5.1


In [33]:
# RGB modality: bulk extraction of MobileNetV2 embeddings for all train/test IDs
import json, time
from pathlib import Path
import numpy as np

# Reuse helpers from cell 21: cache_rgb_embedding_for_id, split_of_id

def list_ids_from_features(split: str):
    base = Path('features3d_v3')/split
    ids = sorted(int(p.stem) for p in base.glob('*.npz'))
    return ids

train_ids = list_ids_from_features('train')
test_ids = list_ids_from_features('test')
print('Found ids -> train:', len(train_ids), 'test:', len(test_ids))

def bulk_extract(ids, stride=2, split_hint=None):
    t0=time.time(); ok=0; skip=0; fail=0
    for i, sid in enumerate(ids, 1):
        out = (Path('rgb_embed')/(split_of_id(sid)) / f"{sid}.npy")
        if out.exists():
            try:
                arr = np.load(out, mmap_mode='r')
                if arr.shape[0] > 0:
                    skip += 1
                    if (i%20)==0 or i==len(ids):
                        print(f'  skip {i}/{len(ids)} elapsed={time.time()-t0:.1f}s', flush=True)
                    continue
            except Exception:
                pass
        p = cache_rgb_embedding_for_id(int(sid), stride=stride, force=False)
        if p is None:
            fail += 1
        else:
            ok += 1
        if (i%20)==0 or i==len(ids):
            print(f'  processed {i}/{len(ids)} ok={ok} skip={skip} fail={fail} elapsed={time.time()-t0:.1f}s', flush=True)
    print(f'Done: ok={ok} skip={skip} fail={fail} total={len(ids)} elapsed={time.time()-t0:.1f}s', flush=True)

print('Bulk extracting TRAIN embeddings (stride=2)...', flush=True)
bulk_extract(train_ids, stride=2)
print('Bulk extracting TEST embeddings (stride=2)...', flush=True)
bulk_extract(test_ids, stride=2)
print('RGB embedding cache complete. Next: train per-frame linear head fold-pure and cache RGB probs for fusion.', flush=True)

Found ids -> train: 297 test: 95
Bulk extracting TRAIN embeddings (stride=2)...


[read_video_frames] 4.mp4: frames=668 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 5.mp4: frames=667 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 6.mp4: frames=601 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 7.mp4: frames=562 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 8.mp4: frames=596 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 9.mp4: frames=610 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 11.mp4: frames=571 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 12.mp4: frames=592 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 13.mp4: frames=608 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 14.mp4: frames=623 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 15.mp4: frames=650 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 16.mp4: frames=585 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 17.mp4: frames=584 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 18.mp4: frames=593 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 19.mp4: frames=605 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 20.mp4: frames=589 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 21.mp4: frames=552 stride=2 first_shape=(480, 640, 3)


  processed 20/297 ok=17 skip=3 fail=0 elapsed=36.7s


[read_video_frames] 22.mp4: frames=787 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 23.mp4: frames=741 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 24.mp4: frames=745 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 25.mp4: frames=591 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 26.mp4: frames=560 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 27.mp4: frames=596 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 28.mp4: frames=637 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 29.mp4: frames=578 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 30.mp4: frames=632 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 31.mp4: frames=668 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 32.mp4: frames=646 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 33.mp4: frames=601 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 34.mp4: frames=612 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 35.mp4: frames=670 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 36.mp4: frames=599 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 37.mp4: frames=627 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 38.mp4: frames=573 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 39.mp4: frames=562 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 40.mp4: frames=549 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 41.mp4: frames=916 stride=2 first_shape=(480, 640, 3)


  processed 40/297 ok=37 skip=3 fail=0 elapsed=101.2s


[read_video_frames] 42.mp4: frames=816 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 43.mp4: frames=833 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 44.mp4: frames=768 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 45.mp4: frames=853 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 46.mp4: frames=824 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 47.mp4: frames=834 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 48.mp4: frames=823 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 49.mp4: frames=844 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 50.mp4: frames=804 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 51.mp4: frames=841 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 52.mp4: frames=781 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 53.mp4: frames=873 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 54.mp4: frames=804 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 55.mp4: frames=844 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 56.mp4: frames=773 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 57.mp4: frames=646 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 58.mp4: frames=636 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 59.mp4: frames=613 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 60.mp4: frames=663 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 61.mp4: frames=655 stride=2 first_shape=(480, 640, 3)


  processed 60/297 ok=57 skip=3 fail=0 elapsed=200.7s


[read_video_frames] 62.mp4: frames=653 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 63.mp4: frames=646 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 64.mp4: frames=644 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 65.mp4: frames=617 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 66.mp4: frames=629 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 67.mp4: frames=564 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 68.mp4: frames=565 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 69.mp4: frames=590 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 70.mp4: frames=668 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 71.mp4: frames=830 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 72.mp4: frames=810 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 73.mp4: frames=791 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 74.mp4: frames=805 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 75.mp4: frames=810 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 76.mp4: frames=745 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 77.mp4: frames=823 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 78.mp4: frames=817 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 79.mp4: frames=836 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 80.mp4: frames=858 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 81.mp4: frames=643 stride=2 first_shape=(480, 640, 3)


  processed 80/297 ok=77 skip=3 fail=0 elapsed=324.8s


[read_video_frames] 82.mp4: frames=685 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 83.mp4: frames=587 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 84.mp4: frames=717 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 85.mp4: frames=522 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 86.mp4: frames=643 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 87.mp4: frames=661 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 88.mp4: frames=623 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 89.mp4: frames=605 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 90.mp4: frames=601 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 91.mp4: frames=653 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 92.mp4: frames=599 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 93.mp4: frames=586 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 94.mp4: frames=641 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 95.mp4: frames=687 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 96.mp4: frames=541 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 97.mp4: frames=522 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 98.mp4: frames=622 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 99.mp4: frames=562 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 102.mp4: frames=640 stride=2 first_shape=(480, 640, 3)


  processed 100/297 ok=96 skip=4 fail=0 elapsed=446.5s


[read_video_frames] 103.mp4: frames=578 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 104.mp4: frames=644 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 105.mp4: frames=632 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 106.mp4: frames=586 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 107.mp4: frames=643 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 108.mp4: frames=630 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 109.mp4: frames=627 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 110.mp4: frames=647 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 111.mp4: frames=660 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 112.mp4: frames=632 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 113.mp4: frames=621 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 114.mp4: frames=652 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 115.mp4: frames=583 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 116.mp4: frames=560 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 117.mp4: frames=619 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 118.mp4: frames=625 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 119.mp4: frames=643 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 120.mp4: frames=610 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 121.mp4: frames=574 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 122.mp4: frames=582 stride=2 first_shape=(480, 640, 3)


  processed 120/297 ok=116 skip=4 fail=0 elapsed=478.0s


[read_video_frames] 123.mp4: frames=666 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 124.mp4: frames=651 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 125.mp4: frames=615 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 126.mp4: frames=621 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 127.mp4: frames=532 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 128.mp4: frames=602 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 129.mp4: frames=603 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 130.mp4: frames=674 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 131.mp4: frames=601 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 132.mp4: frames=643 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 133.mp4: frames=581 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 134.mp4: frames=638 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 135.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 136.mp4: frames=637 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 137.mp4: frames=663 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 138.mp4: frames=672 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 139.mp4: frames=671 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 140.mp4: frames=592 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 141.mp4: frames=591 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 142.mp4: frames=584 stride=2 first_shape=(480, 640, 3)


  processed 140/297 ok=136 skip=4 fail=0 elapsed=518.9s


[read_video_frames] 143.mp4: frames=613 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 144.mp4: frames=507 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 145.mp4: frames=630 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 146.mp4: frames=605 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 147.mp4: frames=603 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 148.mp4: frames=552 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 149.mp4: frames=609 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 150.mp4: frames=577 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 151.mp4: frames=602 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 152.mp4: frames=601 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 153.mp4: frames=598 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 154.mp4: frames=591 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 155.mp4: frames=596 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 156.mp4: frames=572 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 157.mp4: frames=603 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 158.mp4: frames=632 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 159.mp4: frames=545 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 160.mp4: frames=617 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 161.mp4: frames=596 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 162.mp4: frames=562 stride=2 first_shape=(480, 640, 3)


  processed 160/297 ok=156 skip=4 fail=0 elapsed=566.9s


[read_video_frames] 163.mp4: frames=560 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 164.mp4: frames=642 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 165.mp4: frames=568 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 166.mp4: frames=616 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 167.mp4: frames=577 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 168.mp4: frames=553 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 169.mp4: frames=605 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 170.mp4: frames=640 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 171.mp4: frames=615 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 172.mp4: frames=540 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 173.mp4: frames=611 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 174.mp4: frames=565 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 175.mp4: frames=566 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 176.mp4: frames=670 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 177.mp4: frames=670 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 178.mp4: frames=616 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 179.mp4: frames=522 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 180.mp4: frames=647 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 181.mp4: frames=577 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 182.mp4: frames=611 stride=2 first_shape=(480, 640, 3)


  processed 180/297 ok=176 skip=4 fail=0 elapsed=623.2s


[read_video_frames] 183.mp4: frames=650 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 184.mp4: frames=627 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 185.mp4: frames=637 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 186.mp4: frames=549 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 187.mp4: frames=598 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 188.mp4: frames=564 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 189.mp4: frames=581 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 190.mp4: frames=690 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 191.mp4: frames=655 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 192.mp4: frames=607 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 193.mp4: frames=590 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 194.mp4: frames=623 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 195.mp4: frames=593 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 196.mp4: frames=620 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 197.mp4: frames=591 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 198.mp4: frames=639 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 199.mp4: frames=615 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 201.mp4: frames=585 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 202.mp4: frames=595 stride=2 first_shape=(480, 640, 3)


  processed 200/297 ok=195 skip=5 fail=0 elapsed=681.4s


[read_video_frames] 203.mp4: frames=636 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 204.mp4: frames=628 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 205.mp4: frames=609 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 206.mp4: frames=588 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 207.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 208.mp4: frames=608 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 209.mp4: frames=689 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 210.mp4: frames=587 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 211.mp4: frames=692 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 212.mp4: frames=590 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 213.mp4: frames=601 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 214.mp4: frames=585 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 215.mp4: frames=613 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 216.mp4: frames=618 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 217.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 218.mp4: frames=548 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 219.mp4: frames=617 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 220.mp4: frames=573 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 221.mp4: frames=595 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 222.mp4: frames=627 stride=2 first_shape=(480, 640, 3)


  processed 220/297 ok=215 skip=5 fail=0 elapsed=713.8s


[read_video_frames] 223.mp4: frames=565 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 224.mp4: frames=567 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 225.mp4: frames=524 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 226.mp4: frames=563 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 227.mp4: frames=616 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 228.mp4: frames=573 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 229.mp4: frames=640 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 230.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 231.mp4: frames=622 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 232.mp4: frames=647 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 233.mp4: frames=684 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 234.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 235.mp4: frames=621 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 236.mp4: frames=579 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 237.mp4: frames=631 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 238.mp4: frames=645 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 239.mp4: frames=600 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 240.mp4: frames=665 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 241.mp4: frames=661 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 242.mp4: frames=590 stride=2 first_shape=(480, 640, 3)


  processed 240/297 ok=235 skip=5 fail=0 elapsed=766.1s


[read_video_frames] 243.mp4: frames=576 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 244.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 245.mp4: frames=613 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 246.mp4: frames=599 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 247.mp4: frames=603 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 248.mp4: frames=622 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 249.mp4: frames=625 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 251.mp4: frames=706 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 252.mp4: frames=717 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 253.mp4: frames=836 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 254.mp4: frames=616 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 255.mp4: frames=620 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 256.mp4: frames=619 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 257.mp4: frames=545 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 258.mp4: frames=549 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 259.mp4: frames=584 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 260.mp4: frames=587 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 261.mp4: frames=585 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 262.mp4: frames=660 stride=2 first_shape=(480, 640, 3)


  processed 260/297 ok=254 skip=6 fail=0 elapsed=827.1s


[read_video_frames] 263.mp4: frames=604 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 264.mp4: frames=605 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 265.mp4: frames=647 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 266.mp4: frames=603 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 267.mp4: frames=690 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 268.mp4: frames=645 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 269.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 270.mp4: frames=616 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 271.mp4: frames=530 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 272.mp4: frames=716 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 273.mp4: frames=808 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 274.mp4: frames=748 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 275.mp4: frames=577 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 276.mp4: frames=517 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 277.mp4: frames=577 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 278.mp4: frames=613 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 279.mp4: frames=573 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 280.mp4: frames=628 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 281.mp4: frames=577 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 282.mp4: frames=573 stride=2 first_shape=(480, 640, 3)


  processed 280/297 ok=274 skip=6 fail=0 elapsed=898.9s


[read_video_frames] 283.mp4: frames=632 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 284.mp4: frames=585 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 285.mp4: frames=593 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 286.mp4: frames=624 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 287.mp4: frames=586 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 288.mp4: frames=652 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 289.mp4: frames=603 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 290.mp4: frames=605 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 291.mp4: frames=569 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 292.mp4: frames=569 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 293.mp4: frames=575 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 294.mp4: frames=549 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 295.mp4: frames=610 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 296.mp4: frames=581 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 297.mp4: frames=530 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 298.mp4: frames=596 stride=2 first_shape=(480, 640, 3)


  skip 297/297 elapsed=962.8s


Done: ok=290 skip=7 fail=0 total=297 elapsed=962.8s


Bulk extracting TEST embeddings (stride=2)...


[read_video_frames] 301.mp4: frames=626 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 302.mp4: frames=661 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 303.mp4: frames=575 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 304.mp4: frames=614 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 305.mp4: frames=620 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 306.mp4: frames=612 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 307.mp4: frames=594 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 308.mp4: frames=565 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 309.mp4: frames=591 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 310.mp4: frames=604 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 311.mp4: frames=695 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 312.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 313.mp4: frames=607 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 314.mp4: frames=608 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 315.mp4: frames=662 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 316.mp4: frames=611 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 317.mp4: frames=631 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 318.mp4: frames=601 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 319.mp4: frames=579 stride=2 first_shape=(480, 640, 3)


  processed 20/95 ok=19 skip=1 fail=0 elapsed=30.1s


[read_video_frames] 320.mp4: frames=573 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 321.mp4: frames=589 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 322.mp4: frames=657 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 323.mp4: frames=580 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 324.mp4: frames=620 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 325.mp4: frames=621 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 326.mp4: frames=636 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 327.mp4: frames=599 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 328.mp4: frames=565 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 329.mp4: frames=609 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 330.mp4: frames=576 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 332.mp4: frames=781 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 333.mp4: frames=776 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 334.mp4: frames=704 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 335.mp4: frames=762 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 336.mp4: frames=787 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 337.mp4: frames=768 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 338.mp4: frames=854 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 339.mp4: frames=820 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 340.mp4: frames=602 stride=2 first_shape=(480, 640, 3)


  processed 40/95 ok=39 skip=1 fail=0 elapsed=73.7s


[read_video_frames] 341.mp4: frames=662 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 342.mp4: frames=529 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 343.mp4: frames=601 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 344.mp4: frames=634 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 345.mp4: frames=590 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 346.mp4: frames=615 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 347.mp4: frames=581 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 348.mp4: frames=530 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 351.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 352.mp4: frames=621 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 353.mp4: frames=604 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 354.mp4: frames=605 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 355.mp4: frames=701 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 356.mp4: frames=652 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 357.mp4: frames=608 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 358.mp4: frames=685 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 359.mp4: frames=632 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 360.mp4: frames=647 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 361.mp4: frames=596 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 362.mp4: frames=569 stride=2 first_shape=(480, 640, 3)


  processed 60/95 ok=59 skip=1 fail=0 elapsed=126.7s


[read_video_frames] 363.mp4: frames=599 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 364.mp4: frames=572 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 365.mp4: frames=655 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 366.mp4: frames=639 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 367.mp4: frames=598 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 368.mp4: frames=688 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 369.mp4: frames=602 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 370.mp4: frames=634 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 371.mp4: frames=933 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 372.mp4: frames=848 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 373.mp4: frames=639 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 374.mp4: frames=547 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 375.mp4: frames=629 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 376.mp4: frames=654 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 377.mp4: frames=602 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 378.mp4: frames=702 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 379.mp4: frames=564 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 380.mp4: frames=598 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 381.mp4: frames=606 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 383.mp4: frames=588 stride=2 first_shape=(480, 640, 3)


  processed 80/95 ok=79 skip=1 fail=0 elapsed=191.7s


[read_video_frames] 384.mp4: frames=684 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 385.mp4: frames=592 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 386.mp4: frames=578 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 389.mp4: frames=585 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 390.mp4: frames=637 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 391.mp4: frames=667 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 392.mp4: frames=618 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 393.mp4: frames=577 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 394.mp4: frames=641 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 395.mp4: frames=611 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 396.mp4: frames=628 stride=2 first_shape=(480, 640, 3)


[read_video_frames] 397.mp4: frames=654 stride=2 first_shape=(480, 640, 3)


[extract] Missing tar for id=401: None
[extract] Missing tar for id=402: None
[extract] Missing tar for id=403: None
  processed 95/95 ok=91 skip=1 fail=3 elapsed=234.4s


Done: ok=91 skip=1 fail=3 total=95 elapsed=234.4s


RGB embedding cache complete. Next: train per-frame linear head fold-pure and cache RGB probs for fusion.


In [35]:
# RGB modality: train fold-pure linear per-frame head on cached embeddings; cache OOF/test probs; fit scalar temperature per fold
import os, json, time, math, random
from pathlib import Path
from typing import List, Dict
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import pandas as pd
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.backends.cudnn.benchmark = True
random.seed(42); np.random.seed(42); torch.manual_seed(42)

rgb_emb_dir = Path('rgb_embed')
labels_dir = Path('labels3d_v2/train')
probs_cache = Path('probs_cache'); probs_cache.mkdir(exist_ok=True)

# Helper: load skeleton T for train/test to upsample RGB embeddings appropriately
def get_T_train(sid: int) -> int:
    y = np.load(labels_dir / f"{sid}.npy")
    return int(y.shape[0])
def get_T_test(sid: int) -> int:
    # use v3 CE probs length if available; else infer from features3d_v3/test npz 'X' first dim
    p3 = probs_cache / f"{sid}_ce_v3.npy"
    if p3.exists():
        return int(np.load(p3, mmap_mode='r').shape[1])  # CxT
    # fallback: features3d_v3
    d = np.load(Path('features3d_v3/test')/f"{sid}.npz")
    if 'X' in d.files:
        X = d['X']
        # X observed shape (T, D) in this repo schema (first dim matches labels length)
        return int(X.shape[0])
    return None

def upsample_to_T_np(E: np.ndarray, T: int) -> np.ndarray:
    if E.shape[0] == T:
        return E.astype(np.float32)
    if E.shape[0] == 0:
        return np.zeros((T, E.shape[1] if E.ndim==2 else 1280), dtype=np.float32)
    import torch.nn.functional as Fnn
    x = torch.from_numpy(E.astype(np.float32)).unsqueeze(0).transpose(1,2)  # 1,D,T'
    y = Fnn.interpolate(x, size=T, mode='linear', align_corners=False).transpose(1,2).squeeze(0).contiguous()
    return y.numpy().astype(np.float32)

class RGBSeqDataset(Dataset):
    def __init__(self, ids: List[int], split: str, chunk_len: int = 1024):
        self.ids = list(ids)
        self.split = split  # 'train' only here
        self.chunk_len = chunk_len
        # build index of (sid, start, end) chunks for efficient batching
        self.index = []
        for sid in self.ids:
            E = np.load(rgb_emb_dir/'train'/f"{sid}.npy", mmap_mode='r')  # (T',1280)
            T = get_T_train(sid)
            Eu = upsample_to_T_np(np.array(E), T)  # (T,1280)
            n = Eu.shape[0]
            # create chunks
            if n <= chunk_len:
                self.index.append((sid, 0, n))
            else:
                s = 0
                while s < n:
                    e = min(n, s + chunk_len)
                    self.index.append((sid, s, e))
                    s = e
        random.shuffle(self.index)
    def __len__(self):
        return len(self.index)
    def __getitem__(self, i):
        sid, s, e = self.index[i]
        E = np.load(rgb_emb_dir/'train'/f"{sid}.npy", mmap_mode='r')
        T = get_T_train(sid)
        Eu = upsample_to_T_np(np.array(E), T)  # (T,1280)
        y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)  # (T,)
        x = Eu[s:e].astype(np.float32)
        t = y[s:e]  # include background (0..20)
        return torch.from_numpy(x), torch.from_numpy(t)

class RGBLinearHead(nn.Module):
    def __init__(self, d_in=1280, n_classes=21, p_drop=0.5):
        super().__init__()
        self.drop = nn.Dropout(p_drop)
        self.fc = nn.Linear(d_in, n_classes)
    def forward(self, x):  # x: B, L, D
        x = self.drop(x)
        return self.fc(x)  # B, L, C

def train_rgb_fold(train_ids: List[int], val_ids: List[int],
                    epochs: int = 12, lr: float = 1e-3, wd: float = 1e-5,
                    chunk_len: int = 1024, batch_size: int = 1, patience: int = 3):
    model = RGBLinearHead().to(device)
    opt = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=wd)
    best_val = 1e9; bad = 0
    for ep in range(1, epochs+1):
        t0 = time.time()
        model.train()
        ds = RGBSeqDataset(train_ids, split='train', chunk_len=chunk_len)
        dl = DataLoader(ds, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
        tr_loss = 0.0; n_tok = 0
        for xb, yb in dl:
            xb = xb.to(device, non_blocking=True)  # B,L,D
            yb = yb.to(device, non_blocking=True)  # B,L
            opt.zero_grad(set_to_none=True)
            logits = model(xb)  # B,L,C
            loss = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), yb.reshape(-1))
            loss.backward()
            opt.step()
            tr_loss += float(loss.item()) * yb.numel()
            n_tok += int(yb.numel())
        tr_loss = tr_loss / max(1, n_tok)
        # quick val NLL
        model.eval()
        val_loss = 0.0; n_tok = 0
        with torch.no_grad():
            for sid in val_ids:
                E = np.load(rgb_emb_dir/'train'/f"{sid}.npy", mmap_mode='r')
                T = get_T_train(sid)
                Eu = upsample_to_T_np(np.array(E), T)  # (T,1280)
                y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)  # (T,)
                xb = torch.from_numpy(Eu).unsqueeze(0).to(device)  # 1,T,D
                logits = model(xb)  # 1,T,C
                ll = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), torch.from_numpy(y).to(device))
                val_loss += float(ll.item()) * int(T)
                n_tok += int(T)
        val_loss = val_loss / max(1, n_tok)
        print(f"[RGB fold] ep {ep:02d} tr_nll={tr_loss:.4f} val_nll={val_loss:.4f} elapsed={time.time()-t0:.1f}s", flush=True)
        if val_loss < best_val - 1e-4:
            best_val = val_loss; bad = 0
            torch.save(model.state_dict(), 'rgb_head_tmp.pth')
        else:
            bad += 1
            if bad >= patience:
                break
    # load best
    model.load_state_dict(torch.load('rgb_head_tmp.pth', map_location=device))
    return model

def infer_probs_for_ids(model: nn.Module, ids: List[int], split: str, out_suffix: str):
    # Import embed cache helper from previous cell (21) if available
    from __main__ import cache_rgb_embedding_for_id  # notebook context
    model.eval()
    saved = 0; t0 = time.time()
    with torch.no_grad():
        for i, sid in enumerate(ids, 1):
            if split == 'train':
                T = get_T_train(sid)
                emb_path = rgb_emb_dir/'train'/f"{sid}.npy"
            else:
                T = get_T_test(sid)
                emb_path = rgb_emb_dir/'test'/f"{sid}.npy"
            if not emb_path.exists():
                # attempt on-the-fly embedding extraction (stride=2) if missing
                try:
                    cache_rgb_embedding_for_id(int(sid), stride=2, force=False)
                except Exception as e:
                    print(f"  [RGB infer] missing embedding for id={sid}, skip. err={e}")
                    continue
            if not emb_path.exists():
                print(f"  [RGB infer] still missing embedding for id={sid}, skipping.")
                continue
            E = np.load(emb_path, mmap_mode='r')
            Eu = upsample_to_T_np(np.array(E), T)  # (T,1280)
            xb = torch.from_numpy(Eu).unsqueeze(0).to(device)  # 1,T,D
            logits = model(xb)[0]  # T,C
            p = logits.softmax(dim=-1).cpu().numpy().astype(np.float32)  # T,C
            p = p / (p.sum(axis=1, keepdims=True) + 1e-8)
            np.save(probs_cache/f"{sid}{out_suffix}", p.T)  # save CxT
            saved += 1
            if (i%20)==0 or i==len(ids):
                print(f"  saved {saved}/{len(ids)} split={split} elapsed={time.time()-t0:.1f}s", flush=True)

def fit_scalar_temperature_on_val(val_ids: List[int], suffix: str) -> float:
    # find T in [0.8, 1.5] grid minimizing NLL on RGB val probs (pre-calibration)
    grid = [round(x,2) for x in np.linspace(0.8, 1.5, 15)]
    best_T = 1.0; best_nll = 1e18
    for Tval in grid:
        nll = 0.0; n_tok = 0
        for sid in val_ids:
            p = np.load(probs_cache/f"{sid}{suffix}")  # CxT
            y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)  # (T,)
            logp = np.log(np.clip(p, 1e-8, 1.0)) / float(Tval)
            q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)  # CxT
            # gather -log q[y_t, t]
            idx = (y >= 0) & (y < q.shape[0])
            yy = y[idx]
            nll += -float(np.log(q[yy, np.nonzero(idx)[0]] + 1e-8).sum())
            n_tok += int(idx.sum())
        if n_tok > 0:
            nll /= float(n_tok)
            if nll < best_nll:
                best_nll = nll; best_T = Tval
    print(f"[Temp] best scalar T={best_T} NLL={best_nll:.4f} on {len(val_ids)} val ids", flush=True)
    return float(best_T)

def apply_scalar_temperature(ids: List[int], suffix: str, Tscalar: float):
    for sid in ids:
        p = np.load(probs_cache/f"{sid}{suffix}")  # CxT
        logp = np.log(np.clip(p, 1e-8, 1.0)) / float(Tscalar)
        q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
        np.save(probs_cache/f"{sid}{suffix}", q.astype(np.float32))

# Main fold loop
with open('folds_archive_cv.json') as f:
    folds_list = json.load(f)
print('Training RGB linear head per fold, caching OOF and test probs...', flush=True)
# Use test.csv for test id order (avoid stray ids like 401..403) 
test_ids_list = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
# For test accumulation, save per-fold files as _rgb_f{fold}.npy; later average to _rgb.npy
for fd in folds_list:
    fidx = int(fd['fold'])
    tr_ids = list(map(int, fd['train_ids']))
    va_ids = list(map(int, fd['val_ids']))
    print(f'Fold {fidx}: train={len(tr_ids)} val={len(va_ids)}', flush=True)
    model = train_rgb_fold(tr_ids, va_ids, epochs=12, lr=1e-3, wd=1e-5, chunk_len=1024, batch_size=1, patience=3)
    # Inference OOF (val) -> save as {id}_rgb.npy (CxT)
    infer_probs_for_ids(model, va_ids, split='train', out_suffix='_rgb.npy')
    # Inference TEST -> save per-fold {id}_rgb_f{fidx}.npy, using test.csv ids
    infer_probs_for_ids(model, test_ids_list, split='test', out_suffix=f'_rgb_f{fidx}.npy')
    # Fit scalar temperature on val and apply to val probs
    Tbest = fit_scalar_temperature_on_val(va_ids, suffix='_rgb.npy')
    apply_scalar_temperature(va_ids, suffix='_rgb.npy', Tscalar=Tbest)
    # Save fold temperature for later applying to test fusion if needed
    Path(f'rgb_temp_fold{fidx}.json').write_text(json.dumps({'T': Tbest}))
print('Done RGB head training per fold.')
print('Next: average test per-fold RGB probs to probs_cache/{id}_rgb.npy and proceed to fusion (alpha grid) and decoding.', flush=True)

Training RGB linear head per fold, caching OOF and test probs...


Fold 0: train=199 val=98


[RGB fold] ep 01 tr_nll=3.2055 val_nll=3.7633 elapsed=4.1s


[RGB fold] ep 02 tr_nll=3.0332 val_nll=3.7617 elapsed=4.2s


[RGB fold] ep 03 tr_nll=2.9287 val_nll=3.6381 elapsed=4.2s


[RGB fold] ep 04 tr_nll=2.9361 val_nll=3.0754 elapsed=4.1s


[RGB fold] ep 05 tr_nll=2.8915 val_nll=3.9235 elapsed=4.1s


[RGB fold] ep 06 tr_nll=2.8633 val_nll=3.4826 elapsed=4.0s


[RGB fold] ep 07 tr_nll=2.8273 val_nll=3.5811 elapsed=3.9s


  saved 20/98 split=train elapsed=0.1s


  model.load_state_dict(torch.load('rgb_head_tmp.pth', map_location=device))


  saved 40/98 split=train elapsed=0.3s


  saved 60/98 split=train elapsed=0.4s


  saved 80/98 split=train elapsed=0.6s


  saved 98/98 split=train elapsed=0.7s


  saved 20/95 split=test elapsed=0.1s


  saved 40/95 split=test elapsed=0.3s


  saved 60/95 split=test elapsed=0.4s


  saved 80/95 split=test elapsed=0.6s


[extract] Missing tar for id=401: None
  [RGB infer] still missing embedding for id=401, skipping.
[extract] Missing tar for id=402: None
  [RGB infer] still missing embedding for id=402, skipping.
[extract] Missing tar for id=403: None
  [RGB infer] still missing embedding for id=403, skipping.


[Temp] best scalar T=1.5 NLL=2.9172 on 98 val ids


Fold 1: train=198 val=99


[RGB fold] ep 01 tr_nll=3.1472 val_nll=3.2586 elapsed=4.2s


[RGB fold] ep 02 tr_nll=2.9689 val_nll=3.5512 elapsed=4.1s


[RGB fold] ep 03 tr_nll=2.8852 val_nll=3.2042 elapsed=4.2s


[RGB fold] ep 04 tr_nll=2.8335 val_nll=3.3165 elapsed=4.1s


[RGB fold] ep 05 tr_nll=2.8508 val_nll=3.2487 elapsed=4.5s


[RGB fold] ep 06 tr_nll=2.8143 val_nll=3.5368 elapsed=4.2s


  saved 20/99 split=train elapsed=0.1s


  saved 40/99 split=train elapsed=0.3s


  saved 60/99 split=train elapsed=0.4s


  saved 80/99 split=train elapsed=0.5s


  saved 99/99 split=train elapsed=0.7s


  saved 20/95 split=test elapsed=0.1s


  saved 40/95 split=test elapsed=0.3s


  saved 60/95 split=test elapsed=0.4s


  saved 80/95 split=test elapsed=0.6s


[extract] Missing tar for id=401: None
  [RGB infer] still missing embedding for id=401, skipping.
[extract] Missing tar for id=402: None
  [RGB infer] still missing embedding for id=402, skipping.
[extract] Missing tar for id=403: None
  [RGB infer] still missing embedding for id=403, skipping.


[Temp] best scalar T=1.5 NLL=3.0672 on 99 val ids


Fold 2: train=197 val=100


[RGB fold] ep 01 tr_nll=2.7794 val_nll=3.7412 elapsed=4.1s


[RGB fold] ep 02 tr_nll=2.5739 val_nll=3.8235 elapsed=4.1s


[RGB fold] ep 03 tr_nll=2.5227 val_nll=4.3732 elapsed=4.0s


[RGB fold] ep 04 tr_nll=2.5053 val_nll=3.9944 elapsed=4.2s


  saved 20/100 split=train elapsed=0.1s


  saved 40/100 split=train elapsed=0.3s


  saved 60/100 split=train elapsed=0.4s


  saved 80/100 split=train elapsed=0.5s


  saved 100/100 split=train elapsed=0.7s


  saved 20/95 split=test elapsed=0.1s


  saved 40/95 split=test elapsed=0.3s


  saved 60/95 split=test elapsed=0.4s


  saved 80/95 split=test elapsed=0.6s


[extract] Missing tar for id=401: None
  [RGB infer] still missing embedding for id=401, skipping.
[extract] Missing tar for id=402: None
  [RGB infer] still missing embedding for id=402, skipping.
[extract] Missing tar for id=403: None
  [RGB infer] still missing embedding for id=403, skipping.


[Temp] best scalar T=1.5 NLL=3.3059 on 100 val ids


Done RGB head training per fold.
Next: average test per-fold RGB probs to probs_cache/{id}_rgb.npy and proceed to fusion (alpha grid) and decoding.


In [38]:
# RGB fusion: OOF alpha tuning (geometric PoE), align RGB->skeleton, decode, and test submission
import numpy as np, json, time, os
from pathlib import Path
import pandas as pd

probs_cache = Path('probs_cache'); probs_cache.mkdir(exist_ok=True)

# Utilities: ensure CxT, entropy/fg-mass, best shift, align
def ensure_CxT(p: np.ndarray, C: int) -> np.ndarray:
    if p.ndim != 2:
        raise ValueError(f'Expected 2D probs, got {p.shape}')
    if p.shape[0] == C:
        return p.astype(np.float32)
    if p.shape[1] == C:
        return p.T.astype(np.float32)
    raise ValueError(f'Cannot ensure CxT; probs shape {p.shape}, C={C}')

def entropy_series(p_c_t: np.ndarray) -> np.ndarray:
    p = np.clip(p_c_t, 1e-8, 1.0)
    return (- (p * np.log(p)).sum(axis=0)).astype(np.float32)

def fg_series(p_c_t: np.ndarray) -> np.ndarray:
    return (1.0 - np.clip(p_c_t[0], 0.0, 1.0)).astype(np.float32)

def best_shift_by_corr(a: np.ndarray, b: np.ndarray, max_shift: int = 15) -> int:
    best_s = 0; best_r = -1e9
    T = int(min(a.shape[0], b.shape[0])); a = a[:T]; b = b[:T]
    for s in range(-max_shift, max_shift+1):
        if s >= 0:
            x = a[:T - s]; y = b[s:T]
        else:
            x = a[-s:T]; y = b[:T + s]
        if x.size < 8: continue
        sx = float(np.std(x)); sy = float(np.std(y))
        if sx < 1e-6 or sy < 1e-6: continue
        r = float(np.corrcoef(x, y)[0,1])
        if np.isfinite(r) and r > best_r:
            best_r = r; best_s = s
    return int(best_s)

def align_rgb_to_skel(p_rgb: np.ndarray, p_skel: np.ndarray, max_shift: int = 15) -> np.ndarray:
    # Inputs CxT, already normalized per frame
    e_r = entropy_series(p_rgb); e_s = entropy_series(p_skel)
    s = best_shift_by_corr(e_r, e_s, max_shift=max_shift)
    if s == 0 and (np.std(e_r) < 1e-6 or np.std(e_s) < 1e-6):
        f_r = fg_series(p_rgb); f_s = fg_series(p_skel)
        s = best_shift_by_corr(f_r, f_s, max_shift=max_shift)
    if s > 0:
        pr = p_rgb[:, s:]; ps = p_skel[:, :pr.shape[1]]
    elif s < 0:
        s2 = -s; ps = p_skel[:, s2:]; pr = p_rgb[:, :ps.shape[1]]
    else:
        Tm = min(p_rgb.shape[1], p_skel.shape[1]); pr = p_rgb[:, :Tm]; ps = p_skel[:, :Tm]
    Tm = min(pr.shape[1], ps.shape[1])
    pr = pr[:, :Tm]; ps = ps[:, :Tm]
    # renorm
    pr = pr / (pr.sum(axis=0, keepdims=True) + 1e-8)
    ps = ps / (ps.sum(axis=0, keepdims=True) + 1e-8)
    return pr, ps

# Geometric fusion (product of experts) in log-space with weight alpha
def fuse_geometric(p_skel: np.ndarray, p_rgb: np.ndarray, alpha: float) -> np.ndarray:
    p_s = np.clip(p_skel, 1e-8, 1.0); p_r = np.clip(p_rgb, 1e-8, 1.0)
    logp = (1.0 - float(alpha)) * np.log(p_s) + float(alpha) * np.log(p_r)
    q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
    return q.astype(np.float32)

# Decoder pieces reused from earlier cells: compute_runlen_stats, build_min_dur, decode_minseg_smooth_aba, load_frame_labels, make_perm20
from collections import defaultdict
def compute_runlen_stats(ids):
    agg = defaultdict(list)
    for sid in ids:
        y = load_frame_labels(int(sid))
        cur=None; run=0
        for c in y:
            if c==0:
                if cur is not None: agg[cur].append(run); cur=None; run=0
                continue
            if cur is None: cur=int(c); run=1
            elif c==cur: run+=1
            else: agg[cur].append(run); cur=int(c); run=1
        if cur is not None: agg[cur].append(run)
    med = np.zeros(21, np.float32); q75 = np.zeros(21, np.float32)
    for c in range(1,21):
        ls = agg.get(c, [])
        if ls:
            arr = np.array(ls, np.float32); med[c]=float(np.median(arr)); q75[c]=float(np.percentile(arr, 75.0))
        else:
            med[c]=1.0; q75[c]=2.0
    return med, q75

def build_min_dur(med, q75, mult):
    md = np.round(med * float(mult)).astype(np.int32)
    md = np.clip(md, 2, np.maximum(q75.astype(np.int32), 2)); md[0]=0; return md

def compress_to_sequence(y_frames):
    seq=[]; last=-1
    for c in y_frames:
        if c==0: continue
        if c!=last: seq.append(int(c)); last=int(c)
    return seq

def levenshtein(a,b):
    n,m=len(a),len(b)
    if n==0: return m
    if m==0: return n
    dp=list(range(m+1))
    for i in range(1,n+1):
        prev=dp[0]; dp[0]=i; ai=a[i-1]
        for j in range(1,m+1):
            tmp=dp[j]; dp[j]=min(dp[j]+1, dp[j-1]+1, prev + (0 if ai==b[j-1] else 1)); prev=tmp
    return dp[m]

# Helper: temperature scale probs (CxT) with scalar T
def temp_scale_array(p: np.ndarray, T: float) -> np.ndarray:
    q = np.exp(np.log(np.clip(p, 1e-8, 1.0)) / float(T))
    q /= (q.sum(axis=0, keepdims=True) + 1e-8)
    return q.astype(np.float32)

# Average test per-fold RGB probs -> {id}_rgb.npy, applying per-fold temperature first
def average_test_rgb_folds():
    # Load per-fold temperatures
    Ts = []
    for f in range(3):
        jf = Path(f'rgb_temp_fold{f}.json')
        if jf.exists():
            try:
                Ts.append(float(json.loads(jf.read_text())['T']))
            except Exception:
                Ts.append(1.0)
        else:
            Ts.append(1.0)
    # Use test.csv to define the canonical id list (avoid stray 401..403 etc.)
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    n_avg=0
    for sid in test_ids:
        outs = [probs_cache/f"{sid}_rgb_f0.npy", probs_cache/f"{sid}_rgb_f1.npy", probs_cache/f"{sid}_rgb_f2.npy"]
        arrs=[]
        for idx, p in enumerate(outs):
            if p.exists():
                a = np.load(p, mmap_mode='r').astype(np.float32)
                a = temp_scale_array(a, Ts[idx])  # apply per-fold T
                arrs.append(a)
        if not arrs:
            continue
        # align lengths if small drift
        Tm = min(a.shape[1] for a in arrs)
        arrs = [a[:, :Tm].astype(np.float32) for a in arrs]
        m = np.mean(arrs, axis=0)
        m = m / (m.sum(axis=0, keepdims=True) + 1e-8)
        np.save(probs_cache/f"{sid}_rgb.npy", m.astype(np.float32))
        n_avg += 1
    print('Averaged test RGB per-fold files ->', n_avg, 'ids')

# OOF alpha tuning: use leave-one-archive-out folds, align RGB to skeleton per id, fuse, decode, score
def oof_alpha_tune(alpha_list=(0.22,0.24,0.25,0.26,0.28,0.30), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04):
    with open('folds_archive_cv.json') as f:
        folds_list_local = json.load(f)
    worst_by={}; mean_by={}
    for alpha in alpha_list:
        per_fold=[]
        print(f'[OOF] alpha={alpha}', flush=True)
        for fd in folds_list_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult=mult)
            dists=[]; n=0; t0=time.time()
            for sid in va_ids:
                # Load RGB OOF (already at skeleton T): CxT
                p_rgb = np.load(probs_cache/f"{sid}_rgb.npy").astype(np.float32)
                # Load skeleton probs (aligned v2+v3) via load_probs (already installed to aligned version)
                p_skel = load_probs(int(sid)).astype(np.float32)
                # Align RGB to skeleton
                pr, ps = align_rgb_to_skel(p_rgb, p_skel, max_shift=15)
                # Fuse
                pf = fuse_geometric(ps, pr, alpha=alpha)  # CxT
                # Decode
                y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                seq = compress_to_sequence(y_hat); seq_true = compress_to_sequence(load_frame_labels(int(sid)))
                dists.append(levenshtein(seq, seq_true)); n+=1
            mval = float(np.mean(dists)) if dists else 0.0
            per_fold.append(mval)
            print(f'  fold {fd["fold"]}: mean={mval:.3f} (n={n})', flush=True)
        worst_by[alpha] = max(per_fold); mean_by[alpha] = float(np.mean(per_fold))
        print(f'  -> worst={worst_by[alpha]:.3f} mean={mean_by[alpha]:.3f}', flush=True)
    print('OOF alpha tuning summary (lower better):')
    for a in alpha_list:
        print(f'  alpha={a}: worst={worst_by[a]:.3f} mean={mean_by[a]:.3f}')
    best_alpha = min(alpha_list, key=lambda a: (worst_by[a], mean_by[a]))
    print('Chosen alpha (by worst then mean):', best_alpha)
    return best_alpha, worst_by, mean_by

# Test fusion + decode + perm20 submission
def fuse_decode_test(alpha: float, mult: float = 0.7, smooth_k: int = 5, aba_len: int = 2, aba_ratio: float = 1.04, out_csv: str = 'submission_rgb_fused.csv'):
    # duration stats from all training ids
    all_train_ids=[]
    for fd in json.load(open('folds_archive_cv.json','r')):
        all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids))); md = build_min_dur(med, q75, mult=mult)
    rows=[]; ids=[]; n=0; t0=time.time()
    # Use test.csv canonical ids to ensure 95 rows even if some RGB missing
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    for sid in test_ids:
        # need skeleton probs and rgb probs (rgb optional)
        p2 = probs_cache/f"{sid}_ce.npy"; p3 = probs_cache/f"{sid}_ce_v3.npy"; prgb = probs_cache/f"{sid}_rgb.npy"
        if not (p2.exists() and p3.exists()):
            continue  # cannot decode without skeleton
        p_skel = load_probs(int(sid)).astype(np.float32)  # aligned v2+v3
        if prgb.exists():
            p_rgb = np.load(prgb).astype(np.float32)
            pr, ps = align_rgb_to_skel(p_rgb, p_skel, max_shift=15)
            pf = fuse_geometric(ps, pr, alpha=alpha)
        else:
            # fallback: no RGB -> use skeleton only
            pf = p_skel
        y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
        # perm20
        seq_raw=[]; last=-1
        for c in y_hat:
            if c==0: continue
            if c!=last: seq_raw.append(int(c)); last=int(c)
        seq = make_perm20(seq_raw, pf)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%20)==0 or n==95:
            print(f'  test fused decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False)
    print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95, f'Expected 95 rows, got {len(sub)}'
    # also mirror to submission.csv for convenience
    sub.to_csv('submission.csv', index=False)
    print('submission.csv written ->', out_csv)

print('Step 1: Calibrate per-fold TEST RGB probs and average -> _rgb.npy ...', flush=True)
average_test_rgb_folds()
print('Step 2: OOF alpha tuning (geometric fusion, tight grid) ...', flush=True)
best_alpha, worst_by, mean_by = oof_alpha_tune(alpha_list=(0.22,0.24,0.25,0.26,0.28,0.30), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04)
print('Step 3: Fuse + decode test with best alpha ...', flush=True)
out_csv = f'submission_fused_rgb_alpha{str(best_alpha).replace(".", "")}.csv'
fuse_decode_test(alpha=best_alpha, mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04, out_csv=out_csv)
print('RGB fusion pipeline complete.')

Step 1: Calibrate per-fold TEST RGB probs and average -> _rgb.npy ...


Averaged test RGB per-fold files -> 92 ids
Step 2: OOF alpha tuning (geometric fusion, tight grid) ...


[OOF] alpha=0.22


  fold 0: mean=4.041 (n=98)


  fold 1: mean=3.040 (n=99)


  fold 2: mean=4.460 (n=100)


  -> worst=4.460 mean=3.847


[OOF] alpha=0.24


  fold 0: mean=3.990 (n=98)


  fold 1: mean=3.051 (n=99)


  fold 2: mean=4.400 (n=100)


  -> worst=4.400 mean=3.813


[OOF] alpha=0.25


  fold 0: mean=3.949 (n=98)


  fold 1: mean=3.040 (n=99)


  fold 2: mean=4.350 (n=100)


  -> worst=4.350 mean=3.780


[OOF] alpha=0.26


  fold 0: mean=3.959 (n=98)


  fold 1: mean=3.071 (n=99)


  fold 2: mean=4.340 (n=100)


  -> worst=4.340 mean=3.790


[OOF] alpha=0.28


  fold 0: mean=3.990 (n=98)


  fold 1: mean=3.061 (n=99)


  fold 2: mean=4.410 (n=100)


  -> worst=4.410 mean=3.820


[OOF] alpha=0.3


  fold 0: mean=3.980 (n=98)


  fold 1: mean=3.162 (n=99)


  fold 2: mean=4.390 (n=100)


  -> worst=4.390 mean=3.844


OOF alpha tuning summary (lower better):
  alpha=0.22: worst=4.460 mean=3.847
  alpha=0.24: worst=4.400 mean=3.813
  alpha=0.25: worst=4.350 mean=3.780
  alpha=0.26: worst=4.340 mean=3.790
  alpha=0.28: worst=4.410 mean=3.820
  alpha=0.3: worst=4.390 mean=3.844
Chosen alpha (by worst then mean): 0.26
Step 3: Fuse + decode test with best alpha ...


  test fused decoded 20/95 elapsed=0.4s


  test fused decoded 40/95 elapsed=0.8s


  test fused decoded 60/95 elapsed=1.1s


  test fused decoded 80/95 elapsed=1.5s


  test fused decoded 95/95 elapsed=1.7s


Wrote submission_fused_rgb_alpha026.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 13 20 18 11 3 4 6 8 14 10 2 ...
3  303  18 13 4 3 10 14 6 5 19 20 17 2 11 16 9 7 12 1 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 12 1...
submission.csv written -> submission_fused_rgb_alpha026.csv
RGB fusion pipeline complete.


In [39]:
# RGB fusion alt: Arithmetic blend OOF tuning and submission
import numpy as np, json, time, os
from pathlib import Path
import pandas as pd

probs_cache = Path('probs_cache')

def fuse_arithmetic(p_skel: np.ndarray, p_rgb: np.ndarray, alpha: float) -> np.ndarray:
    q = (1.0 - float(alpha)) * np.clip(p_skel, 1e-8, 1.0) + float(alpha) * np.clip(p_rgb, 1e-8, 1.0)
    q /= (q.sum(axis=0, keepdims=True) + 1e-8)
    return q.astype(np.float32)

def oof_alpha_tune_arith(alpha_list=(0.22,0.24,0.25,0.26,0.28,0.30), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04):
    with open('folds_archive_cv.json') as f:
        folds_list_local = json.load(f)
    worst_by={}; mean_by={}
    for alpha in alpha_list:
        per_fold=[]
        print(f'[OOF-ARITH] alpha={alpha}', flush=True)
        for fd in folds_list_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult=mult)
            dists=[]; n=0
            for sid in va_ids:
                p_rgb = np.load(probs_cache/f"{sid}_rgb.npy").astype(np.float32)
                p_skel = load_probs(int(sid)).astype(np.float32)
                pr, ps = align_rgb_to_skel(p_rgb, p_skel, max_shift=15)
                pf = fuse_arithmetic(ps, pr, alpha=alpha)
                y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                seq = compress_to_sequence(y_hat); seq_true = compress_to_sequence(load_frame_labels(int(sid)))
                dists.append(levenshtein(seq, seq_true)); n+=1
            mval = float(np.mean(dists)) if dists else 0.0
            per_fold.append(mval)
            print(f'  fold {fd["fold"]}: mean={mval:.3f} (n={n})', flush=True)
        worst_by[alpha] = max(per_fold); mean_by[alpha] = float(np.mean(per_fold))
        print(f'  -> worst={worst_by[alpha]:.3f} mean={mean_by[alpha]:.3f}', flush=True)
    print('OOF-ARITH alpha tuning summary (lower better):')
    for a in alpha_list:
        print(f'  alpha={a}: worst={worst_by[a]:.3f} mean={mean_by[a]:.3f}')
    best_alpha = min(alpha_list, key=lambda a: (worst_by[a], mean_by[a]))
    print('Chosen alpha (ARITH, by worst then mean):', best_alpha)
    return best_alpha, worst_by, mean_by

def fuse_decode_test_arith(alpha: float, mult: float = 0.7, smooth_k: int = 5, aba_len: int = 2, aba_ratio: float = 1.04, out_csv: str = 'submission_fused_rgb_arith.csv'):
    all_train_ids=[]
    for fd in json.load(open('folds_archive_cv.json','r')):
        all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids))); md = build_min_dur(med, q75, mult=mult)
    rows=[]; ids=[]; n=0; t0=time.time()
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    for sid in test_ids:
        p2 = probs_cache/f"{sid}_ce.npy"; p3 = probs_cache/f"{sid}_ce_v3.npy"; prgb = probs_cache/f"{sid}_rgb.npy"
        if not (p2.exists() and p3.exists()):
            continue
        p_skel = load_probs(int(sid)).astype(np.float32)
        if prgb.exists():
            p_rgb = np.load(prgb).astype(np.float32)
            pr, ps = align_rgb_to_skel(p_rgb, p_skel, max_shift=15)
            pf = fuse_arithmetic(ps, pr, alpha=alpha)
        else:
            pf = p_skel
        y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
        seq_raw=[]; last=-1
        for c in y_hat:
            if c==0: continue
            if c!=last: seq_raw.append(int(c)); last=int(c)
        seq = make_perm20(seq_raw, pf)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%20)==0 or n==95:
            print(f'  [ARITH] test fused decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False)
    print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95, f'Expected 95 rows, got {len(sub)}'
    sub.to_csv('submission.csv', index=False)
    print('submission.csv written ->', out_csv)

print('ARITH Step 1: OOF alpha tuning (tight grid) ...', flush=True)
best_alpha_arith, worst_by_arith, mean_by_arith = oof_alpha_tune_arith(alpha_list=(0.22,0.24,0.25,0.26,0.28,0.30), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04)
print('ARITH Step 2: Fuse + decode test with best alpha ...', flush=True)
out_csv_arith = f'submission_fused_rgb_arith_alpha{str(best_alpha_arith).replace(".", "")}.csv'
fuse_decode_test_arith(alpha=best_alpha_arith, mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04, out_csv=out_csv_arith)
print('Arithmetic fusion pipeline complete.')

ARITH Step 1: OOF alpha tuning (tight grid) ...


[OOF-ARITH] alpha=0.22


  fold 0: mean=3.980 (n=98)


  fold 1: mean=3.020 (n=99)


  fold 2: mean=4.480 (n=100)


  -> worst=4.480 mean=3.827


[OOF-ARITH] alpha=0.24


  fold 0: mean=3.949 (n=98)


  fold 1: mean=3.010 (n=99)


  fold 2: mean=4.520 (n=100)


  -> worst=4.520 mean=3.826


[OOF-ARITH] alpha=0.25


  fold 0: mean=3.929 (n=98)


  fold 1: mean=2.990 (n=99)


  fold 2: mean=4.530 (n=100)


  -> worst=4.530 mean=3.816


[OOF-ARITH] alpha=0.26


  fold 0: mean=3.939 (n=98)


  fold 1: mean=2.980 (n=99)


  fold 2: mean=4.510 (n=100)


  -> worst=4.510 mean=3.810


[OOF-ARITH] alpha=0.28


  fold 0: mean=4.000 (n=98)


  fold 1: mean=2.970 (n=99)


  fold 2: mean=4.550 (n=100)


  -> worst=4.550 mean=3.840


[OOF-ARITH] alpha=0.3


  fold 0: mean=3.980 (n=98)


  fold 1: mean=3.000 (n=99)


  fold 2: mean=4.540 (n=100)


  -> worst=4.540 mean=3.840


OOF-ARITH alpha tuning summary (lower better):
  alpha=0.22: worst=4.480 mean=3.827
  alpha=0.24: worst=4.520 mean=3.826
  alpha=0.25: worst=4.530 mean=3.816
  alpha=0.26: worst=4.510 mean=3.810
  alpha=0.28: worst=4.550 mean=3.840
  alpha=0.3: worst=4.540 mean=3.840
Chosen alpha (ARITH, by worst then mean): 0.22
ARITH Step 2: Fuse + decode test with best alpha ...


  [ARITH] test fused decoded 20/95 elapsed=0.4s


  [ARITH] test fused decoded 40/95 elapsed=0.8s


  [ARITH] test fused decoded 60/95 elapsed=1.1s


  [ARITH] test fused decoded 80/95 elapsed=1.5s


  [ARITH] test fused decoded 95/95 elapsed=1.7s


Wrote submission_fused_rgb_arith_alpha022.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 13 20 18 11 3 4 6 8 14 10 2 ...
3  303  18 13 4 3 10 14 6 5 19 20 17 2 11 16 7 9 12 1 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 12 1...
submission.csv written -> submission_fused_rgb_arith_alpha022.csv
Arithmetic fusion pipeline complete.


In [40]:
# RGB fusion PoE with smooth_k=3 and mult sweep; retune alpha and write submission
import json, time, pandas as pd

def tune_mult_alpha_smooth3(mult_list=(0.65, 0.7), alpha_list=(0.22,0.24,0.25,0.26,0.28,0.30)):
    best = None
    recs = []
    for m in mult_list:
        print(f'[OOF smooth_k=3] mult={m}', flush=True)
        a, wb, mb = oof_alpha_tune(alpha_list=alpha_list, mult=m, smooth_k=3, aba_len=2, aba_ratio=1.04)
        recs.append((m, a, wb[a], mb[a]))
        cand = (wb[a], mb[a], m, a)
        if (best is None) or (cand < best):
            best = cand
    # best tuple: (worst, mean, mult, alpha)
    return best, recs

print('PoE fusion with smoothing k=3: tuning mult and alpha...', flush=True)
best_tuple, recs = tune_mult_alpha_smooth3(mult_list=(0.65,0.7), alpha_list=(0.22,0.24,0.25,0.26,0.28,0.30))
worst, mean_v, best_mult, best_alpha = best_tuple
print('Chosen (mult, alpha) by worst then mean:', best_mult, best_alpha, '-> worst=', worst, 'mean=', mean_v, flush=True)
out_csv = f'submission_fused_rgb_poe_s3_m{str(best_mult).replace(".", "")}_a{str(best_alpha).replace(".", "")}.csv'
print('Decoding test with best settings (smooth_k=3)...', flush=True)
fuse_decode_test(alpha=best_alpha, mult=best_mult, smooth_k=3, aba_len=2, aba_ratio=1.04, out_csv=out_csv)
print('Done PoE smooth_k=3 submission:', out_csv, flush=True)

PoE fusion with smoothing k=3: tuning mult and alpha...


[OOF smooth_k=3] mult=0.65


[OOF] alpha=0.22


  fold 0: mean=4.010 (n=98)


  fold 1: mean=2.909 (n=99)


  fold 2: mean=4.460 (n=100)


  -> worst=4.460 mean=3.793


[OOF] alpha=0.24


  fold 0: mean=4.082 (n=98)


  fold 1: mean=2.949 (n=99)


  fold 2: mean=4.470 (n=100)


  -> worst=4.470 mean=3.834


[OOF] alpha=0.25


  fold 0: mean=4.071 (n=98)


  fold 1: mean=2.980 (n=99)


  fold 2: mean=4.440 (n=100)


  -> worst=4.440 mean=3.830


[OOF] alpha=0.26


  fold 0: mean=4.092 (n=98)


  fold 1: mean=2.990 (n=99)


  fold 2: mean=4.450 (n=100)


  -> worst=4.450 mean=3.844


[OOF] alpha=0.28


  fold 0: mean=4.112 (n=98)


  fold 1: mean=3.020 (n=99)


  fold 2: mean=4.400 (n=100)


  -> worst=4.400 mean=3.844


[OOF] alpha=0.3


  fold 0: mean=4.143 (n=98)


  fold 1: mean=3.020 (n=99)


  fold 2: mean=4.440 (n=100)


  -> worst=4.440 mean=3.868


OOF alpha tuning summary (lower better):
  alpha=0.22: worst=4.460 mean=3.793
  alpha=0.24: worst=4.470 mean=3.834
  alpha=0.25: worst=4.440 mean=3.830
  alpha=0.26: worst=4.450 mean=3.844
  alpha=0.28: worst=4.400 mean=3.844
  alpha=0.3: worst=4.440 mean=3.868
Chosen alpha (by worst then mean): 0.28
[OOF smooth_k=3] mult=0.7


[OOF] alpha=0.22


  fold 0: mean=4.051 (n=98)


  fold 1: mean=3.101 (n=99)


  fold 2: mean=4.450 (n=100)


  -> worst=4.450 mean=3.867


[OOF] alpha=0.24


  fold 0: mean=4.102 (n=98)


  fold 1: mean=3.131 (n=99)


  fold 2: mean=4.450 (n=100)


  -> worst=4.450 mean=3.894


[OOF] alpha=0.25


  fold 0: mean=4.010 (n=98)


  fold 1: mean=3.162 (n=99)


  fold 2: mean=4.360 (n=100)


  -> worst=4.360 mean=3.844


[OOF] alpha=0.26


  fold 0: mean=4.010 (n=98)


  fold 1: mean=3.152 (n=99)


  fold 2: mean=4.360 (n=100)


  -> worst=4.360 mean=3.841


[OOF] alpha=0.28


  fold 0: mean=4.102 (n=98)


  fold 1: mean=3.192 (n=99)


  fold 2: mean=4.390 (n=100)


  -> worst=4.390 mean=3.895


[OOF] alpha=0.3


  fold 0: mean=4.092 (n=98)


  fold 1: mean=3.202 (n=99)


  fold 2: mean=4.400 (n=100)


  -> worst=4.400 mean=3.898


OOF alpha tuning summary (lower better):
  alpha=0.22: worst=4.450 mean=3.867
  alpha=0.24: worst=4.450 mean=3.894
  alpha=0.25: worst=4.360 mean=3.844
  alpha=0.26: worst=4.360 mean=3.841
  alpha=0.28: worst=4.390 mean=3.895
  alpha=0.3: worst=4.400 mean=3.898
Chosen alpha (by worst then mean): 0.26
Chosen (mult, alpha) by worst then mean: 0.7 0.26 -> worst= 4.36 mean= 3.8405730777159355


Decoding test with best settings (smooth_k=3)...


  test fused decoded 20/95 elapsed=0.4s


  test fused decoded 40/95 elapsed=0.8s


  test fused decoded 60/95 elapsed=1.2s


  test fused decoded 80/95 elapsed=1.5s


  test fused decoded 95/95 elapsed=1.8s


Wrote submission_fused_rgb_poe_s3_m07_a026.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 13 20 18 11 3 4 6 8 14 10 2 ...
3  303  18 13 4 3 10 14 6 5 19 20 17 2 11 16 9 7 12 1 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 12 1...
submission.csv written -> submission_fused_rgb_poe_s3_m07_a026.csv
Done PoE smooth_k=3 submission: submission_fused_rgb_poe_s3_m07_a026.csv


In [41]:
# Add depth/user streams: extract *_depth.mp4 and *_user.mp4, cache embeddings with MobileNetV2 (stride=2)
import os, io, time, tarfile, zipfile, shutil
from pathlib import Path
import numpy as np
import torch
import torch.nn as nn
from torchvision import transforms
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
try:
    import decord
    from decord import VideoReader, cpu
    HAS_DECORD = True
except Exception:
    HAS_DECORD = False
try:
    import cv2
    HAS_CV2 = True
except Exception:
    HAS_CV2 = False
try:
    import imageio.v3 as iio
    HAS_IMAGEIO = True
except Exception:
    HAS_IMAGEIO = False

# Reuse id->tar helpers
def id_to_tar(sid: int) -> Path | None:
    if 1 <= sid <= 99: return Path('training1.tar.gz')
    if 101 <= sid <= 199: return Path('training2.tar.gz')
    if 200 <= sid <= 299: return Path('training3.tar.gz')
    if 300 <= sid <= 399: return Path('test.tar.gz')
    return None
def split_of_id(sid: int) -> str:
    return 'train' if sid < 300 else 'test'

# Output dirs
vid_depth_dir = Path('rgb_videos_depth'); (vid_depth_dir/'train').mkdir(parents=True, exist_ok=True); (vid_depth_dir/'test').mkdir(parents=True, exist_ok=True)
vid_user_dir = Path('rgb_videos_user'); (vid_user_dir/'train').mkdir(parents=True, exist_ok=True); (vid_user_dir/'test').mkdir(parents=True, exist_ok=True)
emb_depth_dir = Path('rgb_embed_depth'); (emb_depth_dir/'train').mkdir(parents=True, exist_ok=True); (emb_depth_dir/'test').mkdir(parents=True, exist_ok=True)
emb_user_dir = Path('rgb_embed_user'); (emb_user_dir/'train').mkdir(parents=True, exist_ok=True); (emb_user_dir/'test').mkdir(parents=True, exist_ok=True)

def extract_stream_mp4_to_cache(sid: int, stream: str) -> Path | None:
    # stream in {'color','depth','user'}
    split = split_of_id(sid)
    out_base = {'color': Path('rgb_videos'), 'depth': vid_depth_dir, 'user': vid_user_dir}[stream]
    out_path = out_base / split / f"{sid}.mp4"
    if out_path.exists():
        return out_path
    tar_p = id_to_tar(sid)
    if tar_p is None or not tar_p.exists():
        print(f'[extract-{stream}] Missing tar for id={sid}:', tar_p); return None
    zip_name = f"Sample{sid:05d}.zip"
    member_name = f"Sample{sid:05d}_{stream}.mp4"
    try:
        with tarfile.open(tar_p, 'r:gz') as tf:
            m = next((m for m in tf if m.isreg() and Path(m.name).name == zip_name), None)
            if m is None:
                print(f'[extract-{stream}] zip {zip_name} not in {tar_p}'); return None
            data = tf.extractfile(m).read()
        with zipfile.ZipFile(io.BytesIO(data)) as zf:
            names = zf.namelist()
            cand = member_name if member_name in names else next((n for n in names if n.lower().endswith(f'_{stream}.mp4')), None)
            if cand is None:
                print(f'[extract-{stream}] mp4 not found for id={sid}')
                return None
            tmp = out_path.with_suffix('.mp4.tmp')
            with zf.open(cand) as fsrc, open(tmp, 'wb') as fdst:
                shutil.copyfileobj(fsrc, fdst)
            tmp.replace(out_path)
        return out_path
    except Exception as e:
        print(f'[extract-{stream}] error id={sid}:', e); return None

# Readers and embedding (reuse MobileNetV2 features head)
IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)
preproc = transforms.Compose([transforms.ToPILImage(), transforms.Resize((112,112)), transforms.ToTensor(), transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD)])
mb = mobilenet_v2(weights=MobileNet_V2_Weights.IMAGENET1K_V1).features.eval().to(device)
pool = nn.AdaptiveAvgPool2d((1,1)).to(device)
mb.requires_grad_(False)

def read_video_frames(path: Path, stride: int = 2):
    frames = []
    if HAS_DECORD:
        try:
            vr = VideoReader(str(path), ctx=cpu(0))
            nfr = len(vr)
            if nfr > 0:
                for i in range(0, nfr, stride):
                    frames.append(vr[i].asnumpy())
            else:
                print('[decord] zero frames for', path)
        except Exception as e:
            print('[decord] fail', e); frames = []
    if not frames and HAS_CV2:
        try:
            cap = cv2.VideoCapture(str(path)); i = 0
            if not cap.isOpened():
                print('[cv2] cannot open', path)
            while cap.isOpened():
                ret, frame = cap.read()
                if not ret: break
                if (i % stride)==0:
                    frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
                i += 1
            cap.release()
        except Exception as e:
            print('[cv2] fail', e); frames = []
    if not frames and HAS_IMAGEIO:
        try:
            i = 0
            for frm in iio.imiter(str(path)):
                if (i % stride)==0:
                    frames.append(frm)
                i += 1
        except Exception as e:
            print('[imageio] fail', e)
    return frames

def embed_frames(frames, batch_size: int = 128, use_fp16: bool = True) -> np.ndarray:
    if not frames:
        return np.zeros((0,1280), dtype=np.float16)
    embs = []
    with torch.no_grad(), torch.amp.autocast(device_type='cuda', enabled=(device.type=='cuda' and use_fp16)):
        batch = []
        for i, img in enumerate(frames, 1):
            x = preproc(img); batch.append(x)
            if len(batch)==batch_size or i==len(frames):
                xb = torch.stack(batch, 0).to(device)
                feat = mb(xb); feat = pool(feat).flatten(1)
                embs.append(feat.float().cpu()); batch.clear()
    return torch.cat(embs, 0).numpy().astype(np.float16)

def cache_stream_embedding_for_id(sid: int, stream: str, stride: int = 2, force: bool = False) -> Path | None:
    emb_base = {'depth': emb_depth_dir, 'user': emb_user_dir}[stream]
    out = emb_base / split_of_id(sid) / f"{sid}.npy"
    if out.exists() and not force:
        try:
            arr = np.load(out, mmap_mode='r')
            if arr.shape[0] > 0:
                return out
        except Exception:
            pass
    vpath = extract_stream_mp4_to_cache(sid, stream=stream)
    if vpath is None:
        return None
    frames = read_video_frames(vpath, stride=stride)
    E = embed_frames(frames, batch_size=128, use_fp16=True)
    np.save(out, E.astype(np.float16))
    return out

def list_ids_from_features(split: str):
    base = Path('features3d_v3')/split
    return sorted(int(p.stem) for p in base.glob('*.npz'))

def bulk_extract_stream(stream: str, ids: list[int], stride: int = 2):
    t0=time.time(); ok=0; skip=0; fail=0
    for i, sid in enumerate(ids, 1):
        emb_base = {'depth': emb_depth_dir, 'user': emb_user_dir}[stream]
        out = emb_base / split_of_id(sid) / f"{sid}.npy"
        if out.exists():
            try:
                arr = np.load(out, mmap_mode='r')
                if arr.shape[0] > 0:
                    skip += 1
                    if (i%20)==0 or i==len(ids):
                        print(f'  [{stream}] skip {i}/{len(ids)} elapsed={time.time()-t0:.1f}s', flush=True)
                    continue
            except Exception:
                pass
        p = cache_stream_embedding_for_id(int(sid), stream=stream, stride=stride, force=False)
        if p is None:
            fail += 1
        else:
            ok += 1
        if (i%20)==0 or i==len(ids):
            print(f'  [{stream}] processed {i}/{len(ids)} ok={ok} skip={skip} fail={fail} elapsed={time.time()-t0:.1f}s', flush=True)
    print(f'[{stream}] Done: ok={ok} skip={skip} fail={fail} total={len(ids)} elapsed={time.time()-t0:.1f}s', flush=True)

train_ids = list_ids_from_features('train')
test_ids = list_ids_from_features('test')
print('Depth/User embedding extraction starting... train:', len(train_ids), 'test:', len(test_ids))
print('Extracting DEPTH embeddings (stride=2)...', flush=True)
bulk_extract_stream('depth', train_ids, stride=2)
bulk_extract_stream('depth', test_ids, stride=2)
print('Extracting USER embeddings (stride=2)...', flush=True)
bulk_extract_stream('user', train_ids, stride=2)
bulk_extract_stream('user', test_ids, stride=2)
print('Depth/User embedding caches complete.')

Depth/User embedding extraction starting... train: 297 test: 95
Extracting DEPTH embeddings (stride=2)...


  [depth] processed 20/297 ok=20 skip=0 fail=0 elapsed=39.8s


  [depth] processed 40/297 ok=40 skip=0 fail=0 elapsed=100.7s


  [depth] processed 60/297 ok=60 skip=0 fail=0 elapsed=197.9s


  [depth] processed 80/297 ok=80 skip=0 fail=0 elapsed=320.3s


  [depth] processed 100/297 ok=100 skip=0 fail=0 elapsed=441.9s


  [depth] processed 120/297 ok=120 skip=0 fail=0 elapsed=472.4s


  [depth] processed 140/297 ok=140 skip=0 fail=0 elapsed=512.0s


  [depth] processed 160/297 ok=160 skip=0 fail=0 elapsed=559.6s


  [depth] processed 180/297 ok=180 skip=0 fail=0 elapsed=615.5s


  [depth] processed 200/297 ok=200 skip=0 fail=0 elapsed=672.9s


  [depth] processed 220/297 ok=220 skip=0 fail=0 elapsed=704.0s


  [depth] processed 240/297 ok=240 skip=0 fail=0 elapsed=753.2s


  [depth] processed 260/297 ok=260 skip=0 fail=0 elapsed=815.3s


  [depth] processed 280/297 ok=280 skip=0 fail=0 elapsed=886.4s


  [depth] processed 297/297 ok=297 skip=0 fail=0 elapsed=953.6s


[depth] Done: ok=297 skip=0 fail=0 total=297 elapsed=953.6s


  [depth] processed 20/95 ok=20 skip=0 fail=0 elapsed=30.7s


  [depth] processed 40/95 ok=40 skip=0 fail=0 elapsed=74.6s


  [depth] processed 60/95 ok=60 skip=0 fail=0 elapsed=127.7s


  [depth] processed 80/95 ok=80 skip=0 fail=0 elapsed=192.6s


[extract-depth] Missing tar for id=401: None
[extract-depth] Missing tar for id=402: None
[extract-depth] Missing tar for id=403: None
  [depth] processed 95/95 ok=92 skip=0 fail=3 elapsed=235.8s


[depth] Done: ok=92 skip=0 fail=3 total=95 elapsed=235.8s


Extracting USER embeddings (stride=2)...


  [user] processed 20/297 ok=20 skip=0 fail=0 elapsed=37.6s


  [user] processed 40/297 ok=40 skip=0 fail=0 elapsed=96.9s


  [user] processed 60/297 ok=60 skip=0 fail=0 elapsed=190.5s


  [user] processed 80/297 ok=80 skip=0 fail=0 elapsed=309.9s


  [user] processed 100/297 ok=100 skip=0 fail=0 elapsed=431.2s


  [user] processed 120/297 ok=120 skip=0 fail=0 elapsed=461.8s


  [user] processed 140/297 ok=140 skip=0 fail=0 elapsed=501.5s


  [user] processed 160/297 ok=160 skip=0 fail=0 elapsed=548.5s


  [user] processed 180/297 ok=180 skip=0 fail=0 elapsed=603.8s


  [user] processed 200/297 ok=200 skip=0 fail=0 elapsed=661.1s


  [user] processed 220/297 ok=220 skip=0 fail=0 elapsed=692.3s


  [user] processed 240/297 ok=240 skip=0 fail=0 elapsed=739.7s


  [user] processed 260/297 ok=260 skip=0 fail=0 elapsed=801.8s


  [user] processed 280/297 ok=280 skip=0 fail=0 elapsed=872.7s


  [user] processed 297/297 ok=297 skip=0 fail=0 elapsed=940.0s


[user] Done: ok=297 skip=0 fail=0 total=297 elapsed=940.0s


  [user] processed 20/95 ok=20 skip=0 fail=0 elapsed=30.6s


  [user] processed 40/95 ok=40 skip=0 fail=0 elapsed=73.7s


  [user] processed 60/95 ok=60 skip=0 fail=0 elapsed=125.6s


  [user] processed 80/95 ok=80 skip=0 fail=0 elapsed=189.9s


[extract-user] Missing tar for id=401: None
[extract-user] Missing tar for id=402: None
[extract-user] Missing tar for id=403: None
  [user] processed 95/95 ok=92 skip=0 fail=3 elapsed=232.8s


[user] Done: ok=92 skip=0 fail=3 total=95 elapsed=232.8s


Depth/User embedding caches complete.


In [42]:
# Train depth/user linear heads on embeddings; cache OOF/test probs; per-fold temp; average TEST per-fold -> probs_cache/{id}_{depth,user}.npy
import os, json, time, random
from pathlib import Path
from typing import List
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.backends.cudnn.benchmark = True
random.seed(42); np.random.seed(42); torch.manual_seed(42)

probs_cache = Path('probs_cache'); probs_cache.mkdir(exist_ok=True)
labels_dir = Path('labels3d_v2/train')

# Generic helpers reused from RGB head cell
def get_T_train(sid: int) -> int:
    y = np.load(labels_dir / f"{sid}.npy"); return int(y.shape[0])
def get_T_test(sid: int) -> int:
    p3 = probs_cache / f"{sid}_ce_v3.npy"
    if p3.exists():
        return int(np.load(p3, mmap_mode='r').shape[1])
    d = np.load(Path('features3d_v3/test')/f"{sid}.npz")
    X = d['X'] if 'X' in d.files else d[d.files[0]]
    return int(X.shape[0])
def upsample_to_T_np(E: np.ndarray, T: int) -> np.ndarray:
    if E.shape[0] == T: return E.astype(np.float32)
    if E.shape[0] == 0: return np.zeros((T, E.shape[1] if E.ndim==2 else 1280), dtype=np.float32)
    import torch.nn.functional as Fnn
    x = torch.from_numpy(E.astype(np.float32)).unsqueeze(0).transpose(1,2)
    y = Fnn.interpolate(x, size=T, mode='linear', align_corners=False).transpose(1,2).squeeze(0).contiguous()
    return y.numpy().astype(np.float32)

class EmbSeqDataset(Dataset):
    def __init__(self, emb_dir: Path, ids: List[int], chunk_len: int = 1024):
        self.emb_dir = emb_dir; self.ids = list(ids); self.chunk_len = chunk_len; self.index = []
        for sid in self.ids:
            E = np.load(emb_dir/'train'/f"{sid}.npy", mmap_mode='r')
            T = get_T_train(sid); Eu = upsample_to_T_np(np.array(E), T); n = Eu.shape[0]
            if n <= chunk_len: self.index.append((sid, 0, n))
            else:
                s = 0
                while s < n:
                    e = min(n, s + chunk_len); self.index.append((sid, s, e)); s = e
        random.shuffle(self.index)
    def __len__(self): return len(self.index)
    def __getitem__(self, i):
        sid, s, e = self.index[i]
        E = np.load(self.emb_dir/'train'/f"{sid}.npy", mmap_mode='r')
        T = get_T_train(sid); Eu = upsample_to_T_np(np.array(E), T)
        y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
        x = Eu[s:e].astype(np.float32); t = y[s:e]
        return torch.from_numpy(x), torch.from_numpy(t)

class LinearHead(nn.Module):
    def __init__(self, d_in=1280, n_classes=21, p_drop=0.5):
        super().__init__(); self.drop = nn.Dropout(p_drop); self.fc = nn.Linear(d_in, n_classes)
    def forward(self, x): return self.fc(self.drop(x))

def train_stream_fold(emb_dir: Path, train_ids: List[int], val_ids: List[int], epochs: int = 12, lr: float = 1e-3, wd: float = 1e-5, chunk_len: int = 1024, batch_size: int = 1, patience: int = 3):
    model = LinearHead().to(device)
    opt = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=wd)
    best = 1e9; bad=0
    for ep in range(1, epochs+1):
        t0=time.time(); model.train()
        ds = EmbSeqDataset(emb_dir, train_ids, chunk_len=chunk_len)
        dl = DataLoader(ds, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
        tr_loss=0.0; n_tok=0
        for xb, yb in dl:
            xb=xb.to(device, non_blocking=True); yb=yb.to(device, non_blocking=True)
            opt.zero_grad(set_to_none=True); logits = model(xb)
            loss = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), yb.reshape(-1))
            loss.backward(); opt.step()
            tr_loss += float(loss.item()) * yb.numel(); n_tok += int(yb.numel())
        tr_loss /= max(1, n_tok)
        # val
        model.eval(); val_loss=0.0; n_tok=0
        with torch.no_grad():
            for sid in val_ids:
                E = np.load(emb_dir/'train'/f"{sid}.npy", mmap_mode='r')
                T = get_T_train(sid); Eu = upsample_to_T_np(np.array(E), T)
                y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
                xb = torch.from_numpy(Eu).unsqueeze(0).to(device); logits = model(xb)
                ll = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), torch.from_numpy(y).to(device))
                val_loss += float(ll.item()) * int(T); n_tok += int(T)
        val_loss /= max(1, n_tok)
        print(f"[STREAM fold] ep {ep:02d} tr_nll={tr_loss:.4f} val_nll={val_loss:.4f} elapsed={time.time()-t0:.1f}s", flush=True)
        if val_loss < best - 1e-4: best = val_loss; bad=0; torch.save(model.state_dict(), 'stream_head_tmp.pth')
        else:
            bad += 1
            if bad >= patience: break
    model.load_state_dict(torch.load('stream_head_tmp.pth', map_location=device)); return model

def infer_probs_for_ids(model: nn.Module, emb_dir: Path, ids: List[int], split: str, out_suffix: str):
    model.eval(); saved=0; t0=time.time()
    with torch.no_grad():
        for i, sid in enumerate(ids, 1):
            if split=='train':
                T = get_T_train(sid); emb_path = emb_dir/'train'/f"{sid}.npy"
            else:
                T = get_T_test(sid); emb_path = emb_dir/'test'/f"{sid}.npy"
            if not emb_path.exists():
                print(f"  [infer {out_suffix}] missing emb for id={sid}, skip"); continue
            E = np.load(emb_path, mmap_mode='r'); Eu = upsample_to_T_np(np.array(E), T)
            xb = torch.from_numpy(Eu).unsqueeze(0).to(device); logits = model(xb)[0]
            p = logits.softmax(dim=-1).cpu().numpy().astype(np.float32).T  # CxT
            p /= (p.sum(axis=0, keepdims=True) + 1e-8)
            np.save(probs_cache/f"{sid}{out_suffix}", p)
            saved += 1
            if (i%20)==0 or i==len(ids): print(f"  saved {saved}/{len(ids)} split={split} {out_suffix} elapsed={time.time()-t0:.1f}s", flush=True)

def fit_scalar_temperature_on_val(val_ids: List[int], suffix: str) -> float:
    grid = [round(x,2) for x in np.linspace(0.8, 1.5, 15)]
    best_T=1.0; best_nll=1e18
    for Tval in grid:
        nll=0.0; n_tok=0
        for sid in val_ids:
            p = np.load(probs_cache/f"{sid}{suffix}")
            y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
            logp = np.log(np.clip(p, 1e-8, 1.0)) / float(Tval)
            q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
            idx = (y >= 0) & (y < q.shape[0]); yy = y[idx]
            nll += -float(np.log(q[yy, np.nonzero(idx)[0]] + 1e-8).sum()); n_tok += int(idx.sum())
        if n_tok>0:
            nll /= float(n_tok)
            if nll < best_nll: best_nll = nll; best_T = Tval
    print(f"[Temp] best T={best_T} NLL={best_nll:.4f} on {len(val_ids)} val ids for {suffix}", flush=True)
    return float(best_T)

def apply_scalar_temperature(ids: List[int], suffix: str, Tscalar: float):
    for sid in ids:
        p = np.load(probs_cache/f"{sid}{suffix}")
        logp = np.log(np.clip(p, 1e-8, 1.0)) / float(Tscalar)
        q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
        np.save(probs_cache/f"{sid}{suffix}", q.astype(np.float32))

def average_test_per_fold_with_temps(stream: str, temp_prefix: str):
    # temp files saved as {temp_prefix}_fold{f}.json with key 'T'
    Ts=[]
    for f in range(3):
        jf = Path(f'{temp_prefix}_fold{f}.json')
        if jf.exists():
            try: Ts.append(float(json.loads(jf.read_text())['T']))
            except Exception: Ts.append(1.0)
        else: Ts.append(1.0)
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    n_avg=0
    for sid in test_ids:
        arrs=[]
        for f in range(3):
            p = probs_cache/f"{sid}_{stream}_f{f}.npy"
            if p.exists():
                a = np.load(p, mmap_mode='r').astype(np.float32)
                # apply T_f
                logp = np.log(np.clip(a, 1e-8, 1.0)) / float(Ts[f])
                q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
                arrs.append(q)
        if not arrs: continue
        Tm = min(a.shape[1] for a in arrs); arrs = [a[:, :Tm] for a in arrs]
        m = np.mean(arrs, axis=0); m /= (m.sum(axis=0, keepdims=True) + 1e-8)
        np.save(probs_cache/f"{sid}_{stream}.npy", m.astype(np.float32)); n_avg+=1
    print(f"Averaged TEST per-fold -> {stream}.npy for {n_avg} ids")

def list_ids_from_features(split: str):
    base = Path('features3d_v3')/split; return sorted(int(p.stem) for p in base.glob('*.npz'))

print('Training DEPTH and USER linear heads per fold...', flush=True)
emb_dirs = {'depth': Path('rgb_embed_depth'), 'user': Path('rgb_embed_user')}
suffix_val = {'depth': '_depth.npy', 'user': '_user.npy'}
suffix_test_fold = {'depth': '_depth_f{f}.npy', 'user': '_user_f{f}.npy'}
temp_prefix = {'depth': 'depth_temp', 'user': 'user_temp'}

with open('folds_archive_cv.json') as f:
    folds_list = json.load(f)
test_ids_list = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())

for stream in ('depth','user'):
    emb_dir = emb_dirs[stream]
    print(f'== Stream {stream.upper()} ==', flush=True)
    for fd in folds_list:
        fidx = int(fd['fold'])
        tr_ids = list(map(int, fd['train_ids']))
        va_ids = list(map(int, fd['val_ids']))
        print(f'Fold {fidx}: train={len(tr_ids)} val={len(va_ids)}', flush=True)
        model = train_stream_fold(emb_dir, tr_ids, va_ids, epochs=12, lr=1e-3, wd=1e-5, chunk_len=1024, batch_size=1, patience=3)
        # OOF val
        infer_probs_for_ids(model, emb_dir, va_ids, split='train', out_suffix=suffix_val[stream])
        # TEST per-fold
        infer_probs_for_ids(model, emb_dir, test_ids_list, split='test', out_suffix=suffix_test_fold[stream].format(f=fidx))
        # Fit temp on val and apply
        Tbest = fit_scalar_temperature_on_val(va_ids, suffix=suffix_val[stream])
        apply_scalar_temperature(va_ids, suffix=suffix_val[stream], Tscalar=Tbest)
        Path(f"{temp_prefix[stream]}_fold{fidx}.json").write_text(json.dumps({'T': Tbest}))
    # After all folds, average TEST per-fold with temps
    print(f'Calibrating and averaging TEST per-fold for {stream} ...', flush=True)
    average_test_per_fold_with_temps(stream=stream, temp_prefix=temp_prefix[stream])

print('Depth/User stream training + calibration complete. Ready for multi-stream fusion.')

Training DEPTH and USER linear heads per fold...


== Stream DEPTH ==


Fold 0: train=199 val=98


[STREAM fold] ep 01 tr_nll=3.0346 val_nll=3.6526 elapsed=4.3s


[STREAM fold] ep 02 tr_nll=2.8945 val_nll=3.7958 elapsed=4.2s


[STREAM fold] ep 03 tr_nll=2.8094 val_nll=3.9268 elapsed=4.3s


[STREAM fold] ep 04 tr_nll=2.7961 val_nll=3.3064 elapsed=4.3s


[STREAM fold] ep 05 tr_nll=2.7602 val_nll=3.9697 elapsed=4.5s


[STREAM fold] ep 06 tr_nll=2.7501 val_nll=3.8719 elapsed=4.2s


[STREAM fold] ep 07 tr_nll=2.7375 val_nll=3.6817 elapsed=4.2s


  saved 20/98 split=train _depth.npy elapsed=0.1s


  model.load_state_dict(torch.load('stream_head_tmp.pth', map_location=device)); return model


  saved 40/98 split=train _depth.npy elapsed=0.3s


  saved 60/98 split=train _depth.npy elapsed=0.5s


  saved 80/98 split=train _depth.npy elapsed=0.6s


  saved 98/98 split=train _depth.npy elapsed=0.8s


  saved 20/95 split=test _depth_f0.npy elapsed=0.1s


  saved 40/95 split=test _depth_f0.npy elapsed=0.3s


  saved 60/95 split=test _depth_f0.npy elapsed=0.4s


  saved 80/95 split=test _depth_f0.npy elapsed=0.6s


  [infer _depth_f0.npy] missing emb for id=401, skip
  [infer _depth_f0.npy] missing emb for id=402, skip
  [infer _depth_f0.npy] missing emb for id=403, skip


[Temp] best T=1.5 NLL=3.0751 on 98 val ids for _depth.npy


Fold 1: train=198 val=99


[STREAM fold] ep 01 tr_nll=2.9756 val_nll=2.7973 elapsed=4.4s


[STREAM fold] ep 02 tr_nll=2.8168 val_nll=3.1141 elapsed=4.3s


[STREAM fold] ep 03 tr_nll=2.7564 val_nll=2.5984 elapsed=4.4s


[STREAM fold] ep 04 tr_nll=2.6883 val_nll=2.6988 elapsed=4.2s


[STREAM fold] ep 05 tr_nll=2.7007 val_nll=2.7162 elapsed=4.5s


[STREAM fold] ep 06 tr_nll=2.6762 val_nll=2.8661 elapsed=4.7s


  saved 20/99 split=train _depth.npy elapsed=0.1s


  saved 40/99 split=train _depth.npy elapsed=0.3s


  saved 60/99 split=train _depth.npy elapsed=0.4s


  saved 80/99 split=train _depth.npy elapsed=0.6s


  saved 99/99 split=train _depth.npy elapsed=0.7s


  saved 20/95 split=test _depth_f1.npy elapsed=0.1s


  saved 40/95 split=test _depth_f1.npy elapsed=0.3s


  saved 60/95 split=test _depth_f1.npy elapsed=0.4s


  saved 80/95 split=test _depth_f1.npy elapsed=0.6s


  [infer _depth_f1.npy] missing emb for id=401, skip
  [infer _depth_f1.npy] missing emb for id=402, skip
  [infer _depth_f1.npy] missing emb for id=403, skip


[Temp] best T=1.05 NLL=2.5980 on 99 val ids for _depth.npy


Fold 2: train=197 val=100


[STREAM fold] ep 01 tr_nll=2.6015 val_nll=3.5215 elapsed=4.4s


[STREAM fold] ep 02 tr_nll=2.3895 val_nll=3.6274 elapsed=4.3s


[STREAM fold] ep 03 tr_nll=2.3667 val_nll=3.8904 elapsed=4.3s


[STREAM fold] ep 04 tr_nll=2.3507 val_nll=3.5062 elapsed=4.3s


[STREAM fold] ep 05 tr_nll=2.3015 val_nll=3.9293 elapsed=4.2s


[STREAM fold] ep 06 tr_nll=2.2875 val_nll=3.8968 elapsed=4.4s


[STREAM fold] ep 07 tr_nll=2.2851 val_nll=3.8442 elapsed=4.3s


  saved 20/100 split=train _depth.npy elapsed=0.1s


  saved 40/100 split=train _depth.npy elapsed=0.3s


  saved 60/100 split=train _depth.npy elapsed=0.5s


  saved 80/100 split=train _depth.npy elapsed=0.6s


  saved 100/100 split=train _depth.npy elapsed=0.7s


  saved 20/95 split=test _depth_f2.npy elapsed=0.1s


  saved 40/95 split=test _depth_f2.npy elapsed=0.3s


  saved 60/95 split=test _depth_f2.npy elapsed=0.5s


  saved 80/95 split=test _depth_f2.npy elapsed=0.6s


  [infer _depth_f2.npy] missing emb for id=401, skip
  [infer _depth_f2.npy] missing emb for id=402, skip
  [infer _depth_f2.npy] missing emb for id=403, skip


[Temp] best T=1.5 NLL=3.1499 on 100 val ids for _depth.npy


Calibrating and averaging TEST per-fold for depth ...


Averaged TEST per-fold -> depth.npy for 92 ids
== Stream USER ==


Fold 0: train=199 val=98


[STREAM fold] ep 01 tr_nll=2.9689 val_nll=2.6789 elapsed=4.4s


[STREAM fold] ep 02 tr_nll=2.8107 val_nll=3.1903 elapsed=4.4s


[STREAM fold] ep 03 tr_nll=2.7805 val_nll=2.9851 elapsed=4.3s


[STREAM fold] ep 04 tr_nll=2.7405 val_nll=3.2149 elapsed=4.3s


  saved 20/98 split=train _user.npy elapsed=0.1s


  model.load_state_dict(torch.load('stream_head_tmp.pth', map_location=device)); return model


  saved 40/98 split=train _user.npy elapsed=0.3s


  saved 60/98 split=train _user.npy elapsed=0.4s


  saved 80/98 split=train _user.npy elapsed=0.6s


  saved 98/98 split=train _user.npy elapsed=0.7s


  saved 20/95 split=test _user_f0.npy elapsed=0.1s


  saved 40/95 split=test _user_f0.npy elapsed=0.3s


  saved 60/95 split=test _user_f0.npy elapsed=0.5s


  saved 80/95 split=test _user_f0.npy elapsed=0.6s


  [infer _user_f0.npy] missing emb for id=401, skip
  [infer _user_f0.npy] missing emb for id=402, skip
  [infer _user_f0.npy] missing emb for id=403, skip


[Temp] best T=0.9 NLL=2.6752 on 98 val ids for _user.npy


Fold 1: train=198 val=99


[STREAM fold] ep 01 tr_nll=2.9029 val_nll=2.8047 elapsed=4.5s


[STREAM fold] ep 02 tr_nll=2.7489 val_nll=2.6743 elapsed=4.3s


[STREAM fold] ep 03 tr_nll=2.7089 val_nll=2.7961 elapsed=4.6s


[STREAM fold] ep 04 tr_nll=2.6661 val_nll=2.7395 elapsed=4.5s


[STREAM fold] ep 05 tr_nll=2.6547 val_nll=2.6886 elapsed=4.5s


  saved 20/99 split=train _user.npy elapsed=0.2s


  saved 40/99 split=train _user.npy elapsed=0.3s


  saved 60/99 split=train _user.npy elapsed=0.4s


  saved 80/99 split=train _user.npy elapsed=0.6s


  saved 99/99 split=train _user.npy elapsed=0.7s


  saved 20/95 split=test _user_f1.npy elapsed=0.1s


  saved 40/95 split=test _user_f1.npy elapsed=0.3s


  saved 60/95 split=test _user_f1.npy elapsed=0.5s


  saved 80/95 split=test _user_f1.npy elapsed=0.6s


  [infer _user_f1.npy] missing emb for id=401, skip
  [infer _user_f1.npy] missing emb for id=402, skip
  [infer _user_f1.npy] missing emb for id=403, skip


[Temp] best T=1.2 NLL=2.6593 on 99 val ids for _user.npy


Fold 2: train=197 val=100


[STREAM fold] ep 01 tr_nll=2.5475 val_nll=3.5018 elapsed=4.5s


[STREAM fold] ep 02 tr_nll=2.3867 val_nll=3.6107 elapsed=4.3s


[STREAM fold] ep 03 tr_nll=2.3458 val_nll=3.4831 elapsed=4.3s


[STREAM fold] ep 04 tr_nll=2.3140 val_nll=3.6888 elapsed=4.5s


[STREAM fold] ep 05 tr_nll=2.2945 val_nll=3.5335 elapsed=4.3s


[STREAM fold] ep 06 tr_nll=2.2631 val_nll=3.6726 elapsed=4.4s


  saved 20/100 split=train _user.npy elapsed=0.1s


  saved 40/100 split=train _user.npy elapsed=0.3s


  saved 60/100 split=train _user.npy elapsed=0.4s


  saved 80/100 split=train _user.npy elapsed=0.6s


  saved 100/100 split=train _user.npy elapsed=0.7s


  saved 20/95 split=test _user_f2.npy elapsed=0.2s


  saved 40/95 split=test _user_f2.npy elapsed=0.3s


  saved 60/95 split=test _user_f2.npy elapsed=0.5s


  saved 80/95 split=test _user_f2.npy elapsed=0.6s


  [infer _user_f2.npy] missing emb for id=401, skip
  [infer _user_f2.npy] missing emb for id=402, skip
  [infer _user_f2.npy] missing emb for id=403, skip


[Temp] best T=1.5 NLL=3.1568 on 100 val ids for _user.npy


Calibrating and averaging TEST per-fold for user ...


Averaged TEST per-fold -> user.npy for 92 ids
Depth/User stream training + calibration complete. Ready for multi-stream fusion.


In [43]:
# Multi-stream fusion: visual-avg (color+depth+user) vs skeleton with PoE; OOF alpha grid and test submission
import numpy as np, json, time, os
from pathlib import Path
import pandas as pd

probs_cache = Path('probs_cache'); probs_cache.mkdir(exist_ok=True)

# Reuse helpers from prior cells: load_probs (aligned v2+v3), align_rgb_to_skel, fuse_geometric,
# compute_runlen_stats, build_min_dur, decode_minseg_smooth_aba, load_frame_labels, make_perm20

def build_visual_avg_aligned(sid: int, p_skel: np.ndarray, max_shift: int = 15) -> np.ndarray | None:
    streams = []
    # color
    p_rgb_p = probs_cache/f"{sid}_rgb.npy"
    if p_rgb_p.exists():
        p_rgb = np.load(p_rgb_p).astype(np.float32)
        pr, ps = align_rgb_to_skel(p_rgb, p_skel, max_shift=max_shift)
        streams.append(pr)
        p_skel = ps  # ps cropped to match pr length; we'll re-crop later to common
    # depth
    p_dep_p = probs_cache/f"{sid}_depth.npy"
    if p_dep_p.exists():
        p_dep = np.load(p_dep_p).astype(np.float32)
        pr, ps = align_rgb_to_skel(p_dep, p_skel, max_shift=max_shift)
        streams.append(pr)
        p_skel = ps
    # user
    p_usr_p = probs_cache/f"{sid}_user.npy"
    if p_usr_p.exists():
        p_usr = np.load(p_usr_p).astype(np.float32)
        pr, ps = align_rgb_to_skel(p_usr, p_skel, max_shift=max_shift)
        streams.append(pr)
        p_skel = ps
    if not streams:
        return None
    # crop all streams to common minimal T and average
    Tm = min(s.shape[1] for s in streams)
    streams = [s[:, :Tm] for s in streams]
    vavg = np.mean(streams, axis=0)
    vavg /= (vavg.sum(axis=0, keepdims=True) + 1e-8)
    # also crop skeleton to Tm to return along with visual
    ps_out = p_skel[:, :Tm]
    ps_out /= (ps_out.sum(axis=0, keepdims=True) + 1e-8)
    return vavg, ps_out

def oof_alpha_tune_visual(alpha_list=None, mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04):
    if alpha_list is None:
        alpha_list = [round(a,2) for a in np.linspace(0.10, 0.50, 21)]
    with open('folds_archive_cv.json') as f:
        folds_list_local = json.load(f)
    worst_by={}; mean_by={}
    for alpha in alpha_list:
        per_fold=[]
        print(f'[OOF-VIS] alpha={alpha}', flush=True)
        for fd in folds_list_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult=mult)
            dists=[]; n=0
            for sid in va_ids:
                p_skel_full = load_probs(int(sid)).astype(np.float32)
                vis = build_visual_avg_aligned(int(sid), p_skel_full, max_shift=15)
                if vis is None:
                    # fallback to color-only if available
                    p_rgb_p = probs_cache/f"{sid}_rgb.npy"
                    if not p_rgb_p.exists():
                        continue
                    p_rgb = np.load(p_rgb_p).astype(np.float32)
                    pr, ps = align_rgb_to_skel(p_rgb, p_skel_full, max_shift=15)
                    vavg, ps_out = pr, ps
                else:
                    vavg, ps_out = vis
                # fuse PoE
                pf = fuse_geometric(ps_out, vavg, alpha=float(alpha))
                # decode
                y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                # score
                seq = [] ; last=-1
                for c in y_hat:
                    if c==0: continue
                    if c!=last: seq.append(int(c)); last=int(c)
                seq_true = [] ; last=-1
                for c in load_frame_labels(int(sid)):
                    if c==0: continue
                    if c!=last: seq_true.append(int(c)); last=int(c)
                # Levenshtein
                n1=len(seq); n2=len(seq_true)
                if n1==0: d = n2
                elif n2==0: d = n1
                else:
                    dp=list(range(n2+1))
                    for i in range(1,n1+1):
                        prev=dp[0]; dp[0]=i; ai=seq[i-1]
                        for j in range(1,n2+1):
                            tmp=dp[j]; dp[j]=min(dp[j]+1, dp[j-1]+1, prev + (0 if ai==seq_true[j-1] else 1)); prev=tmp
                    d=dp[n2]
                dists.append(float(d)); n+=1
            mval = float(np.mean(dists)) if dists else 0.0
            per_fold.append(mval)
            print(f'  fold {fd["fold"]}: mean={mval:.3f} (n={len(dists)})', flush=True)
        worst_by[alpha] = max(per_fold); mean_by[alpha] = float(np.mean(per_fold))
        print(f'  -> worst={worst_by[alpha]:.3f} mean={mean_by[alpha]:.3f}', flush=True)
    print('OOF-VIS alpha tuning summary (lower better):')
    for a in alpha_list:
        print(f'  alpha={a}: worst={worst_by[a]:.3f} mean={mean_by[a]:.3f}')
    best_alpha = min(alpha_list, key=lambda a: (worst_by[a], mean_by[a]))
    print('Chosen alpha (VIS by worst then mean):', best_alpha)
    return best_alpha, worst_by, mean_by

def fuse_decode_test_visual(alpha: float, mult: float = 0.7, smooth_k: int = 5, aba_len: int = 2, aba_ratio: float = 1.04, out_csv: str = 'submission_fused_visualavg.csv'):
    # duration stats from all training ids
    all_train_ids=[]
    for fd in json.load(open('folds_archive_cv.json','r')):
        all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids))); md = build_min_dur(med, q75, mult=mult)
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    rows=[]; ids=[]; n=0; t0=time.time()
    for sid in test_ids:
        p2 = probs_cache/f"{sid}_ce.npy"; p3 = probs_cache/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()):
            continue
        p_skel_full = load_probs(int(sid)).astype(np.float32)
        vis = build_visual_avg_aligned(int(sid), p_skel_full, max_shift=15)
        if vis is None:
            # fall back to color-only if available; else skeleton-only
            prgb = probs_cache/f"{sid}_rgb.npy"
            if prgb.exists():
                p_rgb = np.load(prgb).astype(np.float32)
                pr, ps = align_rgb_to_skel(p_rgb, p_skel_full, max_shift=15)
                vavg, ps_out = pr, ps
            else:
                ps_out = p_skel_full; pf = ps_out
                y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                seq_raw=[]; last=-1
                for c in y_hat:
                    if c==0: continue
                    if c!=last: seq_raw.append(int(c)); last=int(c)
                seq = make_perm20(seq_raw, pf)
                ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
                if (n%20)==0 or n==95:
                    print(f'  [VIS] test decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
                continue
        else:
            vavg, ps_out = vis
        pf = fuse_geometric(ps_out, vavg, alpha=float(alpha))
        y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
        seq_raw=[]; last=-1
        for c in y_hat:
            if c==0: continue
            if c!=last: seq_raw.append(int(c)); last=int(c)
        seq = make_perm20(seq_raw, pf)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%20)==0 or n==95:
            print(f'  [VIS] test decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False)
    print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95, f'Expected 95 rows, got {len(sub)}'
    sub.to_csv('submission.csv', index=False)
    print('submission.csv written ->', out_csv)

print('VIS Step 1: OOF alpha tuning (skeleton vs visual-avg) ...', flush=True)
best_alpha_vis, worst_by_vis, mean_by_vis = oof_alpha_tune_visual(alpha_list=[round(a,2) for a in np.linspace(0.10, 0.50, 21)], mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04)
print('VIS Step 2: Fuse + decode test with best alpha ...', flush=True)
out_csv_vis = f'submission_fused_visualavg_alpha{str(best_alpha_vis).replace(".", "")}.csv'
fuse_decode_test_visual(alpha=best_alpha_vis, mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04, out_csv=out_csv_vis)
print('Visual-avg fusion pipeline complete.')

VIS Step 1: OOF alpha tuning (skeleton vs visual-avg) ...


[OOF-VIS] alpha=0.1


  fold 0: mean=4.000 (n=98)


  fold 1: mean=3.192 (n=99)


  fold 2: mean=4.480 (n=100)


  -> worst=4.480 mean=3.891


[OOF-VIS] alpha=0.12


  fold 0: mean=3.969 (n=98)


  fold 1: mean=3.141 (n=99)


  fold 2: mean=4.520 (n=100)


  -> worst=4.520 mean=3.877


[OOF-VIS] alpha=0.14


  fold 0: mean=3.959 (n=98)


  fold 1: mean=3.182 (n=99)


  fold 2: mean=4.510 (n=100)


  -> worst=4.510 mean=3.884


[OOF-VIS] alpha=0.16


  fold 0: mean=3.908 (n=98)


  fold 1: mean=3.202 (n=99)


  fold 2: mean=4.470 (n=100)


  -> worst=4.470 mean=3.860


[OOF-VIS] alpha=0.18


  fold 0: mean=3.888 (n=98)


  fold 1: mean=3.172 (n=99)


  fold 2: mean=4.430 (n=100)


  -> worst=4.430 mean=3.830


[OOF-VIS] alpha=0.2


  fold 0: mean=3.867 (n=98)


  fold 1: mean=3.121 (n=99)


  fold 2: mean=4.490 (n=100)


  -> worst=4.490 mean=3.826


[OOF-VIS] alpha=0.22


  fold 0: mean=3.908 (n=98)


  fold 1: mean=3.162 (n=99)


  fold 2: mean=4.550 (n=100)


  -> worst=4.550 mean=3.873


[OOF-VIS] alpha=0.24


  fold 0: mean=3.929 (n=98)


  fold 1: mean=3.172 (n=99)


  fold 2: mean=4.580 (n=100)


  -> worst=4.580 mean=3.893


[OOF-VIS] alpha=0.26


  fold 0: mean=3.990 (n=98)


  fold 1: mean=3.172 (n=99)


  fold 2: mean=4.510 (n=100)


  -> worst=4.510 mean=3.891


[OOF-VIS] alpha=0.28


  fold 0: mean=3.990 (n=98)


  fold 1: mean=3.131 (n=99)


  fold 2: mean=4.500 (n=100)


  -> worst=4.500 mean=3.874


[OOF-VIS] alpha=0.3


  fold 0: mean=4.051 (n=98)


  fold 1: mean=3.202 (n=99)


  fold 2: mean=4.510 (n=100)


  -> worst=4.510 mean=3.921


[OOF-VIS] alpha=0.32


  fold 0: mean=4.122 (n=98)


  fold 1: mean=3.212 (n=99)


  fold 2: mean=4.540 (n=100)


  -> worst=4.540 mean=3.958


[OOF-VIS] alpha=0.34


  fold 0: mean=4.102 (n=98)


  fold 1: mean=3.253 (n=99)


  fold 2: mean=4.540 (n=100)


  -> worst=4.540 mean=3.965


[OOF-VIS] alpha=0.36


  fold 0: mean=4.133 (n=98)


  fold 1: mean=3.343 (n=99)


  fold 2: mean=4.580 (n=100)


  -> worst=4.580 mean=4.019


[OOF-VIS] alpha=0.38


  fold 0: mean=4.173 (n=98)


  fold 1: mean=3.455 (n=99)


  fold 2: mean=4.580 (n=100)


  -> worst=4.580 mean=4.069


[OOF-VIS] alpha=0.4


  fold 0: mean=4.255 (n=98)


  fold 1: mean=3.545 (n=99)


  fold 2: mean=4.600 (n=100)


  -> worst=4.600 mean=4.134


[OOF-VIS] alpha=0.42


  fold 0: mean=4.286 (n=98)


  fold 1: mean=3.626 (n=99)


  fold 2: mean=4.620 (n=100)


  -> worst=4.620 mean=4.177


[OOF-VIS] alpha=0.44


  fold 0: mean=4.357 (n=98)


  fold 1: mean=3.566 (n=99)


  fold 2: mean=4.590 (n=100)


  -> worst=4.590 mean=4.171


[OOF-VIS] alpha=0.46


  fold 0: mean=4.388 (n=98)


  fold 1: mean=3.667 (n=99)


  fold 2: mean=4.620 (n=100)


  -> worst=4.620 mean=4.225


[OOF-VIS] alpha=0.48


  fold 0: mean=4.429 (n=98)


  fold 1: mean=3.707 (n=99)


  fold 2: mean=4.600 (n=100)


  -> worst=4.600 mean=4.245


[OOF-VIS] alpha=0.5


  fold 0: mean=4.541 (n=98)


  fold 1: mean=3.788 (n=99)


  fold 2: mean=4.660 (n=100)


  -> worst=4.660 mean=4.330


OOF-VIS alpha tuning summary (lower better):
  alpha=0.1: worst=4.480 mean=3.891
  alpha=0.12: worst=4.520 mean=3.877
  alpha=0.14: worst=4.510 mean=3.884
  alpha=0.16: worst=4.470 mean=3.860
  alpha=0.18: worst=4.430 mean=3.830
  alpha=0.2: worst=4.490 mean=3.826
  alpha=0.22: worst=4.550 mean=3.873
  alpha=0.24: worst=4.580 mean=3.893
  alpha=0.26: worst=4.510 mean=3.891
  alpha=0.28: worst=4.500 mean=3.874
  alpha=0.3: worst=4.510 mean=3.921
  alpha=0.32: worst=4.540 mean=3.958
  alpha=0.34: worst=4.540 mean=3.965
  alpha=0.36: worst=4.580 mean=4.019
  alpha=0.38: worst=4.580 mean=4.069
  alpha=0.4: worst=4.600 mean=4.134
  alpha=0.42: worst=4.620 mean=4.177
  alpha=0.44: worst=4.590 mean=4.171
  alpha=0.46: worst=4.620 mean=4.225
  alpha=0.48: worst=4.600 mean=4.245
  alpha=0.5: worst=4.660 mean=4.330
Chosen alpha (VIS by worst then mean): 0.18
VIS Step 2: Fuse + decode test with best alpha ...


  [VIS] test decoded 20/95 elapsed=0.5s


  [VIS] test decoded 40/95 elapsed=1.0s


  [VIS] test decoded 60/95 elapsed=1.5s


  [VIS] test decoded 80/95 elapsed=1.9s


  [VIS] test decoded 95/95 elapsed=2.3s


Wrote submission_fused_visualavg_alpha018.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 7 13 20 18 11 3 4 6 8 14 10 ...
3  303  18 13 4 3 10 14 6 5 19 20 17 7 11 16 9 2 12 8 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 15 1...
submission.csv written -> submission_fused_visualavg_alpha018.csv
Visual-avg fusion pipeline complete.


In [44]:
# Hedge: separate alphas for color/depth/user with constraint sum<=0.5; PoE vs skeleton; OOF select by worst then mean; test decode
import numpy as np, json, time, os
from pathlib import Path
import pandas as pd

probs_cache = Path('probs_cache')

def load_stream_prob(sid: int, stream: str, split: str) -> np.ndarray | None:
    # stream keys: 'rgb' (color), 'depth', 'user'
    p = probs_cache / f"{sid}_{stream}.npy"
    if not p.exists():
        return None
    a = np.load(p).astype(np.float32)
    # ensure normalized per frame
    a = a / (a.sum(axis=0, keepdims=True) + 1e-8)
    return a

def align_to_skel_stream(p_stream: np.ndarray, p_skel: np.ndarray, max_shift: int = 15):
    # re-use align_rgb_to_skel (works for any per-frame prob stream)
    return align_rgb_to_skel(p_stream, p_skel, max_shift=max_shift)

def fuse_weighted_poe(ps: np.ndarray, streams: list[tuple[np.ndarray, float]]) -> np.ndarray:
    # ps: CxT skeleton; streams: list of (p_stream_aligned, alpha_stream)
    # Compute normalized weights: skeleton weight = 1 - sum(alphas_present), clamp >= 0
    alpha_sum = float(sum(w for _, w in streams))
    w_s = max(0.0, 1.0 - alpha_sum)
    logp = w_s * np.log(np.clip(ps, 1e-8, 1.0))
    for (p, w) in streams:
        logp += float(w) * np.log(np.clip(p, 1e-8, 1.0))
    q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
    return q.astype(np.float32)

def build_visual_streams_aligned(sid: int, p_skel_full: np.ndarray, max_shift: int = 15):
    out = []  # list of (name, p_aligned, ps_cropped)
    # color
    prgb = load_stream_prob(sid, 'rgb', split='oof')
    if prgb is not None:
        pr, ps = align_to_skel_stream(prgb, p_skel_full, max_shift=max_shift); out.append(('rgb', pr, ps)); p_skel_full = ps
    # depth
    pdep = load_stream_prob(sid, 'depth', split='oof')
    if pdep is not None:
        pr, ps = align_to_skel_stream(pdep, p_skel_full, max_shift=max_shift); out.append(('depth', pr, ps)); p_skel_full = ps
    # user
    pusr = load_stream_prob(sid, 'user', split='oof')
    if pusr is not None:
        pr, ps = align_to_skel_stream(pusr, p_skel_full, max_shift=max_shift); out.append(('user', pr, ps)); p_skel_full = ps
    if not out:
        return None
    # crop all to common T and return streams list and cropped skeleton
    Tm = min(s[1].shape[1] for s in out)
    streams = [(name, p[:, :Tm]) for (name, p, _) in out]
    ps_out = out[-1][2][:, :Tm]  # last ps after alignment/crop chain matches Tm
    ps_out /= (ps_out.sum(axis=0, keepdims=True) + 1e-8)
    return streams, ps_out

def oof_tune_separate_alphas(alpha_color_list=(0.15,0.20,0.25), alpha_du_list=(0.05,0.10,0.15), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04, alpha_cap=0.5):
    with open('folds_archive_cv.json') as f:
        folds_list_local = json.load(f)
    candidates = []
    for ac in alpha_color_list:
        for adu in alpha_du_list:
            # enforce sum constraint with both depth and user equal to adu
            if (ac + adu + adu) <= alpha_cap:
                candidates.append((round(ac, 3), round(adu, 3)))
    worst_by={}; mean_by={}
    for (ac, adu) in candidates:
        per_fold=[]
        print(f'[OOF-sep] alpha_color={ac} alpha_depth=alpha_user={adu} (sum={ac+2*adu:.2f})', flush=True)
        for fd in folds_list_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult=mult)
            dists=[]; n=0
            for sid in va_ids:
                p_skel_full = load_probs(int(sid)).astype(np.float32)
                built = build_visual_streams_aligned(int(sid), p_skel_full, max_shift=15)
                if built is None:
                    # Fallback to color-only if exists; else skip
                    prgb = load_stream_prob(int(sid), 'rgb', split='oof')
                    if prgb is None:
                        continue
                    pr, ps = align_to_skel_stream(prgb, p_skel_full, max_shift=15)
                    pf = fuse_weighted_poe(ps, [(pr, ac)])
                else:
                    streams_aligned, ps = built  # list of (name, p)
                    streams_w = []
                    for name, p in streams_aligned:
                        w = ac if name=='rgb' else adu
                        if w > 0:
                            streams_w.append((p, w))
                    pf = fuse_weighted_poe(ps, streams_w)
                y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                seq=[]; last=-1
                for c in y_hat:
                    if c==0: continue
                    if c!=last: seq.append(int(c)); last=int(c)
                y_true = load_frame_labels(int(sid)); seq_true=[]; last=-1
                for c in y_true:
                    if c==0: continue
                    if c!=last: seq_true.append(int(c)); last=int(c)
                # Levenshtein
                n1=len(seq); n2=len(seq_true)
                if n1==0: d = n2
                elif n2==0: d = n1
                else:
                    dp=list(range(n2+1))
                    for i in range(1,n1+1):
                        prev=dp[0]; dp[0]=i; ai=seq[i-1]
                        for j in range(1,n2+1):
                            tmp=dp[j]; dp[j]=min(dp[j]+1, dp[j-1]+1, prev + (0 if ai==seq_true[j-1] else 1)); prev=tmp
                    d=dp[n2]
                dists.append(float(d)); n+=1
            mval = float(np.mean(dists)) if dists else 0.0
            per_fold.append(mval)
            print(f'  fold {fd["fold"]}: mean={mval:.3f} (n={len(dists)})', flush=True)
        worst_by[(ac, adu)] = max(per_fold); mean_by[(ac, adu)] = float(np.mean(per_fold))
        print(f'  -> worst={worst_by[(ac,adu)]:.3f} mean={mean_by[(ac,adu)]:.3f}', flush=True)
    print('OOF-separate alpha summary (lower better):')
    for (ac, adu) in candidates:
        print(f'  ac={ac} adu={adu}: worst={worst_by[(ac,adu)]:.3f} mean={mean_by[(ac,adu)]:.3f}')
    best = min(candidates, key=lambda k: (worst_by[k], mean_by[k])) if candidates else None
    print('Chosen (alpha_color, alpha_depth=user):', best)
    return best, worst_by, mean_by

def fuse_decode_test_separate_alphas(ac: float, adu: float, mult: float = 0.7, smooth_k: int = 5, aba_len: int = 2, aba_ratio: float = 1.04, out_csv: str = 'submission_fused_sepalphas.csv'):
    # duration stats from all training ids
    all_train_ids=[]
    for fd in json.load(open('folds_archive_cv.json','r')):
        all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids))); md = build_min_dur(med, q75, mult=mult)
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    rows=[]; ids=[]; n=0; t0=time.time()
    for sid in test_ids:
        p2 = probs_cache/f"{sid}_ce.npy"; p3 = probs_cache/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()):
            continue
        p_skel_full = load_probs(int(sid)).astype(np.float32)
        built = build_visual_streams_aligned(int(sid), p_skel_full, max_shift=15)
        if built is None:
            # fallback: if color exists, use it; else skeleton-only
            prgbp = probs_cache/f"{sid}_rgb.npy"
            if prgbp.exists():
                prgb = np.load(prgbp).astype(np.float32); pr, ps = align_to_skel_stream(prgb, p_skel_full, max_shift=15)
                pf = fuse_weighted_poe(ps, [(pr, ac)])
            else:
                pf = p_skel_full
        else:
            streams_aligned, ps = built
            streams_w=[]
            for name, p in streams_aligned:
                w = ac if name=='rgb' else adu
                if w > 0: streams_w.append((p, w))
            pf = fuse_weighted_poe(ps, streams_w)
        y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
        # perm20
        seq_raw=[]; last=-1
        for c in y_hat:
            if c==0: continue
            if c!=last: seq_raw.append(int(c)); last=int(c)
        seq = make_perm20(seq_raw, pf)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%20)==0 or n==95:
            print(f'  [SEP] test decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False)
    print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95, f'Expected 95 rows, got {len(sub)}'
    sub.to_csv('submission.csv', index=False)
    print('submission.csv written ->', out_csv)

print('SEP Step 1: OOF tune separate alphas with sum<=0.5 ...', flush=True)
best_sep, worst_by_sep, mean_by_sep = oof_tune_separate_alphas(alpha_color_list=(0.15,0.20,0.25), alpha_du_list=(0.05,0.10,0.15), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04, alpha_cap=0.5)
if best_sep is not None:
    ac, adu = best_sep
    print('SEP Step 2: Fuse + decode test with best separate alphas ...', flush=True)
    out_csv_sep = f'submission_fused_sepalphas_ac{str(ac).replace(".", "")}_adu{str(adu).replace(".", "")}.csv'
    fuse_decode_test_separate_alphas(ac=ac, adu=adu, mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04, out_csv=out_csv_sep)
else:
    print('No valid alpha combos under constraint; skipping separate-alphas submission.')

SEP Step 1: OOF tune separate alphas with sum<=0.5 ...


[OOF-sep] alpha_color=0.15 alpha_depth=alpha_user=0.05 (sum=0.25)


  fold 0: mean=3.969 (n=98)


  fold 1: mean=3.162 (n=99)


  fold 2: mean=4.490 (n=100)


  -> worst=4.490 mean=3.874


[OOF-sep] alpha_color=0.15 alpha_depth=alpha_user=0.1 (sum=0.35)


  fold 0: mean=4.092 (n=98)


  fold 1: mean=3.343 (n=99)


  fold 2: mean=4.470 (n=100)


  -> worst=4.470 mean=3.968


[OOF-sep] alpha_color=0.15 alpha_depth=alpha_user=0.15 (sum=0.45)


  fold 0: mean=4.306 (n=98)


  fold 1: mean=3.576 (n=99)


  fold 2: mean=4.600 (n=100)


  -> worst=4.600 mean=4.161


[OOF-sep] alpha_color=0.2 alpha_depth=alpha_user=0.05 (sum=0.30)


  fold 0: mean=4.020 (n=98)


  fold 1: mean=3.212 (n=99)


  fold 2: mean=4.440 (n=100)


  -> worst=4.440 mean=3.891


[OOF-sep] alpha_color=0.2 alpha_depth=alpha_user=0.1 (sum=0.40)


  fold 0: mean=4.133 (n=98)


  fold 1: mean=3.455 (n=99)


  fold 2: mean=4.530 (n=100)


  -> worst=4.530 mean=4.039


[OOF-sep] alpha_color=0.2 alpha_depth=alpha_user=0.15 (sum=0.50)


  fold 0: mean=4.418 (n=98)


  fold 1: mean=3.727 (n=99)


  fold 2: mean=4.580 (n=100)


  -> worst=4.580 mean=4.242


[OOF-sep] alpha_color=0.25 alpha_depth=alpha_user=0.05 (sum=0.35)


  fold 0: mean=4.092 (n=98)


  fold 1: mean=3.242 (n=99)


  fold 2: mean=4.440 (n=100)


  -> worst=4.440 mean=3.925


[OOF-sep] alpha_color=0.25 alpha_depth=alpha_user=0.1 (sum=0.45)


  fold 0: mean=4.367 (n=98)


  fold 1: mean=3.586 (n=99)


  fold 2: mean=4.520 (n=100)


  -> worst=4.520 mean=4.158


OOF-separate alpha summary (lower better):
  ac=0.15 adu=0.05: worst=4.490 mean=3.874
  ac=0.15 adu=0.1: worst=4.470 mean=3.968
  ac=0.15 adu=0.15: worst=4.600 mean=4.161
  ac=0.2 adu=0.05: worst=4.440 mean=3.891
  ac=0.2 adu=0.1: worst=4.530 mean=4.039
  ac=0.2 adu=0.15: worst=4.580 mean=4.242
  ac=0.25 adu=0.05: worst=4.440 mean=3.925
  ac=0.25 adu=0.1: worst=4.520 mean=4.158
Chosen (alpha_color, alpha_depth=user): (0.2, 0.05)
SEP Step 2: Fuse + decode test with best separate alphas ...


  [SEP] test decoded 20/95 elapsed=0.5s


  [SEP] test decoded 40/95 elapsed=1.0s


  [SEP] test decoded 60/95 elapsed=1.5s


  [SEP] test decoded 80/95 elapsed=2.0s


  [SEP] test decoded 95/95 elapsed=2.3s


Wrote submission_fused_sepalphas_ac02_adu005.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 13 20 18 11 3 4 6 8 14 10 2 ...
3  303  18 13 4 3 10 14 6 5 19 20 17 7 11 16 9 2 12 1 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 12 1...
submission.csv written -> submission_fused_sepalphas_ac02_adu005.csv


In [45]:
# Order-only decoder on fused probs (skeleton + color PoE), OOF sweep and test submission
import numpy as np, json, time, pandas as pd
from pathlib import Path

probs_cache = Path('probs_cache')

# Reuse: load_probs (aligned v2+v3), align_rgb_to_skel, fuse_geometric, load_frame_labels, compute_runlen_stats etc.

def fused_poe_skel_rgb(sid: int, alpha: float = 0.26) -> np.ndarray:
    # returns CxT fused probs (aligned RGB->skeleton, PoE with weight alpha); fallback to skeleton-only if RGB missing
    p_skel = load_probs(int(sid)).astype(np.float32)
    prgb = probs_cache/f"{sid}_rgb.npy"
    if prgb.exists():
        p_rgb = np.load(prgb).astype(np.float32)
        pr, ps = align_rgb_to_skel(p_rgb, p_skel, max_shift=15)
        pf = fuse_geometric(ps, pr, alpha=float(alpha))
        return pf
    else:
        return p_skel

def smooth_probs_time(p: np.ndarray, k: int = 5) -> np.ndarray:
    if k <= 1: return p
    C,T = p.shape; pad = k//2
    x = np.pad(p, ((0,0),(pad,pad)), mode='edge')
    y = np.empty_like(p, dtype=np.float32)
    for t in range(T):
        y[:,t] = x[:, t:t+k].mean(axis=1)
    y = np.clip(y, 1e-8, None)
    y /= (y.sum(axis=0, keepdims=True) + 1e-8)
    return y

def order_only_perm20(p: np.ndarray, gamma: float = 1.0, smooth_k: int = 5) -> list:
    # p: CxT, class 0 is background. Returns a permutation of [1..20] by center-of-mass.
    q = smooth_probs_time(p, k=smooth_k) if smooth_k and smooth_k>1 else p
    C,T = q.shape
    idx = np.arange(T, dtype=np.float32)
    stats = []
    for c in range(1, min(21, C)):
        w = np.power(np.clip(q[c], 1e-8, 1.0), float(gamma))
        s = float(w.sum()) + 1e-8
        tbar = float((idx * w).sum() / s)
        stats.append((tbar, c))
    stats.sort(key=lambda x: x[0])
    perm = [c for _, c in stats][:20]
    # ensure it is exactly a permutation of 1..20 (fill any missed due to C mismatch)
    seen = set(perm)
    for c in range(1,21):
        if c not in seen:
            perm.append(c)
    return perm[:20]

def compress_to_sequence(y_frames):
    seq=[]; last=-1
    for c in y_frames:
        if c==0: continue
        if c!=last: seq.append(int(c)); last=int(c)
    return seq

def levenshtein(a,b):
    n,m=len(a),len(b)
    if n==0: return m
    if m==0: return n
    dp=list(range(m+1))
    for i in range(1,n+1):
        prev=dp[0]; dp[0]=i; ai=a[i-1]
        for j in range(1,m+1):
            tmp=dp[j]
            dp[j]=min(dp[j]+1, dp[j-1]+1, prev + (0 if ai==b[j-1] else 1)); prev=tmp
    return dp[m]

def oof_sweep_order_only(alpha: float = 0.26, gamma_list=(1.0,1.2,1.5,2.0), smooth_list=(3,5)):
    with open('folds_archive_cv.json') as f:
        folds_local = json.load(f)
    results = []  # list of (worst, mean, gamma, smooth_k)
    for g in gamma_list:
        for k in smooth_list:
            per_fold=[]
            print(f'[OOF-ORDER] gamma={g} smooth_k={k}', flush=True)
            for fd in folds_local:
                va_ids = list(map(int, fd['val_ids']))
                dists=[]
                for sid in va_ids:
                    p = fused_poe_skel_rgb(int(sid), alpha=alpha)
                    perm = order_only_perm20(p, gamma=g, smooth_k=k)
                    y_true = load_frame_labels(int(sid))
                    seq_true = compress_to_sequence(y_true)
                    dists.append(levenshtein(perm, seq_true))
                mval = float(np.mean(dists)) if dists else 0.0
                per_fold.append(mval)
                print(f'  fold {fd["fold"]}: mean={mval:.3f} (n={len(dists)})', flush=True)
            worst = max(per_fold); mean_v = float(np.mean(per_fold))
            results.append((worst, mean_v, g, k))
            print(f'  -> worst={worst:.3f} mean={mean_v:.3f}', flush=True)
    results.sort(key=lambda x: (x[0], x[1]))
    print('OOF-ORDER summary (top 5 by worst then mean):')
    for r in results[:5]:
        print(r)
    best = results[0] if results else None
    return best, results

def decode_test_order_only(alpha: float, gamma: float, smooth_k: int, out_csv: str = 'submission_order_only.csv'):
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    rows=[]; ids=[]; t0=time.time(); n=0
    for sid in test_ids:
        # need skeleton probs at least
        p2 = probs_cache/f"{sid}_ce.npy"; p3 = probs_cache/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()):
            continue
        p = fused_poe_skel_rgb(int(sid), alpha=alpha)
        perm = order_only_perm20(p, gamma=gamma, smooth_k=smooth_k)
        ids.append(sid); rows.append(' '.join(map(str, perm))); n+=1
        if (n%20)==0 or n==95:
            print(f'  [ORDER] test decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False)
    print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95, f'Expected 95 rows, got {len(sub)}'
    # mirror to submission.csv
    sub.to_csv('submission.csv', index=False)
    print('submission.csv written ->', out_csv)

print('ORDER Step 1: OOF sweep on gamma and smoothing for PoE(skel+RGB, alpha=0.26)...', flush=True)
best_ord, all_ord = oof_sweep_order_only(alpha=0.26, gamma_list=(1.0,1.2,1.5,2.0), smooth_list=(3,5))
if best_ord is not None:
    worst, mean_v, g_best, k_best = best_ord
    print('Chosen (gamma, smooth_k):', g_best, k_best, '-> worst=', worst, 'mean=', mean_v)
    print('ORDER Step 2: Decode test with best params ...', flush=True)
    out_csv = f'submission_order_only_poe_rgb_a026_g{str(g_best).replace(".", "")}_k{k_best}.csv'
    decode_test_order_only(alpha=0.26, gamma=g_best, smooth_k=k_best, out_csv=out_csv)
else:
    print('Order-only OOF produced no results; skipping decode.')

ORDER Step 1: OOF sweep on gamma and smoothing for PoE(skel+RGB, alpha=0.26)...


[OOF-ORDER] gamma=1.0 smooth_k=3


  fold 0: mean=13.276 (n=98)


  fold 1: mean=12.697 (n=99)


  fold 2: mean=13.240 (n=100)


  fold 0: mean=13.276 (n=98)


  fold 1: mean=12.697 (n=99)


  fold 2: mean=13.240 (n=100)


  -> worst=13.276 mean=13.071


[OOF-ORDER] gamma=1.2 smooth_k=3


  fold 0: mean=12.653 (n=98)


  fold 1: mean=11.980 (n=99)


  fold 2: mean=12.550 (n=100)


  -> worst=12.653 mean=12.394


[OOF-ORDER] gamma=1.2 smooth_k=5


  fold 0: mean=12.653 (n=98)


  fold 1: mean=11.980 (n=99)


  fold 2: mean=12.550 (n=100)


  -> worst=12.653 mean=12.394


[OOF-ORDER] gamma=1.5 smooth_k=3


  fold 0: mean=11.398 (n=98)


  fold 1: mean=10.556 (n=99)


  fold 2: mean=11.590 (n=100)


  -> worst=11.590 mean=11.181


[OOF-ORDER] gamma=1.5 smooth_k=5


  fold 0: mean=11.398 (n=98)


  fold 1: mean=10.545 (n=99)


  fold 2: mean=11.570 (n=100)


  -> worst=11.570 mean=11.171


[OOF-ORDER] gamma=2.0 smooth_k=3


  fold 0: mean=9.714 (n=98)


  fold 1: mean=8.374 (n=99)


  fold 2: mean=9.920 (n=100)


  -> worst=9.920 mean=9.336


[OOF-ORDER] gamma=2.0 smooth_k=5


  fold 0: mean=9.694 (n=98)


  fold 1: mean=8.384 (n=99)


  fold 2: mean=9.910 (n=100)


  -> worst=9.910 mean=9.329


OOF-ORDER summary (top 5 by worst then mean):
(9.91, 9.329238644952932, 2.0, 5)
(9.92, 9.336007696007696, 2.0, 3)
(11.57, 11.171137909709339, 1.5, 5)
(11.59, 11.181171579743008, 1.5, 3)
(12.653061224489797, 12.39428640142926, 1.2, 3)
Chosen (gamma, smooth_k): 2.0 5 -> worst= 9.91 mean= 9.329238644952932
ORDER Step 2: Decode test with best params ...


  [ORDER] test decoded 20/95 elapsed=0.3s


  [ORDER] test decoded 40/95 elapsed=0.6s


KeyboardInterrupt: 

In [46]:
# RGB temporal head (Dilated TCN) on MobileNet embeddings: train per fold, temp-scale, cache OOF/test, fuse vs skeleton
import os, json, time, random
from pathlib import Path
from typing import List
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.backends.cudnn.benchmark = True
random.seed(42); np.random.seed(42); torch.manual_seed(42)

rgb_emb_dir = Path('rgb_embed')
labels_dir = Path('labels3d_v2/train')
probs_cache = Path('probs_cache'); probs_cache.mkdir(exist_ok=True)

def get_T_train(sid: int) -> int:
    return int(np.load(labels_dir / f"{sid}.npy").shape[0])

def get_T_test(sid: int) -> int:
    p3 = probs_cache / f"{sid}_ce_v3.npy"
    if p3.exists(): return int(np.load(p3, mmap_mode='r').shape[1])
    d = np.load(Path('features3d_v3/test')/f"{sid}.npz"); X = d['X'] if 'X' in d.files else d[d.files[0]]; return int(X.shape[0])

def upsample_to_T_np(E: np.ndarray, T: int) -> np.ndarray:
    if E.shape[0] == T: return E.astype(np.float32)
    if E.shape[0] == 0: return np.zeros((T, E.shape[1] if E.ndim==2 else 1280), dtype=np.float32)
    import torch.nn.functional as Fnn
    x = torch.from_numpy(E.astype(np.float32)).unsqueeze(0).transpose(1,2)
    y = Fnn.interpolate(x, size=T, mode='linear', align_corners=False).transpose(1,2).squeeze(0).contiguous()
    return y.numpy().astype(np.float32)

class RGBEmbDataset(Dataset):
    def __init__(self, ids: List[int], chunk_len: int = 2048):
        self.ids = list(ids); self.chunk_len = chunk_len; self.index=[]
        for sid in self.ids:
            E = np.load(rgb_emb_dir/'train'/f"{sid}.npy", mmap_mode='r')
            T = get_T_train(sid); Eu = upsample_to_T_np(np.array(E), T); n = Eu.shape[0]
            if n <= chunk_len: self.index.append((sid, 0, n))
            else:
                s=0
                while s < n:
                    e = min(n, s+chunk_len); self.index.append((sid, s, e)); s = e
        random.shuffle(self.index)
    def __len__(self): return len(self.index)
    def __getitem__(self, i):
        sid, s, e = self.index[i]
        E = np.load(rgb_emb_dir/'train'/f"{sid}.npy", mmap_mode='r')
        T = get_T_train(sid); Eu = upsample_to_T_np(np.array(E), T)
        y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
        x = Eu[s:e].astype(np.float32); t = y[s:e]
        return torch.from_numpy(x), torch.from_numpy(t)

class DilatedResBlock(nn.Module):
    def __init__(self, ch, dilation, drop=0.35, groups=8, k=3):
        super().__init__()
        self.conv1 = nn.Conv1d(ch, ch, k, padding=dilation, dilation=dilation)
        self.gn1 = nn.GroupNorm(groups, ch)
        self.drop = nn.Dropout(drop)
        self.conv2 = nn.Conv1d(ch, ch, 1)
        self.gn2 = nn.GroupNorm(groups, ch)
    def forward(self, x):
        h = self.conv1(x); h = self.gn1(h); h = F.relu(h, inplace=True); h = self.drop(h)
        h = self.conv2(h); h = self.gn2(h); h = F.relu(h, inplace=True)
        return x + h

class RGBTCNHead(nn.Module):
    def __init__(self, d_in=1280, ch=256, layers=10, n_classes=21, dropout=0.4):
        super().__init__()
        self.inp = nn.Conv1d(d_in, ch, 1)
        blocks=[]; dil=1
        for _ in range(layers):
            blocks.append(DilatedResBlock(ch, dil, drop=dropout, groups=8, k=3));
            dil = min(dil*2, 512)
        self.blocks = nn.ModuleList(blocks)
        self.head = nn.Conv1d(ch, n_classes, 1)
    def forward(self, x_b_t_d):
        x = x_b_t_d.transpose(1,2)  # B,T,D -> B,D,T
        h = self.inp(x)
        for b in self.blocks: h = b(h)
        out = self.head(h)  # B,C,T
        return out.transpose(1,2)  # B,T,C

def train_rgb_tcn_fold(train_ids: List[int], val_ids: List[int], epochs: int = 16, lr: float = 2e-3, wd: float = 1e-4, chunk_len: int = 2048, batch_size: int = 1, patience: int = 4):
        model = RGBTCNHead().to(device)
        opt = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=wd)
        best = 1e18; bad=0
        for ep in range(1, epochs+1):
            t0=time.time(); model.train()
            ds = RGBEmbDataset(train_ids, chunk_len=chunk_len)
            dl = DataLoader(ds, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
            tr_loss=0.0; n_tok=0
            for xb, yb in dl:
                xb=xb.to(device, non_blocking=True); yb=yb.to(device, non_blocking=True)
                opt.zero_grad(set_to_none=True); logits = model(xb)
                loss = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), yb.reshape(-1))
                loss.backward(); opt.step()
                tr_loss += float(loss.item()) * yb.numel(); n_tok += int(yb.numel())
            tr_loss /= max(1, n_tok)
            model.eval(); val_loss=0.0; n_tok=0
            with torch.no_grad():
                for sid in val_ids:
                    E = np.load(rgb_emb_dir/'train'/f"{sid}.npy", mmap_mode='r')
                    T = get_T_train(sid); Eu = upsample_to_T_np(np.array(E), T)
                    y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
                    xb = torch.from_numpy(Eu).unsqueeze(0).to(device); logits = model(xb)
                    ll = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), torch.from_numpy(y).to(device))
                    val_loss += float(ll.item()) * int(T); n_tok += int(T)
            val_loss /= max(1, n_tok)
            print(f"[RGB-TCN] ep {ep:02d} tr_nll={tr_loss:.4f} val_nll={val_loss:.4f} elapsed={time.time()-t0:.1f}s", flush=True)
            if val_loss < best - 1e-4: best = val_loss; bad=0; torch.save(model.state_dict(), 'rgb_tcn_tmp.pth')
            else:
                bad += 1
                if bad >= patience: break
        model.load_state_dict(torch.load('rgb_tcn_tmp.pth', map_location=device));
        return model

def infer_probs_rgb_tcn(model: nn.Module, ids: List[int], split: str, out_suffix: str):
    model.eval(); saved=0; t0=time.time()
    with torch.no_grad():
        for i, sid in enumerate(ids, 1):
            if split=='train':
                T = get_T_train(sid); emb_path = rgb_emb_dir/'train'/f"{sid}.npy"
            else:
                T = get_T_test(sid); emb_path = rgb_emb_dir/'test'/f"{sid}.npy"
            if not emb_path.exists():
                print(f"  [RGB-TCN infer] missing emb for id={sid}, skip"); continue
            E = np.load(emb_path, mmap_mode='r'); Eu = upsample_to_T_np(np.array(E), T)
            xb = torch.from_numpy(Eu).unsqueeze(0).to(device); logits = model(xb)[0]
            p = logits.softmax(dim=-1).cpu().numpy().astype(np.float32).T  # CxT
            p /= (p.sum(axis=0, keepdims=True) + 1e-8)
            np.save(probs_cache/f"{sid}{out_suffix}", p); saved += 1
            if (i%20)==0 or i==len(ids): print(f"  saved {saved}/{len(ids)} split={split} {out_suffix} elapsed={time.time()-t0:.1f}s", flush=True)

def fit_scalar_temperature_on_val(val_ids: List[int], suffix: str) -> float:
    grid = [round(x,2) for x in np.linspace(0.8, 1.5, 15)]
    best_T=1.0; best_nll=1e18
    for Tval in grid:
        nll=0.0; n_tok=0
        for sid in val_ids:
            p = np.load(probs_cache/f"{sid}{suffix}")
            y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
            logp = np.log(np.clip(p, 1e-8, 1.0)) / float(Tval)
            q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
            idx = (y >= 0) & (y < q.shape[0]); yy = y[idx]
            nll += -float(np.log(q[yy, np.nonzero(idx)[0]] + 1e-8).sum()); n_tok += int(idx.sum())
        if n_tok>0:
            nll /= float(n_tok)
            if nll < best_nll: best_nll = nll; best_T = Tval
    print(f"[Temp RGB-TCN] best T={best_T} NLL={best_nll:.4f} on {len(val_ids)} val ids", flush=True)
    return float(best_T)

def average_test_rgbt_with_fold_temps():
    Ts=[]
    for f in range(3):
        jf = Path(f'rgbt_temp_fold{f}.json')
        if jf.exists():
            try: Ts.append(float(json.loads(jf.read_text())['T']))
            except Exception: Ts.append(1.0)
        else: Ts.append(1.0)
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    n_avg=0
    for sid in test_ids:
        arrs=[]
        for f in range(3):
            p = probs_cache/f"{sid}_rgbt_f{f}.npy"
            if p.exists():
                a = np.load(p, mmap_mode='r').astype(np.float32)
                logp = np.log(np.clip(a, 1e-8, 1.0)) / float(Ts[f])
                q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
                arrs.append(q)
        if not arrs: continue
        Tm = min(a.shape[1] for a in arrs); arrs = [a[:, :Tm] for a in arrs]
        m = np.mean(arrs, axis=0); m /= (m.sum(axis=0, keepdims=True) + 1e-8)
        np.save(probs_cache/f"{sid}_rgbt.npy", m.astype(np.float32)); n_avg+=1
    print('Averaged TEST per-fold -> rgbt.npy for', n_avg, 'ids')

print('Training RGB-TCN head per fold...', flush=True)
folds_list = json.load(open('folds_archive_cv.json','r'))
test_ids_list = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
for fd in folds_list:
    fidx = int(fd['fold'])
    tr_ids = list(map(int, fd['train_ids']))
    va_ids = list(map(int, fd['val_ids']))
    print(f'Fold {fidx}: train={len(tr_ids)} val={len(va_ids)}', flush=True)
    model = train_rgb_tcn_fold(tr_ids, va_ids, epochs=16, lr=2e-3, wd=1e-4, chunk_len=2048, batch_size=1, patience=4)
    # OOF (val) -> save {id}_rgbt.npy
    infer_probs_rgb_tcn(model, va_ids, split='train', out_suffix='_rgbt.npy')
    # TEST per-fold -> save {id}_rgbt_f{f}.npy
    infer_probs_rgb_tcn(model, test_ids_list, split='test', out_suffix=f'_rgbt_f{fidx}.npy')
    # Fit temp on val and apply to OOF
    Tbest = fit_scalar_temperature_on_val(va_ids, suffix='_rgbt.npy')
    for sid in va_ids:
        p = np.load(probs_cache/f"{sid}_rgbt.npy")
        logp = np.log(np.clip(p, 1e-8, 1.0)) / float(Tbest)
        q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
        np.save(probs_cache/f"{sid}_rgbt.npy", q.astype(np.float32))
    Path(f'rgbt_temp_fold{fidx}.json').write_text(json.dumps({'T': Tbest}))

print('Calibrating TEST per-fold RGB-TCN and averaging...', flush=True)
average_test_rgbt_with_fold_temps()

# Fusion: use rgbt if available; fallback to rgb
def align_stream_to_skel(p_stream: np.ndarray, p_skel: np.ndarray, max_shift: int = 15):
    return align_rgb_to_skel(p_stream, p_skel, max_shift=max_shift)

def load_rgb_like_for_id(sid: int) -> np.ndarray | None:
    p_rgbt = probs_cache/f"{sid}_rgbt.npy"
    if p_rgbt.exists(): return np.load(p_rgbt).astype(np.float32)
    p_rgb = probs_cache/f"{sid}_rgb.npy"
    if p_rgb.exists(): return np.load(p_rgb).astype(np.float32)
    return None

def oof_alpha_tune_rgbt(alpha_list=(0.20,0.22,0.24,0.26,0.28), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04):
    folds_list_local = json.load(open('folds_archive_cv.json','r'))
    worst_by={}; mean_by={}
    for alpha in alpha_list:
        per_fold=[]
        print(f'[OOF-RGBT] alpha={alpha}', flush=True)
        for fd in folds_list_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult=mult)
            dists=[]
            for sid in va_ids:
                p_rgbx = load_rgb_like_for_id(int(sid))
                if p_rgbx is None: continue
                p_skel = load_probs(int(sid)).astype(np.float32)
                pr, ps = align_stream_to_skel(p_rgbx, p_skel, max_shift=15)
                pf = fuse_geometric(ps, pr, alpha=float(alpha))
                y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                seq = compress_to_sequence(y_hat); seq_true = compress_to_sequence(load_frame_labels(int(sid)))
                dists.append(levenshtein(seq, seq_true))
            per_fold.append(float(np.mean(dists)) if dists else 0.0)
        worst_by[alpha] = max(per_fold); mean_by[alpha] = float(np.mean(per_fold))
        print(f"  -> worst={worst_by[alpha]:.3f} mean={mean_by[alpha]:.3f}", flush=True)
    print('OOF-RGBT alpha summary:')
    for a in alpha_list: print(f'  alpha={a}: worst={worst_by[a]:.3f} mean={mean_by[a]:.3f}')
    best_alpha = min(alpha_list, key=lambda a: (worst_by[a], mean_by[a]))
    print('Chosen alpha (RGBT):', best_alpha)
    return best_alpha, worst_by, mean_by

def fuse_decode_test_rgbt(alpha: float, mult: float = 0.7, smooth_k: int = 5, aba_len: int = 2, aba_ratio: float = 1.04, out_csv: str = 'submission_fused_rgbt.csv'):
    all_train_ids=[]
    for fd in json.load(open('folds_archive_cv.json','r')): all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids))); md = build_min_dur(med, q75, mult=mult)
    rows=[]; ids=[]; n=0; t0=time.time()
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    for sid in test_ids:
        p2 = probs_cache/f"{sid}_ce.npy"; p3 = probs_cache/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()): continue
        p_skel = load_probs(int(sid)).astype(np.float32)
        p_rgbx = load_rgb_like_for_id(int(sid))
        if p_rgbx is not None:
            pr, ps = align_stream_to_skel(p_rgbx, p_skel, max_shift=15)
            pf = fuse_geometric(ps, pr, alpha=float(alpha))
        else:
            pf = p_skel
        y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
        seq_raw=[]; last=-1
        for c in y_hat:
            if c==0: continue
            if c!=last: seq_raw.append(int(c)); last=int(c)
        seq = make_perm20(seq_raw, pf)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%20)==0 or n==95: print(f'  [RGBT] test fused decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False); print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95
    sub.to_csv('submission.csv', index=False); print('submission.csv written ->', out_csv)

print('Step RGBT-1: OOF tune alpha for RGB-TCN fusion...', flush=True)
best_alpha_rgbt, wb_rgbt, mb_rgbt = oof_alpha_tune_rgbt(alpha_list=(0.20,0.22,0.24,0.26,0.28), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04)
print('Step RGBT-2: Fuse + decode test with best alpha ...', flush=True)
out_csv = f'submission_fused_rgbt_alpha{str(best_alpha_rgbt).replace(".", "")}.csv'
fuse_decode_test_rgbt(alpha=best_alpha_rgbt, mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04, out_csv=out_csv)
print('RGB-TCN fusion pipeline complete.')

Training RGB-TCN head per fold...


Fold 0: train=199 val=98


[RGB-TCN] ep 01 tr_nll=4.1517 val_nll=2.7459 elapsed=22.6s


[RGB-TCN] ep 02 tr_nll=2.6981 val_nll=2.7980 elapsed=4.2s


[RGB-TCN] ep 03 tr_nll=2.4425 val_nll=3.2126 elapsed=4.2s


[RGB-TCN] ep 04 tr_nll=2.1661 val_nll=2.7739 elapsed=4.2s


[RGB-TCN] ep 05 tr_nll=1.9418 val_nll=2.6511 elapsed=4.1s


[RGB-TCN] ep 06 tr_nll=1.7229 val_nll=2.8918 elapsed=4.1s


[RGB-TCN] ep 07 tr_nll=1.5392 val_nll=3.0362 elapsed=4.2s


[RGB-TCN] ep 08 tr_nll=1.3920 val_nll=3.4897 elapsed=4.2s


[RGB-TCN] ep 09 tr_nll=1.2558 val_nll=3.5313 elapsed=4.2s


  saved 20/98 split=train _rgbt.npy elapsed=0.2s


  model.load_state_dict(torch.load('rgb_tcn_tmp.pth', map_location=device));


  saved 40/98 split=train _rgbt.npy elapsed=0.4s


  saved 60/98 split=train _rgbt.npy elapsed=0.6s


  saved 80/98 split=train _rgbt.npy elapsed=0.8s


  saved 98/98 split=train _rgbt.npy elapsed=0.9s


  saved 20/95 split=test _rgbt_f0.npy elapsed=0.7s


  saved 40/95 split=test _rgbt_f0.npy elapsed=1.4s


  saved 60/95 split=test _rgbt_f0.npy elapsed=1.9s


  saved 80/95 split=test _rgbt_f0.npy elapsed=2.5s


  [RGB-TCN infer] missing emb for id=401, skip
  [RGB-TCN infer] missing emb for id=402, skip
  [RGB-TCN infer] missing emb for id=403, skip


[Temp RGB-TCN] best T=1.5 NLL=2.4553 on 98 val ids


Fold 1: train=198 val=99


[RGB-TCN] ep 01 tr_nll=4.0053 val_nll=3.1658 elapsed=7.8s


[RGB-TCN] ep 02 tr_nll=2.6128 val_nll=2.6917 elapsed=4.1s


[RGB-TCN] ep 03 tr_nll=2.4170 val_nll=2.9736 elapsed=4.1s


[RGB-TCN] ep 04 tr_nll=2.1515 val_nll=2.8452 elapsed=4.2s


[RGB-TCN] ep 05 tr_nll=1.9226 val_nll=2.9184 elapsed=4.1s


[RGB-TCN] ep 06 tr_nll=1.7267 val_nll=2.9717 elapsed=4.2s


  saved 20/99 split=train _rgbt.npy elapsed=0.2s


  saved 40/99 split=train _rgbt.npy elapsed=0.4s


  saved 60/99 split=train _rgbt.npy elapsed=0.5s


  saved 80/99 split=train _rgbt.npy elapsed=0.7s


  saved 99/99 split=train _rgbt.npy elapsed=0.9s


  saved 20/95 split=test _rgbt_f1.npy elapsed=0.2s


  saved 40/95 split=test _rgbt_f1.npy elapsed=0.4s


  saved 60/95 split=test _rgbt_f1.npy elapsed=0.6s


  saved 80/95 split=test _rgbt_f1.npy elapsed=0.8s


  [RGB-TCN infer] missing emb for id=401, skip
  [RGB-TCN infer] missing emb for id=402, skip
  [RGB-TCN infer] missing emb for id=403, skip


[Temp RGB-TCN] best T=1.15 NLL=2.6831 on 99 val ids


Fold 2: train=197 val=100


[RGB-TCN] ep 01 tr_nll=3.7669 val_nll=4.4986 elapsed=4.1s


[RGB-TCN] ep 02 tr_nll=2.3747 val_nll=3.8667 elapsed=4.2s


[RGB-TCN] ep 03 tr_nll=2.0153 val_nll=3.7545 elapsed=4.2s


[RGB-TCN] ep 04 tr_nll=1.7417 val_nll=4.0578 elapsed=4.1s


[RGB-TCN] ep 05 tr_nll=1.5132 val_nll=3.6272 elapsed=4.1s


[RGB-TCN] ep 06 tr_nll=1.3995 val_nll=4.3459 elapsed=4.1s


[RGB-TCN] ep 07 tr_nll=1.2881 val_nll=4.0081 elapsed=4.2s


[RGB-TCN] ep 08 tr_nll=1.2012 val_nll=4.2809 elapsed=4.2s


[RGB-TCN] ep 09 tr_nll=1.0690 val_nll=3.9608 elapsed=4.2s


  saved 20/100 split=train _rgbt.npy elapsed=0.2s


  saved 40/100 split=train _rgbt.npy elapsed=0.4s


  saved 60/100 split=train _rgbt.npy elapsed=0.6s


  saved 80/100 split=train _rgbt.npy elapsed=0.7s


  saved 100/100 split=train _rgbt.npy elapsed=0.9s


  saved 20/95 split=test _rgbt_f2.npy elapsed=0.2s


  saved 40/95 split=test _rgbt_f2.npy elapsed=0.4s


  saved 60/95 split=test _rgbt_f2.npy elapsed=0.6s


  saved 80/95 split=test _rgbt_f2.npy elapsed=0.8s


  [RGB-TCN infer] missing emb for id=401, skip
  [RGB-TCN infer] missing emb for id=402, skip
  [RGB-TCN infer] missing emb for id=403, skip


[Temp RGB-TCN] best T=1.5 NLL=3.0840 on 100 val ids


Calibrating TEST per-fold RGB-TCN and averaging...


Averaged TEST per-fold -> rgbt.npy for 92 ids
Step RGBT-1: OOF tune alpha for RGB-TCN fusion...


[OOF-RGBT] alpha=0.2


  -> worst=4.520 mean=3.844


[OOF-RGBT] alpha=0.22


  -> worst=4.560 mean=3.884


[OOF-RGBT] alpha=0.24


  -> worst=4.590 mean=3.907


[OOF-RGBT] alpha=0.26


  -> worst=4.610 mean=3.921


[OOF-RGBT] alpha=0.28


  -> worst=4.600 mean=3.938


OOF-RGBT alpha summary:
  alpha=0.2: worst=4.520 mean=3.844
  alpha=0.22: worst=4.560 mean=3.884
  alpha=0.24: worst=4.590 mean=3.907
  alpha=0.26: worst=4.610 mean=3.921
  alpha=0.28: worst=4.600 mean=3.938
Chosen alpha (RGBT): 0.2
Step RGBT-2: Fuse + decode test with best alpha ...


  [RGBT] test fused decoded 20/95 elapsed=0.4s


  [RGBT] test fused decoded 40/95 elapsed=0.8s


  [RGBT] test fused decoded 60/95 elapsed=1.1s


  [RGBT] test fused decoded 80/95 elapsed=1.5s


  [RGBT] test fused decoded 95/95 elapsed=1.7s


Wrote submission_fused_rgbt_alpha02.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 19 7 1...
1  301  10 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 7 1...
2  302  1 17 16 3 5 9 19 13 20 18 11 4 6 8 14 10 2 7 1...
3  303  17 13 4 3 10 14 6 5 19 20 2 11 16 18 9 7 15 1 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 15 1...
submission.csv written -> submission_fused_rgbt_alpha02.csv
RGB-TCN fusion pipeline complete.


In [47]:
# RGBT PoE with smoother k=3 and tight alpha grid; fuse+decode test
import numpy as np, json, pandas as pd, time
from pathlib import Path

def oof_alpha_tune_rgbt_s3(alpha_list=(0.18,0.20,0.22,0.24,0.26,0.28,0.30), mult=0.7, smooth_k=3, aba_len=2, aba_ratio=1.04):
    folds_list_local = json.load(open('folds_archive_cv.json','r'))
    worst_by={}; mean_by={}
    for alpha in alpha_list:
        per_fold=[]
        print(f'[OOF-RGBT-s3] alpha={alpha}', flush=True)
        for fd in folds_list_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult=mult)
            dists=[]
            for sid in va_ids:
                p_rgbx = None
                prgbt = Path('probs_cache')/f"{sid}_rgbt.npy"
                if prgbt.exists(): p_rgbx = np.load(prgbt).astype(np.float32)
                elif (Path('probs_cache')/f"{sid}_rgb.npy").exists(): p_rgbx = np.load(Path('probs_cache')/f"{sid}_rgb.npy").astype(np.float32)
                if p_rgbx is None: continue
                p_skel = load_probs(int(sid)).astype(np.float32)
                pr, ps = align_rgb_to_skel(p_rgbx, p_skel, max_shift=15)
                pf = fuse_geometric(ps, pr, alpha=float(alpha))
                y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                seq = compress_to_sequence(y_hat); seq_true = compress_to_sequence(load_frame_labels(int(sid)))
                dists.append(levenshtein(seq, seq_true))
            per_fold.append(float(np.mean(dists)) if dists else 0.0)
        worst_by[alpha] = max(per_fold); mean_by[alpha] = float(np.mean(per_fold))
        print(f"  -> worst={worst_by[alpha]:.3f} mean={mean_by[alpha]:.3f}", flush=True)
    print('OOF-RGBT-s3 alpha summary:')
    for a in alpha_list: print(f'  alpha={a}: worst={worst_by[a]:.3f} mean={mean_by[a]:.3f}')
    best_alpha = min(alpha_list, key=lambda a: (worst_by[a], mean_by[a]))
    print('Chosen alpha (RGBT-s3):', best_alpha)
    return best_alpha, worst_by, mean_by

def fuse_decode_test_rgbt_s3(alpha: float, mult: float = 0.7, smooth_k: int = 3, aba_len: int = 2, aba_ratio: float = 1.04, out_csv: str = 'submission_fused_rgbt_s3.csv'):
    all_train_ids=[]
    for fd in json.load(open('folds_archive_cv.json','r')): all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids))); md = build_min_dur(med, q75, mult=mult)
    rows=[]; ids=[]; n=0; t0=time.time()
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    for sid in test_ids:
        p2 = Path('probs_cache')/f"{sid}_ce.npy"; p3 = Path('probs_cache')/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()): continue
        p_skel = load_probs(int(sid)).astype(np.float32)
        p_rgbx = None
        prgbt = Path('probs_cache')/f"{sid}_rgbt.npy"
        if prgbt.exists(): p_rgbx = np.load(prgbt).astype(np.float32)
        elif (Path('probs_cache')/f"{sid}_rgb.npy").exists(): p_rgbx = np.load(Path('probs_cache')/f"{sid}_rgb.npy").astype(np.float32)
        if p_rgbx is not None:
            pr, ps = align_rgb_to_skel(p_rgbx, p_skel, max_shift=15)
            pf = fuse_geometric(ps, pr, alpha=float(alpha))
        else:
            pf = p_skel
        y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
        seq_raw=[]; last=-1
        for c in y_hat:
            if c==0: continue
            if c!=last: seq_raw.append(int(c)); last=int(c)
        seq = make_perm20(seq_raw, pf)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%20)==0 or n==95: print(f'  [RGBT-s3] test fused decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False); print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95
    sub.to_csv('submission.csv', index=False); print('submission.csv written ->', out_csv)

print('RGBT-s3 Step 1: OOF alpha tuning (smooth_k=3)...', flush=True)
best_alpha_rgbt_s3, wb_rgbt_s3, mb_rgbt_s3 = oof_alpha_tune_rgbt_s3(alpha_list=(0.18,0.20,0.22,0.24,0.26,0.28,0.30), mult=0.7, smooth_k=3, aba_len=2, aba_ratio=1.04)
print('RGBT-s3 Step 2: Fuse + decode test with best alpha ...', flush=True)
out_csv = f'submission_fused_rgbt_s3_alpha{str(best_alpha_rgbt_s3).replace(".", "")}.csv'
fuse_decode_test_rgbt_s3(alpha=best_alpha_rgbt_s3, mult=0.7, smooth_k=3, aba_len=2, aba_ratio=1.04, out_csv=out_csv)

RGBT-s3 Step 1: OOF alpha tuning (smooth_k=3)...


[OOF-RGBT-s3] alpha=0.18


  -> worst=4.600 mean=3.921


[OOF-RGBT-s3] alpha=0.2


  -> worst=4.550 mean=3.884


[OOF-RGBT-s3] alpha=0.22


  -> worst=4.590 mean=3.941


[OOF-RGBT-s3] alpha=0.24


  -> worst=4.600 mean=3.972


[OOF-RGBT-s3] alpha=0.26


  -> worst=4.620 mean=3.995


[OOF-RGBT-s3] alpha=0.28


  -> worst=4.650 mean=4.015


[OOF-RGBT-s3] alpha=0.3


  -> worst=4.610 mean=4.080


OOF-RGBT-s3 alpha summary:
  alpha=0.18: worst=4.600 mean=3.921
  alpha=0.2: worst=4.550 mean=3.884
  alpha=0.22: worst=4.590 mean=3.941
  alpha=0.24: worst=4.600 mean=3.972
  alpha=0.26: worst=4.620 mean=3.995
  alpha=0.28: worst=4.650 mean=4.015
  alpha=0.3: worst=4.610 mean=4.080
Chosen alpha (RGBT-s3): 0.2
RGBT-s3 Step 2: Fuse + decode test with best alpha ...


  [RGBT-s3] test fused decoded 20/95 elapsed=0.4s


  [RGBT-s3] test fused decoded 40/95 elapsed=0.8s


  [RGBT-s3] test fused decoded 60/95 elapsed=1.2s


  [RGBT-s3] test fused decoded 80/95 elapsed=1.5s


  [RGBT-s3] test fused decoded 95/95 elapsed=1.8s


Wrote submission_fused_rgbt_s3_alpha02.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 3 5 9 19 13 20 18 11 4 6 8 14 10 2 7 1...
3  303  17 13 4 3 10 14 6 5 19 20 2 11 16 18 9 7 15 1 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 15 1...
submission.csv written -> submission_fused_rgbt_s3_alpha02.csv


In [48]:
# Local-Search (LS) decoder on fused PoE(skel+RGB, alpha=0.26): OOF sweep (mult, lambda, max_moves) and test submission
import numpy as np, json, time, pandas as pd
from pathlib import Path

probs_cache = Path('probs_cache')

# Reuse helpers: load_probs (aligned v2+v3), align_rgb_to_skel, fuse_geometric, decode_minseg_smooth_aba,
# compute_runlen_stats, build_min_dur, load_frame_labels, compress_to_sequence, levenshtein, make_perm20

def fused_poe_skel_rgb_fixedalpha(sid: int, alpha: float = 0.26) -> np.ndarray:
    p_skel = load_probs(int(sid)).astype(np.float32)
    prgb = probs_cache/f"{sid}_rgb.npy"
    if prgb.exists():
        p_rgb = np.load(prgb).astype(np.float32)
        pr, ps = align_rgb_to_skel(p_rgb, p_skel, max_shift=15)
        pf = fuse_geometric(ps, pr, alpha=float(alpha))
        return pf
    else:
        return p_skel

def neglog_prefix(p: np.ndarray) -> np.ndarray:
    # p: CxT
    nl = -np.log(np.clip(p, 1e-8, 1.0))
    return np.cumsum(nl, axis=1).astype(np.float32)

def seg_cost_from_prefix(cum: np.ndarray, c: int, t0: int, t1: int) -> float:
    if t0 < 0: t0 = 0
    if t1 < t0: return 0.0
    if t0 == 0: return float(cum[c, t1])
    return float(cum[c, t1] - cum[c, t0-1])

def build_lmin_lmax(med: np.ndarray, q95: np.ndarray, mult: float) -> tuple:
    lmin = np.zeros_like(med, dtype=np.int32); lmax = np.zeros_like(med, dtype=np.int32)
    for c in range(21):
        if c == 0: lmin[c]=0; lmax[c]=0; continue
        m = float(med[c]) if med[c] > 0 else 10.0
        q = float(q95[c]) if q95[c] > 0 else (m * 2.0)
        lmin[c] = max(3, int(round(mult * m)))
        lmax[c] = min(int(max(3, 1.5 * m)), int(q), 150)
        if lmax[c] < lmin[c]: lmax[c] = lmin[c]
    return lmin, lmax

def initial_segments_from_minseg(p: np.ndarray, med: np.ndarray, q95: np.ndarray, mult: float) -> tuple:
    # Use minseg+smooth+ABA to get a path; take first occurrence segments for unique classes (up to 20)
    lmin_tmp, lmax_tmp = build_lmin_lmax(med, q95, mult)
    md = lmin_tmp.copy(); md[0]=0
    y = decode_minseg_smooth_aba(p, md, smooth_k=5, aba_len=2, aba_ratio=1.04)
    T = len(y)
    segments = []  # list of (c, t0, t1)
    used = set()
    i = 0
    while i < T and len(segments) < 20:
        c = int(y[i]); j=i+1
        while j<T and y[j]==c: j+=1
        if c != 0 and c not in used:
            segments.append((c, i, j-1))
            used.add(c)
        i = j
    # If fewer than 20 unique, fill remaining with highest-mass missing classes using equal splits
    if len(segments) < 20:
        missing = [c for c in range(1,21) if c not in used]
        # simple equal partition of remaining time at the end
        rem = max(0, T - (segments[-1][2] + 1)) if segments else T
        chunk = max(3, rem // max(1, len(missing))) if missing else 0
        t0 = segments[-1][2]+1 if segments else 0
        for c in missing:
            t1 = min(T-1, t0 + chunk - 1)
            if t1 >= t0: segments.append((c, t0, t1))
            t0 = t1 + 1
            if len(segments) >= 20 or t0 >= T: break
    # Ensure exactly 20 segments by trimming or merging last ones
    if len(segments) > 20: segments = segments[:20]
    # If still fewer, pad with any remaining classes with 1-frame slots at end
    while len(segments) < 20:
        c = next((cc for cc in range(1,21) if cc not in {s[0] for s in segments}), 1)
        segments.append((c, max(0, T-1), max(0, T-1)))
    # Extract order and boundaries
    order = [c for (c,_,_) in segments]
    bounds = [(t0, t1) for (_,t0,t1) in segments]
    return order, bounds, y

def seq_cost_with_fixed_bounds(p: np.ndarray, cum: np.ndarray, med: np.ndarray, order: list, bounds: list, lam: float) -> float:
    # p: CxT, cum: CxT prefix of neglog, order: length-20, bounds: list of (t0,t1) length-20
    total=0.0
    for k, c in enumerate(order):
        c = int(c)
        t0, t1 = bounds[k]
        t0 = max(0, int(t0)); t1 = min(p.shape[1]-1, int(t1))
        if t1 < t0: continue
        total += seg_cost_from_prefix(cum, c, t0, t1)
        if lam > 0.0:
            L = max(1, t1 - t0 + 1); m = max(1.0, float(med[c]))
            total += float(lam) * abs(np.log(float(L)) - np.log(m))
    return float(total)

def ls_refine_order_fixed_bounds(p: np.ndarray, med: np.ndarray, init_order: list, bounds: list, lam: float, max_moves: int) -> list:
    # Precompute prefix sums for fast segment cost
    cum = neglog_prefix(p)
    order = list(init_order)
    best_cost = seq_cost_with_fixed_bounds(p, cum, med, order, bounds, lam)
    moves = 0
    improved = True
    while improved and moves < max_moves:
        improved = False
        best_delta = 0.0; best_move = None  # ('swap', k) or ('ins', k, d)
        # Adjacent swaps
        for k in range(19):
            a, b = order[k], order[k+1]
            # cost only affected at segments k and k+1
            c0 = seq_cost_with_fixed_bounds(p, cum, med, [a,b], [bounds[k], bounds[k+1]], lam)
            c1 = seq_cost_with_fixed_bounds(p, cum, med, [b,a], [bounds[k], bounds[k+1]], lam)
            delta = c0 - c1  # positive if swap improves
            if delta > best_delta + 1e-6:
                best_delta = delta; best_move = ('swap', k)
        # Reinsert (move position by +/-1 or +/-2)
        for k in range(20):
            for d in (-2, -1, 1, 2):
                j = k + d
                if j < 0 or j >= 20: continue
                if d == 0: continue
                new_order = order.copy()
                elem = new_order.pop(k)
                new_order.insert(j, elem)
                # affected range between min(k,j) and max(k,j); compute delta by recomputing only that span
                span_lo = min(k,j); span_hi = max(k,j)
                c_old = seq_cost_with_fixed_bounds(p, cum, med, order[span_lo:span_hi+1], bounds[span_lo:span_hi+1], lam)
                c_new = seq_cost_with_fixed_bounds(p, cum, med, new_order[span_lo:span_hi+1], bounds[span_lo:span_hi+1], lam)
                delta = c_old - c_new
                if delta > best_delta + 1e-6:
                    best_delta = delta; best_move = ('ins', k, j)
        if best_move is not None and best_delta > 1e-6:
            if best_move[0] == 'swap':
                k = best_move[1]; order[k], order[k+1] = order[k+1], order[k]
            else:
                k, j = best_move[1], best_move[2]
                elem = order.pop(k); order.insert(j, elem)
            best_cost -= best_delta
            moves += 1; improved = True
        else:
            break
    return order

def micro_boundary_shift(p: np.ndarray, order: list, bounds: list, med: np.ndarray, lmin: np.ndarray, lmax: np.ndarray, lam: float, max_shift: int = 5) -> list:
    # Adjust each internal boundary by +/- up to max_shift frames to reduce cost; keep order fixed
    cum = neglog_prefix(p)
    K = len(order); T = p.shape[1]
    b = [list(x) for x in bounds]
    def seg_len(k): return b[k][1] - b[k][0] + 1
    changed = True
    while changed:
        changed = False
        for k in range(K-1):
            cL = int(order[k]); cR = int(order[k+1])
            # current boundary between segments k and k+1
            t_split = b[k][1]
            best_local = seq_cost_with_fixed_bounds(p, cum, med, [cL, cR], [tuple(b[k]), tuple(b[k+1])], lam)
            best_tsplit = t_split
            for d in range(-max_shift, max_shift+1):
                t_new = t_split + d
                # new bounds
                t0L = b[k][0]; t1L = t_new
                t0R = t_new + 1; t1R = b[k+1][1]
                if t0L < 0 or t1R >= T or t1L < t0L or t1R < t0R: continue
                Llen = t1L - t0L + 1; Rlen = t1R - t0R + 1
                if Llen < lmin[cL] or Llen > lmax[cL]: continue
                if Rlen < lmin[cR] or Rlen > lmax[cR]: continue
                cst = seg_cost_from_prefix(cum, cL, t0L, t1L) + seg_cost_from_prefix(cum, cR, t0R, t1R)
                if lam > 0.0:
                    cst += float(lam) * (abs(np.log(max(1, Llen)) - np.log(max(1.0, med[cL]))) + abs(np.log(max(1, Rlen)) - np.log(max(1.0, med[cR]))))
                if cst + 1e-6 < best_local:
                    best_local = cst; best_tsplit = t_new
            if best_tsplit != t_split:
                b[k][1] = best_tsplit; b[k+1][0] = best_tsplit + 1; changed = True
    return [tuple(x) for x in b]

def ls_decode_for_id(p: np.ndarray, med: np.ndarray, q95: np.ndarray, mult: float, lam: float, max_moves: int) -> list:
    order0, bounds0, _ = initial_segments_from_minseg(p, med, q95, mult)
    lmin, lmax = build_lmin_lmax(med, q95, mult)
    order_ls = ls_refine_order_fixed_bounds(p, med, order0, bounds0, lam=lam, max_moves=max_moves)
    bounds_ls = micro_boundary_shift(p, order_ls, bounds0, med, lmin, lmax, lam=lam, max_shift=5)
    # final permutation is order_ls
    return order_ls

def oof_sweep_ls(mult_list=(0.65,0.7), lam_list=(0.0, 0.2), max_moves_list=(20,40), alpha: float = 0.26):
    folds = json.load(open('folds_archive_cv.json','r'))
    results = []  # (worst, mean, mult, lam, moves)
    for mult in mult_list:
        for lam in lam_list:
            for mv in max_moves_list:
                per_fold=[]
                print(f'[OOF-LS] mult={mult} lam={lam} max_moves={mv}', flush=True)
                for fd in folds:
                    tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
                    med, q75 = compute_runlen_stats(tr_ids)
                    # approximate q95 via 95th percentile from train (reuse q75 as proxy if q95 not available)
                    # build q95 from train using robust method from HSMM cell if available
                    try:
                        # reuse robust_q95_from_ids if defined
                        med_r, q95 = robust_q95_from_ids(tr_ids)
                        med = med_r
                    except Exception:
                        q95 = np.maximum(q75*1.5, med*1.5).astype(np.float32)
                    dists=[]
                    for sid in va_ids:
                        p = fused_poe_skel_rgb_fixedalpha(int(sid), alpha=alpha)
                        perm = ls_decode_for_id(p, med, q95, mult=mult, lam=lam, max_moves=mv)
                        y_true = load_frame_labels(int(sid))
                        seq_true = compress_to_sequence(y_true)
                        dists.append(levenshtein(perm, seq_true))
                    per_fold.append(float(np.mean(dists)) if dists else 0.0)
                worst = max(per_fold); mean_v = float(np.mean(per_fold))
                results.append((worst, mean_v, mult, lam, mv))
                print(f'  -> worst={worst:.3f} mean={mean_v:.3f}', flush=True)
    results.sort(key=lambda x: (x[0], x[1]))
    print('OOF-LS summary (top 5):')
    for r in results[:5]: print(r)
    best = results[0] if results else None
    return best, results

def decode_test_ls(mult: float, lam: float, max_moves: int, alpha: float = 0.26, out_csv: str = 'submission_ls_poe.csv'):
    folds = json.load(open('folds_archive_cv.json','r'))
    all_train_ids=[]
    for fd in folds: all_train_ids.extend(list(map(int, fd['train_ids'])))
    # durations from all-train (test-safe)
    try:
        med_all, q95_all = robust_q95_from_ids(sorted(set(all_train_ids)))
    except Exception:
        med_all, q75_all = compute_runlen_stats(sorted(set(all_train_ids))); q95_all = np.maximum(q75_all*1.5, med_all*1.5).astype(np.float32)
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    rows=[]; ids=[]; n=0; t0=time.time()
    for sid in test_ids:
        p2 = probs_cache/f"{sid}_ce.npy"; p3 = probs_cache/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()): continue
        p = fused_poe_skel_rgb_fixedalpha(int(sid), alpha=alpha)
        perm = ls_decode_for_id(p, med_all, q95_all, mult=mult, lam=lam, max_moves=max_moves)
        ids.append(sid); rows.append(' '.join(map(str, perm))); n+=1
        if (n%20)==0 or n==95: print(f'  [LS] test decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False); print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95
    sub.to_csv('submission.csv', index=False); print('submission.csv written ->', out_csv)

print('LS Step 1: OOF sweep (mult in {0.65,0.7}, lambda in {0,0.2}, max_moves in {20,40})...', flush=True)
best_ls, res_ls = oof_sweep_ls(mult_list=(0.65,0.7), lam_list=(0.0,0.2), max_moves_list=(20,40), alpha=0.26)
if best_ls is not None:
    worst, mean_v, mult_b, lam_b, mv_b = best_ls
    print('Chosen LS config:', (mult_b, lam_b, mv_b), '-> worst=', worst, 'mean=', mean_v, flush=True)
    print('LS Step 2: Decode test with best config ...', flush=True)
    out_csv = f'submission_ls_poe_m{str(mult_b).replace(".", "")}_l{str(lam_b).replace(".", "")}_mv{mv_b}.csv'
    decode_test_ls(mult=mult_b, lam=lam_b, max_moves=mv_b, alpha=0.26, out_csv=out_csv)
else:
    print('LS sweep produced no results; skipping decode.')

LS Step 1: OOF sweep (mult in {0.65,0.7}, lambda in {0,0.2}, max_moves in {20,40})...


[OOF-LS] mult=0.65 lam=0.0 max_moves=20


  -> worst=7.670 mean=6.559


[OOF-LS] mult=0.65 lam=0.0 max_moves=40


  -> worst=7.670 mean=6.559


[OOF-LS] mult=0.65 lam=0.2 max_moves=20


  -> worst=7.670 mean=6.559


[OOF-LS] mult=0.65 lam=0.2 max_moves=40


  -> worst=7.670 mean=6.559


[OOF-LS] mult=0.7 lam=0.0 max_moves=20


  -> worst=7.560 mean=6.691


[OOF-LS] mult=0.7 lam=0.0 max_moves=40


  -> worst=7.560 mean=6.691


[OOF-LS] mult=0.7 lam=0.2 max_moves=20


  -> worst=7.560 mean=6.691


[OOF-LS] mult=0.7 lam=0.2 max_moves=40


  -> worst=7.560 mean=6.691


OOF-LS summary (top 5):
(7.56, 6.691270528413384, 0.7, 0.0, 20)
(7.56, 6.691270528413384, 0.7, 0.0, 40)
(7.56, 6.691270528413384, 0.7, 0.2, 20)
(7.56, 6.691270528413384, 0.7, 0.2, 40)
(7.67, 6.559002954717241, 0.65, 0.0, 20)
Chosen LS config: (0.7, 0.0, 20) -> worst= 7.56 mean= 6.691270528413384


LS Step 2: Decode test with best config ...


  [LS] test decoded 20/95 elapsed=0.4s


  [LS] test decoded 40/95 elapsed=0.8s


  [LS] test decoded 60/95 elapsed=1.2s


  [LS] test decoded 80/95 elapsed=1.6s


  [LS] test decoded 95/95 elapsed=1.9s


Wrote submission_ls_poe_m07_l00_mv20.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 13 20 18 11 3 4 6 8 14 10 2 ...
3  303  18 13 4 3 10 14 6 5 19 20 17 2 11 16 9 1 7 15 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 10 1...
submission.csv written -> submission_ls_poe_m07_l00_mv20.csv


In [49]:
# Set best PoE fused (alpha=0.26, smooth+ABA) as current submission
import shutil, pandas as pd, os
src = 'submission_fused_rgb_alpha026.csv'
assert os.path.exists(src), f'Missing {src}'
shutil.copyfile(src, 'submission.csv')
print('submission.csv ->', src)
print(pd.read_csv('submission.csv').head())

submission.csv -> submission_fused_rgb_alpha026.csv
    Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 7 3 1 5 4 6 2 11 15 13 19 9 8 18 14 16 17 1...
2  302  1 17 16 12 5 9 19 13 20 18 11 3 4 6 8 14 10 2 ...
3  303  18 13 4 3 10 14 6 5 19 20 17 2 11 16 9 7 12 1 ...
4  304  8 1 7 3 14 18 13 9 2 11 20 19 5 6 17 16 4 12 1...


In [51]:
# Audio stream: extract wav, build MFCC+delta features, train small 1D-CNN per fold, cache OOF/test probs, fuse with PoE
import os, io, time, tarfile, zipfile, shutil, json, random, math
from pathlib import Path
from typing import List
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.backends.cudnn.benchmark = True
random.seed(42); np.random.seed(42); torch.manual_seed(42)

probs_cache = Path('probs_cache'); probs_cache.mkdir(exist_ok=True)
labels_dir = Path('labels3d_v2/train')

# Helper: id -> tar mapping (same as video streams)
def id_to_tar(sid: int) -> Path | None:
    if 1 <= sid <= 99: return Path('training1.tar.gz')
    if 101 <= sid <= 199: return Path('training2.tar.gz')
    if 200 <= sid <= 299: return Path('training3.tar.gz')
    if 300 <= sid <= 399: return Path('test.tar.gz')
    return None

def split_of_id(sid: int) -> str:
    return 'train' if sid < 300 else 'test'

# Ensure librosa stack available
try:
    import librosa, soundfile as sf
except Exception:
    import subprocess, sys
    subprocess.run([sys.executable, '-m', 'pip', 'install', '-q', '-c', 'constraints.txt', 'librosa==0.10.2', 'soundfile==0.12.1', 'numba==0.59.1', 'llvmlite==0.42.0'], check=True)
    import librosa, soundfile as sf

# Audio cache dirs
aud_wav_dir = Path('audio_wav'); (aud_wav_dir/'train').mkdir(parents=True, exist_ok=True); (aud_wav_dir/'test').mkdir(parents=True, exist_ok=True)
aud_feat_dir = Path('audio_feat'); (aud_feat_dir/'train').mkdir(parents=True, exist_ok=True); (aud_feat_dir/'test').mkdir(parents=True, exist_ok=True)

def extract_audio_wav_to_cache(sid: int) -> Path | None:
    split = split_of_id(sid)
    out_path = aud_wav_dir / split / f"{sid}.wav"
    if out_path.exists():
        return out_path
    tar_p = id_to_tar(sid)
    if tar_p is None or not tar_p.exists():
        print(f'[audio] missing tar for id={sid}:', tar_p); return None
    zip_name = f"Sample{sid:05d}.zip"
    wav_member = f"Sample{sid:05d}_audio.wav"
    try:
        with tarfile.open(tar_p, 'r:gz') as tf:
            m = next((m for m in tf if m.isreg() and Path(m.name).name == zip_name), None)
            if m is None:
                print(f'[audio] zip {zip_name} not found in {tar_p}'); return None
            data = tf.extractfile(m).read()
        with zipfile.ZipFile(io.BytesIO(data)) as zf:
            names = zf.namelist()
            mem = wav_member if wav_member in names else next((n for n in names if n.lower().endswith('_audio.wav')), None)
            if mem is None:
                print(f'[audio] wav not found for id={sid}')
                return None
            tmp = out_path.with_suffix('.wav.tmp')
            with zf.open(mem) as fsrc, open(tmp, 'wb') as fdst:
                shutil.copyfileobj(fsrc, fdst)
            tmp.replace(out_path)
        return out_path
    except Exception as e:
        print(f'[audio] error id={sid}:', e); return None

# Feature extraction: MFCC (13) + delta (13) -> (T', 26), 16kHz, 25ms window, 10ms hop
def extract_mfcc_feat(wav_path: Path, sr_target: int = 16000, n_mfcc: int = 13, win_ms: float = 25.0, hop_ms: float = 10.0) -> np.ndarray:
    y, sr = librosa.load(str(wav_path), sr=sr_target, mono=True)
    n_fft = int(round(sr_target * (win_ms/1000.0)))
    hop_length = int(round(sr_target * (hop_ms/1000.0)))
    mfcc = librosa.feature.mfcc(y=y, sr=sr_target, n_mfcc=n_mfcc, n_fft=n_fft, hop_length=hop_length)
    d = librosa.feature.delta(mfcc)
    feat = np.vstack([mfcc, d]).T.astype(np.float32)  # (T', 26)
    return feat

def upsample_to_T_np(E: np.ndarray, T: int) -> np.ndarray:
    if E.shape[0] == T:
        return E.astype(np.float32)
    if E.shape[0] == 0:
        return np.zeros((T, E.shape[1] if E.ndim==2 else 26), dtype=np.float32)
    import torch.nn.functional as Fnn
    x = torch.from_numpy(E.astype(np.float32)).unsqueeze(0).transpose(1,2)  # 1,D,T'
    y = Fnn.interpolate(x, size=T, mode='linear', align_corners=False).transpose(1,2).squeeze(0).contiguous()
    return y.numpy().astype(np.float32)

def get_T_train(sid: int) -> int:
    return int(np.load(labels_dir / f"{sid}.npy").shape[0])

def get_T_test(sid: int) -> int:
    p3 = probs_cache / f"{sid}_ce_v3.npy"
    if p3.exists():
        return int(np.load(p3, mmap_mode='r').shape[1])
    d = np.load(Path('features3d_v3/test')/f"{sid}.npz")
    X = d['X'] if 'X' in d.files else d[d.files[0]]
    return int(X.shape[0])

def cache_audio_feat_for_id(sid: int, force: bool = False) -> Path | None:
    split = split_of_id(sid); out = aud_feat_dir / split / f"{sid}.npy"
    if out.exists() and not force:
        try:
            arr = np.load(out, mmap_mode='r')
            if arr.shape[0] > 0: return out
        except Exception:
            pass
    wav_path = extract_audio_wav_to_cache(sid)
    if wav_path is None: return None
    Fm = extract_mfcc_feat(wav_path)  # (T',26)
    np.save(out, Fm.astype(np.float32))
    return out

def list_ids_from_features(split: str):
    base = Path('features3d_v3')/split
    return sorted(int(p.stem) for p in base.glob('*.npz'))

# Build datasets
class AudioSeqDataset(Dataset):
    def __init__(self, ids: List[int], split: str = 'train', chunk_len: int = 2048):
        self.ids = list(ids); self.split = split; self.chunk_len = chunk_len; self.index = []
        for sid in self.ids:
            E = np.load(aud_feat_dir/'train'/f"{sid}.npy", mmap_mode='r')
            T = get_T_train(sid)
            Eu = upsample_to_T_np(np.array(E), T); n = Eu.shape[0]
            if n <= chunk_len:
                self.index.append((sid, 0, n))
            else:
                s = 0
                while s < n:
                    e = min(n, s + chunk_len); self.index.append((sid, s, e)); s = e
        random.shuffle(self.index)
    def __len__(self): return len(self.index)
    def __getitem__(self, i):
        sid, s, e = self.index[i]
        E = np.load(aud_feat_dir/'train'/f"{sid}.npy", mmap_mode='r')
        T = get_T_train(sid); Eu = upsample_to_T_np(np.array(E), T)
        y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
        x = Eu[s:e].astype(np.float32); t = y[s:e]
        return torch.from_numpy(x), torch.from_numpy(t)

# Simple 1D CNN head
class AudioCNN(nn.Module):
    def __init__(self, in_dim=26, hidden=128, n_classes=21, drop=0.3):
        super().__init__()
        self.conv1 = nn.Conv1d(in_dim, hidden, 5, padding=2)
        self.conv2 = nn.Conv1d(hidden, hidden, 5, padding=2)
        self.conv3 = nn.Conv1d(hidden, n_classes, 1)
        self.drop = nn.Dropout(drop)
        self.gn1 = nn.GroupNorm(8, hidden)
        self.gn2 = nn.GroupNorm(8, hidden)
    def forward(self, x_b_t_d):  # B,T,D
        x = x_b_t_d.transpose(1,2)  # B,D,T
        h = F.relu(self.gn1(self.conv1(x)))
        h = self.drop(F.relu(self.gn2(self.conv2(h))))
        out = self.conv3(h)  # B,C,T
        return out.transpose(1,2)  # B,T,C

def train_audio_fold(train_ids: List[int], val_ids: List[int], epochs: int = 12, lr: float = 1e-3, wd: float = 1e-5, chunk_len: int = 2048, batch_size: int = 1, patience: int = 3):
    model = AudioCNN().to(device)
    opt = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=wd)
    best = 1e18; bad = 0
    for ep in range(1, epochs+1):
        t0=time.time(); model.train()
        ds = AudioSeqDataset(train_ids, split='train', chunk_len=chunk_len)
        dl = DataLoader(ds, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
        tr_loss=0.0; n_tok=0
        for xb, yb in dl:
            xb = xb.to(device, non_blocking=True); yb = yb.to(device, non_blocking=True)
            opt.zero_grad(set_to_none=True)
            logits = model(xb)
            loss = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), yb.reshape(-1))
            loss.backward(); opt.step()
            tr_loss += float(loss.item()) * yb.numel(); n_tok += int(yb.numel())
        tr_loss /= max(1, n_tok)
        # val
        model.eval(); val_loss=0.0; n_tok=0
        with torch.no_grad():
            for sid in val_ids:
                E = np.load(aud_feat_dir/'train'/f"{sid}.npy", mmap_mode='r')
                T = get_T_train(sid); Eu = upsample_to_T_np(np.array(E), T)
                y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
                xb = torch.from_numpy(Eu).unsqueeze(0).to(device)
                logits = model(xb)
                ll = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), torch.from_numpy(y).to(device))
                val_loss += float(ll.item()) * int(T); n_tok += int(T)
        val_loss /= max(1, n_tok)
        print(f"[AUDIO fold] ep {ep:02d} tr_nll={tr_loss:.4f} val_nll={val_loss:.4f} elapsed={time.time()-t0:.1f}s", flush=True)
        if val_loss < best - 1e-4:
            best = val_loss; bad=0; torch.save(model.state_dict(), 'audio_head_tmp.pth')
        else:
            bad += 1
            if bad >= patience: break
    model.load_state_dict(torch.load('audio_head_tmp.pth', map_location=device))
    return model

def infer_probs_audio(model: nn.Module, ids: List[int], split: str, out_suffix: str):
    model.eval(); saved=0; t0=time.time()
    with torch.no_grad():
        for i, sid in enumerate(ids, 1):
            if split=='train':
                T = get_T_train(sid); emb_path = aud_feat_dir/'train'/f"{sid}.npy"
            else:
                T = get_T_test(sid); emb_path = aud_feat_dir/'test'/f"{sid}.npy"
            if not emb_path.exists():
                # on-the-fly feat extraction if missing
                cache_audio_feat_for_id(int(sid), force=False)
            if not emb_path.exists():
                print(f"  [AUDIO infer] missing feat for id={sid}, skip"); continue
            E = np.load(emb_path, mmap_mode='r')
            Eu = upsample_to_T_np(np.array(E), T)
            xb = torch.from_numpy(Eu).unsqueeze(0).to(device)
            logits = model(xb)[0]  # T,C
            p = logits.softmax(dim=-1).cpu().numpy().astype(np.float32).T  # CxT
            p /= (p.sum(axis=0, keepdims=True) + 1e-8)
            np.save(probs_cache/f"{sid}{out_suffix}", p); saved += 1
            if (i%20)==0 or i==len(ids):
                print(f"  [AUDIO infer] saved {saved}/{len(ids)} split={split} {out_suffix} elapsed={time.time()-t0:.1f}s", flush=True)

def fit_scalar_temperature_on_val(val_ids: List[int], suffix: str) -> float:
    grid = [round(x,2) for x in np.linspace(0.8, 1.5, 15)]
    best_T=1.0; best_nll=1e18
    for Tval in grid:
        nll=0.0; n_tok=0
        for sid in val_ids:
            p = np.load(probs_cache/f"{sid}{suffix}")  # CxT
            y = np.load(labels_dir/f"{sid}.npy").astype(np.int64)
            logp = np.log(np.clip(p, 1e-8, 1.0)) / float(Tval)
            q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
            idx = (y >= 0) & (y < q.shape[0]); yy = y[idx]
            nll += -float(np.log(q[yy, np.nonzero(idx)[0]] + 1e-8).sum()); n_tok += int(idx.sum())
        if n_tok>0:
            nll /= float(n_tok)
            if nll < best_nll: best_nll = nll; best_T = Tval
    print(f"[AUDIO Temp] best T={best_T} NLL={best_nll:.4f} on {len(val_ids)} val ids", flush=True)
    return float(best_T)

def apply_scalar_temperature(ids: List[int], suffix: str, Tscalar: float):
    for sid in ids:
        p = np.load(probs_cache/f"{sid}{suffix}")
        logp = np.log(np.clip(p, 1e-8, 1.0)) / float(Tscalar)
        q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
        np.save(probs_cache/f"{sid}{suffix}", q.astype(np.float32))

def average_test_audio_with_fold_temps():
    # temps saved as audio_temp_fold{f}.json
    Ts=[]
    for f in range(3):
        jf = Path(f'audio_temp_fold{f}.json')
        if jf.exists():
            try: Ts.append(float(json.loads(jf.read_text())['T']))
            except Exception: Ts.append(1.0)
        else: Ts.append(1.0)
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    n_avg=0
    for sid in test_ids:
        arrs=[]
        for f in range(3):
            p = probs_cache/f"{sid}_aud_f{f}.npy"
            if p.exists():
                a = np.load(p, mmap_mode='r').astype(np.float32)
                logp = np.log(np.clip(a, 1e-8, 1.0)) / float(Ts[f])
                q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
                arrs.append(q)
        if not arrs: continue
        Tm = min(a.shape[1] for a in arrs); arrs = [a[:, :Tm] for a in arrs]
        m = np.mean(arrs, axis=0); m /= (m.sum(axis=0, keepdims=True) + 1e-8)
        np.save(probs_cache/f"{sid}_aud.npy", m.astype(np.float32)); n_avg+=1
    print('Averaged TEST per-fold -> aud.npy for', n_avg, 'ids')

# Phase A: Extract audio features for all ids (train/test)
def bulk_cache_audio_feats(ids: List[int]):
    t0=time.time(); ok=0; skip=0; fail=0
    for i, sid in enumerate(ids, 1):
        out = aud_feat_dir/split_of_id(sid)/f"{sid}.npy"
        if out.exists():
            try:
                arr = np.load(out, mmap_mode='r')
                if arr.shape[0] > 0:
                    skip += 1
                    if (i%20)==0 or i==len(ids):
                        print(f'  [audio] skip {i}/{len(ids)} elapsed={time.time()-t0:.1f}s', flush=True)
                    continue
            except Exception:
                pass
        p = cache_audio_feat_for_id(int(sid), force=False)
        if p is None: fail += 1
        else: ok += 1
        if (i%20)==0 or i==len(ids):
            print(f'  [audio] processed {i}/{len(ids)} ok={ok} skip={skip} fail={fail} elapsed={time.time()-t0:.1f}s', flush=True)
    print(f'[audio] Done: ok={ok} skip={skip} fail={fail} total={len(ids)} elapsed={time.time()-t0:.1f}s')

train_ids = list_ids_from_features('train')
test_ids = list_ids_from_features('test')
print('Audio: caching MFCC features for TRAIN...', flush=True)
bulk_cache_audio_feats(train_ids)
print('Audio: caching MFCC features for TEST...', flush=True)
bulk_cache_audio_feats(test_ids)

# Phase B: Train per-fold audio head, cache OOF/test probs, fit temps, average test
folds_list = json.load(open('folds_archive_cv.json','r'))
test_ids_list = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
for fd in folds_list:
    fidx = int(fd['fold'])
    tr_ids = list(map(int, fd['train_ids']))
    va_ids = list(map(int, fd['val_ids']))
    print(f'Audio Fold {fidx}: train={len(tr_ids)} val={len(va_ids)}', flush=True)
    model = train_audio_fold(tr_ids, va_ids, epochs=10, lr=1e-3, wd=1e-5, chunk_len=2048, batch_size=1, patience=3)
    infer_probs_audio(model, va_ids, split='train', out_suffix='_aud.npy')
    infer_probs_audio(model, test_ids_list, split='test', out_suffix=f'_aud_f{fidx}.npy')
    Tbest = fit_scalar_temperature_on_val(va_ids, suffix='_aud.npy')
    apply_scalar_temperature(va_ids, suffix='_aud.npy', Tscalar=Tbest)
    Path(f'audio_temp_fold{fidx}.json').write_text(json.dumps({'T': float(Tbest)}))

print('Averaging TEST audio per-fold with temps ...', flush=True)
average_test_audio_with_fold_temps()

# Phase C: Fuse with skeleton + RGB using PoE with audio weight gamma; OOF tune gamma and create submission

# Reuse helpers from earlier cells in notebook: load_probs (aligned v2+v3), align_rgb_to_skel as generic align, fuse_geometric,
# compute_runlen_stats, build_min_dur, decode_minseg_smooth_aba, load_frame_labels, compress_to_sequence, levenshtein, make_perm20

def fuse_poe_triple(ps: np.ndarray, pr: np.ndarray | None, pa: np.ndarray | None, alpha: float, gamma: float) -> np.ndarray:
    # ps: skeleton CxT; pr: rgb aligned CxT or None; pa: audio aligned CxT or None
    w_s = 1.0 - float(alpha) - float(gamma)
    w_s = max(0.0, w_s)
    logp = w_s * np.log(np.clip(ps, 1e-8, 1.0))
    if pr is not None and alpha > 0.0:
        logp += float(alpha) * np.log(np.clip(pr, 1e-8, 1.0))
    if pa is not None and gamma > 0.0:
        logp += float(gamma) * np.log(np.clip(pa, 1e-8, 1.0))
    q = np.exp(logp); q /= (q.sum(axis=0, keepdims=True) + 1e-8)
    return q.astype(np.float32)

def align_stream_to_skel(p_stream: np.ndarray, p_skel: np.ndarray, max_shift: int = 15):
    return align_rgb_to_skel(p_stream, p_skel, max_shift=max_shift)  # reuse

def _crop_common_lengths(ps_base: np.ndarray, pr_aligned: np.ndarray | None, pa_aligned: np.ndarray | None):
    Tm = ps_base.shape[1]
    if pr_aligned is not None: Tm = min(Tm, pr_aligned.shape[1])
    if pa_aligned is not None: Tm = min(Tm, pa_aligned.shape[1])
    ps_c = ps_base[:, :Tm]
    pr_c = pr_aligned[:, :Tm] if pr_aligned is not None else None
    pa_c = pa_aligned[:, :Tm] if pa_aligned is not None else None
    return ps_c, pr_c, pa_c

def oof_tune_audio_gamma(alpha_fixed: float = 0.26, gamma_list=(0.10,0.15,0.20,0.25), mult: float = 0.7, smooth_k: int = 5, aba_len: int = 2, aba_ratio: float = 1.04):
    folds_local = json.load(open('folds_archive_cv.json','r'))
    worst_by={}; mean_by={}
    for gamma in gamma_list:
        per_fold=[]
        print(f'[OOF-AUDIO] gamma={gamma}', flush=True)
        for fd in folds_local:
            tr_ids = list(map(int, fd['train_ids'])); va_ids = list(map(int, fd['val_ids']))
            med, q75 = compute_runlen_stats(tr_ids); md = build_min_dur(med, q75, mult=mult)
            dists=[]
            for sid in va_ids:
                # skeleton fused v2+v3 aligned
                p_skel = load_probs(int(sid)).astype(np.float32)
                # RGB optional
                prgbp = probs_cache/f"{sid}_rgb.npy"
                pr_aligned = None
                ps_base = p_skel
                if prgbp.exists():
                    pr = np.load(prgbp).astype(np.float32)
                    pr_aligned, ps_base = align_stream_to_skel(pr, p_skel, max_shift=15)
                # Audio optional
                paudp = probs_cache/f"{sid}_aud.npy"
                pa_aligned = None
                if paudp.exists():
                    pa = np.load(paudp).astype(np.float32)
                    pa_aligned, ps_base = align_stream_to_skel(pa, ps_base, max_shift=15)
                # Ensure common length across ps/pr/pa before fusion
                ps_c, pr_c, pa_c = _crop_common_lengths(ps_base, pr_aligned, pa_aligned)
                pf = fuse_poe_triple(ps_c, pr_c, pa_c, alpha=alpha_fixed, gamma=gamma)
                y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
                seq = compress_to_sequence(y_hat); seq_true = compress_to_sequence(load_frame_labels(int(sid)))
                dists.append(levenshtein(seq, seq_true))
            per_fold.append(float(np.mean(dists)) if dists else 0.0)
        worst_by[gamma] = max(per_fold); mean_by[gamma] = float(np.mean(per_fold))
        print(f"  -> worst={worst_by[gamma]:.3f} mean={mean_by[gamma]:.3f}", flush=True)
    print('OOF-AUDIO gamma summary:')
    for g in gamma_list: print(f'  gamma={g}: worst={worst_by[g]:.3f} mean={mean_by[g]:.3f}')
    best_gamma = min(gamma_list, key=lambda g: (worst_by[g], mean_by[g]))
    print('Chosen gamma (by worst then mean):', best_gamma)
    return best_gamma, worst_by, mean_by

def fuse_decode_test_with_audio(alpha: float = 0.26, gamma: float = 0.15, mult: float = 0.7, smooth_k: int = 5, aba_len: int = 2, aba_ratio: float = 1.04, out_csv: str = 'submission_fused_rgbt_audio.csv'):
    all_train_ids=[]
    for fd in json.load(open('folds_archive_cv.json','r')): all_train_ids.extend(list(map(int, fd['train_ids'])))
    med, q75 = compute_runlen_stats(sorted(set(all_train_ids))); md = build_min_dur(med, q75, mult=mult)
    rows=[]; ids=[]; n=0; t0=time.time()
    test_ids = sorted(pd.read_csv('test.csv')['Id'].astype(int).tolist())
    for sid in test_ids:
        p2 = probs_cache/f"{sid}_ce.npy"; p3 = probs_cache/f"{sid}_ce_v3.npy"
        if not (p2.exists() and p3.exists()):
            continue
        p_skel = load_probs(int(sid)).astype(np.float32)
        # RGB optional
        prgbp = probs_cache/f"{sid}_rgb.npy"
        pr_aligned = None
        ps_base = p_skel
        if prgbp.exists():
            pr = np.load(prgbp).astype(np.float32)
            pr_aligned, ps_base = align_stream_to_skel(pr, ps_base, max_shift=15)
        # Audio optional
        paudp = probs_cache/f"{sid}_aud.npy"
        pa_aligned = None
        if paudp.exists():
            pa = np.load(paudp).astype(np.float32)
            pa_aligned, ps_base = align_stream_to_skel(pa, ps_base, max_shift=15)
        # Ensure common length before fusion
        ps_c, pr_c, pa_c = _crop_common_lengths(ps_base, pr_aligned, pa_aligned)
        pf = fuse_poe_triple(ps_c, pr_c, pa_c, alpha=alpha, gamma=gamma)
        y_hat = decode_minseg_smooth_aba(pf, md, smooth_k=smooth_k, aba_len=aba_len, aba_ratio=aba_ratio)
        seq_raw=[]; last=-1
        for c in y_hat:
            if c==0: continue
            if c!=last: seq_raw.append(int(c)); last=int(c)
        seq = make_perm20(seq_raw, pf)
        ids.append(sid); rows.append(' '.join(map(str, seq))); n+=1
        if (n%20)==0 or n==95: print(f'  [AUDIO FUSE] test decoded {n}/95 elapsed={time.time()-t0:.1f}s', flush=True)
    sub = pd.DataFrame({'Id': ids, 'Sequence': rows}).sort_values('Id')
    sub.to_csv(out_csv, index=False); print('Wrote', out_csv, 'rows=', len(sub), 'head:\n', sub.head())
    assert len(sub)==95, f'Expected 95 rows, got {len(sub)}'
    sub.to_csv('submission.csv', index=False); print('submission.csv written ->', out_csv)

print('Audio OOF: tuning gamma with alpha=0.26 ...', flush=True)
best_gamma, wb_g, mb_g = oof_tune_audio_gamma(alpha_fixed=0.26, gamma_list=(0.10,0.15,0.20,0.25), mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04)
print('Audio TEST: fuse decode with best gamma ...', flush=True)
out_csv = f"submission_fused_rgb_audio_g{str(best_gamma).replace('.', '')}.csv"
fuse_decode_test_with_audio(alpha=0.26, gamma=best_gamma, mult=0.7, smooth_k=5, aba_len=2, aba_ratio=1.04, out_csv=out_csv)
print('Audio fusion pipeline complete.')

Audio: caching MFCC features for TRAIN...


  [audio] skip 20/297 elapsed=0.0s


  [audio] skip 40/297 elapsed=0.0s


  [audio] skip 60/297 elapsed=0.0s


  [audio] skip 80/297 elapsed=0.0s


  [audio] skip 100/297 elapsed=0.0s


  [audio] skip 120/297 elapsed=0.0s


  [audio] skip 140/297 elapsed=0.0s


  [audio] skip 160/297 elapsed=0.0s


  [audio] skip 180/297 elapsed=0.0s


  [audio] skip 200/297 elapsed=0.0s


  [audio] skip 220/297 elapsed=0.0s


  [audio] skip 240/297 elapsed=0.0s


  [audio] skip 260/297 elapsed=0.0s


  [audio] skip 280/297 elapsed=0.0s


  [audio] skip 297/297 elapsed=0.0s


[audio] Done: ok=0 skip=297 fail=0 total=297 elapsed=0.0s
Audio: caching MFCC features for TEST...


  [audio] skip 20/95 elapsed=0.0s


  [audio] skip 40/95 elapsed=0.0s


  [audio] skip 60/95 elapsed=0.0s


  [audio] skip 80/95 elapsed=0.0s


[audio] missing tar for id=401: None
[audio] missing tar for id=402: None
[audio] missing tar for id=403: None
  [audio] processed 95/95 ok=0 skip=92 fail=3 elapsed=0.0s


[audio] Done: ok=0 skip=92 fail=3 total=95 elapsed=0.0s
Audio Fold 0: train=199 val=98


[AUDIO fold] ep 01 tr_nll=2.8922 val_nll=2.4531 elapsed=0.8s


[AUDIO fold] ep 02 tr_nll=2.6610 val_nll=2.3254 elapsed=0.8s


[AUDIO fold] ep 03 tr_nll=2.5227 val_nll=2.1471 elapsed=0.8s


[AUDIO fold] ep 04 tr_nll=2.4389 val_nll=2.1655 elapsed=0.8s


[AUDIO fold] ep 05 tr_nll=2.3682 val_nll=2.2237 elapsed=0.9s


[AUDIO fold] ep 06 tr_nll=2.3158 val_nll=2.1486 elapsed=0.9s


[AUDIO Temp] best T=0.85 NLL=2.1252 on 98 val ids


Audio Fold 1: train=198 val=99


[AUDIO fold] ep 01 tr_nll=2.8523 val_nll=2.5330 elapsed=0.8s


[AUDIO fold] ep 02 tr_nll=2.6304 val_nll=2.4371 elapsed=0.8s


[AUDIO fold] ep 03 tr_nll=2.4659 val_nll=2.4376 elapsed=0.8s


[AUDIO fold] ep 04 tr_nll=2.3843 val_nll=2.2258 elapsed=0.8s


[AUDIO fold] ep 05 tr_nll=2.3144 val_nll=2.2373 elapsed=0.8s


[AUDIO fold] ep 06 tr_nll=2.2890 val_nll=2.2719 elapsed=0.8s


[AUDIO fold] ep 07 tr_nll=2.2205 val_nll=2.3472 elapsed=0.8s


  [AUDIO infer] saved 20/99 split=train _aud.npy elapsed=0.0s


  [AUDIO infer] saved 40/99 split=train _aud.npy elapsed=0.1s


  [AUDIO infer] saved 60/99 split=train _aud.npy elapsed=0.1s


  [AUDIO infer] saved 80/99 split=train _aud.npy elapsed=0.1s


  [AUDIO infer] saved 99/99 split=train _aud.npy elapsed=0.1s


  [AUDIO infer] saved 20/95 split=test _aud_f1.npy elapsed=0.0s


  [AUDIO infer] saved 40/95 split=test _aud_f1.npy elapsed=0.1s


  [AUDIO infer] saved 60/95 split=test _aud_f1.npy elapsed=0.1s


  [AUDIO infer] saved 80/95 split=test _aud_f1.npy elapsed=0.1s


[audio] missing tar for id=401: None
  [AUDIO infer] missing feat for id=401, skip
[audio] missing tar for id=402: None
  [AUDIO infer] missing feat for id=402, skip
[audio] missing tar for id=403: None
  [AUDIO infer] missing feat for id=403, skip


[AUDIO Temp] best T=0.8 NLL=2.1927 on 99 val ids


Audio Fold 2: train=197 val=100


[AUDIO fold] ep 01 tr_nll=2.4829 val_nll=3.1863 elapsed=0.8s


[AUDIO fold] ep 02 tr_nll=2.2852 val_nll=3.3736 elapsed=0.8s


[AUDIO fold] ep 03 tr_nll=2.1809 val_nll=3.0701 elapsed=0.8s


[AUDIO fold] ep 04 tr_nll=2.0983 val_nll=3.0451 elapsed=0.8s


[AUDIO fold] ep 05 tr_nll=2.0480 val_nll=2.8269 elapsed=0.8s


[AUDIO fold] ep 06 tr_nll=2.0040 val_nll=3.0366 elapsed=0.8s


[AUDIO fold] ep 07 tr_nll=1.9670 val_nll=2.9255 elapsed=0.9s


[AUDIO fold] ep 08 tr_nll=1.9335 val_nll=2.9271 elapsed=0.8s


  [AUDIO infer] saved 20/100 split=train _aud.npy elapsed=0.0s


  [AUDIO infer] saved 40/100 split=train _aud.npy elapsed=0.1s


  [AUDIO infer] saved 60/100 split=train _aud.npy elapsed=0.1s


  [AUDIO infer] saved 80/100 split=train _aud.npy elapsed=0.1s


  [AUDIO infer] saved 100/100 split=train _aud.npy elapsed=0.1s


  [AUDIO infer] saved 20/95 split=test _aud_f2.npy elapsed=0.0s


  [AUDIO infer] saved 40/95 split=test _aud_f2.npy elapsed=0.1s


  [AUDIO infer] saved 60/95 split=test _aud_f2.npy elapsed=0.1s


  [AUDIO infer] saved 80/95 split=test _aud_f2.npy elapsed=0.1s


[audio] missing tar for id=401: None
  [AUDIO infer] missing feat for id=401, skip
[audio] missing tar for id=402: None
  [AUDIO infer] missing feat for id=402, skip
[audio] missing tar for id=403: None
  [AUDIO infer] missing feat for id=403, skip


[AUDIO Temp] best T=1.5 NLL=2.7328 on 100 val ids


Averaging TEST audio per-fold with temps ...


Averaged TEST per-fold -> aud.npy for 92 ids
Audio OOF: tuning gamma with alpha=0.26 ...


[OOF-AUDIO] gamma=0.1


  -> worst=4.530 mean=3.758


[OOF-AUDIO] gamma=0.15


  -> worst=4.460 mean=3.627


[OOF-AUDIO] gamma=0.2


  -> worst=4.420 mean=3.627


[OOF-AUDIO] gamma=0.25


  -> worst=4.280 mean=3.526


OOF-AUDIO gamma summary:
  gamma=0.1: worst=4.530 mean=3.758
  gamma=0.15: worst=4.460 mean=3.627
  gamma=0.2: worst=4.420 mean=3.627
  gamma=0.25: worst=4.280 mean=3.526
Chosen gamma (by worst then mean): 0.25
Audio TEST: fuse decode with best gamma ...


  [AUDIO FUSE] test decoded 20/95 elapsed=0.4s


  [AUDIO FUSE] test decoded 40/95 elapsed=0.9s


  [AUDIO FUSE] test decoded 60/95 elapsed=1.3s


  [AUDIO FUSE] test decoded 80/95 elapsed=1.7s


  [AUDIO FUSE] test decoded 95/95 elapsed=2.0s


Wrote submission_fused_rgb_audio_g025.csv rows= 95 head:
     Id                                           Sequence
0  300  5 9 1 2 18 3 8 4 20 13 12 15 14 11 6 16 7 10 1...
1  301  10 12 1 5 4 6 2 11 15 13 19 9 8 18 14 3 16 17 ...
2  302  1 17 16 12 5 9 19 7 13 20 18 11 3 4 6 15 8 14 ...
3  303  18 13 4 3 10 14 6 5 19 20 17 2 11 16 8 9 7 12 ...
4  304  8 1 7 12 14 18 13 9 2 11 3 20 19 5 6 17 16 4 1...
submission.csv written -> submission_fused_rgb_audio_g025.csv
Audio fusion pipeline complete.
