# Plan: TensorFlow Speech Recognition Challenge

Objectives:
- Build a robust, GPU-accelerated audio pipeline (log-mel spectrograms) with strong CV.
- Ship a fast baseline ASAP; iterate to medal via FE+augment+ensembling if time allows.

Data understanding:
- Repo contains train/ and test/ directories; sample_submission.csv provides ID/label format.
- Labels will be exactly the 12 classes: 10 target words + unknown + silence. Map all non-target folders to unknown; generate silence from _background_noise_.

Validation:
- Use StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42) with groups=speaker_id (filename prefix before "_nohash_") and y=label.
- Deterministic seed; cache fold splits and features.
- Fit normalization on train-fold only (no global stats).

Baseline v1 (fast & strong):
- Precompute log-mel spectrograms (SR=16k mono): n_fft=512, win=400 (25ms), hop=160 (10ms), n_mels=64, fmin≈20, fmax=8000; log(mel+1e-6).
- Normalization: per-utterance z-norm or train-fold CMVN (per-frequency).
- Backbone: ResNet18 2D (1 input channel), global avg pool, dropout 0.2, label smoothing 0.05.
- Optimizer: AdamW (lr=3e-3, wd=1e-4), cosine LR 30 epochs, warmup 1–2, early stopping on val acc.
- Augmentations: time shift ±120ms, background noise mixing (SNR 5–20 dB), SpecAugment (2×T=24, 2×F=10), mixup α=0.3 p=0.5; optional speed 0.9–1.1.

Improvements:
- Tune mel params (64→128 mels), second config diversity; TTA with time shifts.
- Add EfficientNet-B0 or second seed; average logits across folds/models.

Pipeline steps:
1) Environment check: GPU availability and versions.
2) Data inventory: list classes, derive label mapping (targets, unknown, silence), extract speaker_id.
3) Feature builder: precompute/cached log-mels to .npy/.pt (float16) for train/test.
4) Grouped stratified 5-fold training with progress logging; save OOF + per-fold test logits.
5) Inference: fold-average logits; optional TTA; create submission.csv with exact label order.
6) Error analysis on OOF; iterate.

Risks & mitigations:
- CV/LB mismatch: enforce speaker-grouped stratification.
- Label traps (unknown/silence): strict mapping and consistent proportions.
- Leakage: fit normalization per train-fold only.
- Slow I/O: cache spectrograms; DataLoader tuning.

Next actions:
- Run environment check and data inventory (including sample_submission labels).
- Implement fast spectrogram extractor and ResNet18 baseline.
- Establish and save folds; smoke test on 1–2 folds, then full 5-fold.
- After baseline, request expert review for targeted improvements.

In [1]:
# Environment & Data Inventory
import os, sys, subprocess, time, json, math, random, glob, re
from pathlib import Path
import pandas as pd

def log(msg):
    print(f"[{time.strftime('%H:%M:%S')}] {msg}")

# Torch install/check
log("Checking PyTorch and GPU...")
try:
    import torch
except Exception as e:
    log(f"PyTorch not found, installing... ({e})")
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'torch==2.4.0', 'torchvision==0.19.0', 'torchaudio==2.4.0', '--index-url', 'https://download.pytorch.org/whl/cu121'], check=True)
    import torch

gpu_available = torch.cuda.is_available()
log(f"GPU Available: {gpu_available}")
if gpu_available:
    log(f"GPU Count: {torch.cuda.device_count()}")
    log(f"GPU Name: {torch.cuda.get_device_name(0)}")
    props = torch.cuda.get_device_properties(0)
    log(f"GPU Memory: {props.total_memory / 1024**3:.1f} GB")

ROOT = Path('.')
TRAIN_DIR = ROOT / 'train'
TEST_DIR = ROOT / 'test'
SAMPLE_SUB = ROOT / 'sample_submission.csv'

log("Listing directories...")
log(f"CWD: {ROOT.resolve()}")
log(f"Train exists: {TRAIN_DIR.exists()} | Test exists: {TEST_DIR.exists()} | sample_submission exists: {SAMPLE_SUB.exists()}")

# Read sample_submission (to confirm format/labels)
if SAMPLE_SUB.exists():
    df_ss = pd.read_csv(SAMPLE_SUB)
    log(f"sample_submission shape: {df_ss.shape}")
    log("sample_submission head:")
    print(df_ss.head(3))
else:
    df_ss = None

# Inspect train structure
log("Scanning train folders...")
train_classes = []
if TRAIN_DIR.exists():
    for p in sorted(TRAIN_DIR.iterdir()):
        if p.is_dir():
            train_classes.append(p.name)

TARGET_WORDS = ['yes','no','up','down','left','right','on','off','stop','go']
log(f"Found {len(train_classes)} train folders: {train_classes}")
bg_folder = '_background_noise_'
has_bg = bg_folder in train_classes
log(f"Background noise folder present: {has_bg}")

# Count wav files per folder
folder_counts = {}
total_files = 0
for cls in train_classes:
    wavs = glob.glob(str(TRAIN_DIR / cls / '*.wav'))
    folder_counts[cls] = len(wavs)
    total_files += len(wavs)
log(f"Total train wav files: {total_files}")
top_counts = sorted(folder_counts.items(), key=lambda x: -x[1])[:10]
log(f"Top folders by count: {top_counts}")

# Speaker id helper
speaker_re = re.compile(r'^(?P<spk>[^_]+)_nohash_')
def get_speaker_id(fname: str) -> str:
    m = speaker_re.match(Path(fname).stem)
    return m.group('spk') if m else Path(fname).stem.split('_')[0]

# Sample a few files and extract speaker ids
sample_files = []
for cls in train_classes:
    sample_files.extend(glob.glob(str(TRAIN_DIR / cls / '*.wav'))[:3])
log(f"Sampling {len(sample_files)} files for speaker_id parsing...")
spk_samples = [(Path(f).name, get_speaker_id(f)) for f in sample_files[:10]]
print(spk_samples)

# Test dir inventory
log("Scanning test set...")
test_files = glob.glob(str(TEST_DIR / '**' / '*.wav')) if TEST_DIR.exists() else []
log(f"Total test wav files: {len(test_files)}")
print(test_files[:5])

# Summarize mapping plan
non_target = sorted([c for c in train_classes if c not in TARGET_WORDS and c != bg_folder])
log(f"Non-target folders (to be mapped to 'unknown'): count={len(non_target)}")
print(non_target[:20])

log("Environment & data inventory complete.")

[05:06:48] Checking PyTorch and GPU...
[05:06:48] PyTorch not found, installing... (No module named 'torch')


[05:07:43] GPU Available: False
[05:07:43] Listing directories...
[05:07:43] CWD: /var/lib/simon/agent_run_states/tensorflow-speech-recognition-challenge-20250922-045634
[05:07:43] Train exists: True | Test exists: True | sample_submission exists: True
[05:07:43] sample_submission shape: (6473, 2)
[05:07:43] sample_submission head:
               fname    label
0  clip_00000000.wav  silence
1  clip_00000001.wav  silence
2  clip_00000002.wav  silence
[05:07:43] Scanning train folders...
[05:07:43] Found 1 train folders: ['audio']
[05:07:43] Background noise folder present: False
[05:07:43] Total train wav files: 0
[05:07:43] Top folders by count: [('audio', 0)]
[05:07:43] Sampling 0 files for speaker_id parsing...
[]
[05:07:43] Scanning test set...
[05:07:43] Total test wav files: 6473
['test/audio/clip_00002578.wav', 'test/audio/clip_00001415.wav', 'test/audio/clip_00004142.wav', 'test/audio/clip_00002943.wav', 'test/audio/clip_00000977.wav']
[05:07:43] Non-target folders (to be mapped

In [2]:
# Deep inventory: list files under root/train/test and look for metadata
import os
from pathlib import Path
import itertools

def list_dir(path, max_entries=50):
    p = Path(path)
    entries = []
    if not p.exists():
        return entries
    for i, fp in enumerate(p.rglob('*')):
        if i >= max_entries:
            break
        entries.append(str(fp))
    return entries

print('Root files:')
for f in sorted(Path('.').glob('*')):
    print(' -', f)

print('\nAny CSV/NPY/NPZ/JSON in root:')
for ext in ['*.csv','*.npy','*.npz','*.json','*.parquet']:
    for f in sorted(Path('.').glob(ext)):
        print(' -', f)

print('\nTop files under train/:')
for fp in list_dir('train', max_entries=200):
    print(' -', fp)

print('\nTop files under test/:')
for fp in list_dir('test', max_entries=50):
    print(' -', fp)

print('\nLook for metadata files anywhere:')
meta_patterns = ['*train*.csv','*labels*.csv','*meta*.csv','*.parquet','*train*.json']
for pat in meta_patterns:
    for f in sorted(Path('.').rglob(pat)):
        print(' -', f)

Root files:
 - .00_eda_and_planning_kernel_state.json
 - 00_eda_and_planning.ipynb
 - agent_metadata
 - description.md
 - docker_run.log
 - requirements.txt
 - sample_submission.csv
 - submission.csv
 - task.txt
 - test
 - train

Any CSV/NPY/NPZ/JSON in root:
 - sample_submission.csv
 - submission.csv
 - .00_eda_and_planning_kernel_state.json

Top files under train/:
 - train/audio
 - train/audio/wow
 - train/audio/zero
 - train/audio/up
 - train/audio/seven
 - train/audio/_background_noise_
 - train/audio/no
 - train/audio/five
 - train/audio/dog
 - train/audio/eight
 - train/audio/sheila
 - train/audio/bed
 - train/audio/two
 - train/audio/six
 - train/audio/marvin
 - train/audio/left
 - train/audio/down
 - train/audio/stop
 - train/audio/happy
 - train/audio/on
 - train/audio/off
 - train/audio/house
 - train/audio/right
 - train/audio/four
 - train/audio/yes
 - train/audio/go
 - train/audio/nine
 - train/audio/cat
 - train/audio/bird
 - train/audio/tree
 - train/audio/three
 - trai

In [3]:
# Build train dataframe with 12-class mapping (targets, unknown, silence)
import glob, math, random, re
from pathlib import Path
import pandas as pd
import numpy as np

ROOT = Path('.')
AUDIO_DIR = ROOT / 'train' / 'audio'
TEST_DIR = ROOT / 'test' / 'audio'
TARGET_WORDS = ['yes','no','up','down','left','right','on','off','stop','go']
BG_DIR = AUDIO_DIR / '_background_noise_'

def get_speaker_id_from_path(fp: Path) -> str:
    m = re.match(r'^(?P<spk>[^_]+)_nohash_', fp.stem)
    return m.group('spk') if m else fp.stem.split('_')[0]

# Enumerate class folders
class_dirs = [p for p in AUDIO_DIR.iterdir() if p.is_dir()]
class_names = sorted([p.name for p in class_dirs])
print('Class folders (train/audio):', class_names)

# Split into targets, background, and non-target (unknown source)
bg_present = BG_DIR.exists()
non_target_classes = [c for c in class_names if c not in TARGET_WORDS and c != '_background_noise_']
print(f'Background present: {bg_present}; non-target classes (unknown sources): {len(non_target_classes)}')

# Gather file paths
def gather_wavs(dirpath: Path):
    return sorted([Path(p) for p in glob.glob(str(dirpath / '*.wav'))])

rows = []
for cls in TARGET_WORDS:
    cls_dir = AUDIO_DIR / cls
    for fp in gather_wavs(cls_dir):
        rows.append({'path': str(fp), 'label': cls, 'speaker': get_speaker_id_from_path(fp), 'kind': 'target'})

for cls in non_target_classes:
    cls_dir = AUDIO_DIR / cls
    for fp in gather_wavs(cls_dir):
        rows.append({'path': str(fp), 'label': 'unknown', 'speaker': get_speaker_id_from_path(fp), 'kind': 'unknown'})

df_train = pd.DataFrame(rows)
print('Train rows (target+unknown):', df_train.shape)
print(df_train.groupby('label').size().sort_values(ascending=False).head(15))

# Build silence examples from background noise: create ~10% of train size
silence_rows = []
if bg_present:
    bg_files = gather_wavs(BG_DIR)
    # Some files in _background_noise_ are long; we'll sample random 1s offsets during feature extraction.
    n_silence = max(1, int(0.10 * len(df_train)))
    rng = np.random.default_rng(42)
    for i in range(n_silence):
        fp = bg_files[i % len(bg_files)]
        silence_rows.append({'path': str(fp), 'label': 'silence', 'speaker': f'silence_{i}', 'kind': 'silence', 'bg_index': i})
    print(f'Generated planned silence entries: {len(silence_rows)}')
else:
    print('Warning: _background_noise_ not found; no explicit silence examples will be created.')

if len(silence_rows) > 0:
    df_sil = pd.DataFrame(silence_rows)
    df_train = pd.concat([df_train, df_sil], ignore_index=True)

print('Final train rows (incl. silence if any):', df_train.shape)
print(df_train['label'].value_counts().head(15))

# Test dataframe
test_files = sorted([Path(p) for p in glob.glob(str(TEST_DIR / '*.wav'))])
df_test = pd.DataFrame({'fname': [p.name for p in test_files], 'path': [str(p) for p in test_files]})
print('Test rows:', df_test.shape)
print(df_test.head())

# Save metadata for reuse
df_train.to_csv('train_meta.csv', index=False)
df_test.to_csv('test_meta.csv', index=False)
print('Saved train_meta.csv and test_meta.csv.')

Class folders (train/audio): ['_background_noise_', 'bed', 'bird', 'cat', 'dog', 'down', 'eight', 'five', 'four', 'go', 'happy', 'house', 'left', 'marvin', 'nine', 'no', 'off', 'on', 'one', 'right', 'seven', 'sheila', 'six', 'stop', 'three', 'tree', 'two', 'up', 'wow', 'yes', 'zero']
Background present: True; non-target classes (unknown sources): 20


Train rows (target+unknown): (58249, 4)
label
unknown    36932
stop        2157
no          2146
up          2146
on          2142
down        2141
yes         2138
go          2133
off         2124
right       2114
left        2076
dtype: int64
Generated planned silence entries: 5824
Final train rows (incl. silence if any): (64073, 5)
label
unknown    36932
silence     5824
stop        2157
no          2146
up          2146
on          2142
down        2141
yes         2138
go          2133
off         2124
right       2114
left        2076
Name: count, dtype: int64
Test rows: (6473, 2)
               fname                          path
0  clip_00000000.wav  test/audio/clip_00000000.wav
1  clip_00000001.wav  test/audio/clip_00000001.wav
2  clip_00000002.wav  test/audio/clip_00000002.wav
3  clip_00000003.wav  test/audio/clip_00000003.wav
4  clip_00000004.wav  test/audio/clip_00000004.wav
Saved train_meta.csv and test_meta.csv.


In [4]:
# Feature extraction: log-mel spectrograms (per-utterance z-norm), cache to .npy
import sys, subprocess, time, math, json, os
import numpy as np
import pandas as pd
from pathlib import Path

# Install deps for audio feature extraction
try:
    import librosa, soundfile as sf
except Exception as e:
    print('Installing librosa & soundfile...', e)
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'librosa==0.10.1', 'soundfile==0.12.1', 'numba==0.59.1'], check=True)
    import librosa, soundfile as sf

SR = 16000
N_MELS = 64
N_FFT = 512
WIN_LENGTH = 400
HOP_LENGTH = 160
FMIN, FMAX = 20, 8000
EPS = 1e-6
FIX_DURATION = 1.0

def load_audio_1s(path: str, rng: np.random.Generator | None = None) -> np.ndarray:
    y, sr = librosa.load(path, sr=SR, mono=True)
    target_len = int(FIX_DURATION * SR)
    if len(y) < target_len:
        y = np.pad(y, (0, target_len - len(y)))
    elif len(y) > target_len:
        # random crop if rng provided, else center crop
        if rng is not None:
            start = rng.integers(0, len(y) - target_len + 1)
        else:
            start = (len(y) - target_len) // 2
        y = y[start:start+target_len]
    return y

def compute_logmel(y: np.ndarray) -> np.ndarray:
    mel = librosa.feature.melspectrogram(y=y, sr=SR, n_fft=N_FFT, hop_length=HOP_LENGTH, win_length=WIN_LENGTH, window='hann',
                                         n_mels=N_MELS, fmin=FMIN, fmax=FMAX, power=2.0, center=True)
    logmel = np.log(mel + EPS)
    # per-utterance z-norm
    m = logmel.mean(axis=1, keepdims=True)
    s = logmel.std(axis=1, keepdims=True) + 1e-8
    logmel = (logmel - m) / s
    return logmel.astype(np.float32)  # [n_mels, T]

def extract_features(df_train_csv='train_meta.csv', df_test_csv='test_meta.csv', seed=42, max_train=4000, max_test=1000):
    t0 = time.time()
    df_tr = pd.read_csv(df_train_csv)
    df_te = pd.read_csv(df_test_csv)
    if isinstance(max_train, int) and max_train > 0:
        df_tr = df_tr.sample(n=min(max_train, len(df_tr)), random_state=seed).reset_index(drop=True)
    if isinstance(max_test, int) and max_test > 0:
        df_te = df_te.head(max_test).reset_index(drop=True)
    # Label mapping (12 classes)
    classes = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
    cls2idx = {c:i for i,c in enumerate(classes)}

    # Pre-allocate lists
    X_list = []
    y_list = []
    groups = []

    rng = np.random.default_rng(seed)

    print(f"[FE] Train rows: {len(df_tr)} | Test rows: {len(df_te)}")
    # Train
    for i, row in df_tr.iterrows():
        if i % 1000 == 0:
            print(f"[FE] Train {i}/{len(df_tr)} elapsed {time.time()-t0:.1f}s", flush=True)
        path = row['path']
        label = row['label']
        spk = row['speaker']
        # For background noise (silence rows), random crop inside file
        rgen = rng if label == 'silence' else None
        y = load_audio_1s(path, rng=rgen)
        feat = compute_logmel(y)  # [M, T]
        X_list.append(feat.flatten())
        y_list.append(cls2idx[label])
        groups.append(str(spk))

    X_train = np.stack(X_list, axis=0)
    y_train = np.array(y_list, dtype=np.int64)
    groups = np.array(groups)
    np.save('X_train_logmel.npy', X_train)
    np.save('y_train.npy', y_train)
    np.save('groups.npy', groups)
    print(f"[FE] Saved X_train_logmel.npy {X_train.shape}, y_train {y_train.shape}")

    # Test
    X_list = []
    fnames = []
    for i, row in df_te.iterrows():
        if i % 500 == 0:
            print(f"[FE] Test {i}/{len(df_te)} elapsed {time.time()-t0:.1f}s", flush=True)
        path = row['path']
        y = load_audio_1s(path, rng=None)
        feat = compute_logmel(y)
        X_list.append(feat.flatten())
        fnames.append(row['fname'])
    X_test = np.stack(X_list, axis=0)
    np.save('X_test_logmel.npy', X_test)
    pd.Series(fnames).to_csv('test_fnames.csv', index=False, header=False)
    print(f"[FE] Saved X_test_logmel.npy {X_test.shape}")
    print(f"[FE] Total time: {time.time()-t0:.1f}s")

# Smoke extract on subset for fast iteration
extract_features(max_train=4000, max_test=1000)

Installing librosa & soundfile... No module named 'librosa'




[FE] Train rows: 4000 | Test rows: 1000
[FE] Train 0/4000 elapsed 0.0s


[FE] Train 1000/4000 elapsed 8.6s


[FE] Train 2000/4000 elapsed 10.1s


[FE] Train 3000/4000 elapsed 11.5s


[FE] Saved X_train_logmel.npy (4000, 6464), y_train (4000,)
[FE] Test 0/1000 elapsed 13.1s


[FE] Test 500/1000 elapsed 13.7s


[FE] Saved X_test_logmel.npy (1000, 6464)
[FE] Total time: 14.3s


In [5]:
# Smoke-test model: XGBoost on subset log-mels with StratifiedGroupKFold
import sys, subprocess, time, numpy as np, pandas as pd
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.metrics import accuracy_score

try:
    import xgboost as xgb
except Exception as e:
    print('Installing xgboost...', e)
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'xgboost==2.1.1'], check=True)
    import xgboost as xgb

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

X = np.load('X_train_logmel.npy')  # subset (4000, 6464)
y = np.load('y_train.npy')
groups = np.load('groups.npy')

X_test = np.load('X_test_logmel.npy')  # subset test (1000, 6464)
test_fnames = pd.read_csv('test_fnames.csv', header=None)[0].values

print('Shapes:', X.shape, y.shape, groups.shape, X_test.shape)

cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
oof_pred = np.zeros((len(y), num_class), dtype=np.float32)
test_pred = np.zeros((len(X_test), num_class), dtype=np.float32)

params = dict(
    objective='multi:softprob',
    num_class=num_class,
    tree_method='hist',
    max_bin=256,
    max_depth=8,
    eta=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    min_child_weight=1,
    reg_alpha=0.0,
    reg_lambda=1.5,
    eval_metric='mlogloss',
    n_jobs=-1
)

start = time.time()
for fold, (tr_idx, va_idx) in enumerate(cv.split(X, y, groups)):
    t0 = time.time()
    print(f'Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
    dtr = xgb.DMatrix(X[tr_idx], label=y[tr_idx])
    dva = xgb.DMatrix(X[va_idx], label=y[va_idx])
    watch = [(dtr, 'train'), (dva, 'valid')]
    model = xgb.train(params, dtr, num_boost_round=2000, evals=watch, early_stopping_rounds=100, verbose_eval=100)
    oof_pred[va_idx] = model.predict(dva, iteration_range=(0, model.best_ntree_limit))
    dte = xgb.DMatrix(X_test)
    test_pred += model.predict(dte, iteration_range=(0, model.best_ntree_limit)) / cv.n_splits
    va_acc = accuracy_score(y[va_idx], oof_pred[va_idx].argmax(1))
    print(f'Fold {fold} acc: {va_acc:.4f} | elapsed {time.time()-t0:.1f}s')

oof_acc = accuracy_score(y, oof_pred.argmax(1))
print(f'OOF accuracy: {oof_acc:.4f} | total {time.time()-start:.1f}s')

# Save smoke artifacts
np.save('oof_pred_subset.npy', oof_pred)
np.save('test_pred_subset.npy', test_pred)

# Note: test_pred covers only first 1000 test files (subset FE). Full submission will be built after full FE.
print('Smoke test complete.')

Shapes: (4000, 6464) (4000,) (4000,) (1000, 6464)


Fold 0 | train 3224 val 776


[0]	train-mlogloss:2.33515	valid-mlogloss:2.36803


[100]	train-mlogloss:0.12298	valid-mlogloss:1.15637


In [17]:
# CPU-friendly pooled feature extraction (MFCC+Δ+ΔΔ stats, log-mel pooled stats, spectral descriptors) with parallelism
import sys, subprocess, os, time, math, numpy as np, pandas as pd
from pathlib import Path
from joblib import Parallel, delayed
import multiprocessing as mp

try:
    import librosa, soundfile as sf
except Exception as e:
    print('Installing librosa & soundfile...', e)
    import sys
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'librosa==0.10.1', 'soundfile==0.12.1', 'numba==0.59.1'], check=True)
    import librosa, soundfile as sf

from scipy.stats import skew, kurtosis

SR = 16000
N_MELS = 64
N_FFT = 512
WIN_LENGTH = 400
HOP_LENGTH = 160
FMIN, FMAX = 20, 8000
EPS = 1e-6
FIX_DURATION = 1.0
N_MFCC = 20

def load_audio_fixed(path: str, shift_samples: int = 0, rng: np.random.Generator | None = None) -> np.ndarray:
    y, sr = librosa.load(path, sr=SR, mono=True)
    target_len = int(FIX_DURATION * SR)
    # Apply shift (positive -> right, negative -> left) by padding and slicing
    if shift_samples != 0:
        if shift_samples > 0:
            y = np.pad(y, (shift_samples, 0))
        else:
            y = np.pad(y, (0, -shift_samples))
        start = max(0, 0)  # we'll crop below
    # Pad/crop to 1s
    if len(y) < target_len:
        y = np.pad(y, (0, target_len - len(y)))
    elif len(y) > target_len:
        if rng is not None:
            start = rng.integers(0, len(y) - target_len + 1)
        else:
            start = (len(y) - target_len) // 2
        y = y[start:start+target_len]
    return y.astype(np.float32)

def pooled_stats(x: np.ndarray, axis: int = -1, percentiles=(25, 75)) -> np.ndarray:
    # x shape [features, time] or [time]
    if x.ndim == 1:
        x = x[None, :]
    mean = np.mean(x, axis=axis)
    std = np.std(x, axis=axis) + 1e-8
    p25 = np.percentile(x, percentiles[0], axis=axis)
    p75 = np.percentile(x, percentiles[1], axis=axis)
    return np.concatenate([mean, std, p25, p75], axis=0)

def mfcc_extra_stats(feat_2d: np.ndarray) -> np.ndarray:
    # feat_2d shape [n_coeff, time]
    mn = np.min(feat_2d, axis=1)
    mx = np.max(feat_2d, axis=1)
    sk = skew(feat_2d, axis=1, bias=False, nan_policy='omit')
    ku = kurtosis(feat_2d, axis=1, fisher=True, bias=False, nan_policy='omit')
    return np.concatenate([mn, mx, sk, ku], axis=0)

def extract_feature_vector(path: str, label: str | None, speaker: str | None, seed: int = 42, is_silence: bool = False, shift_samples: int = 0) -> tuple:
    rng = np.random.default_rng(seed) if is_silence else None
    y = load_audio_fixed(path, shift_samples=shift_samples, rng=rng)
    # Log-mel for pooled stats
    mel = librosa.feature.melspectrogram(y=y, sr=SR, n_fft=N_FFT, hop_length=HOP_LENGTH, win_length=WIN_LENGTH,
                                         window='hann', n_mels=N_MELS, fmin=FMIN, fmax=FMAX, power=2.0, center=True)
    logmel = np.log(mel + EPS)
    logmel_stats = pooled_stats(logmel, axis=1)  # shape 64*4 = 256
    # MFCC + deltas + delta-delta
    mfcc = librosa.feature.mfcc(S=librosa.power_to_db(mel, ref=np.max), n_mfcc=N_MFCC)
    mfcc_d = librosa.feature.delta(mfcc, order=1)
    mfcc_dd = librosa.feature.delta(mfcc, order=2)
    mfcc_stats = np.concatenate([
        np.mean(mfcc, axis=1), np.std(mfcc, axis=1),
        np.mean(mfcc_d, axis=1), np.std(mfcc_d, axis=1),
        np.mean(mfcc_dd, axis=1), np.std(mfcc_dd, axis=1)
    ])  # 20*6 = 120
    # MFCC extras per coach: per-coef min/max/skew/kurt for mfcc, mfcc_d, mfcc_dd
    mfcc_extras = np.concatenate([
        mfcc_extra_stats(mfcc),
        mfcc_extra_stats(mfcc_d),
        mfcc_extra_stats(mfcc_dd)
    ])  # 20*4*3 = 240
    # Spectral descriptors
    sc = librosa.feature.spectral_centroid(y=y, sr=SR)
    sbw = librosa.feature.spectral_bandwidth(y=y, sr=SR)
    srf = librosa.feature.spectral_rolloff(y=y, sr=SR, roll_percent=0.95)
    zcr = librosa.feature.zero_crossing_rate(y)
    rms = librosa.feature.rms(y=y)
    spec_contrast = librosa.feature.spectral_contrast(y=y, sr=SR, n_fft=N_FFT, hop_length=HOP_LENGTH)
    # New: spectral flatness, RMS percentiles, energy slope, chroma
    sflat = librosa.feature.spectral_flatness(y=y)  # (1, T)
    sflat_mean = float(np.mean(sflat))
    sflat_std = float(np.std(sflat) + 1e-8)
    # RMS percentiles on time axis
    rms_vec = rms[0]
    rms_p10 = float(np.percentile(rms_vec, 10))
    rms_p90 = float(np.percentile(rms_vec, 90))
    # Energy slope: mean(last 20%) - mean(first 20%)
    T = rms_vec.shape[0]
    k = max(1, int(0.2 * T))
    e_slope = float(np.mean(rms_vec[-k:]) - np.mean(rms_vec[:k]))
    # Optional: chroma_stft mean/std (12x2)
    chroma = librosa.feature.chroma_stft(y=y, sr=SR, n_fft=N_FFT, hop_length=HOP_LENGTH)
    chroma_mean = np.mean(chroma, axis=1)
    chroma_std = np.std(chroma, axis=1) + 1e-8

    spec_desc = np.array([
        sc.mean(), sc.std(),
        sbw.mean(), sbw.std(),
        srf.mean(), srf.std(),
        zcr.mean(), zcr.std(),
        rms.mean(), rms.std(),
        sflat_mean, sflat_std,
        rms_p10, rms_p90,
        e_slope
    ], dtype=np.float32)
    spec_contrast_mean = spec_contrast.mean(axis=1)  # 7 dims
    feats = np.concatenate([
        logmel_stats,            # 256
        mfcc_stats,              # 120
        mfcc_extras,             # 240
        spec_desc,               # 15
        spec_contrast_mean,      # 7
        chroma_mean, chroma_std  # 24
    ]).astype(np.float32)
    return feats, label, speaker

def run_pooled_feature_extraction(train_meta='train_meta.csv', test_meta='test_meta.csv',
                                  out_prefix='pooled', max_train=None, max_test=None,
                                  n_jobs=None, seed=42, tta_shifts_ms=None):
    t0 = time.time()
    df_tr = pd.read_csv(train_meta)
    df_te = pd.read_csv(test_meta)
    if max_train is not None:
        df_tr = df_tr.sample(n=min(max_train, len(df_tr)), random_state=seed).reset_index(drop=True)
    if max_test is not None:
        df_te = df_te.head(max_test).reset_index(drop=True)
    print(f"[POOL-FE] Train rows: {len(df_tr)} | Test rows: {len(df_te)}")
    # Prepare TTA shifts in samples
    if tta_shifts_ms is None:
        tta_shifts_ms = [ -200, -100, 0, 100, 200 ]
    shifts = [int(ms/1000.0 * SR) for ms in tta_shifts_ms]
    # Parallel settings
    if n_jobs is None:
        n_jobs = max(1, mp.cpu_count() - 2)
    print(f"[POOL-FE] Using n_jobs={n_jobs} | shifts(ms)={tta_shifts_ms}")

    # Train features (single shift: we do not augment here; shifts used only for test TTA)
    def _proc_train(row):
        path = row['path']
        label = row['label']
        speaker = row['speaker']
        is_sil = (label == 'silence')
        feats, label_out, spk = extract_feature_vector(path, label, speaker, seed=seed, is_silence=is_sil, shift_samples=0)
        return feats, label_out, spk

    tr_results = Parallel(n_jobs=n_jobs, backend='loky')(delayed(_proc_train)(row) for _, row in df_tr.iterrows())
    X_train = np.stack([r[0] for r in tr_results])
    y_labels = [r[1] for r in tr_results]
    groups = np.array([r[2] for r in tr_results])
    classes = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
    cls2idx = {c:i for i,c in enumerate(classes)}
    y_train = np.array([cls2idx[l] for l in y_labels], dtype=np.int64)
    np.save(f'X_train_{out_prefix}.npy', X_train)
    np.save(f'y_train_{out_prefix}.npy', y_train)
    np.save(f'groups_{out_prefix}.npy', groups)
    print(f"[POOL-FE] Saved train: {X_train.shape} | time {time.time()-t0:.1f}s")

    # Test features with TTA shifts; average stored separately or at inference
    def _proc_test(row, shift_samples):
        path = row['path']
        feats, _, _ = extract_feature_vector(path, None, None, seed=seed, is_silence=False, shift_samples=shift_samples)
        return feats

    fnames = df_te['fname'].tolist()
    X_tta = []
    for s in shifts:
        tt0 = time.time()
        feats_list = Parallel(n_jobs=n_jobs, backend='loky')(delayed(_proc_test)(row, s) for _, row in df_te.iterrows())
        X_t = np.stack(feats_list)
        X_tta.append(X_t)
        print(f"[POOL-FE] Test shift {s} samples -> {X_t.shape} | elapsed {time.time()-tt0:.1f}s")
    X_test = np.stack(X_tta, axis=0)  # [n_shifts, N, D]
    np.save(f'X_test_{out_prefix}_tta.npy', X_test)
    pd.Series(fnames).to_csv(f'test_fnames_{out_prefix}.csv', index=False, header=False)
    print(f"[POOL-FE] Saved test TTA: {X_test.shape} | total {time.time()-t0:.1f}s")

# Run pooled FE on full data with 5 TTA shifts for test (updated to [-200,-100,0,100,200] ms)
run_pooled_feature_extraction(out_prefix='pooled', max_train=None, max_test=None, n_jobs=None, seed=42, tta_shifts_ms=[-200, -100, 0, 100, 200])

[POOL-FE] Train rows: 64073 | Test rows: 6473
[POOL-FE] Using n_jobs=34 | shifts(ms)=[-200, -100, 0, 100, 200]


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


[POOL-FE] Saved train: (64073, 662) | time 45.9s


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


[POOL-FE] Test shift -3200 samples -> (6473, 662) | elapsed 4.7s


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


[POOL-FE] Test shift -1600 samples -> (6473, 662) | elapsed 5.0s


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


[POOL-FE] Test shift 0 samples -> (6473, 662) | elapsed 4.9s


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


[POOL-FE] Test shift 1600 samples -> (6473, 662) | elapsed 5.1s


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


[POOL-FE] Test shift 3200 samples -> (6473, 662) | elapsed 4.7s
[POOL-FE] Saved test TTA: (5, 6473, 662) | total 70.3s


In [30]:
# Train XGBoost on pooled CPU-friendly features with SGKF and TTA; produce submission
import os, time, numpy as np, pandas as pd, sys, subprocess
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.utils.class_weight import compute_sample_weight

try:
    import xgboost as xgb
except Exception as e:
    print('Installing xgboost...', e)
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'xgboost==2.1.1'], check=True)
    import xgboost as xgb

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

train_feat = 'X_train_pooled.npy'
train_y = 'y_train_pooled.npy'
train_groups = 'groups_pooled.npy'
# Use small-shift TTA features for test
test_feat_tta = 'X_test_pooled_tta_small.npy'
test_fnames_csv = 'test_fnames_pooled_small.csv'

if not (os.path.exists(train_feat) and os.path.exists(train_y) and os.path.exists(train_groups) and os.path.exists(test_feat_tta) and os.path.exists(test_fnames_csv)):
    print('Pooled features not found. Run cell 6 to generate pooled features first.')
else:
    X = np.load(train_feat)
    y = np.load(train_y)
    groups = np.load(train_groups)
    X_test_tta = np.load(test_feat_tta)  # [n_shifts, N, D]
    test_fnames = pd.read_csv(test_fnames_csv, header=None)[0].values
    n_shifts, n_test, D = X_test_tta.shape
    print('Shapes:', X.shape, y.shape, groups.shape, X_test_tta.shape)

    params = dict(
        objective='multi:softprob',
        num_class=num_class,
        tree_method='hist',
        max_bin=256,
        max_depth=7,
        eta=0.05,
        subsample=0.8,
        colsample_bytree=0.8,
        min_child_weight=3,
        reg_alpha=0.1,
        reg_lambda=1.0,
        eval_metric='mlogloss',
        n_jobs=-1
    )

    cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
    oof = np.zeros((len(y), num_class), dtype=np.float32)
    test_pred = np.zeros((n_test, num_class), dtype=np.float32)
    start = time.time()

    for fold, (tr_idx, va_idx) in enumerate(cv.split(X, y, groups)):
        t0 = time.time()
        print(f'Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
        scaler = StandardScaler(with_mean=True, with_std=True)
        X_tr = scaler.fit_transform(X[tr_idx])
        X_va = scaler.transform(X[va_idx])
        # Clip outliers
        X_tr = np.clip(X_tr, -5, 5)
        X_va = np.clip(X_va, -5, 5)
        # Balanced sample weights for heavy class imbalance
        tr_weights = compute_sample_weight('balanced', y=y[tr_idx])
        dtr = xgb.DMatrix(X_tr, label=y[tr_idx], weight=tr_weights)
        dva = xgb.DMatrix(X_va, label=y[va_idx])
        model = xgb.train(params, dtr, num_boost_round=2000, evals=[(dtr,'train'),(dva,'valid')], early_stopping_rounds=100, verbose_eval=100)
        # Determine best iteration for prediction in XGBoost >=2.x
        best_iter = getattr(model, 'best_iteration', None)
        if best_iter is None:
            try:
                best_iter = model.num_boosted_rounds() - 1
            except Exception:
                best_iter = None
        if best_iter is not None:
            oof[va_idx] = model.predict(dva, iteration_range=(0, best_iter + 1))
        else:
            oof[va_idx] = model.predict(dva)
        va_acc = accuracy_score(y[va_idx], oof[va_idx].argmax(1))
        print(f'Fold {fold} acc: {va_acc:.4f} | elapsed {time.time()-t0:.1f}s')
        # Test TTA
        fold_test = np.zeros((n_test, num_class), dtype=np.float32)
        for s in range(n_shifts):
            X_te_s = scaler.transform(X_test_tta[s])
            X_te_s = np.clip(X_te_s, -5, 5)
            dte = xgb.DMatrix(X_te_s)
            if best_iter is not None:
                fold_test += model.predict(dte, iteration_range=(0, best_iter + 1)) / n_shifts
            else:
                fold_test += model.predict(dte) / n_shifts
        test_pred += fold_test / cv.n_splits

    oof_acc = accuracy_score(y, oof.argmax(1))
    print(f'OOF accuracy: {oof_acc:.4f} | total {time.time()-start:.1f}s')
    np.save('oof_pooled.npy', oof)
    # Save test preds with small-TTA suffix
    np.save('test_pred_pooled_tta50.npy', test_pred)
    print('Saved XGB preds (small TTA) to test_pred_pooled_tta50.npy.')

    # Build submission (optional, for quick check)
    pred_idx = test_pred.argmax(1)
    labels = [CLASSES[i] for i in pred_idx]
    sub = pd.DataFrame({'fname': test_fnames, 'label': labels})
    sub.to_csv('submission.csv', index=False)
    print('Saved submission.csv with shape:', sub.shape)

Shapes: (64073, 662) (64073,) (64073,) (5, 6473, 662)


Fold 0 | train 52005 val 12068


[0]	train-mlogloss:2.33250	valid-mlogloss:2.38331


[100]	train-mlogloss:0.24060	valid-mlogloss:0.74707


[200]	train-mlogloss:0.08020	valid-mlogloss:0.47907


[300]	train-mlogloss:0.03884	valid-mlogloss:0.38749


[400]	train-mlogloss:0.02276	valid-mlogloss:0.34849


[500]	train-mlogloss:0.01511	valid-mlogloss:0.33076


[600]	train-mlogloss:0.01096	valid-mlogloss:0.32142


[700]	train-mlogloss:0.00846	valid-mlogloss:0.31587


[800]	train-mlogloss:0.00685	valid-mlogloss:0.31224


[900]	train-mlogloss:0.00573	valid-mlogloss:0.31005


[1000]	train-mlogloss:0.00493	valid-mlogloss:0.30868


[1100]	train-mlogloss:0.00432	valid-mlogloss:0.30782


[1200]	train-mlogloss:0.00386	valid-mlogloss:0.30732


[1300]	train-mlogloss:0.00350	valid-mlogloss:0.30717


[1400]	train-mlogloss:0.00320	valid-mlogloss:0.30701


[1500]	train-mlogloss:0.00295	valid-mlogloss:0.30707


[1556]	train-mlogloss:0.00283	valid-mlogloss:0.30699


Fold 0 acc: 0.9018 | elapsed 500.8s


Fold 1 | train 51074 val 12999


[0]	train-mlogloss:2.33007	valid-mlogloss:2.38589


[100]	train-mlogloss:0.23088	valid-mlogloss:0.77783


[200]	train-mlogloss:0.07527	valid-mlogloss:0.52077


[300]	train-mlogloss:0.03620	valid-mlogloss:0.43762


[400]	train-mlogloss:0.02132	valid-mlogloss:0.40697


[500]	train-mlogloss:0.01421	valid-mlogloss:0.39249


[600]	train-mlogloss:0.01033	valid-mlogloss:0.38603


[700]	train-mlogloss:0.00799	valid-mlogloss:0.38280


[800]	train-mlogloss:0.00649	valid-mlogloss:0.38147


[900]	train-mlogloss:0.00546	valid-mlogloss:0.38113


[1000]	train-mlogloss:0.00470	valid-mlogloss:0.38101


[1089]	train-mlogloss:0.00419	valid-mlogloss:0.38137


Fold 1 acc: 0.8832 | elapsed 426.9s


Fold 2 | train 51792 val 12281


[0]	train-mlogloss:2.33221	valid-mlogloss:2.38324


[100]	train-mlogloss:0.23833	valid-mlogloss:0.75401


[200]	train-mlogloss:0.07860	valid-mlogloss:0.48901


[300]	train-mlogloss:0.03779	valid-mlogloss:0.39934


[400]	train-mlogloss:0.02219	valid-mlogloss:0.36339


[500]	train-mlogloss:0.01478	valid-mlogloss:0.34656


[600]	train-mlogloss:0.01075	valid-mlogloss:0.33777


[700]	train-mlogloss:0.00830	valid-mlogloss:0.33246


[800]	train-mlogloss:0.00673	valid-mlogloss:0.32974


[900]	train-mlogloss:0.00563	valid-mlogloss:0.32777


[1000]	train-mlogloss:0.00486	valid-mlogloss:0.32701


[1100]	train-mlogloss:0.00427	valid-mlogloss:0.32660


[1200]	train-mlogloss:0.00382	valid-mlogloss:0.32629


[1268]	train-mlogloss:0.00356	valid-mlogloss:0.32664


Fold 2 acc: 0.8941 | elapsed 426.6s


Fold 3 | train 50819 val 13254


[0]	train-mlogloss:2.33087	valid-mlogloss:2.38537


[100]	train-mlogloss:0.23165	valid-mlogloss:0.75722


[200]	train-mlogloss:0.07549	valid-mlogloss:0.50170


[300]	train-mlogloss:0.03629	valid-mlogloss:0.41785


[400]	train-mlogloss:0.02134	valid-mlogloss:0.38470


[500]	train-mlogloss:0.01418	valid-mlogloss:0.37027


[600]	train-mlogloss:0.01034	valid-mlogloss:0.36354


[700]	train-mlogloss:0.00805	valid-mlogloss:0.35972


[800]	train-mlogloss:0.00653	valid-mlogloss:0.35769


[900]	train-mlogloss:0.00549	valid-mlogloss:0.35686


[1000]	train-mlogloss:0.00473	valid-mlogloss:0.35677


[1046]	train-mlogloss:0.00446	valid-mlogloss:0.35688


Fold 3 acc: 0.8874 | elapsed 359.0s


Fold 4 | train 50602 val 13471


[0]	train-mlogloss:2.33100	valid-mlogloss:2.38609


[100]	train-mlogloss:0.23282	valid-mlogloss:0.77955


[200]	train-mlogloss:0.07577	valid-mlogloss:0.51648


[300]	train-mlogloss:0.03636	valid-mlogloss:0.42886


[400]	train-mlogloss:0.02141	valid-mlogloss:0.39478


[500]	train-mlogloss:0.01425	valid-mlogloss:0.37887


[600]	train-mlogloss:0.01038	valid-mlogloss:0.37135


[700]	train-mlogloss:0.00808	valid-mlogloss:0.36755


[800]	train-mlogloss:0.00657	valid-mlogloss:0.36544


[900]	train-mlogloss:0.00552	valid-mlogloss:0.36459


[1000]	train-mlogloss:0.00476	valid-mlogloss:0.36421


[1100]	train-mlogloss:0.00419	valid-mlogloss:0.36425


[1158]	train-mlogloss:0.00392	valid-mlogloss:0.36415


Fold 4 acc: 0.8863 | elapsed 385.5s


OOF accuracy: 0.8903 | total 2104.5s
Saved XGB preds (small TTA) to test_pred_pooled_tta50.npy.
Saved submission.csv with shape: (6473, 2)


In [33]:
# CatBoost on pooled features + blend with XGBoost; SGKF, weights, clipping, 5-shift TTA
import os, time, numpy as np, pandas as pd, sys, subprocess
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.utils.class_weight import compute_sample_weight

try:
    from catboost import CatBoostClassifier, Pool
except Exception as e:
    print('Installing catboost...', e)
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'catboost==1.2.5'], check=True)
    from catboost import CatBoostClassifier, Pool

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

train_feat = 'X_train_pooled.npy'
train_y = 'y_train_pooled.npy'
train_groups = 'groups_pooled.npy'
# Use small-shift TTA features for test
test_feat_tta = 'X_test_pooled_tta_small.npy'
test_fnames_csv = 'test_fnames_pooled_small.csv'

assert os.path.exists(train_feat) and os.path.exists(train_y) and os.path.exists(train_groups), 'Missing train pooled features'
assert os.path.exists(test_feat_tta) and os.path.exists(test_fnames_csv), 'Missing test pooled TTA features'

X = np.load(train_feat)
y = np.load(train_y)
groups = np.load(train_groups)
X_test_tta = np.load(test_feat_tta)  # [n_shifts, N, D]
test_fnames = pd.read_csv(test_fnames_csv, header=None)[0].values
n_shifts, n_test, D = X_test_tta.shape
print('Shapes:', X.shape, y.shape, groups.shape, X_test_tta.shape)

cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
oof_cb = np.zeros((len(y), num_class), dtype=np.float32)
test_cb = np.zeros((n_test, num_class), dtype=np.float32)
start = time.time()

for fold, (tr_idx, va_idx) in enumerate(cv.split(X, y, groups)):
    t0 = time.time()
    print(f'CB Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
    scaler = StandardScaler(with_mean=True, with_std=True)
    X_tr = scaler.fit_transform(X[tr_idx])
    X_va = scaler.transform(X[va_idx])
    # Clip outliers
    X_tr = np.clip(X_tr, -5, 5)
    X_va = np.clip(X_va, -5, 5)
    # Weights
    tr_weights = compute_sample_weight('balanced', y=y[tr_idx])
    train_pool = Pool(X_tr, label=y[tr_idx], weight=tr_weights)
    valid_pool = Pool(X_va, label=y[va_idx])
    model = CatBoostClassifier(
        loss_function='MultiClass',
        eval_metric='MultiClass',
        depth=8,
        learning_rate=0.05,
        l2_leaf_reg=1.0,
        iterations=4000,
        od_type='Iter',
        od_wait=275,
        border_count=128,
        bootstrap_type='Bernoulli',
        subsample=0.8,
        rsm=0.8,
        random_strength=0.1,
        random_seed=42,
        thread_count=-1,
        verbose=100
    )
    model.fit(train_pool, eval_set=valid_pool, use_best_model=True, verbose=100)
    oof_cb[va_idx] = model.predict_proba(valid_pool)
    va_acc = accuracy_score(y[va_idx], oof_cb[va_idx].argmax(1))
    print(f'CB Fold {fold} acc: {va_acc:.4f} | elapsed {time.time()-t0:.1f}s')
    # Test TTA
    fold_test = np.zeros((n_test, num_class), dtype=np.float32)
    for s in range(n_shifts):
        X_te_s = scaler.transform(X_test_tta[s])
        X_te_s = np.clip(X_te_s, -5, 5)
        test_pool = Pool(X_te_s)
        fold_test += model.predict_proba(test_pool) / n_shifts
    test_cb += fold_test / cv.n_splits

oof_acc_cb = accuracy_score(y, oof_cb.argmax(1))
print(f'CatBoost OOF accuracy: {oof_acc_cb:.4f} | total {time.time()-start:.1f}s')
np.save('oof_pooled_cat.npy', oof_cb)
# Save test preds with small-TTA suffix
np.save('test_pred_pooled_cat_tta50.npy', test_cb)
print('Saved CatBoost preds (small TTA) to test_pred_pooled_cat_tta50.npy.')

# Optional quick blended submission with existing XGB (if small-TTA available) for sanity
if os.path.exists('test_pred_pooled_tta50.npy'):
    oof_xgb = np.load('oof_pooled.npy') if os.path.exists('oof_pooled.npy') else None
    test_xgb = np.load('test_pred_pooled_tta50.npy')
    if oof_xgb is not None:
        alpha = 0.5
        oof_blend = (1 - alpha) * oof_xgb + alpha * oof_cb
        acc = accuracy_score(y, oof_blend.argmax(1))
        print(f'Quick XGB+CB alpha=0.5 OOF acc: {acc:.5f}')
    pred_idx = ((test_xgb + test_cb) * 0.5).argmax(1)
    labels = [CLASSES[i] for i in pred_idx]
    sub = pd.DataFrame({'fname': test_fnames, 'label': labels})
    sub.to_csv('submission.csv', index=False)
    print('Saved quick blended submission.csv with shape:', sub.shape)

Shapes: (64073, 662) (64073,) (64073,) (5, 6473, 662)


CB Fold 0 | train 52005 val 12068


0:	learn: 2.3264630	test: 2.3739912	best: 2.3739912 (0)	total: 250ms	remaining: 16m 38s


100:	learn: 0.7069801	test: 1.1439862	best: 1.1439862 (100)	total: 25.5s	remaining: 16m 23s


200:	learn: 0.4353933	test: 0.8956426	best: 0.8956426 (200)	total: 50.5s	remaining: 15m 55s


300:	learn: 0.3019782	test: 0.7541823	best: 0.7541823 (300)	total: 1m 15s	remaining: 15m 25s


400:	learn: 0.2252558	test: 0.6590188	best: 0.6590188 (400)	total: 1m 39s	remaining: 14m 56s


500:	learn: 0.1756789	test: 0.5925465	best: 0.5925465 (500)	total: 2m 4s	remaining: 14m 28s


600:	learn: 0.1409437	test: 0.5410470	best: 0.5410470 (600)	total: 2m 28s	remaining: 14m 1s


700:	learn: 0.1161046	test: 0.5001383	best: 0.5001383 (700)	total: 2m 53s	remaining: 13m 35s


800:	learn: 0.0983941	test: 0.4698836	best: 0.4698836 (800)	total: 3m 17s	remaining: 13m 8s


900:	learn: 0.0850112	test: 0.4460643	best: 0.4460643 (900)	total: 3m 41s	remaining: 12m 42s


1000:	learn: 0.0758108	test: 0.4289278	best: 0.4289278 (1000)	total: 4m 5s	remaining: 12m 15s


1100:	learn: 0.0677302	test: 0.4138283	best: 0.4138283 (1100)	total: 4m 29s	remaining: 11m 50s


1200:	learn: 0.0613607	test: 0.4021944	best: 0.4021944 (1200)	total: 4m 53s	remaining: 11m 24s


1300:	learn: 0.0565300	test: 0.3929822	best: 0.3929822 (1300)	total: 5m 17s	remaining: 10m 59s


1400:	learn: 0.0518441	test: 0.3839963	best: 0.3839963 (1400)	total: 5m 41s	remaining: 10m 33s


1500:	learn: 0.0477145	test: 0.3761123	best: 0.3761123 (1500)	total: 6m 5s	remaining: 10m 8s


1600:	learn: 0.0443664	test: 0.3693319	best: 0.3693319 (1600)	total: 6m 29s	remaining: 9m 43s


1700:	learn: 0.0414395	test: 0.3636248	best: 0.3636248 (1700)	total: 6m 53s	remaining: 9m 18s


1800:	learn: 0.0386339	test: 0.3581414	best: 0.3581414 (1800)	total: 7m 17s	remaining: 8m 54s


1900:	learn: 0.0364115	test: 0.3534260	best: 0.3534260 (1900)	total: 7m 41s	remaining: 8m 29s


2000:	learn: 0.0341994	test: 0.3489446	best: 0.3489446 (2000)	total: 8m 5s	remaining: 8m 4s


2100:	learn: 0.0322877	test: 0.3451496	best: 0.3451496 (2100)	total: 8m 29s	remaining: 7m 40s


2200:	learn: 0.0306075	test: 0.3420797	best: 0.3420797 (2200)	total: 8m 53s	remaining: 7m 15s


2300:	learn: 0.0290836	test: 0.3394389	best: 0.3394389 (2300)	total: 9m 17s	remaining: 6m 51s


2400:	learn: 0.0276647	test: 0.3366504	best: 0.3366504 (2400)	total: 9m 40s	remaining: 6m 26s


2500:	learn: 0.0262575	test: 0.3338300	best: 0.3338300 (2500)	total: 10m 4s	remaining: 6m 2s


2600:	learn: 0.0250719	test: 0.3314577	best: 0.3314577 (2600)	total: 10m 28s	remaining: 5m 38s


2700:	learn: 0.0239791	test: 0.3294760	best: 0.3294760 (2700)	total: 10m 52s	remaining: 5m 13s


2800:	learn: 0.0229276	test: 0.3274562	best: 0.3274562 (2800)	total: 11m 16s	remaining: 4m 49s


2900:	learn: 0.0219935	test: 0.3256126	best: 0.3256110 (2899)	total: 11m 40s	remaining: 4m 25s


3000:	learn: 0.0210884	test: 0.3238271	best: 0.3238271 (3000)	total: 12m 4s	remaining: 4m 1s


3100:	learn: 0.0202327	test: 0.3221863	best: 0.3221863 (3100)	total: 12m 28s	remaining: 3m 36s


3200:	learn: 0.0194385	test: 0.3207482	best: 0.3207482 (3200)	total: 12m 51s	remaining: 3m 12s


3300:	learn: 0.0186864	test: 0.3191236	best: 0.3191236 (3300)	total: 13m 15s	remaining: 2m 48s


3400:	learn: 0.0180751	test: 0.3180145	best: 0.3180145 (3400)	total: 13m 39s	remaining: 2m 24s


3500:	learn: 0.0174056	test: 0.3167781	best: 0.3167781 (3500)	total: 14m 3s	remaining: 2m


3600:	learn: 0.0168033	test: 0.3159217	best: 0.3159217 (3600)	total: 14m 27s	remaining: 1m 36s


3700:	learn: 0.0161734	test: 0.3149079	best: 0.3149079 (3700)	total: 14m 51s	remaining: 1m 11s


3800:	learn: 0.0156339	test: 0.3138496	best: 0.3138496 (3800)	total: 15m 14s	remaining: 47.9s


3900:	learn: 0.0151371	test: 0.3130417	best: 0.3130417 (3900)	total: 15m 38s	remaining: 23.8s


3999:	learn: 0.0146501	test: 0.3120864	best: 0.3120864 (3999)	total: 16m 2s	remaining: 0us

bestTest = 0.3120863855
bestIteration = 3999



CB Fold 0 acc: 0.8978 | elapsed 963.7s
CB Fold 1 | train 51074 val 12999


0:	learn: 2.3232673	test: 2.3758249	best: 2.3758249 (0)	total: 250ms	remaining: 16m 39s


100:	learn: 0.6941784	test: 1.1649530	best: 1.1649530 (100)	total: 25s	remaining: 16m 3s


200:	learn: 0.4202706	test: 0.9210064	best: 0.9210064 (200)	total: 49.6s	remaining: 15m 38s


300:	learn: 0.2888673	test: 0.7825765	best: 0.7825765 (300)	total: 1m 14s	remaining: 15m 10s


400:	learn: 0.2136667	test: 0.6913296	best: 0.6913296 (400)	total: 1m 38s	remaining: 14m 43s


500:	learn: 0.1643169	test: 0.6231237	best: 0.6231237 (500)	total: 2m 2s	remaining: 14m 17s


600:	learn: 0.1318963	test: 0.5747616	best: 0.5747616 (600)	total: 2m 26s	remaining: 13m 50s


700:	learn: 0.1086322	test: 0.5367069	best: 0.5367069 (700)	total: 2m 50s	remaining: 13m 24s


800:	learn: 0.0921461	test: 0.5088552	best: 0.5088552 (800)	total: 3m 14s	remaining: 12m 58s


900:	learn: 0.0806005	test: 0.4883024	best: 0.4883024 (900)	total: 3m 38s	remaining: 12m 32s


1000:	learn: 0.0711788	test: 0.4710500	best: 0.4710500 (1000)	total: 4m 2s	remaining: 12m 6s


1100:	learn: 0.0638696	test: 0.4578410	best: 0.4578410 (1100)	total: 4m 26s	remaining: 11m 41s


1200:	learn: 0.0576781	test: 0.4459515	best: 0.4459515 (1200)	total: 4m 50s	remaining: 11m 15s


1300:	learn: 0.0527036	test: 0.4365355	best: 0.4365355 (1300)	total: 5m 13s	remaining: 10m 50s


1400:	learn: 0.0484658	test: 0.4289461	best: 0.4289461 (1400)	total: 5m 37s	remaining: 10m 25s


1500:	learn: 0.0451095	test: 0.4226403	best: 0.4226403 (1500)	total: 6m	remaining: 10m


1600:	learn: 0.0419418	test: 0.4171951	best: 0.4171951 (1600)	total: 6m 24s	remaining: 9m 36s


1700:	learn: 0.0390595	test: 0.4120162	best: 0.4120162 (1700)	total: 6m 48s	remaining: 9m 11s


1800:	learn: 0.0364951	test: 0.4073155	best: 0.4073155 (1800)	total: 7m 11s	remaining: 8m 47s


1900:	learn: 0.0344556	test: 0.4035750	best: 0.4035750 (1900)	total: 7m 35s	remaining: 8m 22s


2000:	learn: 0.0324584	test: 0.4003025	best: 0.4003025 (2000)	total: 7m 59s	remaining: 7m 58s


2100:	learn: 0.0307605	test: 0.3972531	best: 0.3972531 (2100)	total: 8m 22s	remaining: 7m 34s


2200:	learn: 0.0291044	test: 0.3941476	best: 0.3941476 (2200)	total: 8m 46s	remaining: 7m 10s


2300:	learn: 0.0275866	test: 0.3916791	best: 0.3916791 (2300)	total: 9m 9s	remaining: 6m 46s


2400:	learn: 0.0262014	test: 0.3891985	best: 0.3891985 (2400)	total: 9m 33s	remaining: 6m 22s


2500:	learn: 0.0249721	test: 0.3872451	best: 0.3872451 (2500)	total: 9m 57s	remaining: 5m 57s


2600:	learn: 0.0237969	test: 0.3853876	best: 0.3853876 (2600)	total: 10m 20s	remaining: 5m 33s


2700:	learn: 0.0226884	test: 0.3835729	best: 0.3835729 (2700)	total: 10m 44s	remaining: 5m 9s


2800:	learn: 0.0216759	test: 0.3817013	best: 0.3817013 (2800)	total: 11m 8s	remaining: 4m 46s


2900:	learn: 0.0207299	test: 0.3802288	best: 0.3802288 (2900)	total: 11m 31s	remaining: 4m 22s


3000:	learn: 0.0198695	test: 0.3789283	best: 0.3789283 (3000)	total: 11m 55s	remaining: 3m 58s


3100:	learn: 0.0191066	test: 0.3779728	best: 0.3779728 (3100)	total: 12m 18s	remaining: 3m 34s


3200:	learn: 0.0183280	test: 0.3768577	best: 0.3768577 (3200)	total: 12m 42s	remaining: 3m 10s


3300:	learn: 0.0176244	test: 0.3757814	best: 0.3757814 (3300)	total: 13m 6s	remaining: 2m 46s


3400:	learn: 0.0169634	test: 0.3749378	best: 0.3749289 (3394)	total: 13m 29s	remaining: 2m 22s


3500:	learn: 0.0163881	test: 0.3740105	best: 0.3740105 (3500)	total: 13m 53s	remaining: 1m 58s


3600:	learn: 0.0158175	test: 0.3731988	best: 0.3731988 (3600)	total: 14m 16s	remaining: 1m 34s


3700:	learn: 0.0152425	test: 0.3726561	best: 0.3726561 (3700)	total: 14m 40s	remaining: 1m 11s


3800:	learn: 0.0146790	test: 0.3718222	best: 0.3718222 (3800)	total: 15m 4s	remaining: 47.3s


3900:	learn: 0.0141670	test: 0.3711963	best: 0.3711963 (3900)	total: 15m 27s	remaining: 23.5s


3999:	learn: 0.0136856	test: 0.3705486	best: 0.3705187 (3998)	total: 15m 51s	remaining: 0us

bestTest = 0.370518738
bestIteration = 3998

Shrink model to first 3999 iterations.


CB Fold 1 acc: 0.8868 | elapsed 952.3s
CB Fold 2 | train 51792 val 12281


0:	learn: 2.3263974	test: 2.3746902	best: 2.3746902 (0)	total: 243ms	remaining: 16m 11s


100:	learn: 0.7078816	test: 1.1359938	best: 1.1359938 (100)	total: 25.1s	remaining: 16m 8s


200:	learn: 0.4348035	test: 0.8940727	best: 0.8940727 (200)	total: 49.9s	remaining: 15m 43s


300:	learn: 0.3013509	test: 0.7596887	best: 0.7596887 (300)	total: 1m 14s	remaining: 15m 15s


400:	learn: 0.2222506	test: 0.6652118	best: 0.6652118 (400)	total: 1m 39s	remaining: 14m 48s


500:	learn: 0.1713940	test: 0.5958891	best: 0.5958891 (500)	total: 2m 3s	remaining: 14m 22s


600:	learn: 0.1380385	test: 0.5460217	best: 0.5460217 (600)	total: 2m 27s	remaining: 13m 55s


700:	learn: 0.1135662	test: 0.5060100	best: 0.5060100 (700)	total: 2m 51s	remaining: 13m 29s


800:	learn: 0.0962968	test: 0.4772250	best: 0.4772250 (800)	total: 3m 16s	remaining: 13m 3s


900:	learn: 0.0835554	test: 0.4545873	best: 0.4545873 (900)	total: 3m 40s	remaining: 12m 37s


1000:	learn: 0.0732972	test: 0.4355442	best: 0.4355442 (1000)	total: 4m 4s	remaining: 12m 12s


1100:	learn: 0.0660021	test: 0.4219736	best: 0.4219736 (1100)	total: 4m 28s	remaining: 11m 46s


1200:	learn: 0.0600859	test: 0.4105227	best: 0.4105227 (1200)	total: 4m 52s	remaining: 11m 21s


1300:	learn: 0.0545950	test: 0.4000863	best: 0.4000863 (1300)	total: 5m 16s	remaining: 10m 56s


1400:	learn: 0.0501761	test: 0.3913041	best: 0.3913041 (1400)	total: 5m 40s	remaining: 10m 31s


1500:	learn: 0.0464105	test: 0.3836140	best: 0.3836140 (1500)	total: 6m 4s	remaining: 10m 6s


1600:	learn: 0.0431582	test: 0.3772631	best: 0.3772631 (1600)	total: 6m 28s	remaining: 9m 41s


1700:	learn: 0.0403533	test: 0.3716979	best: 0.3716979 (1700)	total: 6m 51s	remaining: 9m 16s


1800:	learn: 0.0378418	test: 0.3667826	best: 0.3667826 (1800)	total: 7m 15s	remaining: 8m 52s


1900:	learn: 0.0353836	test: 0.3620776	best: 0.3620776 (1900)	total: 7m 39s	remaining: 8m 27s


2000:	learn: 0.0334809	test: 0.3579602	best: 0.3579602 (2000)	total: 8m 3s	remaining: 8m 3s


2100:	learn: 0.0315852	test: 0.3541687	best: 0.3541687 (2100)	total: 8m 27s	remaining: 7m 38s


2200:	learn: 0.0299228	test: 0.3508575	best: 0.3508575 (2200)	total: 8m 51s	remaining: 7m 14s


2300:	learn: 0.0284468	test: 0.3480874	best: 0.3480874 (2300)	total: 9m 15s	remaining: 6m 49s


2400:	learn: 0.0270961	test: 0.3452448	best: 0.3452448 (2400)	total: 9m 38s	remaining: 6m 25s


2500:	learn: 0.0257562	test: 0.3425088	best: 0.3425088 (2500)	total: 10m 2s	remaining: 6m 1s


2600:	learn: 0.0246089	test: 0.3403039	best: 0.3403039 (2600)	total: 10m 26s	remaining: 5m 37s


2700:	learn: 0.0234915	test: 0.3383154	best: 0.3383154 (2700)	total: 10m 50s	remaining: 5m 12s


2800:	learn: 0.0224288	test: 0.3362533	best: 0.3362533 (2800)	total: 11m 14s	remaining: 4m 48s


2900:	learn: 0.0214471	test: 0.3342981	best: 0.3342981 (2900)	total: 11m 38s	remaining: 4m 24s


3000:	learn: 0.0205672	test: 0.3327747	best: 0.3327747 (3000)	total: 12m 1s	remaining: 4m


3100:	learn: 0.0197147	test: 0.3311085	best: 0.3311085 (3100)	total: 12m 25s	remaining: 3m 36s


3200:	learn: 0.0189339	test: 0.3296088	best: 0.3296088 (3200)	total: 12m 49s	remaining: 3m 12s


3300:	learn: 0.0182114	test: 0.3282727	best: 0.3282727 (3300)	total: 13m 13s	remaining: 2m 47s


3400:	learn: 0.0175013	test: 0.3268050	best: 0.3268050 (3400)	total: 13m 37s	remaining: 2m 23s


3500:	learn: 0.0168963	test: 0.3255282	best: 0.3255282 (3500)	total: 14m	remaining: 1m 59s


3600:	learn: 0.0163134	test: 0.3242678	best: 0.3242678 (3600)	total: 14m 24s	remaining: 1m 35s


3700:	learn: 0.0157096	test: 0.3230073	best: 0.3230041 (3699)	total: 14m 48s	remaining: 1m 11s


3800:	learn: 0.0151318	test: 0.3221693	best: 0.3221693 (3800)	total: 15m 12s	remaining: 47.8s


3900:	learn: 0.0146197	test: 0.3212543	best: 0.3212543 (3900)	total: 15m 36s	remaining: 23.8s


3999:	learn: 0.0141087	test: 0.3202681	best: 0.3202681 (3999)	total: 16m	remaining: 0us

bestTest = 0.3202681163
bestIteration = 3999



CB Fold 2 acc: 0.8954 | elapsed 961.4s
CB Fold 3 | train 50819 val 13254


0:	learn: 2.3271926	test: 2.3776545	best: 2.3776545 (0)	total: 242ms	remaining: 16m 6s


100:	learn: 0.6960294	test: 1.1453798	best: 1.1453798 (100)	total: 24.9s	remaining: 16m 2s


200:	learn: 0.4213800	test: 0.9030178	best: 0.9030178 (200)	total: 49.6s	remaining: 15m 36s


300:	learn: 0.2888599	test: 0.7640228	best: 0.7640228 (300)	total: 1m 14s	remaining: 15m 9s


400:	learn: 0.2137409	test: 0.6735064	best: 0.6735064 (400)	total: 1m 38s	remaining: 14m 42s


500:	learn: 0.1633029	test: 0.6055639	best: 0.6055639 (500)	total: 2m 2s	remaining: 14m 16s


600:	learn: 0.1309825	test: 0.5567421	best: 0.5567421 (600)	total: 2m 26s	remaining: 13m 50s


700:	learn: 0.1085605	test: 0.5205854	best: 0.5205854 (700)	total: 2m 50s	remaining: 13m 23s


800:	learn: 0.0920992	test: 0.4941426	best: 0.4941426 (800)	total: 3m 14s	remaining: 12m 58s


900:	learn: 0.0799258	test: 0.4727006	best: 0.4727006 (900)	total: 3m 38s	remaining: 12m 32s


1000:	learn: 0.0708586	test: 0.4564972	best: 0.4564972 (1000)	total: 4m 2s	remaining: 12m 6s


1100:	learn: 0.0639236	test: 0.4435881	best: 0.4435881 (1100)	total: 4m 26s	remaining: 11m 41s


1200:	learn: 0.0576446	test: 0.4318655	best: 0.4318655 (1200)	total: 4m 50s	remaining: 11m 16s


1300:	learn: 0.0524606	test: 0.4226319	best: 0.4226319 (1300)	total: 5m 13s	remaining: 10m 51s


1400:	learn: 0.0485733	test: 0.4155394	best: 0.4155394 (1400)	total: 5m 37s	remaining: 10m 26s


1500:	learn: 0.0451823	test: 0.4094472	best: 0.4094472 (1500)	total: 6m 1s	remaining: 10m 1s


1600:	learn: 0.0421804	test: 0.4039078	best: 0.4039078 (1600)	total: 6m 24s	remaining: 9m 36s


1700:	learn: 0.0392520	test: 0.3985592	best: 0.3985592 (1700)	total: 6m 48s	remaining: 9m 12s


1800:	learn: 0.0368595	test: 0.3939059	best: 0.3939059 (1800)	total: 7m 12s	remaining: 8m 47s


1900:	learn: 0.0346366	test: 0.3897957	best: 0.3897957 (1900)	total: 7m 35s	remaining: 8m 23s


2000:	learn: 0.0326333	test: 0.3859608	best: 0.3859608 (2000)	total: 7m 59s	remaining: 7m 59s


2100:	learn: 0.0309505	test: 0.3828778	best: 0.3828778 (2100)	total: 8m 23s	remaining: 7m 34s


2200:	learn: 0.0294659	test: 0.3800981	best: 0.3800981 (2200)	total: 8m 46s	remaining: 7m 10s


2300:	learn: 0.0279634	test: 0.3773269	best: 0.3773269 (2300)	total: 9m 10s	remaining: 6m 46s


2400:	learn: 0.0265859	test: 0.3748180	best: 0.3748180 (2400)	total: 9m 34s	remaining: 6m 22s


2500:	learn: 0.0252216	test: 0.3727529	best: 0.3727529 (2500)	total: 9m 57s	remaining: 5m 58s


2600:	learn: 0.0240096	test: 0.3706941	best: 0.3706941 (2600)	total: 10m 21s	remaining: 5m 34s


2700:	learn: 0.0229653	test: 0.3688104	best: 0.3688104 (2700)	total: 10m 45s	remaining: 5m 10s


2800:	learn: 0.0219878	test: 0.3673083	best: 0.3673083 (2800)	total: 11m 8s	remaining: 4m 46s


2900:	learn: 0.0210545	test: 0.3658638	best: 0.3658609 (2899)	total: 11m 32s	remaining: 4m 22s


3000:	learn: 0.0202456	test: 0.3644087	best: 0.3644087 (3000)	total: 11m 55s	remaining: 3m 58s


3100:	learn: 0.0194535	test: 0.3632283	best: 0.3632283 (3100)	total: 12m 19s	remaining: 3m 34s


3200:	learn: 0.0187420	test: 0.3620000	best: 0.3620000 (3200)	total: 12m 43s	remaining: 3m 10s


3300:	learn: 0.0179839	test: 0.3608879	best: 0.3608879 (3300)	total: 13m 6s	remaining: 2m 46s


3400:	learn: 0.0173328	test: 0.3597441	best: 0.3597441 (3400)	total: 13m 30s	remaining: 2m 22s


3500:	learn: 0.0167255	test: 0.3588567	best: 0.3588567 (3500)	total: 13m 53s	remaining: 1m 58s


3600:	learn: 0.0160669	test: 0.3579155	best: 0.3579155 (3600)	total: 14m 17s	remaining: 1m 34s


3700:	learn: 0.0154602	test: 0.3569888	best: 0.3569888 (3700)	total: 14m 40s	remaining: 1m 11s


3800:	learn: 0.0149648	test: 0.3564739	best: 0.3564677 (3799)	total: 15m 4s	remaining: 47.3s


3900:	learn: 0.0144493	test: 0.3555559	best: 0.3555337 (3899)	total: 15m 27s	remaining: 23.5s


3999:	learn: 0.0139592	test: 0.3545951	best: 0.3545951 (3999)	total: 15m 51s	remaining: 0us

bestTest = 0.354595149
bestIteration = 3999



CB Fold 3 acc: 0.8858 | elapsed 952.5s
CB Fold 4 | train 50602 val 13471


0:	learn: 2.3267109	test: 2.3755326	best: 2.3755326 (0)	total: 245ms	remaining: 16m 19s


100:	learn: 0.7006199	test: 1.1549268	best: 1.1549268 (100)	total: 24.8s	remaining: 15m 56s


200:	learn: 0.4215549	test: 0.9112460	best: 0.9112460 (200)	total: 49.4s	remaining: 15m 33s


300:	learn: 0.2903616	test: 0.7760815	best: 0.7760815 (300)	total: 1m 13s	remaining: 15m 5s


400:	learn: 0.2132787	test: 0.6836565	best: 0.6836565 (400)	total: 1m 37s	remaining: 14m 38s


500:	learn: 0.1640590	test: 0.6168159	best: 0.6168159 (500)	total: 2m 2s	remaining: 14m 12s


600:	learn: 0.1318998	test: 0.5675788	best: 0.5675788 (600)	total: 2m 26s	remaining: 13m 46s


700:	learn: 0.1086745	test: 0.5294547	best: 0.5294547 (700)	total: 2m 50s	remaining: 13m 20s


800:	learn: 0.0916792	test: 0.5002789	best: 0.5002789 (800)	total: 3m 13s	remaining: 12m 54s


900:	learn: 0.0794245	test: 0.4784651	best: 0.4784651 (900)	total: 3m 37s	remaining: 12m 29s


1000:	learn: 0.0705381	test: 0.4619954	best: 0.4619954 (1000)	total: 4m 1s	remaining: 12m 3s


1100:	learn: 0.0637106	test: 0.4486748	best: 0.4486748 (1100)	total: 4m 25s	remaining: 11m 38s


1200:	learn: 0.0577351	test: 0.4370898	best: 0.4370898 (1200)	total: 4m 48s	remaining: 11m 13s


1300:	learn: 0.0523429	test: 0.4264635	best: 0.4264635 (1300)	total: 5m 12s	remaining: 10m 48s


1400:	learn: 0.0481001	test: 0.4178567	best: 0.4178567 (1400)	total: 5m 36s	remaining: 10m 23s


1500:	learn: 0.0446008	test: 0.4110411	best: 0.4110411 (1500)	total: 5m 59s	remaining: 9m 59s


1600:	learn: 0.0415540	test: 0.4050983	best: 0.4050983 (1600)	total: 6m 23s	remaining: 9m 34s


1700:	learn: 0.0387012	test: 0.3994924	best: 0.3994924 (1700)	total: 6m 47s	remaining: 9m 10s


1800:	learn: 0.0360483	test: 0.3942555	best: 0.3942555 (1800)	total: 7m 10s	remaining: 8m 45s


1900:	learn: 0.0338500	test: 0.3900269	best: 0.3900269 (1900)	total: 7m 34s	remaining: 8m 21s


2000:	learn: 0.0319852	test: 0.3862966	best: 0.3862962 (1999)	total: 7m 57s	remaining: 7m 57s


2100:	learn: 0.0303472	test: 0.3834340	best: 0.3834340 (2100)	total: 8m 21s	remaining: 7m 33s


2200:	learn: 0.0287513	test: 0.3800912	best: 0.3800912 (2200)	total: 8m 44s	remaining: 7m 8s


2300:	learn: 0.0272852	test: 0.3770562	best: 0.3770562 (2300)	total: 9m 8s	remaining: 6m 44s


2400:	learn: 0.0259772	test: 0.3746139	best: 0.3746139 (2400)	total: 9m 31s	remaining: 6m 20s


2500:	learn: 0.0247799	test: 0.3723426	best: 0.3723426 (2500)	total: 9m 55s	remaining: 5m 56s


2600:	learn: 0.0236496	test: 0.3700972	best: 0.3700972 (2600)	total: 10m 18s	remaining: 5m 32s


2700:	learn: 0.0225428	test: 0.3679053	best: 0.3679053 (2700)	total: 10m 42s	remaining: 5m 9s


2800:	learn: 0.0214974	test: 0.3660534	best: 0.3660525 (2799)	total: 11m 6s	remaining: 4m 45s


2900:	learn: 0.0206231	test: 0.3645403	best: 0.3645403 (2900)	total: 11m 29s	remaining: 4m 21s


3000:	learn: 0.0197952	test: 0.3630944	best: 0.3630944 (3000)	total: 11m 53s	remaining: 3m 57s


3100:	learn: 0.0189550	test: 0.3616646	best: 0.3616631 (3099)	total: 12m 16s	remaining: 3m 33s


3200:	learn: 0.0182131	test: 0.3603911	best: 0.3603845 (3199)	total: 12m 40s	remaining: 3m 9s


3300:	learn: 0.0174958	test: 0.3590287	best: 0.3590287 (3300)	total: 13m 3s	remaining: 2m 46s


3400:	learn: 0.0168547	test: 0.3579352	best: 0.3579352 (3400)	total: 13m 27s	remaining: 2m 22s


3500:	learn: 0.0162005	test: 0.3567097	best: 0.3567097 (3500)	total: 13m 51s	remaining: 1m 58s


3600:	learn: 0.0156709	test: 0.3560293	best: 0.3560293 (3600)	total: 14m 14s	remaining: 1m 34s


3700:	learn: 0.0151157	test: 0.3551919	best: 0.3551817 (3693)	total: 14m 38s	remaining: 1m 10s


3800:	learn: 0.0145841	test: 0.3544344	best: 0.3544344 (3800)	total: 15m 1s	remaining: 47.2s


3900:	learn: 0.0140934	test: 0.3536008	best: 0.3536008 (3900)	total: 15m 25s	remaining: 23.5s


3999:	learn: 0.0135819	test: 0.3527502	best: 0.3527421 (3998)	total: 15m 48s	remaining: 0us

bestTest = 0.3527421281
bestIteration = 3998

Shrink model to first 3999 iterations.


CB Fold 4 acc: 0.8853 | elapsed 949.7s
CatBoost OOF accuracy: 0.8900 | total 4782.1s
Saved CatBoost preds (small TTA) to test_pred_pooled_cat_tta50.npy.
Quick XGB+CB alpha=0.5 OOF acc: 0.89503
Saved quick blended submission.csv with shape: (6473, 2)


In [13]:
# Blending: Alpha sweep (XGB+CB) and LogisticRegression stacker on OOF probs
import os, time, numpy as np, pandas as pd
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

paths_ok = all([
    os.path.exists('oof_pooled.npy'),
    os.path.exists('test_pred_pooled.npy'),
    os.path.exists('oof_pooled_cat.npy'),
    os.path.exists('test_pred_pooled_cat.npy'),
    os.path.exists('y_train_pooled.npy'),
    os.path.exists('groups_pooled.npy'),
    os.path.exists('test_fnames_pooled.csv')
])

if not paths_ok:
    print('Required files not found yet. Ensure XGB (Cell 7) and CatBoost (Cell 8) have finished.')
else:
    y = np.load('y_train_pooled.npy')
    groups = np.load('groups_pooled.npy')
    oof_xgb = np.load('oof_pooled.npy')
    test_xgb = np.load('test_pred_pooled.npy')
    oof_cb = np.load('oof_pooled_cat.npy')
    test_cb = np.load('test_pred_pooled_cat.npy')
    test_fnames = pd.read_csv('test_fnames_pooled.csv', header=None)[0].values

    # 1) Alpha sweep
    best_alpha, best_acc = None, -1.0
    for alpha in np.linspace(0.0, 1.0, 21):
        oof_blend = (1 - alpha) * oof_xgb + alpha * oof_cb
        acc = accuracy_score(y, oof_blend.argmax(1))
        print(f'Alpha {alpha:.2f} -> OOF acc {acc:.5f}')
        if acc > best_acc:
            best_acc, best_alpha = acc, alpha
    print(f'Best alpha: {best_alpha:.2f} | OOF acc: {best_acc:.5f}')
    test_blend_alpha = (1 - best_alpha) * test_xgb + best_alpha * test_cb
    np.save('test_pred_blend_alpha.npy', test_blend_alpha)

    # 2) LogisticRegression stacker on probs with SGKF
    X_meta = np.concatenate([oof_xgb, oof_cb], axis=1)  # (N, 24)
    X_test_meta = np.concatenate([test_xgb, test_cb], axis=1)  # (T, 24)
    cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
    oof_stack = np.zeros_like(oof_xgb, dtype=np.float32)
    test_stack = np.zeros_like(test_xgb, dtype=np.float32)
    for fold, (tr_idx, va_idx) in enumerate(cv.split(X_meta, y, groups)):
        t0 = time.time()
        X_tr, X_va = X_meta[tr_idx], X_meta[va_idx]
        y_tr = y[tr_idx]
        clf = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=2000, n_jobs=-1, C=1.0, random_state=42)
        clf.fit(X_tr, y_tr)
        oof_stack[va_idx] = clf.predict_proba(X_va)
        test_stack += clf.predict_proba(X_test_meta) / cv.n_splits
        print(f'Stacker fold {fold} done in {time.time()-t0:.1f}s')
    oof_acc_stack = accuracy_score(y, oof_stack.argmax(1))
    print(f'LR stacker OOF acc: {oof_acc_stack:.5f}')
    np.save('oof_blend_stack.npy', oof_stack)
    np.save('test_pred_blend_stack.npy', test_stack)

    # Choose best of alpha vs stacker
    use_stacker = oof_acc_stack > best_acc
    final_test = test_stack if use_stacker else test_blend_alpha
    choice = 'stacker' if use_stacker else f'alpha={best_alpha:.2f}'
    final_oof_acc = oof_acc_stack if use_stacker else best_acc
    print(f'Final choice: {choice} | OOF acc: {final_oof_acc:.5f}')

    # Build submission
    pred_idx = final_test.argmax(1)
    labels = [CLASSES[i] for i in pred_idx]
    sub = pd.DataFrame({'fname': test_fnames, 'label': labels})
    sub.to_csv('submission.csv', index=False)
    print('Saved submission.csv', sub.shape)

Alpha 0.00 -> OOF acc 0.83800
Alpha 0.05 -> OOF acc 0.83809
Alpha 0.10 -> OOF acc 0.83820
Alpha 0.15 -> OOF acc 0.83812
Alpha 0.20 -> OOF acc 0.83758
Alpha 0.25 -> OOF acc 0.83798
Alpha 0.30 -> OOF acc 0.83751
Alpha 0.35 -> OOF acc 0.83798
Alpha 0.40 -> OOF acc 0.83728
Alpha 0.45 -> OOF acc 0.83641
Alpha 0.50 -> OOF acc 0.83517
Alpha 0.55 -> OOF acc 0.83378
Alpha 0.60 -> OOF acc 0.83205
Alpha 0.65 -> OOF acc 0.82974
Alpha 0.70 -> OOF acc 0.82723
Alpha 0.75 -> OOF acc 0.82425
Alpha 0.80 -> OOF acc 0.82019
Alpha 0.85 -> OOF acc 0.81555
Alpha 0.90 -> OOF acc 0.81098
Alpha 0.95 -> OOF acc 0.80550
Alpha 1.00 -> OOF acc 0.79939
Best alpha: 0.10 | OOF acc: 0.83820




Stacker fold 0 done in 2.6s




Stacker fold 1 done in 2.5s




Stacker fold 2 done in 2.5s




Stacker fold 3 done in 2.5s




Stacker fold 4 done in 2.5s
LR stacker OOF acc: 0.83659
Final choice: alpha=0.10 | OOF acc: 0.83820
Saved submission.csv (6473, 2)


In [28]:
# LightGBM on pooled features with SGKF, weights, clipping, 5-shift TTA
import os, sys, subprocess, time, numpy as np, pandas as pd
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.utils.class_weight import compute_sample_weight

try:
    import lightgbm as lgb
except Exception as e:
    print('Installing lightgbm...', e)
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'lightgbm==4.6.0'], check=True)
    import lightgbm as lgb

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

train_feat = 'X_train_pooled.npy'
train_y = 'y_train_pooled.npy'
train_groups = 'groups_pooled.npy'
# Use small-shift TTA features for test
test_feat_tta = 'X_test_pooled_tta_small.npy'
test_fnames_csv = 'test_fnames_pooled_small.csv'

assert os.path.exists(train_feat) and os.path.exists(train_y) and os.path.exists(train_groups), 'Missing train pooled features'
assert os.path.exists(test_feat_tta) and os.path.exists(test_fnames_csv), 'Missing test pooled TTA features'

X = np.load(train_feat)
y = np.load(train_y)
groups = np.load(train_groups)
X_test_tta = np.load(test_feat_tta)  # [n_shifts, N, D]
test_fnames = pd.read_csv(test_fnames_csv, header=None)[0].values
n_shifts, n_test, D = X_test_tta.shape
print('Shapes:', X.shape, y.shape, groups.shape, X_test_tta.shape)

params = dict(
    objective='multiclass',
    num_class=num_class,
    metric='multi_logloss',
    learning_rate=0.05,
    num_leaves=63,
    max_depth=7,
    feature_fraction=0.8,
    bagging_fraction=0.8,
    bagging_freq=1,
    min_data_in_leaf=30,
    lambda_l1=0.0,
    lambda_l2=1.0,
    n_jobs=-1,
    verbosity=-1,
    seed=42
)

cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
oof_lgb = np.zeros((len(y), num_class), dtype=np.float32)
test_lgb = np.zeros((n_test, num_class), dtype=np.float32)
start = time.time()

for fold, (tr_idx, va_idx) in enumerate(cv.split(X, y, groups)):
    t0 = time.time()
    print(f'LGB Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
    scaler = StandardScaler(with_mean=True, with_std=True)
    X_tr = scaler.fit_transform(X[tr_idx])
    X_va = scaler.transform(X[va_idx])
    X_tr = np.clip(X_tr, -5, 5)
    X_va = np.clip(X_va, -5, 5)
    tr_weights = compute_sample_weight('balanced', y=y[tr_idx]).astype(np.float32)
    lgb_tr = lgb.Dataset(X_tr, label=y[tr_idx], weight=tr_weights, free_raw_data=False)
    lgb_va = lgb.Dataset(X_va, label=y[va_idx], reference=lgb_tr, free_raw_data=False)
    model = lgb.train(
        params,
        lgb_tr,
        num_boost_round=2000,
        valid_sets=[lgb_tr, lgb_va],
        valid_names=['train','valid'],
        callbacks=[
            lgb.early_stopping(stopping_rounds=100, verbose=False),
            lgb.log_evaluation(period=100)
        ]
    )
    best_it = getattr(model, 'best_iteration', None)
    oof_lgb[va_idx] = model.predict(X_va, num_iteration=best_it)
    va_acc = accuracy_score(y[va_idx], oof_lgb[va_idx].argmax(1))
    print(f'LGB Fold {fold} acc: {va_acc:.4f} | elapsed {time.time()-t0:.1f}s')
    # Test TTA
    fold_test = np.zeros((n_test, num_class), dtype=np.float32)
    for s in range(n_shifts):
        X_te_s = scaler.transform(X_test_tta[s])
        X_te_s = np.clip(X_te_s, -5, 5)
        fold_test += model.predict(X_te_s, num_iteration=best_it) / n_shifts
    test_lgb += fold_test / cv.n_splits

oof_acc_lgb = accuracy_score(y, oof_lgb.argmax(1))
print(f'LightGBM OOF accuracy: {oof_acc_lgb:.4f} | total {time.time()-start:.1f}s')
np.save('oof_pooled_lgb.npy', oof_lgb)
# Save test preds with small-TTA suffix
np.save('test_pred_pooled_lgb_tta50.npy', test_lgb)
print('Saved LightGBM preds (small TTA) to test_pred_pooled_lgb_tta50.npy.')

Shapes: (64073, 662) (64073,) (64073,) (5, 6473, 662)


LGB Fold 0 | train 52005 val 12068


[100]	train's multi_logloss: 0.122379	valid's multi_logloss: 0.551333


[200]	train's multi_logloss: 0.031121	valid's multi_logloss: 0.368114


[300]	train's multi_logloss: 0.0112937	valid's multi_logloss: 0.327729


[400]	train's multi_logloss: 0.00518802	valid's multi_logloss: 0.318784


[500]	train's multi_logloss: 0.00287623	valid's multi_logloss: 0.317958


LGB Fold 0 acc: 0.8975 | elapsed 75.3s


LGB Fold 1 | train 51074 val 12999


[100]	train's multi_logloss: 0.115255	valid's multi_logloss: 0.593447


[200]	train's multi_logloss: 0.0286556	valid's multi_logloss: 0.423715


[300]	train's multi_logloss: 0.0103047	valid's multi_logloss: 0.393929


[400]	train's multi_logloss: 0.00474633	valid's multi_logloss: 0.392743


LGB Fold 1 acc: 0.8774 | elapsed 55.1s


LGB Fold 2 | train 51792 val 12281


[100]	train's multi_logloss: 0.121133	valid's multi_logloss: 0.567342


[200]	train's multi_logloss: 0.030187	valid's multi_logloss: 0.385576


[300]	train's multi_logloss: 0.0108877	valid's multi_logloss: 0.34619


[400]	train's multi_logloss: 0.00500143	valid's multi_logloss: 0.33793


[500]	train's multi_logloss: 0.00278309	valid's multi_logloss: 0.339121


LGB Fold 2 acc: 0.8913 | elapsed 68.9s


LGB Fold 3 | train 50819 val 13254


[100]	train's multi_logloss: 0.115546	valid's multi_logloss: 0.570206


[200]	train's multi_logloss: 0.0286827	valid's multi_logloss: 0.400599


[300]	train's multi_logloss: 0.010259	valid's multi_logloss: 0.367921


[400]	train's multi_logloss: 0.00472504	valid's multi_logloss: 0.364423


LGB Fold 3 acc: 0.8842 | elapsed 63.6s


LGB Fold 4 | train 50602 val 13471


[100]	train's multi_logloss: 0.115481	valid's multi_logloss: 0.588831


[200]	train's multi_logloss: 0.0284971	valid's multi_logloss: 0.411636


[300]	train's multi_logloss: 0.0102752	valid's multi_logloss: 0.377183


[400]	train's multi_logloss: 0.00472736	valid's multi_logloss: 0.37274


LGB Fold 4 acc: 0.8836 | elapsed 61.7s


LightGBM OOF accuracy: 0.8866 | total 330.3s
Saved LightGBM preds (small TTA) to test_pred_pooled_lgb_tta50.npy.


In [16]:
# Blend 3 models (XGB + CB + LGB): alpha sweep (coarse) and LR stacker
import os, time, numpy as np, pandas as pd
from itertools import product
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']

have_all = all([
    os.path.exists('oof_pooled.npy'),
    os.path.exists('oof_pooled_cat.npy'),
    os.path.exists('oof_pooled_lgb.npy'),
    os.path.exists('test_pred_pooled.npy'),
    os.path.exists('test_pred_pooled_cat.npy'),
    os.path.exists('test_pred_pooled_lgb.npy'),
    os.path.exists('y_train_pooled.npy'),
    os.path.exists('groups_pooled.npy'),
    os.path.exists('test_fnames_pooled.csv')
])

if not have_all:
    print('Predictions missing. Ensure XGB, CB, and LGB cells finished.')
else:
    y = np.load('y_train_pooled.npy')
    groups = np.load('groups_pooled.npy')
    o_x = np.load('oof_pooled.npy')
    o_c = np.load('oof_pooled_cat.npy')
    o_l = np.load('oof_pooled_lgb.npy')
    t_x = np.load('test_pred_pooled.npy')
    t_c = np.load('test_pred_pooled_cat.npy')
    t_l = np.load('test_pred_pooled_lgb.npy')
    test_fnames = pd.read_csv('test_fnames_pooled.csv', header=None)[0].values

    # 3-way coarse alpha sweep over simplex with step=0.1
    best_acc, best_w = -1.0, (1.0, 0.0, 0.0)
    grid = [i/10.0 for i in range(0, 11)]
    for ax, ac in product(grid, grid):
        if ax + ac <= 1.0:
            al = 1.0 - ax - ac
            o = ax*o_x + ac*o_c + al*o_l
            acc = accuracy_score(y, o.argmax(1))
            if acc > best_acc:
                best_acc, best_w = acc, (ax, ac, al)
    print(f'Coarse 3-way blend best OOF acc: {best_acc:.5f} | weights (XGB,CB,LGB)={best_w}')
    t_blend = best_w[0]*t_x + best_w[1]*t_c + best_w[2]*t_l
    np.save('test_pred_blend3_alpha.npy', t_blend)

    # LR stacker on concatenated probs (36 dims) with SGKF
    X_meta = np.concatenate([o_x, o_c, o_l], axis=1)
    X_test_meta = np.concatenate([t_x, t_c, t_l], axis=1)
    cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
    oof_stack = np.zeros_like(o_x, dtype=np.float32)
    test_stack = np.zeros_like(t_x, dtype=np.float32)
    for fold, (tr_idx, va_idx) in enumerate(cv.split(X_meta, y, groups)):
        t0 = time.time()
        clf = LogisticRegression(solver='lbfgs', max_iter=2000, n_jobs=-1, random_state=42, C=1.0)
        clf.fit(X_meta[tr_idx], y[tr_idx])
        oof_stack[va_idx] = clf.predict_proba(X_meta[va_idx])
        test_stack += clf.predict_proba(X_test_meta) / cv.n_splits
        print(f'Stacker fold {fold} done in {time.time()-t0:.1f}s')
    acc_stack = accuracy_score(y, oof_stack.argmax(1))
    print(f'LR stacker (3-model) OOF acc: {acc_stack:.5f}')
    np.save('oof_blend3_stack.npy', oof_stack)
    np.save('test_pred_blend3_stack.npy', test_stack)

    use_stack = acc_stack > best_acc
    final_pred = test_stack if use_stack else t_blend
    final_desc = 'LR stacker' if use_stack else f'alpha blend weights={best_w}'
    final_oof = acc_stack if use_stack else best_acc
    print(f'Final 3-model choice: {final_desc} | OOF acc: {final_oof:.5f}')

    # Save submission
    pred_idx = final_pred.argmax(1)
    labels = [CLASSES[i] for i in pred_idx]
    sub = pd.DataFrame({'fname': test_fnames, 'label': labels})
    sub.to_csv('submission.csv', index=False)
    print('Saved submission.csv', sub.shape)

Coarse 3-way blend best OOF acc: 0.83921 | weights (XGB,CB,LGB)=(0.4, 0.2, 0.39999999999999997)


Stacker fold 0 done in 2.8s


Stacker fold 1 done in 2.7s


Stacker fold 2 done in 2.7s


Stacker fold 3 done in 2.5s


Stacker fold 4 done in 2.5s
LR stacker (3-model) OOF acc: 0.83717
Final 3-model choice: alpha blend weights=(0.4, 0.2, 0.39999999999999997) | OOF acc: 0.83921
Saved submission.csv (6473, 2)


In [29]:
# LightGBM DART on pooled features with SGKF, weights, clipping, 5-shift TTA (diversity model)
import os, sys, subprocess, time, numpy as np, pandas as pd
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.utils.class_weight import compute_sample_weight

try:
    import lightgbm as lgb
except Exception as e:
    print('Installing lightgbm...', e)
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'lightgbm==4.6.0'], check=True)
    import lightgbm as lgb

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

train_feat = 'X_train_pooled.npy'
train_y = 'y_train_pooled.npy'
train_groups = 'groups_pooled.npy'
# Use small-shift TTA features for test
test_feat_tta = 'X_test_pooled_tta_small.npy'
test_fnames_csv = 'test_fnames_pooled_small.csv'

assert os.path.exists(train_feat) and os.path.exists(train_y) and os.path.exists(train_groups), 'Missing train pooled features'
assert os.path.exists(test_feat_tta) and os.path.exists(test_fnames_csv), 'Missing test pooled TTA features'

X = np.load(train_feat)
y = np.load(train_y)
groups = np.load(train_groups)
X_test_tta = np.load(test_feat_tta)  # [n_shifts, N, D]
test_fnames = pd.read_csv(test_fnames_csv, header=None)[0].values
n_shifts, n_test, D = X_test_tta.shape
print('Shapes:', X.shape, y.shape, groups.shape, X_test_tta.shape)

params = dict(
    objective='multiclass',
    num_class=num_class,
    metric='multi_logloss',
    boosting='dart',
    learning_rate=0.05,
    num_leaves=63,
    max_depth=7,
    feature_fraction=0.8,
    bagging_fraction=0.8,
    bagging_freq=1,
    drop_rate=0.1,
    max_drop=50,
    skip_drop=0.5,
    min_data_in_leaf=30,
    lambda_l2=1.0,
    n_jobs=-1,
    verbosity=-1,
    seed=42
)

cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
oof_lgb_dart = np.zeros((len(y), num_class), dtype=np.float32)
test_lgb_dart = np.zeros((n_test, num_class), dtype=np.float32)
start = time.time()

for fold, (tr_idx, va_idx) in enumerate(cv.split(X, y, groups)):
    t0 = time.time()
    print(f'LGB-DART Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
    scaler = StandardScaler(with_mean=True, with_std=True)
    X_tr = scaler.fit_transform(X[tr_idx])
    X_va = scaler.transform(X[va_idx])
    X_tr = np.clip(X_tr, -5, 5)
    X_va = np.clip(X_va, -5, 5)
    tr_weights = compute_sample_weight('balanced', y=y[tr_idx]).astype(np.float32)
    lgb_tr = lgb.Dataset(X_tr, label=y[tr_idx], weight=tr_weights, free_raw_data=False)
    lgb_va = lgb.Dataset(X_va, label=y[va_idx], reference=lgb_tr, free_raw_data=False)
    model = lgb.train(
        params,
        lgb_tr,
        num_boost_round=2000,
        valid_sets=[lgb_tr, lgb_va],
        valid_names=['train','valid'],
        callbacks=[
            lgb.early_stopping(stopping_rounds=100, verbose=False),
            lgb.log_evaluation(period=100)
        ]
    )
    best_it = getattr(model, 'best_iteration', None)
    oof_lgb_dart[va_idx] = model.predict(X_va, num_iteration=best_it)
    va_acc = accuracy_score(y[va_idx], oof_lgb_dart[va_idx].argmax(1))
    print(f'LGB-DART Fold {fold} acc: {va_acc:.4f} | elapsed {time.time()-t0:.1f}s')
    # Test TTA
    fold_test = np.zeros((n_test, num_class), dtype=np.float32)
    for s in range(n_shifts):
        X_te_s = scaler.transform(X_test_tta[s])
        X_te_s = np.clip(X_te_s, -5, 5)
        fold_test += model.predict(X_te_s, num_iteration=best_it) / n_shifts
    test_lgb_dart += fold_test / cv.n_splits

oof_acc_lgb_dart = accuracy_score(y, oof_lgb_dart.argmax(1))
print(f'LightGBM DART OOF accuracy: {oof_acc_lgb_dart:.4f} | total {time.time()-start:.1f}s')
np.save('oof_pooled_lgb_dart.npy', oof_lgb_dart)
np.save('test_pred_pooled_lgb_dart_tta50.npy', test_lgb_dart)
print('Saved LightGBM DART preds (small TTA) to test_pred_pooled_lgb_dart_tta50.npy.')

Shapes: (64073, 662) (64073,) (64073,) (5, 6473, 662)


LGB-DART Fold 0 | train 52005 val 12068




[100]	train's multi_logloss: 0.552972	valid's multi_logloss: 1.05579


[200]	train's multi_logloss: 0.406045	valid's multi_logloss: 0.89892


[300]	train's multi_logloss: 0.230637	valid's multi_logloss: 0.695785


[400]	train's multi_logloss: 0.172173	valid's multi_logloss: 0.609467


[500]	train's multi_logloss: 0.115533	valid's multi_logloss: 0.519555


[600]	train's multi_logloss: 0.0891536	valid's multi_logloss: 0.470852


[700]	train's multi_logloss: 0.0543709	valid's multi_logloss: 0.403377


[800]	train's multi_logloss: 0.0355838	valid's multi_logloss: 0.363919


[900]	train's multi_logloss: 0.0242916	valid's multi_logloss: 0.34036


[1000]	train's multi_logloss: 0.0200863	valid's multi_logloss: 0.33199


[1100]	train's multi_logloss: 0.0140754	valid's multi_logloss: 0.319771


[1200]	train's multi_logloss: 0.0125469	valid's multi_logloss: 0.31553


[1300]	train's multi_logloss: 0.0106627	valid's multi_logloss: 0.311228


[1400]	train's multi_logloss: 0.00859035	valid's multi_logloss: 0.306734


[1500]	train's multi_logloss: 0.00785125	valid's multi_logloss: 0.304626


[1600]	train's multi_logloss: 0.005784	valid's multi_logloss: 0.302075


[1700]	train's multi_logloss: 0.00511294	valid's multi_logloss: 0.301232


[1800]	train's multi_logloss: 0.00488431	valid's multi_logloss: 0.300413


[1900]	train's multi_logloss: 0.00430674	valid's multi_logloss: 0.299089


[2000]	train's multi_logloss: 0.00368486	valid's multi_logloss: 0.298733


LGB-DART Fold 0 acc: 0.9032 | elapsed 410.6s


LGB-DART Fold 1 | train 51074 val 12999




[100]	train's multi_logloss: 0.538793	valid's multi_logloss: 1.07971


[200]	train's multi_logloss: 0.392718	valid's multi_logloss: 0.93031


[300]	train's multi_logloss: 0.220396	valid's multi_logloss: 0.731963


[400]	train's multi_logloss: 0.162706	valid's multi_logloss: 0.644658


[500]	train's multi_logloss: 0.108478	valid's multi_logloss: 0.559178


[600]	train's multi_logloss: 0.0831296	valid's multi_logloss: 0.512584


[700]	train's multi_logloss: 0.0500785	valid's multi_logloss: 0.450655


[800]	train's multi_logloss: 0.0327931	valid's multi_logloss: 0.415508


[900]	train's multi_logloss: 0.0222264	valid's multi_logloss: 0.39664


[1000]	train's multi_logloss: 0.0184305	valid's multi_logloss: 0.389162


[1100]	train's multi_logloss: 0.0129233	valid's multi_logloss: 0.381318


[1200]	train's multi_logloss: 0.0115378	valid's multi_logloss: 0.378058


[1300]	train's multi_logloss: 0.00988028	valid's multi_logloss: 0.375464


[1400]	train's multi_logloss: 0.00793044	valid's multi_logloss: 0.373113


[1500]	train's multi_logloss: 0.00727976	valid's multi_logloss: 0.371638


[1600]	train's multi_logloss: 0.00537289	valid's multi_logloss: 0.37182


[1700]	train's multi_logloss: 0.00475554	valid's multi_logloss: 0.371292


[1800]	train's multi_logloss: 0.00456312	valid's multi_logloss: 0.370146


[1900]	train's multi_logloss: 0.00402555	valid's multi_logloss: 0.369703


[2000]	train's multi_logloss: 0.00345813	valid's multi_logloss: 0.37004


LGB-DART Fold 1 acc: 0.8868 | elapsed 399.7s


LGB-DART Fold 2 | train 51792 val 12281




[100]	train's multi_logloss: 0.550818	valid's multi_logloss: 1.05999


[200]	train's multi_logloss: 0.403782	valid's multi_logloss: 0.910084


[300]	train's multi_logloss: 0.228138	valid's multi_logloss: 0.706133


[400]	train's multi_logloss: 0.169978	valid's multi_logloss: 0.62186


[500]	train's multi_logloss: 0.114061	valid's multi_logloss: 0.532381


[600]	train's multi_logloss: 0.087686	valid's multi_logloss: 0.484715


[700]	train's multi_logloss: 0.0533129	valid's multi_logloss: 0.417903


[800]	train's multi_logloss: 0.0348068	valid's multi_logloss: 0.38009


[900]	train's multi_logloss: 0.0234339	valid's multi_logloss: 0.355377


[1000]	train's multi_logloss: 0.019385	valid's multi_logloss: 0.346128


[1100]	train's multi_logloss: 0.0135883	valid's multi_logloss: 0.334664


[1200]	train's multi_logloss: 0.0121347	valid's multi_logloss: 0.330085


[1300]	train's multi_logloss: 0.0103593	valid's multi_logloss: 0.326584


[1400]	train's multi_logloss: 0.00836155	valid's multi_logloss: 0.323002


[1500]	train's multi_logloss: 0.00763734	valid's multi_logloss: 0.32072


[1600]	train's multi_logloss: 0.00562723	valid's multi_logloss: 0.318087


[1700]	train's multi_logloss: 0.00497068	valid's multi_logloss: 0.317648


[1800]	train's multi_logloss: 0.00476862	valid's multi_logloss: 0.316904


[1900]	train's multi_logloss: 0.00421681	valid's multi_logloss: 0.316727


[2000]	train's multi_logloss: 0.00361343	valid's multi_logloss: 0.316707


LGB-DART Fold 2 acc: 0.8972 | elapsed 411.2s


LGB-DART Fold 3 | train 50819 val 13254




[100]	train's multi_logloss: 0.540987	valid's multi_logloss: 1.05844


[200]	train's multi_logloss: 0.394975	valid's multi_logloss: 0.912247


[300]	train's multi_logloss: 0.220728	valid's multi_logloss: 0.71242


[400]	train's multi_logloss: 0.163058	valid's multi_logloss: 0.627412


[500]	train's multi_logloss: 0.108582	valid's multi_logloss: 0.542639


[600]	train's multi_logloss: 0.0831684	valid's multi_logloss: 0.495684


[700]	train's multi_logloss: 0.0501806	valid's multi_logloss: 0.433629


[800]	train's multi_logloss: 0.0327617	valid's multi_logloss: 0.399172


[900]	train's multi_logloss: 0.022153	valid's multi_logloss: 0.379187


[1000]	train's multi_logloss: 0.0183027	valid's multi_logloss: 0.372078


[1100]	train's multi_logloss: 0.0127991	valid's multi_logloss: 0.362405


[1200]	train's multi_logloss: 0.011445	valid's multi_logloss: 0.359693


[1300]	train's multi_logloss: 0.00980841	valid's multi_logloss: 0.356181


[1400]	train's multi_logloss: 0.00791497	valid's multi_logloss: 0.353112


[1500]	train's multi_logloss: 0.00727089	valid's multi_logloss: 0.351977


[1600]	train's multi_logloss: 0.00536013	valid's multi_logloss: 0.350716


[1700]	train's multi_logloss: 0.00475219	valid's multi_logloss: 0.3498


[1800]	train's multi_logloss: 0.00455819	valid's multi_logloss: 0.348954


[1900]	train's multi_logloss: 0.00403938	valid's multi_logloss: 0.348688


[2000]	train's multi_logloss: 0.00345908	valid's multi_logloss: 0.349689


LGB-DART Fold 3 acc: 0.8890 | elapsed 406.9s


LGB-DART Fold 4 | train 50602 val 13471




[100]	train's multi_logloss: 0.541225	valid's multi_logloss: 1.07746


[200]	train's multi_logloss: 0.395541	valid's multi_logloss: 0.928802


[300]	train's multi_logloss: 0.222331	valid's multi_logloss: 0.728408


[400]	train's multi_logloss: 0.163913	valid's multi_logloss: 0.640884


[500]	train's multi_logloss: 0.109525	valid's multi_logloss: 0.554543


[600]	train's multi_logloss: 0.0836597	valid's multi_logloss: 0.505432


[700]	train's multi_logloss: 0.0505376	valid's multi_logloss: 0.442114


[800]	train's multi_logloss: 0.0330616	valid's multi_logloss: 0.405372


[900]	train's multi_logloss: 0.0223577	valid's multi_logloss: 0.38333


[1000]	train's multi_logloss: 0.0184745	valid's multi_logloss: 0.375388


[1100]	train's multi_logloss: 0.0129	valid's multi_logloss: 0.365807


[1200]	train's multi_logloss: 0.0115373	valid's multi_logloss: 0.363029


[1300]	train's multi_logloss: 0.00984394	valid's multi_logloss: 0.359683


[1400]	train's multi_logloss: 0.00792423	valid's multi_logloss: 0.356459


[1500]	train's multi_logloss: 0.00727304	valid's multi_logloss: 0.355035


[1600]	train's multi_logloss: 0.00536739	valid's multi_logloss: 0.353191


[1700]	train's multi_logloss: 0.00475491	valid's multi_logloss: 0.352928


[1800]	train's multi_logloss: 0.00456607	valid's multi_logloss: 0.352186


[1900]	train's multi_logloss: 0.00403693	valid's multi_logloss: 0.352421


[2000]	train's multi_logloss: 0.00346883	valid's multi_logloss: 0.353282


LGB-DART Fold 4 acc: 0.8891 | elapsed 405.6s


LightGBM DART OOF accuracy: 0.8928 | total 2095.2s
Saved LightGBM DART preds (small TTA) to test_pred_pooled_lgb_dart_tta50.npy.


In [34]:
# K-agnostic blend with per-class bias tuning and silence override; prefer small-TTA preds; exclude CB if not refreshed
import os, time, numpy as np, pandas as pd
from itertools import product
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)
EPS = 1e-8

def coarse_simplex_blend(oofs, tests, step=0.1):
    names = list(oofs.keys())
    K = len(names)
    assert K >= 2, 'Need at least two models to blend'
    M = int(round(1.0 / step))
    def gen_weights(K, M):
        if K == 1:
            yield np.array([M], dtype=float); return
        from itertools import product
        for parts in product(range(M+1), repeat=K-1):
            s = sum(parts)
            if s <= M:
                last = M - s
                yield np.array(list(parts) + [last], dtype=float)
    best = {'acc': -1.0, 'w': None, 'order': names}
    for w_int in gen_weights(K, M):
        w = w_int / M
        o = sum(w[i] * oofs[n] for i, n in enumerate(names))
        acc = accuracy_score(y, o.argmax(1))
        if acc > best['acc']:
            best = {'acc': acc, 'w': w.copy(), 'order': names}
    tb = sum(best['w'][i]*tests[n] for i,n in enumerate(best['order']))
    ob = sum(best['w'][i]*oofs[n] for i,n in enumerate(best['order']))
    return best, ob, tb

def tune_biases_greedy(log_oof, y, steps=(0.2, 0.1, 0.05), clamp=1.0, max_passes=3):
    b = np.zeros(log_oof.shape[1], dtype=np.float32)
    best_acc = accuracy_score(y, (log_oof + b).argmax(1))
    for step in steps:
        improved = True
        passes = 0
        while improved and passes < max_passes:
            improved = False
            passes += 1
            for c in range(log_oof.shape[1]):
                for delta in (-step, step):
                    old = b[c]
                    b[c] = np.clip(b[c] + delta, -clamp, clamp)
                    acc = accuracy_score(y, (log_oof + b).argmax(1))
                    if acc > best_acc + 1e-9:
                        best_acc = acc
                        improved = True
                    else:
                        b[c] = old
    return b, best_acc

# Load OOF and available test predictions; prefer refreshed small-TTA files and exclude CB unless refreshed
y = np.load('y_train_pooled.npy')
test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values

oofs = {}
tests = {}

# XGB seed1
if os.path.exists('oof_pooled.npy'):
    oofs['XGB'] = np.load('oof_pooled.npy')
    # prefer small-TTA test preds
    if os.path.exists('test_pred_pooled_tta50.npy'):
        tests['XGB'] = np.load('test_pred_pooled_tta50.npy')
    elif os.path.exists('test_pred_pooled.npy'):
        tests['XGB'] = np.load('test_pred_pooled.npy')

# LGB
if os.path.exists('oof_pooled_lgb.npy'):
    oofs['LGB'] = np.load('oof_pooled_lgb.npy')
    if os.path.exists('test_pred_pooled_lgb_tta50.npy'):
        tests['LGB'] = np.load('test_pred_pooled_lgb_tta50.npy')
    elif os.path.exists('test_pred_pooled_lgb.npy'):
        tests['LGB'] = np.load('test_pred_pooled_lgb.npy')

# LGB-DART
if os.path.exists('oof_pooled_lgb_dart.npy'):
    oofs['LGB_DART'] = np.load('oof_pooled_lgb_dart.npy')
    if os.path.exists('test_pred_pooled_lgb_dart_tta50.npy'):
        tests['LGB_DART'] = np.load('test_pred_pooled_lgb_dart_tta50.npy')
    elif os.path.exists('test_pred_pooled_lgb_dart.npy'):
        tests['LGB_DART'] = np.load('test_pred_pooled_lgb_dart.npy')

# XGB seed2
if os.path.exists('oof_pooled_xgb_seed2.npy'):
    oofs['XGB2'] = np.load('oof_pooled_xgb_seed2.npy')
    if os.path.exists('test_pred_pooled_xgb_seed2_tta50.npy'):
        tests['XGB2'] = np.load('test_pred_pooled_xgb_seed2_tta50.npy')
    elif os.path.exists('test_pred_pooled_xgb_seed2.npy'):
        tests['XGB2'] = np.load('test_pred_pooled_xgb_seed2.npy')

# CatBoost: include ONLY if refreshed small-TTA test preds exist (to avoid CV/LB mismatch)
if os.path.exists('oof_pooled_cat.npy') and os.path.exists('test_pred_pooled_cat_tta50.npy'):
    oofs['CB'] = np.load('oof_pooled_cat.npy')
    tests['CB'] = np.load('test_pred_pooled_cat_tta50.npy')

# Filter to models that have both OOF and TEST available
keys = [k for k in oofs.keys() if k in tests]
oofs = {k: oofs[k] for k in keys}
tests = {k: tests[k] for k in keys}
print('Models included in blend (with matching test preds):', keys)
if len(keys) < 2:
    raise RuntimeError('Need at least two models with matching test predictions to blend.')

# Coarse then fine blend search
best_coarse, oof_blend, test_blend = coarse_simplex_blend(oofs, tests, step=0.1)
print(f'Coarse blend best OOF acc: {best_coarse["acc"]:.5f} | weights {dict(zip(best_coarse["order"], best_coarse["w"]))}')
names = best_coarse['order']
w0 = best_coarse['w']
step = 0.05
grid = np.arange(-0.1, 0.1001, step)
best_acc = best_coarse['acc']
best_w = w0.copy()
if len(names) >= 4:
    for da0, da1, da2 in product(grid, grid, grid):
        a0 = np.clip(w0[0] + da0, 0, 1)
        a1 = np.clip(w0[1] + da1, 0, 1)
        a2 = np.clip(w0[2] + da2, 0, 1)
        s = a0 + a1 + a2
        if s <= 1.0:
            a_rest = 1.0 - s
            rest = np.array(w0[3:], dtype=float)
            if rest.sum() > 0:
                rest = rest / rest.sum() * a_rest
            else:
                rest = np.zeros_like(rest)
            w = np.concatenate([np.array([a0, a1, a2]), rest])
        else:
            continue
        o = sum(w[i]*oofs[n] for i,n in enumerate(names))
        acc = accuracy_score(y, o.argmax(1))
        if acc > best_acc:
            best_acc, best_w = acc, w.copy()
else:
    for da0, da1 in product(grid, grid):
        a0 = np.clip(w0[0] + da0, 0, 1)
        a1 = np.clip(w0[1] + da1, 0, 1)
        if len(names)==3:
            if a0 + a1 <= 1.0:
                a2 = 1.0 - a0 - a1
                w = np.array([a0,a1,a2])
            else:
                continue
        else:
            a1 = 1.0 - a0
            w = np.array([a0,a1])
        o = sum(w[i]*oofs[n] for i,n in enumerate(names))
        acc = accuracy_score(y, o.argmax(1))
        if acc > best_acc:
            best_acc, best_w = acc, w.copy()
oof_blend = sum(best_w[i]*oofs[n] for i,n in enumerate(names))
test_blend = sum(best_w[i]*tests[n] for i,n in enumerate(names))
print(f'Fine blend best OOF acc: {best_acc:.5f} | weights {dict(zip(names, best_w))}')

# Per-class bias tuning
log_oof = np.log(np.clip(oof_blend, EPS, 1.0))
b, acc_bias = tune_biases_greedy(log_oof, y, steps=(0.2,0.1,0.05), clamp=1.0, max_passes=3)
print(f'Bias-tuned OOF acc: {acc_bias:.5f} | biases:', np.round(b, 3))

# Apply biases
log_oof_b = log_oof + b
log_test = np.log(np.clip(test_blend, EPS, 1.0)) + b

# Silence override threshold sweep on OOF
silence_idx = CLASSES.index('silence')
best_override = {'acc': acc_bias, 'thr': None, 'count': 0}
try:
    X_tr_pool = np.load('X_train_pooled.npy')
    spec_desc_start = 256 + 120 + 240  # 616
    zcr_mean_idx = spec_desc_start + 6  # 622
    rms_mean_idx = spec_desc_start + 8  # 624
    zcr_mean_tr = X_tr_pool[:, zcr_mean_idx]
    rms_mean_tr = X_tr_pool[:, rms_mean_idx]
    top_prob_tr = np.exp(log_oof_b - log_oof_b.max(1, keepdims=True)).max(1)
    top_grid = [0.35, 0.40, 0.45]
    rms_grid = [0.008, 0.010, 0.012]
    zcr_grid = [0.04, 0.05, 0.06]
    base_pred_tr = log_oof_b.argmax(1).copy()
    for tp in top_grid:
        for rt in rms_grid:
            for zt in zcr_grid:
                mask = (top_prob_tr < tp) & (rms_mean_tr < rt) & (zcr_mean_tr < zt)
                pred_ovr = base_pred_tr.copy()
                pred_ovr[mask] = silence_idx
                acc = accuracy_score(y, pred_ovr)
                if acc > best_override['acc'] + 1e-9:
                    best_override = {'acc': acc, 'thr': (tp, rt, zt), 'count': int(mask.sum())}
    if best_override['thr'] is not None:
        print(f'Best silence override OOF acc: {best_override["acc"]:.5f} | thr (top,rms,zcr)={best_override["thr"]} | count={best_override["count"]}')
    else:
        print('Silence override sweep did not improve OOF.')
except Exception as e:
    print('Silence override sweep skipped due to error:', e)

# Final test predictions with silence override using SMALL-TTA pooled average if available
pred_idx = log_test.argmax(1)
applied = 0
tta_small = 'X_test_pooled_tta_small.npy'
tta_orig = 'X_test_pooled_tta.npy'
tta_path = tta_small if os.path.exists(tta_small) else (tta_orig if os.path.exists(tta_orig) else None)
if tta_path is not None:
    X_test_tta = np.load(tta_path)
    X_test_avg = X_test_tta.mean(axis=0)
    spec_desc_start = 256 + 120 + 240  # 616
    zcr_mean_idx = spec_desc_start + 6  # 622
    rms_mean_idx = spec_desc_start + 8  # 624
    zcr_mean = X_test_avg[:, zcr_mean_idx]
    rms_mean = X_test_avg[:, rms_mean_idx]
    top_prob = np.exp(log_test - log_test.max(1, keepdims=True)).max(1)
    if 'best_override' in locals() and best_override.get('thr') is not None:
        tp, rt, zt = best_override['thr']
    else:
        tp, rt, zt = 0.45, 0.012, 0.06
    mask = (top_prob < tp) & (rms_mean < rt) & (zcr_mean < zt)
    pred_idx[mask] = silence_idx
    applied = int(mask.sum())
    print(f'Silence override applied to {applied} samples (thr={tp},{rt},{zt}) using {tta_path}')
else:
    print('No test TTA feature file found; skipping silence override.')

labels = [CLASSES[i] for i in pred_idx]
sub = pd.DataFrame({'fname': test_fnames, 'label': labels})
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv', sub.shape)

Models included in blend (with matching test preds): ['XGB', 'LGB', 'LGB_DART', 'XGB2', 'CB']


Coarse blend best OOF acc: 0.89598 | weights {'XGB': 0.0, 'LGB': 0.0, 'LGB_DART': 0.4, 'XGB2': 0.1, 'CB': 0.5}


Fine blend best OOF acc: 0.89606 | weights {'XGB': 0.0, 'LGB': 0.0, 'LGB_DART': 0.45000000000000007, 'XGB2': 0.09166666666666666, 'CB': 0.4583333333333333}


Bias-tuned OOF acc: 0.89643 | biases: [-0.05  0.15 -0.3   0.    0.    0.   -0.2   0.    0.    0.2   0.    0.  ]
Silence override sweep did not improve OOF.
Silence override applied to 0 samples (thr=0.45,0.012,0.06) using X_test_pooled_tta_small.npy
Saved submission.csv (6473, 2)


In [31]:
# XGBoost seed2 on pooled features (diversity model) with SGKF, per-fold scaler, clipping, TTA; save oof/test
import os, time, numpy as np, pandas as pd, sys, subprocess
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.utils.class_weight import compute_sample_weight

try:
    import xgboost as xgb
except Exception as e:
    print('Installing xgboost...', e)
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--quiet', 'xgboost==2.1.1'], check=True)
    import xgboost as xgb

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

train_feat = 'X_train_pooled.npy'
train_y = 'y_train_pooled.npy'
train_groups = 'groups_pooled.npy'
# Use small-shift TTA features for test
test_feat_tta = 'X_test_pooled_tta_small.npy'
test_fnames_csv = 'test_fnames_pooled_small.csv'

assert os.path.exists(train_feat) and os.path.exists(train_y) and os.path.exists(train_groups), 'Missing train pooled features'
assert os.path.exists(test_feat_tta) and os.path.exists(test_fnames_csv), 'Missing test pooled TTA features'

X = np.load(train_feat)
y = np.load(train_y)
groups = np.load(train_groups)
X_test_tta = np.load(test_feat_tta)  # [n_shifts, N, D]
test_fnames = pd.read_csv(test_fnames_csv, header=None)[0].values
n_shifts, n_test, D = X_test_tta.shape
print('Shapes:', X.shape, y.shape, groups.shape, X_test_tta.shape)

# Slightly different seed/regularization for diversity
params = dict(
    objective='multi:softprob',
    num_class=num_class,
    tree_method='hist',
    max_bin=256,
    max_depth=7,
    eta=0.05,
    subsample=0.85,
    colsample_bytree=0.85,
    min_child_weight=2.5,
    reg_alpha=0.05,
    reg_lambda=1.2,
    eval_metric='mlogloss',
    n_jobs=-1,
    seed=202
)

cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=202)
oof2 = np.zeros((len(y), num_class), dtype=np.float32)
test2 = np.zeros((n_test, num_class), dtype=np.float32)
start = time.time()

for fold, (tr_idx, va_idx) in enumerate(cv.split(X, y, groups)):
    t0 = time.time()
    print(f'XGB seed2 Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
    scaler = StandardScaler(with_mean=True, with_std=True)
    X_tr = scaler.fit_transform(X[tr_idx])
    X_va = scaler.transform(X[va_idx])
    X_tr = np.clip(X_tr, -5, 5)
    X_va = np.clip(X_va, -5, 5)
    tr_weights = compute_sample_weight('balanced', y=y[tr_idx])
    dtr = xgb.DMatrix(X_tr, label=y[tr_idx], weight=tr_weights)
    dva = xgb.DMatrix(X_va, label=y[va_idx])
    model = xgb.train(params, dtr, num_boost_round=2000, evals=[(dtr,'train'),(dva,'valid')], early_stopping_rounds=100, verbose_eval=100)
    best_iter = getattr(model, 'best_iteration', None)
    if best_iter is None:
        try:
            best_iter = model.num_boosted_rounds() - 1
        except Exception:
            best_iter = None
    if best_iter is not None:
        oof2[va_idx] = model.predict(dva, iteration_range=(0, best_iter + 1))
    else:
        oof2[va_idx] = model.predict(dva)
    va_acc = accuracy_score(y[va_idx], oof2[va_idx].argmax(1))
    print(f'XGB seed2 Fold {fold} acc: {va_acc:.4f} | elapsed {time.time()-t0:.1f}s')
    # Test TTA
    fold_test = np.zeros((n_test, num_class), dtype=np.float32)
    for s in range(n_shifts):
        X_te_s = scaler.transform(X_test_tta[s])
        X_te_s = np.clip(X_te_s, -5, 5)
        dte = xgb.DMatrix(X_te_s)
        if best_iter is not None:
            fold_test += model.predict(dte, iteration_range=(0, best_iter + 1)) / n_shifts
        else:
            fold_test += model.predict(dte) / n_shifts
    test2 += fold_test / cv.n_splits

oof_acc2 = accuracy_score(y, oof2.argmax(1))
print(f'XGB seed2 OOF accuracy: {oof_acc2:.4f} | total {time.time()-start:.1f}s')
np.save('oof_pooled_xgb_seed2.npy', oof2)
np.save('test_pred_pooled_xgb_seed2_tta50.npy', test2)
print('Saved XGB seed2 preds (small TTA) to test_pred_pooled_xgb_seed2_tta50.npy.')

Shapes: (64073, 662) (64073,) (64073,) (5, 6473, 662)


XGB seed2 Fold 0 | train 51539 val 12534


[0]	train-mlogloss:2.32888	valid-mlogloss:2.38587


[100]	train-mlogloss:0.22855	valid-mlogloss:0.76153


[200]	train-mlogloss:0.07317	valid-mlogloss:0.49157


[300]	train-mlogloss:0.03421	valid-mlogloss:0.40128


[400]	train-mlogloss:0.01957	valid-mlogloss:0.36657


[500]	train-mlogloss:0.01279	valid-mlogloss:0.35096


[600]	train-mlogloss:0.00915	valid-mlogloss:0.34302


[700]	train-mlogloss:0.00702	valid-mlogloss:0.33897


[800]	train-mlogloss:0.00564	valid-mlogloss:0.33639


[900]	train-mlogloss:0.00470	valid-mlogloss:0.33519


[1000]	train-mlogloss:0.00402	valid-mlogloss:0.33443


[1100]	train-mlogloss:0.00352	valid-mlogloss:0.33388


[1200]	train-mlogloss:0.00313	valid-mlogloss:0.33399


[1257]	train-mlogloss:0.00295	valid-mlogloss:0.33384


XGB seed2 Fold 0 acc: 0.8954 | elapsed 432.9s


XGB seed2 Fold 1 | train 50895 val 13178


[0]	train-mlogloss:2.33118	valid-mlogloss:2.38386


[100]	train-mlogloss:0.22798	valid-mlogloss:0.74507


[200]	train-mlogloss:0.07151	valid-mlogloss:0.48788


[300]	train-mlogloss:0.03327	valid-mlogloss:0.40432


[400]	train-mlogloss:0.01906	valid-mlogloss:0.37187


[500]	train-mlogloss:0.01243	valid-mlogloss:0.35735


[600]	train-mlogloss:0.00893	valid-mlogloss:0.35006


[700]	train-mlogloss:0.00684	valid-mlogloss:0.34654


[800]	train-mlogloss:0.00551	valid-mlogloss:0.34462


[900]	train-mlogloss:0.00461	valid-mlogloss:0.34389


[1000]	train-mlogloss:0.00395	valid-mlogloss:0.34377


[1100]	train-mlogloss:0.00346	valid-mlogloss:0.34377


[1138]	train-mlogloss:0.00330	valid-mlogloss:0.34389


XGB seed2 Fold 1 acc: 0.8922 | elapsed 390.2s


XGB seed2 Fold 2 | train 51397 val 12676


[0]	train-mlogloss:2.32820	valid-mlogloss:2.38521


[100]	train-mlogloss:0.22810	valid-mlogloss:0.76751


[200]	train-mlogloss:0.07298	valid-mlogloss:0.49990


[300]	train-mlogloss:0.03407	valid-mlogloss:0.41101


[400]	train-mlogloss:0.01951	valid-mlogloss:0.37611


[500]	train-mlogloss:0.01277	valid-mlogloss:0.36055


[600]	train-mlogloss:0.00913	valid-mlogloss:0.35212


[700]	train-mlogloss:0.00701	valid-mlogloss:0.34811


[800]	train-mlogloss:0.00565	valid-mlogloss:0.34540


[900]	train-mlogloss:0.00470	valid-mlogloss:0.34366


[1000]	train-mlogloss:0.00404	valid-mlogloss:0.34259


[1100]	train-mlogloss:0.00353	valid-mlogloss:0.34220


[1200]	train-mlogloss:0.00313	valid-mlogloss:0.34195


[1300]	train-mlogloss:0.00282	valid-mlogloss:0.34210


[1302]	train-mlogloss:0.00282	valid-mlogloss:0.34211


XGB seed2 Fold 2 acc: 0.8924 | elapsed 434.9s


XGB seed2 Fold 3 | train 51925 val 12148


[0]	train-mlogloss:2.32952	valid-mlogloss:2.38556


[100]	train-mlogloss:0.23060	valid-mlogloss:0.75254


[200]	train-mlogloss:0.07420	valid-mlogloss:0.48058


[300]	train-mlogloss:0.03475	valid-mlogloss:0.39337


[400]	train-mlogloss:0.01995	valid-mlogloss:0.36122


[500]	train-mlogloss:0.01305	valid-mlogloss:0.34673


[600]	train-mlogloss:0.00935	valid-mlogloss:0.33968


[700]	train-mlogloss:0.00714	valid-mlogloss:0.33665


[800]	train-mlogloss:0.00573	valid-mlogloss:0.33471


[900]	train-mlogloss:0.00477	valid-mlogloss:0.33366


[1000]	train-mlogloss:0.00407	valid-mlogloss:0.33354


[1023]	train-mlogloss:0.00394	valid-mlogloss:0.33358


XGB seed2 Fold 3 acc: 0.8954 | elapsed 412.4s


XGB seed2 Fold 4 | train 50536 val 13537


[0]	train-mlogloss:2.32751	valid-mlogloss:2.38715


[100]	train-mlogloss:0.22346	valid-mlogloss:0.77904


[200]	train-mlogloss:0.06996	valid-mlogloss:0.52318


[300]	train-mlogloss:0.03265	valid-mlogloss:0.44509


[400]	train-mlogloss:0.01869	valid-mlogloss:0.41589


[500]	train-mlogloss:0.01223	valid-mlogloss:0.40459


[600]	train-mlogloss:0.00878	valid-mlogloss:0.39956


[700]	train-mlogloss:0.00675	valid-mlogloss:0.39716


[800]	train-mlogloss:0.00544	valid-mlogloss:0.39636


[900]	train-mlogloss:0.00454	valid-mlogloss:0.39609


[1000]	train-mlogloss:0.00389	valid-mlogloss:0.39654


[1013]	train-mlogloss:0.00382	valid-mlogloss:0.39679


XGB seed2 Fold 4 acc: 0.8789 | elapsed 351.9s


XGB seed2 OOF accuracy: 0.8906 | total 2027.8s
Saved XGB seed2 preds (small TTA) to test_pred_pooled_xgb_seed2_tta50.npy.


In [27]:
# Build small-shift test TTA pooled features only ([-50,-25,0,25,50] ms)
import time, numpy as np, pandas as pd, multiprocessing as mp
from joblib import Parallel, delayed
from pathlib import Path

SR = 16000

def build_test_tta_small(test_meta='test_meta.csv', out_path='X_test_pooled_tta_small.npy', fnames_out='test_fnames_pooled_small.csv', shifts_ms=None, n_jobs=None, seed=42):
    if shifts_ms is None:
        shifts_ms = [-50, -25, 0, 25, 50]
    shifts = [int(ms/1000.0 * SR) for ms in shifts_ms]
    if n_jobs is None:
        import multiprocessing as mp
        n_jobs = max(1, mp.cpu_count()-2)
    df_te = pd.read_csv(test_meta)
    fnames = df_te['fname'].tolist()
    print(f"[POOL-FE TEST SMALL] Test rows: {len(df_te)} | n_jobs={n_jobs} | shifts(ms)={shifts_ms}")
    def _proc(row, shift_samples):
        feats, _, _ = extract_feature_vector(row['path'], None, None, seed=seed, is_silence=False, shift_samples=shift_samples)
        return feats
    X_tta = []
    t0 = time.time()
    for s in shifts:
        t1 = time.time()
        feats_list = Parallel(n_jobs=n_jobs, backend='loky')(delayed(_proc)(row, s) for _, row in df_te.iterrows())
        X_s = np.stack(feats_list)
        X_tta.append(X_s)
        print(f"[POOL-FE TEST SMALL] shift {s} samples -> {X_s.shape} | elapsed {time.time()-t1:.1f}s")
    X_tta = np.stack(X_tta, axis=0)
    np.save(out_path, X_tta)
    pd.Series(fnames).to_csv(fnames_out, index=False, header=False)
    print(f"[POOL-FE TEST SMALL] Saved {out_path} {X_tta.shape} and {fnames_out} | total {time.time()-t0:.1f}s")

# Run
build_test_tta_small()

[POOL-FE TEST SMALL] Test rows: 6473 | n_jobs=34 | shifts(ms)=[-50, -25, 0, 25, 50]


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


[POOL-FE TEST SMALL] shift -800 samples -> (6473, 662) | elapsed 7.1s


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


[POOL-FE TEST SMALL] shift -400 samples -> (6473, 662) | elapsed 5.1s


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


[POOL-FE TEST SMALL] shift 0 samples -> (6473, 662) | elapsed 4.6s


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


[POOL-FE TEST SMALL] shift 400 samples -> (6473, 662) | elapsed 5.0s


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


  return pitch_tuning(
  return pitch_tuning(


[POOL-FE TEST SMALL] shift 800 samples -> (6473, 662) | elapsed 5.0s
[POOL-FE TEST SMALL] Saved X_test_pooled_tta_small.npy (5, 6473, 662) and test_fnames_pooled_small.csv | total 26.7s


In [35]:
# Robust power-weighted blend (no per-class biases, no silence override); small-TTA only
import os, numpy as np, pandas as pd
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']

# Load OOF and small-TTA test preds for all 5 models
oofs = {}
tests = {}
names_map = {
    'XGB': ('oof_pooled.npy', 'test_pred_pooled_tta50.npy'),
    'LGB': ('oof_pooled_lgb.npy', 'test_pred_pooled_lgb_tta50.npy'),
    'LGB_DART': ('oof_pooled_lgb_dart.npy', 'test_pred_pooled_lgb_dart_tta50.npy'),
    'XGB2': ('oof_pooled_xgb_seed2.npy', 'test_pred_pooled_xgb_seed2_tta50.npy'),
    'CB': ('oof_pooled_cat.npy', 'test_pred_pooled_cat_tta50.npy'),
}

for k, (oof_p, te_p) in names_map.items():
    if os.path.exists(oof_p) and os.path.exists(te_p):
        oofs[k] = np.load(oof_p)
        tests[k] = np.load(te_p)

models = list(oofs.keys())
if len(models) < 2:
    raise RuntimeError(f'Need at least 2 models; found {models}')
print('Models in robust blend:', models)

# Compute OOF acc per model
y = np.load('y_train_pooled.npy')
accs = {m: float(accuracy_score(y, oofs[m].argmax(1))) for m in models}
print('Per-model OOF acc:', accs)

# Power weights: w_i ∝ (acc_i)^4 (normalized)
pows = {m: accs[m]**4 for m in models}
w_sum = sum(pows.values())
weights = {m: (pows[m] / w_sum) for m in models}
print('Power-weights:', weights)

# OOF blend (for sanity) and test blend (linear prob space)
oof_blend = np.zeros_like(next(iter(oofs.values())))
for m in models:
    oof_blend += weights[m] * oofs[m]
oof_acc = accuracy_score(y, oof_blend.argmax(1))
print(f'Power-weighted OOF acc (no biases/override): {oof_acc:.5f}')

test_shape = next(iter(tests.values())).shape
test_blend = np.zeros(test_shape, dtype=np.float32)
for m in models:
    test_blend += weights[m] * tests[m]

# Build submission matching sample_submission order
test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
pred_idx = test_blend.argmax(1)
labels = [CLASSES[i] for i in pred_idx]
pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
sample_sub = pd.read_csv('sample_submission.csv')
sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
assert sub['label'].notna().all(), 'Missing predictions after merge'
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv', sub.shape)

Models in robust blend: ['XGB', 'LGB', 'LGB_DART', 'XGB2', 'CB']
Per-model OOF acc: {'XGB': 0.8903282193747757, 'LGB': 0.8865512774491595, 'LGB_DART': 0.8928409782591731, 'XGB2': 0.8906403633355704, 'CB': 0.890016075413981}
Power-weights: {'XGB': 0.20022114819211928, 'LGB': 0.19684520138881842, 'LGB_DART': 0.20249105870942533, 'XGB2': 0.2005020814282891, 'CB': 0.19994051028134788}
Power-weighted OOF acc (no biases/override): 0.89421
Saved submission.csv (6473, 2)


In [36]:
# Submission 2: Geometric (log-space) blend with coord descent (no biases/override)
import os, numpy as np, pandas as pd
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
EPS = 1e-9

# Load OOF and small-TTA test preds for all 5 models
names_map = {
    'XGB': ('oof_pooled.npy', 'test_pred_pooled_tta50.npy'),
    'LGB': ('oof_pooled_lgb.npy', 'test_pred_pooled_lgb_tta50.npy'),
    'LGB_DART': ('oof_pooled_lgb_dart.npy', 'test_pred_pooled_lgb_dart_tta50.npy'),
    'XGB2': ('oof_pooled_xgb_seed2.npy', 'test_pred_pooled_xgb_seed2_tta50.npy'),
    'CB': ('oof_pooled_cat.npy', 'test_pred_pooled_cat_tta50.npy'),
}
oofs, tests = {}, {}
for k, (oof_p, te_p) in names_map.items():
    if os.path.exists(oof_p) and os.path.exists(te_p):
        oofs[k] = np.load(oof_p)
        tests[k] = np.load(te_p)
models = list(oofs.keys())
assert len(models) >= 2, f'Need >=2 models, have {models}'
print('Models in log-space blend:', models)

y = np.load('y_train_pooled.npy')
accs = {m: float(accuracy_score(y, oofs[m].argmax(1))) for m in models}
print('Per-model OOF acc:', accs)
pows = {m: accs[m]**4 for m in models}
w = np.array([pows[m] for m in models], dtype=np.float64)
w = w / w.sum()
print('Init weights (power):', dict(zip(models, w)))

def oof_acc_from_w(w_vec):
    logits = None
    for wi, m in zip(w_vec, models):
        lp = np.log(np.clip(oofs[m], EPS, 1.0))
        logits = lp * wi if logits is None else logits + wi * lp
    pred = logits.argmax(1)
    return float(accuracy_score(y, pred))

best_w = w.copy()
best_acc = oof_acc_from_w(best_w)
print(f'Init log-space OOF acc: {best_acc:.5f}')

# Coordinate descent with step=0.02, 2-3 passes
step = 0.02
for pass_idx in range(3):
    improved = False
    for i in range(len(best_w)):
        for delta in (+step, -step):
            w_try = best_w.copy()
            w_try[i] = max(0.0, w_try[i] + delta)
            rem = max(1e-12, w_try.sum())
            w_try = w_try / rem  # renormalize to simplex
            acc = oof_acc_from_w(w_try)
            if acc > best_acc + 1e-9:
                best_acc = acc
                best_w = w_try
                improved = True
    print(f'Pass {pass_idx}: best OOF acc {best_acc:.5f} | weights {dict(zip(models, best_w))}')
    if not improved:
        break

# Build test logits with best_w
test_logits = None
for wi, m in zip(best_w, models):
    lp = np.log(np.clip(tests[m], EPS, 1.0))
    test_logits = lp * wi if test_logits is None else test_logits + wi * lp
pred_idx = test_logits.argmax(1)
labels = [CLASSES[i] for i in pred_idx]

# Submission aligned to sample_submission
test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
sample_sub = pd.read_csv('sample_submission.csv')
sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
assert sub['label'].notna().all(), 'Missing predictions after merge'
sub.to_csv('submission.csv', index=False)
print(f'Saved submission.csv {sub.shape} | Final log-space OOF acc: {best_acc:.5f}')

Models in log-space blend: ['XGB', 'LGB', 'LGB_DART', 'XGB2', 'CB']
Per-model OOF acc: {'XGB': 0.8903282193747757, 'LGB': 0.8865512774491595, 'LGB_DART': 0.8928409782591731, 'XGB2': 0.8906403633355704, 'CB': 0.890016075413981}
Init weights (power): {'XGB': 0.20022114819211928, 'LGB': 0.19684520138881842, 'LGB_DART': 0.20249105870942533, 'XGB2': 0.2005020814282891, 'CB': 0.19994051028134788}
Init log-space OOF acc: 0.89417
Pass 0: best OOF acc 0.89432 | weights {'XGB': 0.19244631698588935, 'LGB': 0.1892014623114364, 'LGB_DART': 0.21385145973608738, 'XGB2': 0.19271634124210793, 'CB': 0.2117844197244789}


Pass 1: best OOF acc 0.89451 | weights {'XGB': 0.19637379284274425, 'LGB': 0.17265455337901675, 'LGB_DART': 0.21821577524090552, 'XGB2': 0.19664932779806935, 'CB': 0.2161065507392642}
Pass 2: best OOF acc 0.89456 | weights {'XGB': 0.20447083802868, 'LGB': 0.15894893104853888, 'LGB_DART': 0.22721342694804822, 'XGB2': 0.18434957080182146, 'CB': 0.22501723317291147}
Saved submission.csv (6473, 2) | Final log-space OOF acc: 0.89456


In [37]:
# ExtraTrees (fast diversity) on pooled features with SGKF; predict on mean small-TTA features
import os, time, numpy as np, pandas as pd
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.ensemble import ExtraTreesClassifier

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

train_feat = 'X_train_pooled.npy'
train_y = 'y_train_pooled.npy'
train_groups = 'groups_pooled.npy'
test_feat_tta = 'X_test_pooled_tta_small.npy'
test_fnames_csv = 'test_fnames_pooled_small.csv'

assert os.path.exists(train_feat) and os.path.exists(train_y) and os.path.exists(train_groups), 'Missing train pooled features'
assert os.path.exists(test_feat_tta) and os.path.exists(test_fnames_csv), 'Missing test pooled small-TTA features'

X = np.load(train_feat)
y = np.load(train_y)
groups = np.load(train_groups)
X_test_tta = np.load(test_feat_tta)  # [n_shifts, N, D]
X_test_avg = X_test_tta.mean(axis=0)  # (N, D)
test_fnames = pd.read_csv(test_fnames_csv, header=None)[0].values
n_test, D = X_test_avg.shape
print('Shapes:', X.shape, y.shape, groups.shape, X_test_tta.shape, '| test_avg', X_test_avg.shape)

params = dict(
    n_estimators=600,
    max_depth=None,
    min_samples_leaf=2,
    max_features='sqrt',
    class_weight='balanced',
    n_jobs=-1,
    random_state=42,
)

cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
oof_et = np.zeros((len(y), num_class), dtype=np.float32)
test_et = np.zeros((n_test, num_class), dtype=np.float32)
start = time.time()

for fold, (tr_idx, va_idx) in enumerate(cv.split(X, y, groups)):
    t0 = time.time()
    print(f'ET Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
    scaler = StandardScaler(with_mean=True, with_std=True)
    X_tr = scaler.fit_transform(X[tr_idx])
    X_va = scaler.transform(X[va_idx])
    X_tr = np.clip(X_tr, -5, 5)
    X_va = np.clip(X_va, -5, 5)
    clf = ExtraTreesClassifier(**params)
    clf.fit(X_tr, y[tr_idx])
    oof_et[va_idx] = clf.predict_proba(X_va)
    va_acc = accuracy_score(y[va_idx], oof_et[va_idx].argmax(1))
    print(f'ET Fold {fold} acc: {va_acc:.4f} | elapsed {time.time()-t0:.1f}s')
    # Test uses mean of small-TTA features transformed by this fold's scaler
    X_te = scaler.transform(X_test_avg)
    X_te = np.clip(X_te, -5, 5)
    test_et += clf.predict_proba(X_te) / cv.n_splits

oof_acc_et = accuracy_score(y, oof_et.argmax(1))
print(f'ExtraTrees OOF accuracy: {oof_acc_et:.4f} | total {time.time()-start:.1f}s')
np.save('oof_pooled_et.npy', oof_et)
np.save('test_pred_pooled_et_tta50.npy', test_et)
print('Saved ExtraTrees preds to oof_pooled_et.npy and test_pred_pooled_et_tta50.npy.')

Shapes: (64073, 662) (64073,) (64073,) (5, 6473, 662) | test_avg (6473, 662)


ET Fold 0 | train 52005 val 12068


ET Fold 0 acc: 0.7406 | elapsed 8.0s
ET Fold 1 | train 51074 val 12999


ET Fold 1 acc: 0.7271 | elapsed 7.6s
ET Fold 2 | train 51792 val 12281


ET Fold 2 acc: 0.7381 | elapsed 8.0s
ET Fold 3 | train 50819 val 13254


ET Fold 3 acc: 0.7249 | elapsed 7.7s
ET Fold 4 | train 50602 val 13471


ET Fold 4 acc: 0.7276 | elapsed 7.7s
ExtraTrees OOF accuracy: 0.7314 | total 41.5s
Saved ExtraTrees preds to oof_pooled_et.npy and test_pred_pooled_et_tta50.npy.


In [38]:
# Log-space blend including ExtraTrees (if available); no biases/override
import os, numpy as np, pandas as pd
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
EPS = 1e-9

names_map = {
    'XGB': ('oof_pooled.npy', 'test_pred_pooled_tta50.npy'),
    'LGB': ('oof_pooled_lgb.npy', 'test_pred_pooled_lgb_tta50.npy'),
    'LGB_DART': ('oof_pooled_lgb_dart.npy', 'test_pred_pooled_lgb_dart_tta50.npy'),
    'XGB2': ('oof_pooled_xgb_seed2.npy', 'test_pred_pooled_xgb_seed2_tta50.npy'),
    'CB': ('oof_pooled_cat.npy', 'test_pred_pooled_cat_tta50.npy'),
    'ET': ('oof_pooled_et.npy', 'test_pred_pooled_et_tta50.npy'),
}
oofs, tests = {}, {}
for k, (oof_p, te_p) in names_map.items():
    if os.path.exists(oof_p) and os.path.exists(te_p):
        oofs[k] = np.load(oof_p)
        tests[k] = np.load(te_p)
models = list(oofs.keys())
assert len(models) >= 2, f'Need >=2 models, have {models}'
print('Models in log-space blend (+ET if present):', models)

y = np.load('y_train_pooled.npy')
accs = {m: float(accuracy_score(y, oofs[m].argmax(1))) for m in models}
print('Per-model OOF acc:', accs)
pows = {m: accs[m]**4 for m in models}
w = np.array([pows[m] for m in models], dtype=np.float64)
w = w / w.sum()
print('Init weights (power):', dict(zip(models, w)))

def oof_acc_from_w(w_vec):
    logits = None
    for wi, m in zip(w_vec, models):
        lp = np.log(np.clip(oofs[m], EPS, 1.0))
        logits = lp * wi if logits is None else logits + wi * lp
    return float(accuracy_score(y, logits.argmax(1)))

best_w = w.copy()
best_acc = oof_acc_from_w(best_w)
print(f'Init log-space OOF acc: {best_acc:.5f}')

# 3 passes coord descent, step=0.02
step = 0.02
for p in range(3):
    improved = False
    for i in range(len(best_w)):
        for d in (+step, -step):
            w_try = best_w.copy()
            w_try[i] = max(0.0, w_try[i] + d)
            w_try = w_try / max(1e-12, w_try.sum())
            acc = oof_acc_from_w(w_try)
            if acc > best_acc + 1e-9:
                best_acc = acc
                best_w = w_try
                improved = True
    print(f'Pass {p}: best OOF acc {best_acc:.5f} | weights {dict(zip(models, best_w))}')
    if not improved:
        break

test_logits = None
for wi, m in zip(best_w, models):
    lp = np.log(np.clip(tests[m], EPS, 1.0))
    test_logits = lp * wi if test_logits is None else test_logits + wi * lp
pred_idx = test_logits.argmax(1)
labels = [CLASSES[i] for i in pred_idx]

test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
sample_sub = pd.read_csv('sample_submission.csv')
sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
assert sub['label'].notna().all()
sub.to_csv('submission.csv', index=False)
print(f'Saved submission.csv {sub.shape} | Final log-space OOF acc: {best_acc:.5f}')

Models in log-space blend (+ET if present): ['XGB', 'LGB', 'LGB_DART', 'XGB2', 'CB', 'ET']
Per-model OOF acc: {'XGB': 0.8903282193747757, 'LGB': 0.8865512774491595, 'LGB_DART': 0.8928409782591731, 'XGB2': 0.8906403633355704, 'CB': 0.890016075413981, 'ET': 0.7314001217361447}
Init weights (power): {'XGB': 0.18348942943692537, 'LGB': 0.18039559764972227, 'LGB_DART': 0.18556965217790033, 'XGB2': 0.18374688615256268, 'CB': 0.1832322433674678, 'ET': 0.0835661912154216}
Init log-space OOF acc: 0.89381
Pass 0: best OOF acc 0.89411 | weights {'XGB': 0.20780848847706157, 'LGB': 0.16339150098643704, 'LGB_DART': 0.1895083643067401, 'XGB2': 0.16723058031201257, 'CB': 0.20712934724267795, 'ET': 0.06493171867507076}


Pass 1: best OOF acc 0.89459 | weights {'XGB': 0.19179471986124566, 'LGB': 0.1468434726979344, 'LGB_DART': 0.21314636493418215, 'XGB2': 0.15077204280869214, 'CB': 0.2311335066021635, 'ET': 0.06630989309578228}
Pass 2: best OOF acc 0.89470 | weights {'XGB': 0.1957088978175976, 'LGB': 0.12943211499789226, 'LGB_DART': 0.21749629074916543, 'XGB2': 0.15384902327417563, 'CB': 0.23585051694098313, 'ET': 0.067663156220186}
Saved submission.csv (6473, 2) | Final log-space OOF acc: 0.89470


In [39]:
# Optional tiny global silence boost after log-space blend (no OOF search), factor=1.04
import os, numpy as np, pandas as pd
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
EPS = 1e-9
sil_idx = CLASSES.index('silence')

# Load OOF/test preds for 5 core models
names_map = {
    'XGB': ('oof_pooled.npy', 'test_pred_pooled_tta50.npy'),
    'LGB': ('oof_pooled_lgb.npy', 'test_pred_pooled_lgb_tta50.npy'),
    'LGB_DART': ('oof_pooled_lgb_dart.npy', 'test_pred_pooled_lgb_dart_tta50.npy'),
    'XGB2': ('oof_pooled_xgb_seed2.npy', 'test_pred_pooled_xgb_seed2_tta50.npy'),
    'CB': ('oof_pooled_cat.npy', 'test_pred_pooled_cat_tta50.npy'),
}
oofs, tests = {}, {}
for k, (oof_p, te_p) in names_map.items():
    if os.path.exists(oof_p) and os.path.exists(te_p):
        oofs[k] = np.load(oof_p)
        tests[k] = np.load(te_p)
models = list(oofs.keys())
assert len(models) >= 2, f'Need >=2 models, have {models}'

y = np.load('y_train_pooled.npy')

# Init weights from power of OOF accs
accs = {m: float(accuracy_score(y, oofs[m].argmax(1))) for m in models}
pows = {m: accs[m]**4 for m in models}
w = np.array([pows[m] for m in models], dtype=np.float64)
w = w / w.sum()

def oof_acc_from_w(w_vec):
    logits = None
    for wi, m in zip(w_vec, models):
        lp = np.log(np.clip(oofs[m], EPS, 1.0))
        logits = lp * wi if logits is None else logits + wi * lp
    return float(accuracy_score(y, logits.argmax(1)))

best_w = w.copy()
best_acc = oof_acc_from_w(best_w)

# Light coord descent refinement (step=0.02, up to 3 passes)
step = 0.02
for _ in range(3):
    improved = False
    for i in range(len(best_w)):
        for d in (+step, -step):
            w_try = best_w.copy()
            w_try[i] = max(0.0, w_try[i] + d)
            w_try = w_try / max(1e-12, w_try.sum())
            acc = oof_acc_from_w(w_try)
            if acc > best_acc + 1e-9:
                best_acc, best_w = acc, w_try
                improved = True
    if not improved:
        break

# Build test logits and convert to probs
test_logits = None
for wi, m in zip(best_w, models):
    lp = np.log(np.clip(tests[m], EPS, 1.0))
    test_logits = lp * wi if test_logits is None else test_logits + wi * lp
probs = np.exp(test_logits - test_logits.max(axis=1, keepdims=True))
probs = probs / probs.sum(axis=1, keepdims=True)

# Tiny global silence boost
boost = 1.04
probs[:, sil_idx] *= boost
probs = probs / probs.sum(axis=1, keepdims=True)

pred_idx = probs.argmax(1)
labels = [CLASSES[i] for i in pred_idx]

test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
sample_sub = pd.read_csv('sample_submission.csv')
sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
assert sub['label'].notna().all(), 'Missing predictions after merge'
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv', sub.shape, '| Used silence boost factor:', boost, '| OOF (unboosted logits) acc:', f'{best_acc:.5f}')

Saved submission.csv (6473, 2) | Used silence boost factor: 1.04 | OOF (unboosted logits) acc: 0.89456


In [44]:
# Submission 3: Model subset selection + log-space blend (no biases/override)
import os, itertools, numpy as np, pandas as pd
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
EPS = 1e-9

# Core models
names_map = {
    'XGB': ('oof_pooled.npy', 'test_pred_pooled_tta50.npy'),
    'LGB': ('oof_pooled_lgb.npy', 'test_pred_pooled_lgb_tta50.npy'),
    'LGB_DART': ('oof_pooled_lgb_dart.npy', 'test_pred_pooled_lgb_dart_tta50.npy'),
    'XGB2': ('oof_pooled_xgb_seed2.npy', 'test_pred_pooled_xgb_seed2_tta50.npy'),
    'CB': ('oof_pooled_cat.npy', 'test_pred_pooled_cat_tta50.npy'),
    'ET': ('oof_pooled_et.npy', 'test_pred_pooled_et_tta50.npy'),
}
oofs_all, tests_all = {}, {}
for k, (oof_p, te_p) in names_map.items():
    if os.path.exists(oof_p) and os.path.exists(te_p):
        oofs_all[k] = np.load(oof_p)
        tests_all[k] = np.load(te_p)
models_all = list(oofs_all.keys())
y = np.load('y_train_pooled.npy')
print('Available models:', models_all)

def logspace_oof_acc(models_subset, init_w=None):
    # init_w: optional initial weights in order of models_subset
    accs = [float(accuracy_score(y, oofs_all[m].argmax(1))) for m in models_subset]
    if init_w is None:
        pows = np.array([a**4 for a in accs], dtype=np.float64)
        w = pows / pows.sum()
    else:
        w = np.array(init_w, dtype=np.float64)
        w = w / w.sum()
    def oof_acc_from_w(w_vec):
        logits = None
        for wi, m in zip(w_vec, models_subset):
            lp = np.log(np.clip(oofs_all[m], EPS, 1.0))
            logits = lp * wi if logits is None else logits + wi * lp
        return float(accuracy_score(y, logits.argmax(1)))
    best_w = w.copy()
    best_acc = oof_acc_from_w(best_w)
    step = 0.02
    for _ in range(3):
        improved = False
        for i in range(len(best_w)):
            for d in (+step, -step):
                w_try = best_w.copy()
                w_try[i] = max(0.0, w_try[i] + d)
                w_try = w_try / max(1e-12, w_try.sum())
                acc = oof_acc_from_w(w_try)
                if acc > best_acc + 1e-9:
                    best_acc, best_w = acc, w_try
                    improved = True
        if not improved:
            break
    return best_acc, best_w

candidates = []
# Evaluate all subsets of size 3..5 from the 5 core boosters + optionally ET
base_list = [m for m in ['XGB','LGB','LGB_DART','XGB2','CB'] if m in models_all]
opt_list = base_list.copy()
if 'ET' in models_all:
    opt_list.append('ET')
for k in range(3, min(6, len(opt_list)+1)):
    for subset in itertools.combinations(opt_list, k):
        acc, w = logspace_oof_acc(list(subset))
        candidates.append((acc, list(subset), w))
        print(f'Subset {subset} -> OOF {acc:.5f}')
best = max(candidates, key=lambda x: x[0])
best_acc, best_models, best_w = best
print('Best subset:', best_models, '| OOF:', f'{best_acc:.5f}', '| weights:', dict(zip(best_models, best_w)))

# Build test logits using best subset
test_logits = None
for wi, m in zip(best_w, best_models):
    lp = np.log(np.clip(tests_all[m], EPS, 1.0))
    test_logits = lp * wi if test_logits is None else test_logits + wi * lp
pred_idx = test_logits.argmax(1)
labels = [CLASSES[i] for i in pred_idx]

test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
sample_sub = pd.read_csv('sample_submission.csv')
sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
assert sub['label'].notna().all(), 'Missing predictions after merge'
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv', sub.shape, '| Best subset OOF:', f'{best_acc:.5f}')

Available models: ['XGB', 'LGB', 'LGB_DART', 'XGB2', 'CB', 'ET']
Subset ('XGB', 'LGB', 'LGB_DART') -> OOF 0.89183


Subset ('XGB', 'LGB', 'XGB2') -> OOF 0.89178
Subset ('XGB', 'LGB', 'CB') -> OOF 0.89443


Subset ('XGB', 'LGB', 'ET') -> OOF 0.88908
Subset ('XGB', 'LGB_DART', 'XGB2') -> OOF 0.89370
Subset ('XGB', 'LGB_DART', 'CB') -> OOF 0.89529


Subset ('XGB', 'LGB_DART', 'ET') -> OOF 0.89189
Subset ('XGB', 'XGB2', 'CB') -> OOF 0.89501


Subset ('XGB', 'XGB2', 'ET') -> OOF 0.89153
Subset ('XGB', 'CB', 'ET') -> OOF 0.89320


Subset ('LGB', 'LGB_DART', 'XGB2') -> OOF 0.89284
Subset ('LGB', 'LGB_DART', 'CB') -> OOF 0.89467


Subset ('LGB', 'LGB_DART', 'ET') -> OOF 0.89067
Subset ('LGB', 'XGB2', 'CB') -> OOF 0.89384


Subset ('LGB', 'XGB2', 'ET') -> OOF 0.88975
Subset ('LGB', 'CB', 'ET') -> OOF 0.89239


Subset ('LGB_DART', 'XGB2', 'CB') -> OOF 0.89531
Subset ('LGB_DART', 'XGB2', 'ET') -> OOF 0.89247


Subset ('LGB_DART', 'CB', 'ET') -> OOF 0.89515
Subset ('XGB2', 'CB', 'ET') -> OOF 0.89359


Subset ('XGB', 'LGB', 'LGB_DART', 'XGB2') -> OOF 0.89308
Subset ('XGB', 'LGB', 'LGB_DART', 'CB') -> OOF 0.89407


Subset ('XGB', 'LGB', 'LGB_DART', 'ET') -> OOF 0.89119
Subset ('XGB', 'LGB', 'XGB2', 'CB') -> OOF 0.89357


Subset ('XGB', 'LGB', 'XGB2', 'ET') -> OOF 0.89061


Subset ('XGB', 'LGB', 'CB', 'ET') -> OOF 0.89382
Subset ('XGB', 'LGB_DART', 'XGB2', 'CB') -> OOF 0.89545


Subset ('XGB', 'LGB_DART', 'XGB2', 'ET') -> OOF 0.89314


Subset ('XGB', 'LGB_DART', 'CB', 'ET') -> OOF 0.89545


Subset ('XGB', 'XGB2', 'CB', 'ET') -> OOF 0.89465
Subset ('LGB', 'LGB_DART', 'XGB2', 'CB') -> OOF 0.89450


Subset ('LGB', 'LGB_DART', 'XGB2', 'ET') -> OOF 0.89261


Subset ('LGB', 'LGB_DART', 'CB', 'ET') -> OOF 0.89426


Subset ('LGB', 'XGB2', 'CB', 'ET') -> OOF 0.89356


Subset ('LGB_DART', 'XGB2', 'CB', 'ET') -> OOF 0.89518


Subset ('XGB', 'LGB', 'LGB_DART', 'XGB2', 'CB') -> OOF 0.89456


Subset ('XGB', 'LGB', 'LGB_DART', 'XGB2', 'ET') -> OOF 0.89212


Subset ('XGB', 'LGB', 'LGB_DART', 'CB', 'ET') -> OOF 0.89375


Subset ('XGB', 'LGB', 'XGB2', 'CB', 'ET') -> OOF 0.89382


Subset ('XGB', 'LGB_DART', 'XGB2', 'CB', 'ET') -> OOF 0.89521


Subset ('LGB', 'LGB_DART', 'XGB2', 'CB', 'ET') -> OOF 0.89435
Best subset: ['XGB', 'LGB_DART', 'XGB2', 'CB'] | OOF: 0.89545 | weights: {'XGB': 0.2444052425074738, 'LGB_DART': 0.26678391198081647, 'XGB2': 0.24474817109585956, 'CB': 0.24406267441585017}
Saved submission.csv (6473, 2) | Best subset OOF: 0.89545


In [41]:
# Sanity check submission: LGB-DART only (small-TTA) to validate pipeline on LB
import os, numpy as np, pandas as pd
CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
assert os.path.exists('test_pred_pooled_lgb_dart_tta50.npy'), 'Missing LGB-DART small-TTA test preds'
test_pred = np.load('test_pred_pooled_lgb_dart_tta50.npy')
pred_idx = test_pred.argmax(1)
labels = [CLASSES[i] for i in pred_idx]
test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
sample_sub = pd.read_csv('sample_submission.csv')
sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
assert sub['label'].notna().all(), 'Missing predictions after merge'
sub.to_csv('submission.csv', index=False)
print('Saved LGB-DART only submission.csv', sub.shape)

Saved LGB-DART only submission.csv (6473, 2)


In [42]:
# HistGradientBoostingClassifier (diversity) on pooled features; predict on mean small-TTA
import os, time, numpy as np, pandas as pd
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.ensemble import HistGradientBoostingClassifier

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
num_class = len(CLASSES)

train_feat = 'X_train_pooled.npy'
train_y = 'y_train_pooled.npy'
train_groups = 'groups_pooled.npy'
test_feat_tta = 'X_test_pooled_tta_small.npy'
test_fnames_csv = 'test_fnames_pooled_small.csv'

assert os.path.exists(train_feat) and os.path.exists(train_y) and os.path.exists(train_groups), 'Missing train pooled features'
assert os.path.exists(test_feat_tta) and os.path.exists(test_fnames_csv), 'Missing test pooled small-TTA features'

X = np.load(train_feat)
y = np.load(train_y)
groups = np.load(train_groups)
X_test_tta = np.load(test_feat_tta)  # [n_shifts, N, D]
X_test_avg = X_test_tta.mean(axis=0)  # (N, D)
test_fnames = pd.read_csv(test_fnames_csv, header=None)[0].values
n_test, D = X_test_avg.shape
print('Shapes:', X.shape, y.shape, groups.shape, X_test_tta.shape, '| test_avg', X_test_avg.shape)

params = dict(
    loss='log_loss',
    learning_rate=0.06,
    max_depth=7,
    max_leaf_nodes=63,
    l2_regularization=0.1,
    max_bins=255,
    random_state=42,
)

cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)
oof_hgb = np.zeros((len(y), num_class), dtype=np.float32)
test_hgb = np.zeros((n_test, num_class), dtype=np.float32)
start = time.time()

for fold, (tr_idx, va_idx) in enumerate(cv.split(X, y, groups)):
    t0 = time.time()
    print(f'HGB Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
    scaler = StandardScaler(with_mean=True, with_std=True)
    X_tr = scaler.fit_transform(X[tr_idx])
    X_va = scaler.transform(X[va_idx])
    X_tr = np.clip(X_tr, -5, 5)
    X_va = np.clip(X_va, -5, 5)
    clf = HistGradientBoostingClassifier(**params)
    clf.fit(X_tr, y[tr_idx])
    oof_hgb[va_idx] = clf.predict_proba(X_va)
    va_acc = accuracy_score(y[va_idx], oof_hgb[va_idx].argmax(1))
    print(f'HGB Fold {fold} acc: {va_acc:.4f} | elapsed {time.time()-t0:.1f}s')
    X_te = scaler.transform(X_test_avg)
    X_te = np.clip(X_te, -5, 5)
    test_hgb += clf.predict_proba(X_te) / cv.n_splits

oof_acc_hgb = accuracy_score(y, oof_hgb.argmax(1))
print(f'HistGBDT OOF accuracy: {oof_acc_hgb:.4f} | total {time.time()-start:.1f}s')
np.save('oof_pooled_hgb.npy', oof_hgb)
np.save('test_pred_pooled_hgb_tta50.npy', test_hgb)
print('Saved HGB preds to oof_pooled_hgb.npy and test_pred_pooled_hgb_tta50.npy.')

Shapes: (64073, 662) (64073,) (64073,) (5, 6473, 662) | test_avg (6473, 662)


HGB Fold 0 | train 52005 val 12068


HGB Fold 0 acc: 0.8549 | elapsed 20.1s
HGB Fold 1 | train 51074 val 12999


HGB Fold 1 acc: 0.8361 | elapsed 17.8s
HGB Fold 2 | train 51792 val 12281


HGB Fold 2 acc: 0.8543 | elapsed 19.9s
HGB Fold 3 | train 50819 val 13254


HGB Fold 3 acc: 0.8375 | elapsed 19.8s
HGB Fold 4 | train 50602 val 13471


HGB Fold 4 acc: 0.8428 | elapsed 20.0s
HistGBDT OOF accuracy: 0.8448 | total 99.6s
Saved HGB preds to oof_pooled_hgb.npy and test_pred_pooled_hgb_tta50.npy.


In [43]:
# Log-space blend including HGB (and ET if present); no biases/override
import os, numpy as np, pandas as pd
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
EPS = 1e-9

names_map = {
    'XGB': ('oof_pooled.npy', 'test_pred_pooled_tta50.npy'),
    'LGB': ('oof_pooled_lgb.npy', 'test_pred_pooled_lgb_tta50.npy'),
    'LGB_DART': ('oof_pooled_lgb_dart.npy', 'test_pred_pooled_lgb_dart_tta50.npy'),
    'XGB2': ('oof_pooled_xgb_seed2.npy', 'test_pred_pooled_xgb_seed2_tta50.npy'),
    'CB': ('oof_pooled_cat.npy', 'test_pred_pooled_cat_tta50.npy'),
    'ET': ('oof_pooled_et.npy', 'test_pred_pooled_et_tta50.npy'),
    'HGB': ('oof_pooled_hgb.npy', 'test_pred_pooled_hgb_tta50.npy'),
}
oofs, tests = {}, {}
for k, (oof_p, te_p) in names_map.items():
    if os.path.exists(oof_p) and os.path.exists(te_p):
        oofs[k] = np.load(oof_p)
        tests[k] = np.load(te_p)
models = list(oofs.keys())
assert len(models) >= 2, f'Need >=2 models, have {models}'
print('Models in log-space blend (+HGB/ET if present):', models)

y = np.load('y_train_pooled.npy')
accs = {m: float(accuracy_score(y, oofs[m].argmax(1))) for m in models}
print('Per-model OOF acc:', accs)
pows = {m: accs[m]**4 for m in models}
w = np.array([pows[m] for m in models], dtype=np.float64)
w = w / w.sum()
print('Init weights (power):', dict(zip(models, w)))

def oof_acc_from_w(w_vec):
    logits = None
    for wi, m in zip(w_vec, models):
        lp = np.log(np.clip(oofs[m], EPS, 1.0))
        logits = lp * wi if logits is None else logits + wi * lp
    return float(accuracy_score(y, logits.argmax(1)))

best_w = w.copy()
best_acc = oof_acc_from_w(best_w)
print(f'Init log-space OOF acc: {best_acc:.5f}')

# 3 passes coord descent, step=0.02
step = 0.02
for p in range(3):
    improved = False
    for i in range(len(best_w)):
        for d in (+step, -step):
            w_try = best_w.copy()
            w_try[i] = max(0.0, w_try[i] + d)
            w_try = w_try / max(1e-12, w_try.sum())
            acc = oof_acc_from_w(w_try)
            if acc > best_acc + 1e-9:
                best_acc = acc
                best_w = w_try
                improved = True
    print(f'Pass {p}: best OOF acc {best_acc:.5f} | weights {dict(zip(models, best_w))}')
    if not improved:
        break

test_logits = None
for wi, m in zip(best_w, models):
    lp = np.log(np.clip(tests[m], EPS, 1.0))
    test_logits = lp * wi if test_logits is None else test_logits + wi * lp
pred_idx = test_logits.argmax(1)
labels = [CLASSES[i] for i in pred_idx]

test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
sample_sub = pd.read_csv('sample_submission.csv')
sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
assert sub['label'].notna().all()
sub.to_csv('submission.csv', index=False)
print(f'Saved submission.csv {sub.shape} | Final log-space OOF acc: {best_acc:.5f}')

Models in log-space blend (+HGB/ET if present): ['XGB', 'LGB', 'LGB_DART', 'XGB2', 'CB', 'ET', 'HGB']
Per-model OOF acc: {'XGB': 0.8903282193747757, 'LGB': 0.8865512774491595, 'LGB_DART': 0.8928409782591731, 'XGB2': 0.8906403633355704, 'CB': 0.890016075413981, 'ET': 0.7314001217361447, 'HGB': 0.8448332370889454}
Init weights (power): {'XGB': 0.15972785944726164, 'LGB': 0.15703467363063822, 'LGB_DART': 0.16153869687048042, 'XGB2': 0.15995197595476435, 'CB': 0.15950397853771728, 'ET': 0.07274451114682975, 'HGB': 0.12949830441230833}
Init log-space OOF acc: 0.89128


Pass 0: best OOF acc 0.89225 | weights {'XGB': 0.17641539758082186, 'LGB': 0.13411643774096116, 'LGB_DART': 0.17818500793376804, 'XGB2': 0.1770200017445713, 'CB': 0.1769805813049954, 'ET': 0.05057914207890999, 'HGB': 0.10670343161597232}
Pass 1: best OOF acc 0.89300 | weights {'XGB': 0.16299521555428964, 'LGB': 0.11933372124986974, 'LGB_DART': 0.18568059299724807, 'XGB2': 0.2044825889063591, 'CB': 0.20484183038661286, 'ET': 0.03188216335226578, 'HGB': 0.09078388755335486}


Pass 2: best OOF acc 0.89347 | weights {'XGB': 0.16312569000618418, 'LGB': 0.11942924553762055, 'LGB_DART': 0.20584523625101106, 'XGB2': 0.2250626029787256, 'CB': 0.20500580222746645, 'ET': 0.011083028001382403, 'HGB': 0.07044839499760974}
Saved submission.csv (6473, 2) | Final log-space OOF acc: 0.89347


In [45]:
# Submission: 4-model best subset (XGB, LGB_DART, XGB2, CB) log-space blend + tiny silence boost x1.04
import os, numpy as np, pandas as pd
from sklearn.metrics import accuracy_score

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
EPS = 1e-9
sil_idx = CLASSES.index('silence')

# Best subset and weights from Cell 21
models = ['XGB', 'LGB_DART', 'XGB2', 'CB']
weights = np.array([0.2444052425074738, 0.26678391198081647, 0.24474817109585956, 0.24406267441585017], dtype=np.float64)
weights = weights / weights.sum()

paths = {
    'XGB': ('oof_pooled.npy', 'test_pred_pooled_tta50.npy'),
    'LGB_DART': ('oof_pooled_lgb_dart.npy', 'test_pred_pooled_lgb_dart_tta50.npy'),
    'XGB2': ('oof_pooled_xgb_seed2.npy', 'test_pred_pooled_xgb_seed2_tta50.npy'),
    'CB': ('oof_pooled_cat.npy', 'test_pred_pooled_cat_tta50.npy'),
}

oofs, tests = {}, {}
for m in models:
    oof_p, te_p = paths[m]
    assert os.path.exists(oof_p) and os.path.exists(te_p), f'Missing files for {m}'
    oofs[m] = np.load(oof_p)
    tests[m] = np.load(te_p)

# Log-space blend (geometric mean with fixed weights) -> logits
test_logits = None
for wi, m in zip(weights, models):
    lp = np.log(np.clip(tests[m], EPS, 1.0))
    test_logits = lp * wi if test_logits is None else test_logits + wi * lp

# Convert to probs, apply tiny global silence boost x1.04
probs = np.exp(test_logits - test_logits.max(axis=1, keepdims=True))
probs = probs / probs.sum(axis=1, keepdims=True)
probs[:, sil_idx] *= 1.04
probs = probs / probs.sum(axis=1, keepdims=True)

pred_idx = probs.argmax(1)
labels = [CLASSES[i] for i in pred_idx]

# Build submission aligned to sample_submission
test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
sample_sub = pd.read_csv('sample_submission.csv')
sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
assert sub['label'].notna().all(), 'Missing predictions after merge'
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv', sub.shape, '| Models:', models, '| Weights:', weights.round(5), '| silence x1.04')

Saved submission.csv (6473, 2) | Models: ['XGB', 'LGB_DART', 'XGB2', 'CB'] | Weights: [0.24441 0.26678 0.24475 0.24406] | silence x1.04


In [46]:
# Diagnostics: 4-model best subset distributions + write alt submissions with silence x1.03/x1.05
import os, numpy as np, pandas as pd
from collections import Counter

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
EPS = 1e-9
sil_idx = CLASSES.index('silence')

# Best subset fixed
models = ['XGB', 'LGB_DART', 'XGB2', 'CB']
weights = np.array([0.2444052425074738, 0.26678391198081647, 0.24474817109585956, 0.24406267441585017], dtype=np.float64)
weights = weights / weights.sum()
paths = {
    'XGB': ('oof_pooled.npy', 'test_pred_pooled_tta50.npy'),
    'LGB_DART': ('oof_pooled_lgb_dart.npy', 'test_pred_pooled_lgb_dart_tta50.npy'),
    'XGB2': ('oof_pooled_xgb_seed2.npy', 'test_pred_pooled_xgb_seed2_tta50.npy'),
    'CB': ('oof_pooled_cat.npy', 'test_pred_pooled_cat_tta50.npy'),
}

# Load
y = np.load('y_train_pooled.npy')
oofs = {m: np.load(paths[m][0]) for m in models}
tests = {m: np.load(paths[m][1]) for m in models}

# OOF logits and preds
logits_oof = None
for wi, m in zip(weights, models):
    lp = np.log(np.clip(oofs[m], EPS, 1.0))
    logits_oof = lp * wi if logits_oof is None else logits_oof + wi * lp
pred_oof = logits_oof.argmax(1)
dist_oof = Counter([CLASSES[i] for i in pred_oof])
print('OOF class distribution:', dict(dist_oof))

# Test logits and base probs
logits_te = None
for wi, m in zip(weights, models):
    lp = np.log(np.clip(tests[m], EPS, 1.0))
    logits_te = lp * wi if logits_te is None else logits_te + wi * lp
probs = np.exp(logits_te - logits_te.max(axis=1, keepdims=True))
probs = probs / probs.sum(axis=1, keepdims=True)

def save_variant(probs_in, suffix):
    pred_idx = probs_in.argmax(1)
    labels = [CLASSES[i] for i in pred_idx]
    dist = Counter(labels)
    print(f'Test class distribution {suffix}:', dict(dist))
    test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
    test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
    pred_df = pd.DataFrame({'fname': test_fnames, 'label': labels})
    sample_sub = pd.read_csv('sample_submission.csv')
    sub = sample_sub[['fname']].merge(pred_df, on='fname', how='left')
    assert sub['label'].notna().all()
    out = f'submission{suffix}.csv'
    sub.to_csv(out, index=False)
    print('Saved', out, sub.shape)

# Save base (no boost) as submission_base.csv
save_variant(probs, '_base')
# Silence x1.03
p103 = probs.copy()
p103[:, sil_idx] *= 1.03
p103 /= p103.sum(axis=1, keepdims=True)
save_variant(p103, '_sil103')
# Silence x1.05
p105 = probs.copy()
p105[:, sil_idx] *= 1.05
p105 /= p105.sum(axis=1, keepdims=True)
save_variant(p105, '_sil105')

OOF class distribution: {'yes': 1976, 'down': 1731, 'left': 1726, 'unknown': 39626, 'no': 1781, 'go': 1754, 'right': 1766, 'up': 2145, 'on': 1811, 'off': 1986, 'stop': 1947, 'silence': 5824}
Test class distribution _base: {'unknown': 4523, 'up': 218, 'down': 174, 'yes': 217, 'left': 194, 'right': 212, 'off': 198, 'go': 180, 'on': 151, 'stop': 209, 'no': 196, 'silence': 1}
Saved submission_base.csv (6473, 2)
Test class distribution _sil103: {'unknown': 4523, 'up': 218, 'down': 174, 'yes': 217, 'left': 194, 'right': 212, 'off': 198, 'go': 180, 'on': 151, 'stop': 209, 'no': 196, 'silence': 1}
Saved submission_sil103.csv (6473, 2)
Test class distribution _sil105: {'unknown': 4523, 'up': 218, 'down': 174, 'yes': 217, 'left': 194, 'right': 212, 'off': 198, 'go': 180, 'on': 151, 'stop': 209, 'no': 196, 'silence': 1}
Saved submission_sil105.csv (6473, 2)


In [None]:
# Tiny CNN on 64xT log-mels (CPU-only) with SGKF; save OOF/test logits for blending
import os, time, math, numpy as np, pandas as pd, random, torch, librosa, soundfile as sf
from pathlib import Path
from dataclasses import dataclass
from typing import Optional, Tuple
from sklearn.model_selection import StratifiedGroupKFold

# Threads and CPU perf settings
torch.set_num_threads(max(1, min(os.cpu_count()-2, 16)))
try:
    torch.set_num_interop_threads(1)
except Exception:
    pass
torch.backends.mkldnn.enabled = True

SR = 16000
N_MELS = 64
N_FFT = 512
HOP = 160
WIN = 400
FMIN, FMAX = 20, 8000
FIX_DUR = 1.0
NUM_CLASSES = 12
CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
CLS2IDX = {c:i for i,c in enumerate(CLASSES)}

def load_1s(path: str, shift_samples: int = 0) -> np.ndarray:
    y, sr = librosa.load(path, sr=SR, mono=True)
    target = int(FIX_DUR * SR)
    if shift_samples != 0:
        if shift_samples > 0:
            y = np.pad(y, (shift_samples, 0))
        else:
            y = np.pad(y, (0, -shift_samples))
    if len(y) < target:
        y = np.pad(y, (0, target - len(y)))
    elif len(y) > target:
        start = (len(y) - target) // 2
        y = y[start:start+target]
    return y.astype(np.float32)

def compute_logmel(y: np.ndarray) -> np.ndarray:
    mel = librosa.feature.melspectrogram(y=y, sr=SR, n_fft=N_FFT, hop_length=HOP, win_length=WIN, window='hann',
                                         n_mels=N_MELS, fmin=FMIN, fmax=FMAX, power=2.0, center=True)
    x = np.log(mel + 1e-6).astype(np.float32)  # [64, T]
    # per-utterance z-norm (fast fallback to CMVN for speed)
    m = x.mean(axis=1, keepdims=True); s = x.std(axis=1, keepdims=True) + 1e-8
    x = (x - m) / s
    return x

def spec_augment(x: torch.Tensor, p: float = 0.5, F: int = 6, T: int = 12) -> torch.Tensor:
    # x: (1, 64, T)
    if random.random() > p:
        return x
    _, f, t = x.shape
    # freq mask
    f0 = random.randint(0, max(0, f - F))
    f1 = min(f, f0 + F)
    x[:, f0:f1, :] = 0
    # time mask
    t0 = random.randint(0, max(0, t - T))
    t1 = min(t, t0 + T)
    x[:, :, t0:t1] = 0
    return x

class AudioDataset(torch.utils.data.Dataset):
    def __init__(self, df: pd.DataFrame, train: bool = True, time_shift_ms: int = 80):
        self.df = df.reset_index(drop=True)
        self.train = train
        self.time_shift = int(time_shift_ms/1000.0 * SR)
    def __len__(self):
        return len(self.df)
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        path = row['path']
        label = row['label'] if 'label' in row else None
        shift = 0
        if self.train:
            # random shift in [-time_shift, time_shift]
            shift = random.randint(-self.time_shift, self.time_shift)
        y = load_1s(path, shift_samples=shift)
        x = compute_logmel(y)  # [64, T]
        x = torch.from_numpy(x).unsqueeze(0)  # (1,64,T) - DataLoader will batch to (B,1,64,T)
        if self.train:
            x = spec_augment(x, p=0.5, F=6, T=12)
        if label is None:
            y_t = torch.tensor(-1, dtype=torch.long)
        else:
            y_t = torch.tensor(CLS2IDX[label], dtype=torch.long)
        return x, y_t

class TinyCNN(torch.nn.Module):
    def __init__(self, num_classes=12):
        super().__init__()
        self.block1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 32, kernel_size=3, padding=1),
            torch.nn.BatchNorm2d(32),
            torch.nn.ReLU(inplace=True),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(0.10)
        )
        self.block2 = torch.nn.Sequential(
            torch.nn.Conv2d(32, 64, kernel_size=3, padding=1),
            torch.nn.BatchNorm2d(64),
            torch.nn.ReLU(inplace=True),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(0.10)
        )
        self.block3 = torch.nn.Sequential(
            torch.nn.Conv2d(64, 96, kernel_size=3, padding=1),
            torch.nn.BatchNorm2d(96),
            torch.nn.ReLU(inplace=True),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(0.15)
        )
        self.head = torch.nn.Sequential(
            torch.nn.Conv2d(96, 128, kernel_size=1),
            torch.nn.BatchNorm2d(128),
            torch.nn.ReLU(inplace=True)
        )
        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.drop = torch.nn.Dropout(0.20)
        self.fc = torch.nn.Linear(128, num_classes)
    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.head(x)
        x = self.gap(x).squeeze(-1).squeeze(-1)
        x = self.drop(x)
        x = self.fc(x)
        return x

def train_cnn(seed=42, n_folds=3, epochs=20, batch_size=128):
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed)
    df_tr = pd.read_csv('train_meta.csv')
    df_te = pd.read_csv('test_meta.csv')
    groups = df_tr['speaker'].values
    y_idx = df_tr['label'].map(CLS2IDX).values
    cv = StratifiedGroupKFold(n_splits=n_folds, shuffle=True, random_state=seed)
    oof = np.zeros((len(df_tr), NUM_CLASSES), dtype=np.float32)
    test_logits = np.zeros((len(df_te), NUM_CLASSES), dtype=np.float32)
    device = torch.device('cpu')
    scaler_autocast = hasattr(torch.autocast, '__call__')

    for fold, (tr_idx, va_idx) in enumerate(cv.split(np.zeros(len(y_idx)), y_idx, groups)):
        t0 = time.time()
        print(f'[CNN] Fold {fold} | train {len(tr_idx)} val {len(va_idx)}')
        ds_tr = AudioDataset(df_tr.iloc[tr_idx], train=True, time_shift_ms=80)
        ds_va = AudioDataset(df_tr.iloc[va_idx], train=False)
        ds_te = AudioDataset(df_te, train=False)
        dl_tr = torch.utils.data.DataLoader(ds_tr, batch_size=batch_size, shuffle=True, num_workers=0)
        dl_va = torch.utils.data.DataLoader(ds_va, batch_size=batch_size, shuffle=False, num_workers=0)
        dl_te = torch.utils.data.DataLoader(ds_te, batch_size=batch_size, shuffle=False, num_workers=0)

        model = TinyCNN(NUM_CLASSES).to(device)
        model = model.to(memory_format=torch.channels_last)
        optim = torch.optim.AdamW(model.parameters(), lr=1e-3, betas=(0.9,0.98), weight_decay=1e-4)
        loss_fn = torch.nn.CrossEntropyLoss(label_smoothing=0.05)
        # cosine schedule
        total_steps = epochs * max(1, len(dl_tr))
        def lr_lambda(step):
            if step < len(dl_tr):
                return float(step + 1) / float(len(dl_tr))
            progress = (step - len(dl_tr)) / max(1, total_steps - len(dl_tr))
            return 0.5 * (1 + math.cos(math.pi * progress)) * (1 - 1e-2) + 1e-2
        sched = torch.optim.lr_scheduler.LambdaLR(optim, lr_lambda)

        best_acc, best_state, no_improve = 0.0, None, 0
        for ep in range(1, epochs+1):
            model.train()
            tr_loss, n_tr = 0.0, 0
            for xb, yb in dl_tr:
                # set channels_last after batching to avoid rank error
                xb = xb.to(device, memory_format=torch.channels_last, non_blocking=False)
                yb = yb.to(device)
                optim.zero_grad(set_to_none=True)
                if scaler_autocast:
                    with torch.autocast('cpu', dtype=torch.bfloat16, enabled=True):
                        logits = model(xb)
                        loss = loss_fn(logits, yb)
                else:
                    logits = model(xb)
                    loss = loss_fn(logits, yb)
                loss.backward()
                optim.step()
                sched.step()
                tr_loss += loss.item() * xb.size(0); n_tr += xb.size(0)
            # val
            model.eval()
            correct, total = 0, 0
            with torch.no_grad():
                for xb, yb in dl_va:
                    xb = xb.to(device, memory_format=torch.channels_last, non_blocking=False)
                    logits = model(xb)
                    pred = logits.argmax(1).cpu()
                    correct += (pred == yb).sum().item()
                    total += yb.size(0)
            acc = correct / max(1, total)
            print(f'[CNN] Fold {fold} Epoch {ep}/{epochs} | tr_loss {tr_loss/max(1,n_tr):.4f} | val_acc {acc:.4f}')
            if acc > best_acc + 1e-4:
                best_acc = acc
                best_state = {k:v.cpu().clone() for k,v in model.state_dict().items()}
                no_improve = 0
            else:
                no_improve += 1
            if no_improve >= 3:
                print(f'[CNN] Early stopping at epoch {ep} (best_acc={best_acc:.4f})')
                break

        if best_state is not None:
            model.load_state_dict(best_state, strict=True)
        # OOF logits
        model.eval()
        outs = []
        with torch.no_grad():
            for xb, yb in dl_va:
                xb = xb.to(device, memory_format=torch.channels_last, non_blocking=False)
                logits = model(xb).cpu().float()
                outs.append(logits.numpy())
        oof_logits_fold = np.concatenate(outs, axis=0)
        oof[va_idx] = oof_logits_fold  # logits ok for log-space blend (softmax later)

        # Test logits (fold)
        outs_te = []
        with torch.no_grad():
            for xb, _ in dl_te:
                xb = xb.to(device, memory_format=torch.channels_last, non_blocking=False)
                logits = model(xb).cpu().float()
                outs_te.append(logits.numpy())
        test_logits += np.concatenate(outs_te, axis=0) / n_folds
        print(f'[CNN] Fold {fold} done in {time.time()-t0:.1f}s | best_val_acc={best_acc:.4f}')

    # Save logits (OOF softmaxed to probs for metrics, but store logits for blending)
    # Softmax OOF for quick acc
    oof_probs = torch.from_numpy(oof).softmax(dim=1).numpy()
    y_all = df_tr['label'].map(CLS2IDX).values
    oof_acc = (oof_probs.argmax(1) == y_all).mean()
    print(f'[CNN] OOF accuracy (softmaxed): {oof_acc:.4f}')
    np.save('oof_cnn_logits.npy', oof)
    np.save('test_cnn_logits.npy', test_logits)
    print('Saved oof_cnn_logits.npy and test_cnn_logits.npy')

train_cnn(seed=42, n_folds=3, epochs=20, batch_size=128)

[CNN] Fold 0 | train 42406 val 21667


In [None]:
# CNN+Boosters Log-space Blend with Coord Descent, cap weights, optional silence boost
import os, json, time, numpy as np, pandas as pd
from sklearn.metrics import accuracy_score
import torch

CLASSES = ['yes','no','up','down','left','right','on','off','stop','go','unknown','silence']
EPS = 1e-9

# Paths for 4-boosters (small-TTA) and CNN logits
paths = {
    'XGB': {'oof': 'oof_pooled.npy', 'test': 'test_pred_pooled_tta50.npy', 'type': 'prob'},
    'LGB_DART': {'oof': 'oof_pooled_lgb_dart.npy', 'test': 'test_pred_pooled_lgb_dart_tta50.npy', 'type': 'prob'},
    'XGB2': {'oof': 'oof_pooled_xgb_seed2.npy', 'test': 'test_pred_pooled_xgb_seed2_tta50.npy', 'type': 'prob'},
    'CB': {'oof': 'oof_pooled_cat.npy', 'test': 'test_pred_pooled_cat_tta50.npy', 'type': 'prob'},
    'CNN': {'oof': 'oof_cnn_logits.npy', 'test': 'test_cnn_logits.npy', 'type': 'logit'},
}

missing = [k for k,v in paths.items() if not (os.path.exists(v['oof']) and os.path.exists(v['test']))]
if missing:
    print('Waiting for artifacts to exist (skip execution until ready):', missing)
else:
    # Load OOF labels
    y = np.load('y_train_pooled.npy')
    # Load OOF/test for each model
    oofs = {}; tests = {}; kinds = {}
    for k, v in paths.items():
        oo = np.load(v['oof'])
        tt = np.load(v['test'])
        oofs[k] = oo
        tests[k] = tt
        kinds[k] = v['type']
    models = list(oofs.keys())
    print('Models in blend:', models)

    # Compute per-model OOF accuracy for initialization
    accs = {}
    for m in models:
        if kinds[m] == 'prob':
            pred = oofs[m].argmax(1)
        else:
            pred = torch.from_numpy(oofs[m]).softmax(dim=1).numpy().argmax(1)
        accs[m] = float(accuracy_score(y, pred))
    print('Per-model OOF acc:', {m: round(a,5) for m,a in accs.items()})

    # Helper to get log-probs for a model's OOF or TEST arrays
    def to_log_probs(arr, kind):
        if kind == 'prob':
            return np.log(np.clip(arr, EPS, 1.0))
        else:  # logits
            t = torch.from_numpy(arr)
            lp = torch.log_softmax(t, dim=1).numpy()
            return lp

    # Initialize weights from power of OOF acc^4, then normalize; cap later during coord descent
    w = np.array([accs[m]**4 for m in models], dtype=np.float64)
    w = w / w.sum()
    print('Init weights (power):', dict(zip(models, np.round(w,5))))

    # OOF acc from weights using log-space sum
    log_oofs = {m: to_log_probs(oofs[m], kinds[m]) for m in models}
    def oof_acc_from_w(w_vec):
        logits = None
        for wi, m in zip(w_vec, models):
            lp = log_oofs[m]
            logits = wi*lp if logits is None else logits + wi*lp
        return float(accuracy_score(y, logits.argmax(1)))

    best_w = w.copy()
    best_acc = oof_acc_from_w(best_w)
    print(f'Init log-space OOF acc: {best_acc:.5f}')

    # Coordinate descent: step=0.02, 3 passes, cap any single weight <= 0.35
    step = 0.02
    cap = 0.35
    for p in range(3):
        improved = False
        for i in range(len(best_w)):
            for d in (+step, -step):
                w_try = best_w.copy()
                w_try[i] = max(0.0, min(cap, w_try[i] + d))
                s = w_try.sum()
                if s <= 0:
                    continue
                # renormalize and re-cap
                w_try = w_try / s
                # if any exceed cap after norm, clip and renorm once
                if (w_try > cap).any():
                    over = w_try > cap
                    extra = (w_try[over] - cap).sum()
                    w_try[over] = cap
                    remain_mask = ~over
                    if remain_mask.any():
                        w_try[remain_mask] += extra * (w_try[remain_mask] / w_try[remain_mask].sum())
                    else:
                        # if all capped, just renorm to sum 1
                        w_try = w_try / w_try.sum()
                acc = oof_acc_from_w(w_try)
                if acc > best_acc + 1e-9:
                    best_acc = acc; best_w = w_try; improved = True
        print(f'Pass {p}: best OOF acc {best_acc:.5f} | weights {dict(zip(models, np.round(best_w,5)))}')
        if not improved:
            break

    # Build blended test logits
    log_tests = {m: to_log_probs(tests[m], kinds[m]) for m in models}
    test_logits = None
    for wi, m in zip(best_w, models):
        lp = log_tests[m]
        test_logits = wi*lp if test_logits is None else test_logits + wi*lp

    # Convert to probs
    probs = np.exp(test_logits - test_logits.max(axis=1, keepdims=True))
    probs = probs / probs.sum(axis=1, keepdims=True)

    # Create submissions: base and silence x1.04
    test_fnames_path = 'test_fnames_pooled_small.csv' if os.path.exists('test_fnames_pooled_small.csv') else 'test_fnames_pooled.csv'
    test_fnames = pd.read_csv(test_fnames_path, header=None)[0].values
    pred_idx_base = probs.argmax(1)
    labels_base = [CLASSES[i] for i in pred_idx_base]
    pred_df_base = pd.DataFrame({'fname': test_fnames, 'label': labels_base})
    sample_sub = pd.read_csv('sample_submission.csv')
    sub_base = sample_sub[['fname']].merge(pred_df_base, on='fname', how='left')
    assert sub_base['label'].notna().all(), 'Missing predictions (base)'
    sub_base.to_csv('submission.csv', index=False)
    print('Saved submission.csv (base blend):', sub_base.shape, '| OOF acc:', f'{best_acc:.5f}')

    # Silence boost variant
    sil_idx = CLASSES.index('silence')
    probs_boost = probs.copy()
    probs_boost[:, sil_idx] *= 1.04
    probs_boost /= probs_boost.sum(axis=1, keepdims=True)
    pred_idx_b = probs_boost.argmax(1)
    labels_b = [CLASSES[i] for i in pred_idx_b]
    pred_df_b = pd.DataFrame({'fname': test_fnames, 'label': labels_b})
    sub_b = sample_sub[['fname']].merge(pred_df_b, on='fname', how='left')
    assert sub_b['label'].notna().all(), 'Missing predictions (silence boost)'
    sub_b.to_csv('submission_silence104.csv', index=False)
    print('Saved submission_silence104.csv:', sub_b.shape)

    # Persist weights
    with open('blend_weights_cnn_boosters.json', 'w') as f:
        json.dump({'models': models, 'weights': best_w.tolist(), 'oof_acc': best_acc}, f, indent=2)
    print('Saved blend_weights_cnn_boosters.json')