# Plan: LAL Automated Essay Scoring 2.0

Objectives:
- Establish strong, reliable CV with QWK and lock splits.
- Build fast baseline (TF-IDF + linear model) to get quick OOF and LB.
- Iterate with feature engineering and modern text models; aim for medal.

Validation:
- Use StratifiedKFold on binned score distribution (stratify target).
- 5 folds, multiple seeds (cache folds).
- Optimize rounding (or isotonic/ordinal mapping) to maximize QWK on OOF.
- Fit transforms inside folds only; cache vectorizers to disk.

Baseline v1 (fast):
- Text only: char/word TF-IDF + Ridge/LinearSVR.
- Add NB-SVM style log-count ratio features.
- Predict float scores; apply optimized rounding to integer labels.
- Evaluate OOF QWK; produce submission.

Feature Engineering v2:
- NLP stats: length, unique ratio, punctuation, sentence count, syllables, readability (FKGL), spelling error counts.
- Lexical richness: TTR, MTLD (approx), POS tag counts.
- Misspell correction? Keep raw; only count features to avoid leakage.
- Combine TF-IDF with numeric features via stacking or concatenation.

Modeling v2:
- CatBoost (GPU) on dense features + TF-IDF SVD projections.
- XGBoost (GPU) with monotone constraints not needed; tune depth/eta early stop.

Transformer track (parallel, GPU):
- Start with DeBERTa-v3-base/large or RoBERTa-large (cu121 stack).
- Truncate to max tokens (e.g., 1024 via Longformer/DeBERTa-v3-long if feasible).
- Regression head; train with MSE + QWK-aware post-processing.
- Use gradient accumulation, mixed precision, early stopping.
- Cache OOF/test preds; blend with classical models.

Blending:
- Weighted average using OOF QWK for weights; optionally logistic regression meta on OOF.
- Calibrate via optimized rounding per prompt if prompt available (check cols).

Risk checks:
- No leakage from test during vectorizer fitting.
- Deterministic seeds; save folds to folds.csv.
- Log per-fold timings and scores.

Milestones (request expert review at each):
1) Plan + environment check
2) Data load + EDA + CV design
3) Baseline TF-IDF model + OOF
4) FE v2 + GBDT model
5) Transformer baseline + OOF
6) Blend + finalize submission

Questions for experts:
- Best CV protocol for AES2 (any prompt-based stratification needed)?
- Top text features beyond TF-IDF shown to help in AES2?
- Recommended long-context model choice and tokenization strategy under 24h?
- Common pitfalls that tank LB vs CV in this comp?

In [1]:
# Environment + quick EDA
import os, sys, subprocess, time
import pandas as pd
import numpy as np

def run(cmd):
    return subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True).stdout

print('=== NVIDIA-SMI ===', flush=True)
print(run(['bash','-lc','nvidia-smi || true']))

t0=time.time()
train_path='train.csv'; test_path='test.csv'
print('Loading data...', flush=True)
train=pd.read_csv(train_path)
test=pd.read_csv(test_path)
print(f'train shape: {train.shape}, test shape: {test.shape}', flush=True)
print('train columns:', list(train.columns))
print('test columns:', list(test.columns))

# Identify id, text, target, prompt columns heuristically
id_col = 'essay_id' if 'essay_id' in train.columns else (train.columns[0])
text_col_candidates = [c for c in train.columns if 'text' in c.lower() or 'essay' in c.lower() or 'content' in c.lower()]
text_col = text_col_candidates[0] if text_col_candidates else None
target_col = 'score' if 'score' in train.columns else None
prompt_col = None
for c in train.columns:
    if 'prompt' in c.lower() or 'topic' in c.lower():
        prompt_col = c; break

# Fix: enforce correct text column if available
if 'full_text' in train.columns:
    text_col = 'full_text'

print(f'Heuristic cols -> id: {id_col}, text: {text_col}, target: {target_col}, prompt: {prompt_col}')

if target_col is not None:
    y = train[target_col].dropna()
    print('Target stats:', y.describe())
    print('Unique scores:', np.sort(y.unique())[:20], '... total', y.nunique())

if text_col is not None:
    lens = train[text_col].astype(str).str.len()
    print('Text length (chars) percentiles:', np.percentile(lens.to_numpy(), [1,5,25,50,75,90,95,99]))

if prompt_col is not None:
    print('Unique prompts in train:', train[prompt_col].nunique())
    if prompt_col in test.columns:
        print('Unique prompts in test:', test[prompt_col].nunique())
        seen = set(train[prompt_col].unique()); unseen = [p for p in test[prompt_col].unique() if p not in seen]
        print('Unseen prompts in test:', unseen[:10], 'count:', len(unseen))

print('Head (selected):')
cols_show = [c for c in [id_col, prompt_col, target_col, text_col] if c is not None]
print(train[cols_show].head(3).to_dict(orient='records'))
print(f'Done in {time.time()-t0:.2f}s', flush=True)

=== NVIDIA-SMI ===


Sun Sep 28 22:33:13 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     182MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

train shape: (15576, 3), test shape: (1731, 2)


train columns: ['essay_id', 'full_text', 'score']
test columns: ['essay_id', 'full_text']
Heuristic cols -> id: essay_id, text: essay_id, target: score, prompt: None
Target stats: count    15576.000000
mean         2.950116
std          1.044384
min          1.000000
25%          2.000000
50%          3.000000
75%          4.000000
max          6.000000
Name: score, dtype: float64
Unique scores: [1 2 3 4 5 6] ... total 6
Text length (chars) percentiles: [7. 7. 7. 7. 7. 7. 7. 7.]
Head (selected):
[{'essay_id': '663d2cf', 'score': 3}, {'essay_id': '3a20bfb', 'score': 2}, {'essay_id': '6adae64', 'score': 3}]
Done in 0.25s


  print(train[cols_show].head(3).to_dict(orient='records'))


In [3]:
# Build and save folds (5-fold StratifiedKFold on score x length bins)
import pandas as pd, numpy as np, time
from sklearn.model_selection import StratifiedKFold

t0=time.time()
train = pd.read_csv('train.csv')
id_col, text_col, target_col = 'essay_id', 'full_text', 'score'

# Basic sanity
assert {id_col, text_col, target_col}.issubset(train.columns), f"Missing required columns in train: {train.columns}"
train = train.copy()

# Create stratification label: combine score and length bin
y = train[target_col].astype(int).values
lens = train[text_col].astype(str).str.len().values
len_series = pd.Series(lens)
nq = int(np.clip(len_series.nunique(), 4, 10))
len_bins = pd.qcut(len_series, q=nq, duplicates='drop', labels=False)
len_bins = len_bins.astype('float64').fillna(len_bins.median()).astype(int).values
strat = y * 100 + len_bins  # joint bins

n_splits = 5
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
folds = np.full(len(train), -1, dtype=int)
for fold, (_, val_idx) in enumerate(skf.split(train, strat)):
    folds[val_idx] = fold

assert (folds>=0).all(), 'Unassigned folds found'
df_folds = train[[id_col, target_col]].copy()
df_folds['fold'] = folds
df_folds.to_csv('folds.csv', index=False)

# Print per-fold stats
print('Folds saved to folds.csv')
for f in range(n_splits):
    idx = folds==f
    print(f'Fold {f}: n={idx.sum()}, score dist=', dict(pd.Series(y[idx]).value_counts().sort_index()))

print(f'Done in {time.time()-t0:.2f}s', flush=True)

Folds saved to folds.csv
Fold 0: n=3116, score dist= {1: 225, 2: 852, 3: 1125, 4: 713, 5: 174, 6: 27}
Fold 1: n=3115, score dist= {1: 223, 2: 851, 3: 1126, 4: 713, 5: 175, 6: 27}
Fold 2: n=3115, score dist= {1: 226, 2: 851, 3: 1124, 4: 712, 5: 175, 6: 27}
Fold 3: n=3115, score dist= {1: 225, 2: 847, 3: 1127, 4: 713, 5: 176, 6: 27}
Fold 4: n=3115, score dist= {1: 225, 2: 848, 3: 1127, 4: 712, 5: 176, 6: 27}
Done in 0.22s


In [7]:
# Baseline v1: TF-IDF (word+char_wb) + Ridge with 5-fold CV, OOF QWK, global thresholds, and submission
import time, numpy as np, pandas as pd, sys
from scipy.sparse import hstack, csr_matrix
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import Ridge
from sklearn.metrics import cohen_kappa_score

SEED = 42
np.random.seed(SEED)

def qwk(y_true_int, y_pred_int):
    return cohen_kappa_score(y_true_int, y_pred_int, weights='quadratic')

def apply_thresholds(pred, th):
    # thresholds between classes 1..6; th length 5
    bins = [-np.inf] + list(th) + [np.inf]
    return np.digitize(pred, bins)  # returns 1..6

def optimize_thresholds(y_true, preds, iters=3, step=0.05):
    th = np.array([1.5, 2.5, 3.5, 4.5, 5.5], dtype=float)
    best = qwk(y_true, apply_thresholds(preds, th))
    for _ in range(iters):
        for i in range(len(th)):
            lo = th[i] - 0.5
            hi = th[i] + 0.5
            # ensure monotonicity with neighbors
            if i>0: lo = max(lo, th[i-1] + 0.01)
            if i<len(th)-1: hi = min(hi, th[i+1] - 0.01)
            grid = np.arange(lo, hi + 1e-9, step)
            local_best = best; local_val = th[i]
            for g in grid:
                th_try = th.copy(); th_try[i] = g
                score = qwk(y_true, apply_thresholds(preds, th_try))
                if score > local_best:
                    local_best = score; local_val = g
            th[i] = local_val; best = local_best
    return th, best

t0 = time.time()
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
folds = pd.read_csv('folds.csv')

id_col, text_col, target_col = 'essay_id', 'full_text', 'score'  
assert {id_col, text_col, target_col}.issubset(train.columns)

y = train[target_col].astype(int).values
X_text = train[text_col].astype(str).values
X_test_text = test[text_col].astype(str).values

n_splits = int(folds['fold'].max()) + 1
oof = np.zeros(len(train), dtype=np.float32)
test_pred_folds = np.zeros((len(test), n_splits), dtype=np.float32)

# Vectorizer configs
word_vec_kwargs = dict(lowercase=True, analyzer='word', ngram_range=(1,2), min_df=2, max_features=150_000, sublinear_tf=True, dtype=np.float32)
char_vec_kwargs = dict(lowercase=True, analyzer='char_wb', ngram_range=(3,5), min_df=2, max_features=200_000, sublinear_tf=True, dtype=np.float32)

for f in range(n_splits):
    f_t0 = time.time()
    tr_idx = folds.index[folds['fold']!=f].to_numpy()
    va_idx = folds.index[folds['fold']==f].to_numpy()
    print(f'Fold {f} start: tr={len(tr_idx)} va={len(va_idx)}', flush=True)

    Xtr = X_text[tr_idx]; Xva = X_text[va_idx]
    ytr = y[tr_idx]

    # Fit vectorizers on training fold only
    wv = TfidfVectorizer(**word_vec_kwargs)
    cv = TfidfVectorizer(**char_vec_kwargs)
    Xtr_w = wv.fit_transform(Xtr)
    Xtr_c = cv.fit_transform(Xtr)
    Xtr_all = hstack([Xtr_w, Xtr_c], format='csr')
    del Xtr_w, Xtr_c

    Xva_w = wv.transform(Xva)
    Xva_c = cv.transform(Xva)
    Xva_all = hstack([Xva_w, Xva_c], format='csr')
    del Xva_w, Xva_c

    Xte_w = wv.transform(X_test_text)
    Xte_c = cv.transform(X_test_text)
    Xte_all = hstack([Xte_w, Xte_c], format='csr')
    del Xte_w, Xte_c

    # Model
    model = Ridge(alpha=4.0, random_state=SEED)
    model.fit(Xtr_all, ytr)
    oof_pred = model.predict(Xva_all).astype(np.float32)
    test_pred = model.predict(Xte_all).astype(np.float32)
    oof[va_idx] = oof_pred
    test_pred_folds[:, f] = test_pred

    # Cleanup to free memory
    del Xtr_all, Xva_all, Xte_all, model
    print(f'Fold {f} done in {time.time()-f_t0:.1f}s', flush=True)

# Evaluate OOF and optimize thresholds
base_th = np.array([1.5,2.5,3.5,4.5,5.5])
oof_int_base = apply_thresholds(oof, base_th)
oof_qwk_base = qwk(y, oof_int_base)
opt_th, oof_qwk_opt = optimize_thresholds(y, oof, iters=3, step=0.05)
print(f'OOF QWK base={oof_qwk_base:.5f} opt={oof_qwk_opt:.5f} thresholds={opt_th}')

# Finalize test predictions
test_pred = test_pred_folds.mean(axis=1)
test_pred_int = apply_thresholds(test_pred, opt_th)
test_pred_int = np.clip(test_pred_int, 1, 6).astype(int)

# Save artifacts
pd.DataFrame({'essay_id': train[id_col], 'oof_pred': oof, 'oof_int': apply_thresholds(oof, opt_th), 'y': y}).to_csv('oof_baseline.csv', index=False)
pd.DataFrame({'essay_id': test[id_col], 'score': test_pred_int}).to_csv('submission_ridge.csv', index=False)
np.save('test_ridge.npy', test_pred.astype(np.float32))
print('Saved oof_baseline.csv, submission_ridge.csv, and test_ridge.npy')
print(f'Total time: {time.time()-t0:.1f}s', flush=True)

Fold 0 start: tr=12460 va=3116


Fold 0 done in 26.9s


Fold 1 start: tr=12461 va=3115


Fold 1 done in 27.2s


Fold 2 start: tr=12461 va=3115


Fold 2 done in 27.1s


Fold 3 start: tr=12461 va=3115


Fold 3 done in 27.1s


Fold 4 start: tr=12461 va=3115


Fold 4 done in 27.0s


OOF QWK base=0.74059 opt=0.78642 thresholds=[1.9  2.66 3.35 4.1  4.7 ]
Saved oof_baseline.csv, submission_ridge.csv, and test_ridge.npy
Total time: 137.3s


In [5]:
# Classical v2: Numeric FE + TF-IDF SVD(384) + CatBoost (GPU) per-fold; cache OOF/test preds
import time, os, sys, numpy as np, pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import cohen_kappa_score
from scipy.sparse import hstack
import re
import subprocess

def ensure_pkg(pkg):
    try:
        __import__(pkg)
    except ImportError:
        print(f'Installing {pkg}...', flush=True)
        subprocess.run([sys.executable, '-m', 'pip', 'install', pkg], check=True)

ensure_pkg('catboost')
from catboost import CatBoostRegressor, Pool

SEED=42
np.random.seed(SEED)

def qwk_int(y_true, y_pred_int):
    return cohen_kappa_score(y_true, y_pred_int, weights='quadratic')

def text_stats(s: str):
    s = '' if pd.isna(s) else str(s)
    n_chars = len(s)
    n_newlines = s.count('\n')
    # simple sentence split on .!?
    sents = re.split(r'[.!?]+', s)
    sents = [t for t in sents if t.strip()]
    n_sents = max(1, len(sents))
    words = re.findall(r"\b\w+\b", s)
    n_words = len(words)
    avg_word_len = (sum(len(w) for w in words) / n_words) if n_words>0 else 0.0
    sent_lens = [len(re.findall(r"\b\w+\b", t)) for t in sents]
    avg_sent_len_w = (sum(sent_lens) / n_sents) if n_sents>0 else 0.0
    std_sent_len_w = (np.std(sent_lens) if n_sents>1 else 0.0)
    uniq = len(set(w.lower() for w in words)) if n_words>0 else 0
    ttr = (uniq / n_words) if n_words>0 else 0.0
    hapax = sum(1 for w in set(words) if words.count(w)==1)
    hapax_ratio = (hapax / n_words) if n_words>0 else 0.0
    long_words = sum(1 for w in words if len(w)>=7)
    pct_long = (100.0 * long_words / n_words) if n_words>0 else 0.0
    punct = re.findall(r"[\p{Punct}]", s) if hasattr(re, 'P') else re.findall(r"[\.,;:!\?\-\(\)\'\"\[\]]", s)
    punct_cnt = len(punct)
    punct_pct = (100.0 * punct_cnt / max(1, n_chars))
    commas = s.count(','); periods = s.count('.')
    commas_per_100w = (100.0 * commas / max(1, n_words))
    periods_per_100w = (100.0 * periods / max(1, n_words))
    uppercase_pct = (100.0 * sum(1 for ch in s if ch.isupper()) / max(1, n_chars))
    digits_per_100w = (100.0 * sum(1 for ch in s if ch.isdigit()) / max(1, n_words))
    # FKGL approximation
    syllables = 0
    for w in words:
        syl = max(1, len(re.findall(r'[aeiouyAEIOUY]+', w)))
        syllables += syl
    fkgl = 0.39 * (n_words / max(1, n_sents)) + 11.8 * (syllables / max(1, n_words)) - 15.59 if n_words>0 else 0.0
    return [n_chars, n_words, n_sents, n_newlines, avg_word_len, avg_sent_len_w, std_sent_len_w,
            ttr, hapax_ratio, pct_long, punct_pct, commas_per_100w, periods_per_100w,
            uppercase_pct, digits_per_100w, fkgl]

num_cols = [
    'n_chars','n_words','n_sents','n_newlines','avg_word_len','avg_sent_len_w','std_sent_len_w',
    'ttr','hapax_ratio','pct_long','punct_pct','commas_per_100w','periods_per_100w',
    'uppercase_pct','digits_per_100w','fkgl'
]

def build_numeric(df, text_col):
    feats = np.vstack([text_stats(t) for t in df[text_col].astype(str).values])
    return pd.DataFrame(feats, columns=num_cols, index=df.index)

t0=time.time()
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
folds = pd.read_csv('folds.csv')
id_col, text_col, target_col = 'essay_id','full_text','score'
y = train[target_col].astype(int).values

print('Computing numeric features...', flush=True)
num_tr = build_numeric(train, text_col)
num_te = build_numeric(test, text_col)

# TF-IDF for SVD
word_vec_kwargs = dict(lowercase=True, analyzer='word', ngram_range=(1,2), min_df=2, max_features=150_000, sublinear_tf=True, dtype=np.float32)
char_vec_kwargs = dict(lowercase=True, analyzer='char_wb', ngram_range=(3,5), min_df=2, max_features=200_000, sublinear_tf=True, dtype=np.float32)

n_splits = int(folds['fold'].max()) + 1
oof = np.zeros(len(train), dtype=np.float32)
test_pred_f = np.zeros((len(test), n_splits), dtype=np.float32)

for f in range(n_splits):
    f_t=time.time()
    tr_idx = folds.index[folds['fold']!=f].to_numpy()
    va_idx = folds.index[folds['fold']==f].to_numpy()
    print(f'[CatBoost] Fold {f} start: tr={len(tr_idx)} va={len(va_idx)}', flush=True)

    # Text vectorizers fit on train fold
    wv = TfidfVectorizer(**word_vec_kwargs)
    cv = TfidfVectorizer(**char_vec_kwargs)
    Xtr_w = wv.fit_transform(train.loc[tr_idx, text_col].astype(str).values)
    Xtr_c = cv.fit_transform(train.loc[tr_idx, text_col].astype(str).values)
    Xtr_tfidf = hstack([Xtr_w, Xtr_c], format='csr')
    Xva_tfidf = hstack([wv.transform(train.loc[va_idx, text_col].astype(str).values),
                        cv.transform(train.loc[va_idx, text_col].astype(str).values)], format='csr')
    Xte_tfidf = hstack([wv.transform(test[text_col].astype(str).values),
                        cv.transform(test[text_col].astype(str).values)], format='csr')
    del Xtr_w, Xtr_c

    # SVD fit on train fold only
    svd = TruncatedSVD(n_components=384, random_state=SEED)
    Xtr_svd = svd.fit_transform(Xtr_tfidf)
    Xva_svd = svd.transform(Xva_tfidf)
    Xte_svd = svd.transform(Xte_tfidf)

    # Scale SVD and numeric
    scaler = StandardScaler(with_mean=True, with_std=True)
    Xtr_dense = np.hstack([scaler.fit_transform(Xtr_svd), scaler.fit_transform(num_tr.loc[tr_idx, :].values)])
    # Note: use the same scaler separately for numeric to avoid leakage? Simpler: fit one scaler on concatenated feats.
    # Recompute scaler on concatenated to ensure consistency
    scaler2 = StandardScaler(with_mean=True, with_std=True)
    Xtr_concat = np.hstack([Xtr_svd, num_tr.loc[tr_idx, :].values])
    Xtr_dense = scaler2.fit_transform(Xtr_concat)
    Xva_dense = scaler2.transform(np.hstack([Xva_svd, num_tr.loc[va_idx, :].values]))
    Xte_dense = scaler2.transform(np.hstack([Xte_svd, num_te.values]))

    # CatBoost (GPU) with early stopping
    params = dict(loss_function='RMSE', depth=6, learning_rate=0.05, l2_leaf_reg=4.0,
                  random_seed=SEED, task_type='GPU', devices='0',
                  iterations=2000, od_type='Iter', od_wait=100, verbose=False)
    model = CatBoostRegressor(**params)
    model.fit(Xtr_dense, y[tr_idx], eval_set=(Xva_dense, y[va_idx]))
    oof[va_idx] = model.predict(Xva_dense).astype(np.float32)
    test_pred_f[:, f] = model.predict(Xte_dense).astype(np.float32)

    # cleanup
    del Xtr_tfidf, Xva_tfidf, Xte_tfidf, Xtr_svd, Xva_svd, Xte_svd, Xtr_dense, Xva_dense, Xte_dense, model, svd, scaler2
    print(f'[CatBoost] Fold {f} done in {time.time()-f_t:.1f}s', flush=True)

# Save OOF and test preds
pd.DataFrame({'essay_id': train[id_col], 'oof_cat': oof, 'y': y}).to_csv('oof_cat.csv', index=False)
np.save('test_cat.npy', test_pred_f.mean(axis=1))
print('Saved oof_cat.csv and test_cat.npy; re-opt thresholds and blend later.', flush=True)
print(f'Total time: {time.time()-t0:.1f}s', flush=True)

Computing numeric features...


[CatBoost] Fold 0 start: tr=12460 va=3116


[CatBoost] Fold 0 done in 114.7s


[CatBoost] Fold 1 start: tr=12461 va=3115


[CatBoost] Fold 1 done in 104.2s


[CatBoost] Fold 2 start: tr=12461 va=3115


[CatBoost] Fold 2 done in 114.4s


[CatBoost] Fold 3 start: tr=12461 va=3115


[CatBoost] Fold 3 done in 113.0s


[CatBoost] Fold 4 start: tr=12461 va=3115


[CatBoost] Fold 4 done in 116.6s


Saved oof_cat.csv and test_cat.npy; re-opt thresholds and blend later.


Total time: 581.5s


In [6]:
# Postprocessing: Optimize thresholds on CatBoost OOF and create classical submission
import os, time, numpy as np, pandas as pd
from sklearn.metrics import cohen_kappa_score

def qwk(y_true_int, y_pred_int):
    return cohen_kappa_score(y_true_int, y_pred_int, weights='quadratic')

def apply_thresholds(pred, th):
    bins = [-np.inf] + list(th) + [np.inf]
    return np.digitize(pred, bins)

def optimize_thresholds(y_true, preds, iters=3, step=0.05):
    th = np.array([1.5, 2.5, 3.5, 4.5, 5.5], dtype=float)
    best = qwk(y_true, apply_thresholds(preds, th))
    for _ in range(iters):
        for i in range(len(th)):
            lo = th[i] - 0.5; hi = th[i] + 0.5
            if i>0: lo = max(lo, th[i-1] + 0.01)
            if i<len(th)-1: hi = min(hi, th[i+1] - 0.01)
            grid = np.arange(lo, hi + 1e-9, step)
            local_best, local_val = best, th[i]
            for g in grid:
                th_try = th.copy(); th_try[i] = g
                score = qwk(y_true, apply_thresholds(preds, th_try))
                if score > local_best:
                    local_best, local_val = score, g
            th[i] = local_val; best = local_best
    return th, best

t0=time.time()
assert os.path.exists('oof_cat.csv') and os.path.exists('test_cat.npy'), 'CatBoost OOF/test not found yet.'
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
oof_cat = pd.read_csv('oof_cat.csv')
test_cat = np.load('test_cat.npy')

y = oof_cat['y'].astype(int).values
pred_oof = oof_cat['oof_cat'].values.astype(np.float32)
base_th = np.array([1.5,2.5,3.5,4.5,5.5])
oof_qwk_base = qwk(y, apply_thresholds(pred_oof, base_th))
opt_th, oof_qwk_opt = optimize_thresholds(y, pred_oof, iters=3, step=0.05)
print(f'[CatBoost] OOF QWK base={oof_qwk_base:.5f} opt={oof_qwk_opt:.5f} thresholds={opt_th}', flush=True)

test_int = apply_thresholds(test_cat, opt_th)
test_int = np.clip(test_int, 1, 6).astype(int)
sub_cat = pd.DataFrame({'essay_id': test['essay_id'], 'score': test_int})
sub_cat.to_csv('submission_classical.csv', index=False)
print('Saved submission_classical.csv', flush=True)
print(f'Done in {time.time()-t0:.1f}s', flush=True)

[CatBoost] OOF QWK base=0.79489 opt=0.81219 thresholds=[1.75 2.6  3.45 4.3  4.95]


Saved submission_classical.csv


Done in 1.9s


In [10]:
# Blend classical (CatBoost) and DeBERTa (prefer SW64 if available); optimize thresholds; optional isotonic calibration; write final submission
import os, numpy as np, pandas as pd, time
from sklearn.metrics import cohen_kappa_score
from sklearn.isotonic import IsotonicRegression

def qwk(y_true_int, y_pred_int):
    return cohen_kappa_score(y_true_int, y_pred_int, weights='quadratic')

def apply_thresholds(pred, th):
    bins = [-np.inf] + list(th) + [np.inf]
    return np.digitize(pred, bins)

def optimize_thresholds(y_true, preds, iters=3, step=0.05):
    th = np.array([1.5, 2.5, 3.5, 4.5, 5.5], dtype=float)
    best = qwk(y_true, apply_thresholds(preds, th))
    for _ in range(iters):
        for i in range(len(th)):
            lo = th[i] - 0.5; hi = th[i] + 0.5
            if i>0: lo = max(lo, th[i-1] + 0.01)
            if i<len(th)-1: hi = min(hi, th[i+1] - 0.01)
            grid = np.arange(lo, hi + 1e-9, step)
            local_best, local_val = best, th[i]
            for g in grid:
                th_try = th.copy(); th_try[i] = g
                score = qwk(y_true, apply_thresholds(preds, th_try))
                if score > local_best:
                    local_best, local_val = score, g
            th[i] = local_val; best = local_best
    return th, best

t0=time.time()
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

# Load CatBoost
oof_cat = pd.read_csv('oof_cat.csv')
test_cat = np.load('test_cat.npy').astype(np.float32)
y = oof_cat['y'].astype(int).values
pred_cat = oof_cat['oof_cat'].values.astype(np.float32)

# Load DeBERTa: prefer SW64 artifacts, else base
deb_oof_path, deb_test_path = None, None
if os.path.exists('oof_deberta_base_sw64.csv') and os.path.exists('test_deberta_base_sw64.npy'):
    deb_oof_path, deb_test_path = 'oof_deberta_base_sw64.csv', 'test_deberta_base_sw64.npy'
elif os.path.exists('oof_deberta_base.csv') and os.path.exists('test_deberta_base.npy'):
    deb_oof_path, deb_test_path = 'oof_deberta_base.csv', 'test_deberta_base.npy'

has_deb = deb_oof_path is not None
if not has_deb:
    print('DeBERTa artifacts not found; using CatBoost only.', flush=True)
    opt_th, oof_qwk_opt = optimize_thresholds(y, pred_cat, iters=3, step=0.05)
    test_int = apply_thresholds(np.clip(test_cat,1,6), opt_th)
    test_int = np.clip(test_int, 1, 6).astype(int)
    pd.DataFrame({'essay_id': test['essay_id'], 'score': test_int}).to_csv('submission_blend.csv', index=False)
    print(f'Classical-only OOF QWK={oof_qwk_opt:.5f}; wrote submission_blend.csv in {time.time()-t0:.1f}s')
else:
    oof_deb = pd.read_csv(deb_oof_path)
    pred_deb = oof_deb['oof_deberta'].values.astype(np.float32)
    test_deb = np.load(deb_test_path).astype(np.float32)
    # Grid-search blend weight on OOF
    best = (-1.0, None, None, None, None)  # (qwk, w, th, use_iso(bool), info_str)
    for w in np.linspace(0.4, 0.7, 13):
        blend_oof = np.clip(w*pred_deb + (1.0-w)*pred_cat, 1, 6)
        # Uncalibrated thresholds
        th_u, q_u = optimize_thresholds(y, blend_oof, iters=3, step=0.05)
        info_u = f'w={w:.2f} unc'  # tag
        if q_u > best[0]:
            best = (q_u, w, th_u, False, info_u)
        # Isotonic calibration then thresholds
        iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
        iso.fit(blend_oof, y)
        oof_cal = np.clip(iso.predict(blend_oof), 1, 6).astype(np.float32)
        th_i, q_i = optimize_thresholds(y, oof_cal, iters=3, step=0.05)
        info_i = f'w={w:.2f} iso'
        if q_i > best[0]:
            best = (q_i, w, th_i, True, info_i)
    best_qwk, best_w, best_th, use_iso, tag = best
    print(f'Blend search: best OOF QWK={best_qwk:.5f} at {tag}, th={np.round(best_th,3)}', flush=True)
    # Apply to test
    blend_test = np.clip(best_w*test_deb + (1.0-best_w)*test_cat, 1, 6)
    if use_iso:
        iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
        iso.fit(np.clip(best_w*pred_deb + (1.0-best_w)*pred_cat,1,6), y)
        blend_test = np.clip(iso.predict(blend_test), 1, 6).astype(np.float32)
    test_int = apply_thresholds(blend_test, best_th)
    test_int = np.clip(test_int, 1, 6).astype(int)
    pd.DataFrame({'essay_id': test['essay_id'], 'score': test_int}).to_csv('submission_blend.csv', index=False)
    print(f'Wrote submission_blend.csv in {time.time()-t0:.1f}s', flush=True)

Blend search: best OOF QWK=0.82764 at w=0.43 iso, th=[1.75 2.55 3.4  4.25 5.05]


Wrote submission_blend.csv in 43.4s


In [11]:
# Seed-bagging DeBERTa + CatBoost blend with isotonic and global thresholds; robust to partial availability
import os, numpy as np, pandas as pd, time
from sklearn.metrics import cohen_kappa_score
from sklearn.isotonic import IsotonicRegression

def qwk(y_true_int, y_pred_int):
    return cohen_kappa_score(y_true_int, y_pred_int, weights='quadratic')

def apply_thresholds(pred, th):
    bins = [-np.inf] + list(th) + [np.inf]
    return np.digitize(pred, bins)

def optimize_thresholds(y_true, preds, iters=3, step=0.05):
    th = np.array([1.5, 2.5, 3.5, 4.5, 5.5], dtype=float)
    best = qwk(y_true, apply_thresholds(preds, th))
    for _ in range(iters):
        for i in range(len(th)):
            lo = th[i] - 0.5; hi = th[i] + 0.5
            if i>0: lo = max(lo, th[i-1] + 0.01)
            if i<len(th)-1: hi = min(hi, th[i+1] - 0.01)
            grid = np.arange(lo, hi + 1e-9, step)
            local_best, local_val = best, th[i]
            for g in grid:
                th_try = th.copy(); th_try[i] = g
                score = qwk(y_true, apply_thresholds(preds, th_try))
                if score > local_best:
                    local_best, local_val = score, g
            th[i] = local_val; best = local_best
    return th, best

def load_seed_views(seed_tag, base_prefix):
    # Returns dict with keys: 'oof', 'test' for best-available DeB view for a seed
    # Preference: combined TTA -> manual combine (sw64/sw128/ht) -> sw64 only -> base sw64 (seed 42 legacy)
    oof, test = None, None
    # 1) Combined TTA artifacts
    oof_tta = f'oof_deberta_base_{base_prefix}.csv'
    test_tta = f'test_deberta_base_{base_prefix}.npy'
    if os.path.exists(oof_tta) and os.path.exists(test_tta):
        df = pd.read_csv(oof_tta); oof = df['oof_deberta'].values.astype(np.float32); test = np.load(test_tta).astype(np.float32)
        return oof, test, f'{seed_tag}:combined'
    # 2) Manual combine of views if available
    parts = {}
    for view in ['sw64','sw128','ht']:
        oof_p = f'oof_deberta_base_{base_prefix}_{view}.csv'
        test_p = f'test_deberta_base_{base_prefix}_{view}.npy'
        if os.path.exists(oof_p) and os.path.exists(test_p):
            dfv = pd.read_csv(oof_p); parts[view] = (dfv['oof_deberta'].values.astype(np.float32), np.load(test_p).astype(np.float32))
    if parts:
        oof = np.zeros_like(next(iter(parts.values()))[0], dtype=np.float32)
        test = np.zeros_like(next(iter(parts.values()))[1], dtype=np.float32)
        wsum = 0.0
        if 'sw64' in parts:
            oof += 0.4*parts['sw64'][0]; test += 0.4*parts['sw64'][1]; wsum += 0.4
        if 'sw128' in parts:
            oof += 0.4*parts['sw128'][0]; test += 0.4*parts['sw128'][1]; wsum += 0.4
        if 'ht' in parts:
            oof += 0.2*parts['ht'][0]; test += 0.2*parts['ht'][1]; wsum += 0.2
        if wsum > 0: oof /= wsum; test /= wsum; return oof, test, f'{seed_tag}:views'
    # 3) Single-view sw64
    oof_p = f'oof_deberta_base_{base_prefix}_sw64.csv'; test_p = f'test_deberta_base_{base_prefix}_sw64.npy'
    if os.path.exists(oof_p) and os.path.exists(test_p):
        dfv = pd.read_csv(oof_p); return dfv['oof_deberta'].values.astype(np.float32), np.load(test_p).astype(np.float32), f'{seed_tag}:sw64'
    # 4) Legacy base sw64 (seed 42)
    if base_prefix == 'sw64':
        if os.path.exists('oof_deberta_base_sw64.csv') and os.path.exists('test_deberta_base_sw64.npy'):
            dfv = pd.read_csv('oof_deberta_base_sw64.csv'); return dfv['oof_deberta'].values.astype(np.float32), np.load('test_deberta_base_sw64.npy').astype(np.float32), f'{seed_tag}:legacy_sw64'
    return None, None, None

t0 = time.time()
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

# Load CatBoost
oof_cat = pd.read_csv('oof_cat.csv')
pred_cat_oof = oof_cat['oof_cat'].values.astype(np.float32)
y = oof_cat['y'].astype(int).values
test_cat = np.load('test_cat.npy').astype(np.float32)

# Load DeBERTa seeds
deb_oofs = []
deb_tests = []
tags = []

# Seed 42 legacy (base sw64 artifacts)
o42_oof, o42_test, tag42 = load_seed_views('s042', 'sw64')
if o42_oof is not None:
    deb_oofs.append(o42_oof); deb_tests.append(o42_test); tags.append(tag42)

# Seed 777 (when available, prefer combined)
o777_oof, o777_test, tag777 = load_seed_views('s777', 's777')
if o777_oof is not None:
    deb_oofs.append(o777_oof); deb_tests.append(o777_test); tags.append(tag777)

# Seed 2025 (optional, in future)
o2025_oof, o2025_test, tag2025 = load_seed_views('s2025', 's2025')
if o2025_oof is not None:
    deb_oofs.append(o2025_oof); deb_tests.append(o2025_test); tags.append(tag2025)

assert len(deb_oofs) > 0, 'No DeBERTa seed artifacts found yet.'
print('Loaded DeB seeds:', tags, flush=True)

# Row-wise average across available seeds
deb_oof_bag = np.mean(np.stack(deb_oofs, axis=1), axis=1).astype(np.float32)
deb_test_bag = np.mean(np.stack(deb_tests, axis=1), axis=1).astype(np.float32)

# Blend DeB bag with CatBoost; isotonic after blend; optimize thresholds on isotonic outputs
best = (-1.0, None, None, None)  # (qwk, w_deb, best_th, use_iso_flag)
for w in np.arange(0.55, 0.801, 0.02):
    blend_oof = np.clip(w*deb_oof_bag + (1.0-w)*pred_cat_oof, 1, 6).astype(np.float32)
    # isotonic
    iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
    iso.fit(blend_oof, y)
    oof_cal = np.clip(iso.predict(blend_oof), 1, 6).astype(np.float32)
    th_i, q_i = optimize_thresholds(y, oof_cal, iters=3, step=0.05)
    if q_i > best[0]:
        best = (q_i, w, th_i, True)
    # also check uncalibrated as fallback
    th_u, q_u = optimize_thresholds(y, blend_oof, iters=3, step=0.05)
    if q_u > best[0]:
        best = (q_u, w, th_u, False)

best_q, best_w, best_th, use_iso = best
print(f'[SeedBag] Best OOF QWK={best_q:.5f} with w_deb={best_w:.2f}, iso={use_iso}, th={np.round(best_th,3)}', flush=True)

# Apply to test
blend_test = np.clip(best_w*deb_test_bag + (1.0-best_w)*test_cat, 1, 6).astype(np.float32)
if use_iso:
    # Fit iso on OOF blend for consistency
    blend_oof_final = np.clip(best_w*deb_oof_bag + (1.0-best_w)*pred_cat_oof, 1, 6).astype(np.float32)
    iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
    iso.fit(blend_oof_final, y)
    blend_test = np.clip(iso.predict(blend_test), 1, 6).astype(np.float32)
test_int = apply_thresholds(blend_test, best_th)
test_int = np.clip(test_int, 1, 6).astype(int)
pd.DataFrame({'essay_id': test['essay_id'], 'score': test_int}).to_csv('submission_bag.csv', index=False)
print(f'[SeedBag] Wrote submission_bag.csv in {time.time()-t0:.1f}s', flush=True)

Loaded DeB seeds: ['s042:combined', 's777:combined']


[SeedBag] Best OOF QWK=0.82975 with w_deb=0.57, iso=True, th=[1.75 2.55 3.35 4.25 5.05]


[SeedBag] Wrote submission_bag.csv in 43.7s


In [27]:
# CPU-only: Fixed partial-seed handling, TTA reweight with valid-fold mask, coverage-weighted bagging, per-seed iso, post-blend iso; write submission_bag_rew.csv
import os, time, numpy as np, pandas as pd
from sklearn.metrics import cohen_kappa_score
from sklearn.isotonic import IsotonicRegression

def qwk(y_true_int, y_pred_int):
    return cohen_kappa_score(y_true_int, y_pred_int, weights='quadratic')

def apply_thresholds(pred, th):
    bins = [-np.inf] + list(th) + [np.inf]
    return np.digitize(pred, bins)

def optimize_thresholds_constrained(y_true, preds, iters=3, coarse_step=0.025, fine_step=0.005, base=None, max_delta=0.25):
    base_th = np.array([1.5, 2.5, 3.5, 4.5, 5.5], dtype=float) if base is None else np.array(base, dtype=float)
    th = base_th.copy()
    best = qwk(y_true, apply_thresholds(preds, th))
    # coarse passes
    for _ in range(iters):
        for i in range(5):
            lo = base_th[i] - max_delta; hi = base_th[i] + max_delta
            if i>0: lo = max(lo, th[i-1] + 0.01)
            if i<4: hi = min(hi, th[i+1] - 0.01)
            for g in np.arange(lo, hi + 1e-9, coarse_step):
                th_try = th.copy(); th_try[i] = g
                score = qwk(y_true, apply_thresholds(preds, th_try))
                if score > best:
                    best, th[i] = score, g
    # fine pass around current th
    for i in range(5):
        lo = max(base_th[i] - max_delta, th[i] - coarse_step)
        hi = min(base_th[i] + max_delta, th[i] + coarse_step)
        if i>0: lo = max(lo, th[i-1] + 0.01)
        if i<4: hi = min(hi, th[i+1] - 0.01)
        for g in np.arange(lo, hi + 1e-9, fine_step):
            th_try = th.copy(); th_try[i] = g
            score = qwk(y_true, apply_thresholds(preds, th_try))
            if score > best:
                best, th[i] = score, g
    return th, best

def load_view(prefix):
    oof = pd.read_csv(f'oof_deberta_base_{prefix}.csv')['oof_deberta'].values.astype(np.float32)
    test = np.load(f'test_deberta_base_{prefix}.npy').astype(np.float32)
    return oof, test

def _find_oof_path(seed_prefix, view='sw64'):
    cands = [
        f'oof_deberta_base_{seed_prefix}_{view}.csv',
        f'oof_deberta_base_{seed_prefix}.csv',
    ]
    if seed_prefix in ('sw64','legacy','base'):
        cands.append('oof_deberta_base_sw64.csv')
    for p in cands:
        if os.path.exists(p):
            return p
    return None

def infer_trained_folds(seed_prefix, folds_df, n_splits, view='sw64'):
    p = _find_oof_path(seed_prefix, view=view)
    if p is None:
        return [False]*n_splits
    oof = pd.read_csv(p)['oof_deberta'].values.astype(np.float32)
    trained = []
    for f in range(n_splits):
        va = folds_df.index[folds_df['fold']==f].to_numpy()
        vals = oof[va]
        avail = np.isfinite(vals) & (vals != 0.0)
        trained.append(avail.mean() > 0.90)
    return trained

def add_seed(oof_arr, test_arr, seed_label, n_splits, folds_df, detect_label=None):
    trained = infer_trained_folds(detect_label or seed_label, folds_df, n_splits, view='sw64')
    folds_run = int(sum(trained))
    oof = oof_arr.astype(np.float32).copy()
    for f, ok in enumerate(trained):
        if not ok:
            va = folds_df.index[folds_df['fold']==f].to_numpy()
            oof[va] = np.nan
    w_cov = folds_run / float(n_splits)
    print(f'[add_seed] {seed_label} | folds_run={folds_run}/{n_splits} | w_cov={w_cov:.3f}', flush=True)
    return oof, test_arr.astype(np.float32), seed_label, folds_run, w_cov

def best_tta(o64, o128, oht, t64, t128, tht, y, valid_mask=None, allow_no_ht=False, prefer_large_grid=False):
    # Expert grid: bias to SW64, allow HT=0
    grid = [
        (0.70,0.30,0.00), (0.66,0.34,0.00), (0.60,0.40,0.00), (0.55,0.45,0.00),
        (0.60,0.30,0.10), (0.55,0.35,0.10), (0.55,0.30,0.15), (0.50,0.40,0.10),
    ]
    if not prefer_large_grid:
        # keep a couple of legacy safe points too
        grid += [(0.55,0.30,0.15), (0.50,0.35,0.15)]
    if allow_no_ht:
        pass  # already included no-HT
    best = (-1.0, None, None, None)
    base_bins = [-np.inf,1.5,2.5,3.5,4.5,5.5,np.inf]
    for w64,w128,wht in grid:
        oof_c = (w64*o64 + w128*o128 + wht*oht).astype(np.float32)
        vm = valid_mask if valid_mask is not None else np.isfinite(oof_c)
        q = qwk(y[vm], np.digitize(oof_c[vm], base_bins))
        if q > best[0]:
            test_c = (w64*t64 + w128*t128 + wht*tht).astype(np.float32)
            best = (q, (w64,w128,wht), oof_c, test_c)
    return best

def per_seed_iso(oof_seed, test_seed, y, valid_mask=None):
    vm = valid_mask if valid_mask is not None else np.isfinite(oof_seed)
    if vm.sum() < 10:
        return oof_seed, test_seed
    iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
    iso.fit(np.clip(oof_seed[vm],1,6), y[vm])
    oof_cal = np.clip(iso.predict(np.clip(oof_seed,1,6)), 1, 6).astype(np.float32)
    test_cal = np.clip(iso.predict(np.clip(test_seed,1,6)), 1, 6).astype(np.float32)
    return oof_cal, test_cal

t0 = time.time()
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
folds = pd.read_csv('folds.csv')
n_splits = int(folds['fold'].max()) + 1

# Load CatBoost
oof_cat = pd.read_csv('oof_cat.csv')
y = oof_cat['y'].astype(int).values
pred_cat_oof = oof_cat['oof_cat'].values.astype(np.float32)
test_cat = np.load('test_cat.npy').astype(np.float32)

deb_oofs_seeds = []
deb_tests_seeds = []
seed_cov_weights = []
seed_names = []
idx_sL = None  # track sL position for optional alpha search

# Seed 42 (SW64 full) + per-seed iso
if os.path.exists('oof_deberta_base_sw64.csv') and os.path.exists('test_deberta_base_sw64.npy'):
    o42 = pd.read_csv('oof_deberta_base_sw64.csv')['oof_deberta'].values.astype(np.float32)
    t42 = np.load('test_deberta_base_sw64.npy').astype(np.float32)
    o42, t42 = per_seed_iso(o42, t42, y)
    o42, t42, name42, fr42, wcov42 = add_seed(o42, t42, 's042_sw64', n_splits, folds, detect_label='sw64')
    deb_oofs_seeds.append(o42); deb_tests_seeds.append(t42); seed_cov_weights.append(wcov42); seed_names.append(name42)

# Seed 777 (views with TTA search) + per-seed iso
if all(os.path.exists(p) for p in [
    'oof_deberta_base_s777_sw64.csv','oof_deberta_base_s777_sw128.csv','oof_deberta_base_s777_ht.csv',
    'test_deberta_base_s777_sw64.npy','test_deberta_base_s777_sw128.npy','test_deberta_base_s777_ht.npy']):
    o64,t64 = load_view('s777_sw64')
    o128,t128 = load_view('s777_sw128')
    oht,tht = load_view('s777_ht')
    q,w777,oof777,tst777 = best_tta(o64,o128,oht,t64,t128,tht,y, valid_mask=None, allow_no_ht=False)
    oof777, tst777 = per_seed_iso(oof777, tst777, y)
    o777, t777, name777, fr777, wcov777 = add_seed(oof777, tst777, f's777_{w777}', n_splits, folds, detect_label='s777')
    deb_oofs_seeds.append(o777); deb_tests_seeds.append(t777); seed_cov_weights.append(wcov777); seed_names.append(name777)
elif os.path.exists('oof_deberta_base_s777.csv') and os.path.exists('test_deberta_base_s777.npy'):
    o777 = pd.read_csv('oof_deberta_base_s777.csv')['oof_deberta'].values.astype(np.float32)
    t777 = np.load('test_deberta_base_s777.npy').astype(np.float32)
    o777, t777 = per_seed_iso(o777, t777, y)
    o777, t777, name777, fr777, wcov777 = add_seed(o777, t777, 's777_combined', n_splits, folds, detect_label='s777')
    deb_oofs_seeds.append(o777); deb_tests_seeds.append(t777); seed_cov_weights.append(wcov777); seed_names.append(name777)

# Seed 2025 (views with valid-mask TTA; allow no-HT) + per-seed iso under trained mask
if all(os.path.exists(p) for p in [
    'oof_deberta_base_s2025_sw64.csv','oof_deberta_base_s2025_sw128.csv','oof_deberta_base_s2025_ht.csv',
    'test_deberta_base_s2025_sw64.npy','test_deberta_base_s2025_sw128.npy','test_deberta_base_s2025_ht.npy']):
    o64,t64 = load_view('s2025_sw64')
    o128,t128 = load_view('s2025_sw128')
    oht,tht = load_view('s2025_ht')
    trained_mask = (o64 != 0.0) & np.isfinite(o64)
    q,w2025,oof2025,tst2025 = best_tta(o64,o128,oht,t64,t128,tht,y, valid_mask=trained_mask, allow_no_ht=True)
    oof2025, tst2025 = per_seed_iso(oof2025, tst2025, y, valid_mask=trained_mask)
    o2025, t2025, name2025, fr2025, wcov2025 = add_seed(oof2025, tst2025, f's2025_{w2025}', n_splits, folds, detect_label='s2025')
    deb_oofs_seeds.append(o2025); deb_tests_seeds.append(t2025); seed_cov_weights.append(wcov2025); seed_names.append(name2025)
elif os.path.exists('oof_deberta_base_s2025.csv') and os.path.exists('test_deberta_base_s2025.npy'):
    o2025 = pd.read_csv('oof_deberta_base_s2025.csv')['oof_deberta'].values.astype(np.float32)
    t2025 = np.load('test_deberta_base_s2025.npy').astype(np.float32)
    mask = None
    if os.path.exists('oof_deberta_base_s2025_sw64.csv'):
        m64 = pd.read_csv('oof_deberta_base_s2025_sw64.csv')['oof_deberta'].values.astype(np.float32)
        mask = (m64 != 0.0) & np.isfinite(m64)
    o2025, t2025 = per_seed_iso(o2025, t2025, y, valid_mask=mask)
    o2025, t2025, name2025, fr2025, wcov2025 = add_seed(o2025, t2025, 's2025_combined', n_splits, folds, detect_label='s2025')
    deb_oofs_seeds.append(o2025); deb_tests_seeds.append(t2025); seed_cov_weights.append(wcov2025); seed_names.append(name2025)

# DeBERTa-v3-Large partial seed 'sL' (views with valid-mask TTA; allow no-HT) + per-seed iso
if all(os.path.exists(p) for p in [
    'oof_deberta_base_sL_sw64.csv','oof_deberta_base_sL_sw128.csv','oof_deberta_base_sL_ht.csv',
    'test_deberta_base_sL_sw64.npy','test_deberta_base_sL_sw128.npy','test_deberta_base_sL_ht.npy']):
    o64L,t64L = load_view('sL_sw64')
    o128L,t128L = load_view('sL_sw128')
    ohtL,thtL = load_view('sL_ht')
    trained_mask_L = (o64L != 0.0) & np.isfinite(o64L)
    qL,wL,oofL,tstL = best_tta(o64L,o128L,ohtL,t64L,t128L,thtL,y, valid_mask=trained_mask_L, allow_no_ht=True, prefer_large_grid=True)
    oofL, tstL = per_seed_iso(oofL, tstL, y, valid_mask=trained_mask_L)
    oL, tL, nameL, frL, wcovL = add_seed(oofL, tstL, f'sL_{wL}', n_splits, folds, detect_label='sL')
    idx_sL = len(deb_oofs_seeds)
    deb_oofs_seeds.append(oL); deb_tests_seeds.append(tL); seed_cov_weights.append(wcovL); seed_names.append(nameL)

assert len(deb_oofs_seeds) > 0, 'No DeBERTa seeds available for reweighting/blend.'
print('Seeds in bag:', seed_names, flush=True)

# Coverage-weighted averaging (consistent for OOF/test); OOF renormalizes row-wise over available seeds
O = np.stack(deb_oofs_seeds, axis=1)  # (N, S) with NaNs in untrained folds
T = np.stack(deb_tests_seeds, axis=1) # (Nt, S) no NaNs
W = np.array(seed_cov_weights, dtype=np.float32)
W = W / W.sum() if W.sum() > 0 else np.ones_like(W, dtype=np.float32)/len(W)
A = np.isfinite(O).astype(np.float32)
num = np.nansum(O * W[None, :], axis=1)
den = (A * W[None, :]).sum(axis=1)
deb_oof_bag_cov = (num / np.clip(den, 1e-6, None)).astype(np.float32)
deb_test_bag_cov = (T * W[None, :]).sum(axis=1).astype(np.float32)

# Optional alpha search to slightly upweight sL vs base bag
deb_oof_base = deb_oof_bag_cov.copy()
deb_test_base = deb_test_bag_cov.copy()
if idx_sL is not None:
    # Build base excluding sL
    mask_cols = [i for i in range(O.shape[1]) if i != idx_sL]
    if len(mask_cols) > 0:
        Wb = W[mask_cols]; Wb = Wb / Wb.sum() if Wb.sum()>0 else Wb
        Ob = O[:, mask_cols]; Tb = T[:, mask_cols]
        Ab = np.isfinite(Ob).astype(np.float32)
        deb_oof_base = (np.nansum(Ob * Wb[None,:], axis=1) / np.clip((Ab * Wb[None,:]).sum(axis=1), 1e-6, None)).astype(np.float32)
        deb_test_base = (Tb * Wb[None,:]).sum(axis=1).astype(np.float32)

# Blend DeB bag with CatBoost; try no-iso vs post-blend global iso with constrained thresholds
def eval_blend(deb_oof, deb_test):
    best = (-1.0, None, None, None)  # (q, w_deb, th, iso_or_None)
    # coarse search
    for w in np.arange(0.50, 0.81, 0.02):
        blend_oof = w*deb_oof + (1.0-w)*pred_cat_oof
        th_u, q_u = optimize_thresholds_constrained(y, blend_oof, iters=2, coarse_step=0.025, fine_step=0.005, base=[1.5,2.5,3.5,4.5,5.5], max_delta=0.25)
        if q_u > best[0]: best = (q_u, w, th_u, None)
        iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
        iso.fit(blend_oof, y)
        oof_cal = iso.predict(blend_oof).astype(np.float32)
        th_i, q_i = optimize_thresholds_constrained(y, oof_cal, iters=2, coarse_step=0.025, fine_step=0.005, base=[1.5,2.5,3.5,4.5,5.5], max_delta=0.25)
        if q_i > best[0]: best = (q_i, w, th_i, iso)
    # fine around best w
    q0, w0, th0, iso0 = best
    w_min = max(0.50, w0-0.03); w_max = min(0.80, w0+0.03)
    for w in np.arange(w_min, w_max + 1e-9, 0.01):
        blend_oof = w*deb_oof + (1.0-w)*pred_cat_oof
        th_u, q_u = optimize_thresholds_constrained(y, blend_oof, iters=1, coarse_step=0.02, fine_step=0.005, base=[1.5,2.5,3.5,4.5,5.5], max_delta=0.25)
        if q_u > best[0]: best = (q_u, w, th_u, None)
        iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
        iso.fit(blend_oof, y)
        oof_cal = iso.predict(blend_oof).astype(np.float32)
        th_i, q_i = optimize_thresholds_constrained(y, oof_cal, iters=1, coarse_step=0.02, fine_step=0.005, base=[1.5,2.5,3.5,4.5,5.5], max_delta=0.25)
        if q_i > best[0]: best = (q_i, w, th_i, iso)
    return best

# Evaluate coverage-weighted default bag
best_cov = eval_blend(deb_oof_bag_cov, deb_test_bag_cov)
best = best_cov

# Optional alpha search if sL present: mix sL vs base
if idx_sL is not None:
    sL_oof = O[:, idx_sL].astype(np.float32)
    sL_test = T[:, idx_sL].astype(np.float32)
    vm_sL = np.isfinite(sL_oof)
    # alpha range around coverage weight of sL
    covL = seed_cov_weights[idx_sL]
    alpha_lo = max(0.0, covL - 0.05); alpha_hi = min(1.0, covL + 0.10)
    for a in np.arange(alpha_lo, alpha_hi + 1e-9, 0.02):
        deb_oof_mix = (a * sL_oof + (1.0 - a) * deb_oof_base).astype(np.float32)
        # keep NaNs where sL not trained; base already has values
        # row-wise fill where sL is NaN
        m = ~np.isfinite(sL_oof)
        deb_oof_mix[m] = deb_oof_base[m]
        deb_test_mix = (a * sL_test + (1.0 - a) * deb_test_base).astype(np.float32)
        cand = eval_blend(deb_oof_mix, deb_test_mix)
        if cand[0] > best[0]:
            best = cand

best_q, best_w, best_th, best_iso = best
print(f'[Reweight+Blend-FIXED] OOF QWK={best_q:.5f} w_deb={best_w:.3f} th={np.round(best_th,3)}', flush=True)

# Apply to test: clip only at the end (post-iso) before thresholds
blend_test = (best_w*deb_test_bag_cov + (1.0-best_w)*test_cat).astype(np.float32)
if best_iso is not None:
    # Fit iso on OOF blend for consistency
    blend_oof_final = (best_w*deb_oof_bag_cov + (1.0-best_w)*pred_cat_oof).astype(np.float32)
    iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
    iso.fit(blend_oof_final, y)
    blend_test = iso.predict(blend_test).astype(np.float32)
blend_test = np.clip(blend_test, 1, 6).astype(np.float32)
test_int = np.clip(apply_thresholds(blend_test, best_th), 1, 6).astype(int)
pd.DataFrame({'essay_id': test['essay_id'], 'score': test_int}).to_csv('submission_bag_rew.csv', index=False)
print(f'[Reweight+Blend-FIXED] Wrote submission_bag_rew.csv in {time.time()-t0:.1f}s', flush=True)

[add_seed] s042_sw64 | folds_run=5/5 | w_cov=1.000


[add_seed] s777_(0.55, 0.3, 0.15) | folds_run=5/5 | w_cov=1.000


[add_seed] s2025_(0.55, 0.3, 0.15) | folds_run=5/5 | w_cov=1.000


[add_seed] sL_(0.55, 0.3, 0.15) | folds_run=3/5 | w_cov=0.600


Seeds in bag: ['s042_sw64', 's777_(0.55, 0.3, 0.15)', 's2025_(0.55, 0.3, 0.15)', 'sL_(0.55, 0.3, 0.15)']


[Reweight+Blend-FIXED] OOF QWK=0.83235 w_deb=0.510 th=[1.75 2.55 3.39 4.25 5.25]


[Reweight+Blend-FIXED] Wrote submission_bag_rew.csv in 494.9s


In [13]:
# CPU-only: Level-2 Ridge stacker on OOF predictions (DeB bag + CatBoost) with global isotonic + constrained thresholds
import os, time, numpy as np, pandas as pd
from sklearn.linear_model import Ridge
from sklearn.metrics import cohen_kappa_score
from sklearn.isotonic import IsotonicRegression

t0=time.time()
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
folds = pd.read_csv('folds.csv')
id_col, target_col = 'essay_id','score'
y = train[target_col].astype(int).values
n_splits = int(folds['fold'].max()) + 1

def qwk(y_true_int, y_pred_int):
    return cohen_kappa_score(y_true_int, y_pred_int, weights='quadratic')

def apply_thresholds(pred, th):
    bins = [-np.inf] + list(th) + [np.inf]
    return np.digitize(pred, bins)

def optimize_thresholds_constrained(y_true, preds, iters=3, step=0.05, base=None, max_delta=0.3):
    th = np.array([1.5, 2.5, 3.5, 4.5, 5.5], dtype=float) if base is None else np.array(base, dtype=float)
    base_th = th.copy()
    best = qwk(y_true, apply_thresholds(preds, th))
    for _ in range(iters):
        for i in range(len(th)):
            lo = max(base_th[i] - max_delta, th[i] - 0.5)
            hi = min(base_th[i] + max_delta, th[i] + 0.5)
            if i>0: lo = max(lo, th[i-1] + 0.01)
            if i<len(th)-1: hi = min(hi, th[i+1] - 0.01)
            grid = np.arange(lo, hi + 1e-9, step)
            local_best, local_val = best, th[i]
            for g in grid:
                th_try = th.copy(); th_try[i] = g
                score = qwk(y_true, apply_thresholds(preds, th_try))
                if score > local_best:
                    local_best, local_val = score, g
            th[i] = local_val; best = local_best
    return th, best

# Load CatBoost OOF/test
oof_cat = pd.read_csv('oof_cat.csv')
pred_cat_oof = oof_cat['oof_cat'].values.astype(np.float32)
test_cat = np.load('test_cat.npy').astype(np.float32)

# Load DeB seeds and bag them (reuse chosen TTA for s777 if available)
deb_oofs = []; deb_tests = []
# Seed 42 (sw64)
if os.path.exists('oof_deberta_base_sw64.csv') and os.path.exists('test_deberta_base_sw64.npy'):
    df42 = pd.read_csv('oof_deberta_base_sw64.csv')
    deb_oofs.append(df42['oof_deberta'].values.astype(np.float32))
    deb_tests.append(np.load('test_deberta_base_sw64.npy').astype(np.float32))
# Seed 777 per-views with chosen weights file or fallback to combined
def load_view(prefix):
    oof = pd.read_csv(f'oof_deberta_base_{prefix}.csv')['oof_deberta'].values.astype(np.float32)
    testv = np.load(f'test_deberta_base_{prefix}.npy').astype(np.float32)
    return oof, testv
if all(os.path.exists(p) for p in [
    'oof_deberta_base_s777_sw64.csv','oof_deberta_base_s777_sw128.csv','oof_deberta_base_s777_ht.csv',
    'test_deberta_base_s777_sw64.npy','test_deberta_base_s777_sw128.npy','test_deberta_base_s777_ht.npy']):
    o64,t64 = load_view('s777_sw64')
    o128,t128 = load_view('s777_sw128')
    oht,tht = load_view('s777_ht')
    w = (0.4,0.4,0.2)
    if os.path.exists('tta_weights_s777.txt'):
        try:
            txt = open('tta_weights_s777.txt').read().strip()
            w = eval(txt)
        except Exception:
            pass
    oof_777 = np.clip(w[0]*o64 + w[1]*o128 + w[2]*oht, 1, 6).astype(np.float32)
    tst_777 = np.clip(w[0]*t64 + w[1]*t128 + w[2]*tht, 1, 6).astype(np.float32)
    deb_oofs.append(oof_777); deb_tests.append(tst_777)
elif os.path.exists('oof_deberta_base_s777.csv') and os.path.exists('test_deberta_base_s777.npy'):
    df = pd.read_csv('oof_deberta_base_s777.csv')
    deb_oofs.append(df['oof_deberta'].values.astype(np.float32))
    deb_tests.append(np.load('test_deberta_base_s777.npy').astype(np.float32))

assert len(deb_oofs)>0, 'No DeB OOF available for stacking.'
deb_oof_bag = np.mean(np.stack(deb_oofs, axis=1), axis=1).astype(np.float32)
deb_test_bag = np.mean(np.stack(deb_tests, axis=1), axis=1).astype(np.float32)

# Build meta features
X_oof = np.stack([deb_oof_bag, pred_cat_oof, deb_oof_bag - pred_cat_oof, np.abs(deb_oof_bag - pred_cat_oof)], axis=1).astype(np.float32)
X_test = np.stack([deb_test_bag, test_cat, deb_test_bag - test_cat, np.abs(deb_test_bag - test_cat)], axis=1).astype(np.float32)

# CV Ridge stacker
oof_stack = np.zeros(len(train), dtype=np.float32)
test_stack_f = np.zeros((len(test), n_splits), dtype=np.float32)
for f in range(n_splits):
    tr_idx = folds.index[folds['fold']!=f].to_numpy()
    va_idx = folds.index[folds['fold']==f].to_numpy()
    model = Ridge(alpha=1.0, random_state=42)
    model.fit(X_oof[tr_idx], y[tr_idx].astype(float))
    oof_stack[va_idx] = model.predict(X_oof[va_idx]).astype(np.float32)
    test_stack_f[:, f] = model.predict(X_test).astype(np.float32)

test_stack = test_stack_f.mean(axis=1).astype(np.float32)

# Global isotonic on stacker outputs + constrained thresholds
iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
iso.fit(np.clip(oof_stack,1,6), y)
oof_cal = np.clip(iso.predict(np.clip(oof_stack,1,6)), 1, 6).astype(np.float32)
th_opt, oof_q = optimize_thresholds_constrained(y, oof_cal, iters=3, step=0.05, base=[1.5,2.5,3.5,4.5,5.5], max_delta=0.3)
print(f'[Stack] OOF QWK={oof_q:.5f} th={np.round(th_opt,3)}', flush=True)

# Apply to test
test_cal = np.clip(iso.predict(np.clip(test_stack,1,6)), 1, 6).astype(np.float32)
test_int = np.clip(apply_thresholds(test_cal, th_opt), 1, 6).astype(int)
pd.DataFrame({'essay_id': test[id_col], 'score': test_int}).to_csv('submission_stack.csv', index=False)
print(f'[Stack] Wrote submission_stack.csv in {time.time()-t0:.1f}s', flush=True)

[Stack] OOF QWK=0.82930 th=[1.75 2.6  3.4  4.25 5.2 ]


[Stack] Wrote submission_stack.csv in 1.3s


In [14]:
# CPU-only: Fold-wise isotonic calibration on DeB bag + CatBoost with widened blend grid; write submission_bag_foldiso.csv
import os, time, numpy as np, pandas as pd
from sklearn.isotonic import IsotonicRegression
from sklearn.metrics import cohen_kappa_score

def qwk(y_true_int, y_pred_int):
    return cohen_kappa_score(y_true_int, y_pred_int, weights='quadratic')

def apply_thresholds(pred, th):
    bins = [-np.inf] + list(th) + [np.inf]
    return np.digitize(pred, bins)

def optimize_thresholds_constrained(y_true, preds, iters=3, step=0.05, base=None, max_delta=0.3):
    th = np.array([1.5, 2.5, 3.5, 4.5, 5.5], dtype=float) if base is None else np.array(base, dtype=float)
    base_th = th.copy()
    best = qwk(y_true, apply_thresholds(preds, th))
    for _ in range(iters):
        for i in range(len(th)):
            lo = max(base_th[i] - max_delta, th[i] - 0.5)
            hi = min(base_th[i] + max_delta, th[i] + 0.5)
            if i>0: lo = max(lo, th[i-1] + 0.01)
            if i<len(th)-1: hi = min(hi, th[i+1] - 0.01)
            grid = np.arange(lo, hi + 1e-9, step)
            local_best, local_val = best, th[i]
            for g in grid:
                th_try = th.copy(); th_try[i] = g
                score = qwk(y_true, apply_thresholds(preds, th_try))
                if score > local_best:
                    local_best, local_val = score, g
            th[i] = local_val; best = local_best
    return th, best

t0 = time.time()
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
folds = pd.read_csv('folds.csv')
id_col, target_col = 'essay_id','score'
y = train[target_col].astype(int).values
n_splits = int(folds['fold'].max()) + 1

# Load CatBoost
oof_cat = pd.read_csv('oof_cat.csv')
pred_cat_oof = oof_cat['oof_cat'].values.astype(np.float32)
test_cat = np.load('test_cat.npy').astype(np.float32)

# Load DeB bag (seed 42 sw64 + seed 777 combined or per-view w/ chosen weights) as in cell 9
deb_oofs = []; deb_tests = []
if os.path.exists('oof_deberta_base_sw64.csv') and os.path.exists('test_deberta_base_sw64.npy'):
    df42 = pd.read_csv('oof_deberta_base_sw64.csv')
    deb_oofs.append(df42['oof_deberta'].values.astype(np.float32))
    deb_tests.append(np.load('test_deberta_base_sw64.npy').astype(np.float32))
def load_view(prefix):
    oof = pd.read_csv(f'oof_deberta_base_{prefix}.csv')['oof_deberta'].values.astype(np.float32)
    testv = np.load(f'test_deberta_base_{prefix}.npy').astype(np.float32)
    return oof, testv
if all(os.path.exists(p) for p in [
    'oof_deberta_base_s777_sw64.csv','oof_deberta_base_s777_sw128.csv','oof_deberta_base_s777_ht.csv',
    'test_deberta_base_s777_sw64.npy','test_deberta_base_s777_sw128.npy','test_deberta_base_s777_ht.npy']):
    o64,t64 = load_view('s777_sw64')
    o128,t128 = load_view('s777_sw128')
    oht,tht = load_view('s777_ht')
    w = (0.4,0.4,0.2)
    if os.path.exists('tta_weights_s777.txt'):
        try: w = eval(open('tta_weights_s777.txt').read().strip())
        except Exception: pass
    deb_oofs.append(np.clip(w[0]*o64 + w[1]*o128 + w[2]*oht, 1, 6).astype(np.float32))
    deb_tests.append(np.clip(w[0]*t64 + w[1]*t128 + w[2]*tht, 1, 6).astype(np.float32))
elif os.path.exists('oof_deberta_base_s777.csv') and os.path.exists('test_deberta_base_s777.npy'):
    df = pd.read_csv('oof_deberta_base_s777.csv')
    deb_oofs.append(df['oof_deberta'].values.astype(np.float32))
    deb_tests.append(np.load('test_deberta_base_s777.npy').astype(np.float32))

assert len(deb_oofs)>0, 'No DeB seeds available.'
deb_oof_bag = np.mean(np.stack(deb_oofs, axis=1), axis=1).astype(np.float32)
deb_test_bag = np.mean(np.stack(deb_tests, axis=1), axis=1).astype(np.float32)

# Fold-wise isotonic calibration: for each fold, fit iso on other folds and apply to held-out
def foldwise_iso_oof(pred_float):
    oof_cal = np.zeros_like(pred_float, dtype=np.float32)
    for f in range(n_splits):
        va_idx = folds.index[folds['fold']==f].to_numpy()
        tr_idx = folds.index[folds['fold']!=f].to_numpy()
        iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
        iso.fit(np.clip(pred_float[tr_idx],1,6), y[tr_idx])
        oof_cal[va_idx] = np.clip(iso.predict(np.clip(pred_float[va_idx],1,6)), 1, 6).astype(np.float32)
    return oof_cal

best = (-1.0, None, None, None)  # q, w_deb, th, iso_models (None since fold-wise applied only on OOF)
for w in np.arange(0.45, 0.851, 0.02):
    blend_oof = np.clip(w*deb_oof_bag + (1.0-w)*pred_cat_oof, 1, 6).astype(np.float32)
    oof_cal = foldwise_iso_oof(blend_oof)
    th_i, q_i = optimize_thresholds_constrained(y, oof_cal, iters=3, step=0.05, base=[1.5,2.5,3.5,4.5,5.5], max_delta=0.3)
    if q_i > best[0]: best = (q_i, w, th_i, None)

best_q, best_w, best_th, _ = best
print(f'[FoldISO] OOF QWK={best_q:.5f} w_deb={best_w:.3f} th={np.round(best_th,3)}', flush=True)

# Train single global isotonic on full OOF blend with best weight for test application (safer for LB)
blend_oof_best = np.clip(best_w*deb_oof_bag + (1.0-best_w)*pred_cat_oof, 1, 6).astype(np.float32)
iso_global = IsotonicRegression(increasing=True, out_of_bounds='clip')
iso_global.fit(blend_oof_best, y)

# Apply to test consistently
blend_test = np.clip(best_w*deb_test_bag + (1.0-best_w)*test_cat, 1, 6).astype(np.float32)
test_cal = np.clip(iso_global.predict(blend_test), 1, 6).astype(np.float32)
test_int = np.clip(apply_thresholds(test_cal, best_th), 1, 6).astype(int)
pd.DataFrame({'essay_id': test[id_col], 'score': test_int}).to_csv('submission_bag_foldiso.csv', index=False)
print(f'[FoldISO] Wrote submission_bag_foldiso.csv in {time.time()-t0:.1f}s', flush=True)

[FoldISO] OOF QWK=0.82849 w_deb=0.570 th=[1.7  2.55 3.35 4.2  5.2 ]


[FoldISO] Wrote submission_bag_foldiso.csv in 21.4s


In [28]:
# Finalize submission: copy best current blend to submission.csv with sanity checks
import pandas as pd, os
src = 'submission_bag_rew.csv'
dst = 'submission.csv'
assert os.path.exists(src), f"Missing {src}"
sub = pd.read_csv(src)
assert set(sub.columns)=={'essay_id','score'}, f"Bad columns: {sub.columns}"
assert sub['score'].between(1,6).all(), 'Scores out of bounds 1..6'
sub.to_csv(dst, index=False)
print('Wrote submission.csv from', src, 'n=', len(sub), 'unique scores:', sorted(sub['score'].unique()))

Wrote submission.csv from submission_bag_rew.csv n= 1731 unique scores: [1, 2, 3, 4, 5, 6]


In [29]:
# Fast rebag variant: broader alpha for sL (0..1), wider w_deb grid (0.45..0.90), optional skip per-seed isotonic; write submission.csv
import os, time, numpy as np, pandas as pd
from sklearn.metrics import cohen_kappa_score
from sklearn.isotonic import IsotonicRegression

def qwk(y_true_int, y_pred_int):
    return cohen_kappa_score(y_true_int, y_pred_int, weights='quadratic')

def apply_thresholds(pred, th):
    bins = [-np.inf] + list(th) + [np.inf]
    return np.digitize(pred, bins)

def optimize_thresholds_constrained(y_true, preds, iters=2, coarse_step=0.025, fine_step=0.005, base=None, max_delta=0.30):
    base_th = np.array([1.5, 2.5, 3.5, 4.5, 5.5], dtype=float) if base is None else np.array(base, dtype=float)
    th = base_th.copy()
    best = qwk(y_true, apply_thresholds(preds, th))
    for _ in range(iters):
        for i in range(5):
            lo = base_th[i] - max_delta; hi = base_th[i] + max_delta
            if i>0: lo = max(lo, th[i-1] + 0.01)
            if i<4: hi = min(hi, th[i+1] - 0.01)
            for g in np.arange(lo, hi + 1e-9, coarse_step):
                th_try = th.copy(); th_try[i] = g
                score = qwk(y_true, apply_thresholds(preds, th_try))
                if score > best:
                    best, th[i] = score, g
    for i in range(5):
        lo = max(base_th[i] - max_delta, th[i] - coarse_step)
        hi = min(base_th[i] + max_delta, th[i] + coarse_step)
        if i>0: lo = max(lo, th[i-1] + 0.01)
        if i<4: hi = min(hi, th[i+1] - 0.01)
        for g in np.arange(lo, hi + 1e-9, fine_step):
            th_try = th.copy(); th_try[i] = g
            score = qwk(y_true, apply_thresholds(preds, th_try))
            if score > best:
                best, th[i] = score, g
    return th, best

def load_view(prefix):
    oof = pd.read_csv(f'oof_deberta_base_{prefix}.csv')['oof_deberta'].values.astype(np.float32)
    test = np.load(f'test_deberta_base_{prefix}.npy').astype(np.float32)
    return oof, test

def best_tta(o64, o128, oht, t64, t128, tht, y, valid_mask=None, prefer_large_grid=False):
    grid = [
        (0.70,0.30,0.00),(0.66,0.34,0.00),(0.60,0.40,0.00),(0.55,0.45,0.00),
        (0.60,0.30,0.10),(0.55,0.35,0.10),(0.55,0.30,0.15),(0.50,0.40,0.10)
    ]
    if prefer_large_grid:
        grid += [(0.75,0.25,0.00),(0.72,0.28,0.00),(0.65,0.25,0.10),(0.62,0.28,0.10)]
    best = (-1.0, None, None, None)
    base_bins = [-np.inf,1.5,2.5,3.5,4.5,5.5,np.inf]
    for w64,w128,wht in grid:
        oof_c = (w64*o64 + w128*o128 + wht*oht).astype(np.float32)
        vm = valid_mask if valid_mask is not None else np.isfinite(oof_c)
        q = qwk(y[vm], np.digitize(oof_c[vm], base_bins))
        if q > best[0]:
            test_c = (w64*t64 + w128*t128 + wht*tht).astype(np.float32)
            best = (q, (w64,w128,wht), oof_c, test_c)
    return best

def per_seed_iso(oof_seed, test_seed, y, valid_mask=None):
    vm = valid_mask if valid_mask is not None else np.isfinite(oof_seed)
    if vm.sum() < 10:
        return oof_seed, test_seed
    iso = IsotonicRegression(increasing=True, out_of_bounds='clip')
    iso.fit(np.clip(oof_seed[vm],1,6), y[vm])
    return np.clip(iso.predict(np.clip(oof_seed,1,6)),1,6).astype(np.float32), np.clip(iso.predict(np.clip(test_seed,1,6)),1,6).astype(np.float32)

t0=time.time()
train = pd.read_csv('train.csv'); test = pd.read_csv('test.csv'); folds = pd.read_csv('folds.csv')
n_splits = int(folds['fold'].max()) + 1
oof_cat = pd.read_csv('oof_cat.csv'); y = oof_cat['y'].astype(int).values
pred_cat_oof = oof_cat['oof_cat'].values.astype(np.float32); test_cat = np.load('test_cat.npy').astype(np.float32)

# Build seeds
deb_oofs = []; deb_tests = []; cov_w = []; names = []; idx_sL = None
def add_seed(o, t, name, folds_run):
    deb_oofs.append(o.astype(np.float32)); deb_tests.append(t.astype(np.float32)); cov_w.append(folds_run/float(n_splits)); names.append(name)

# s042 sw64 (5/5)
if os.path.exists('oof_deberta_base_sw64.csv'):
    o42 = pd.read_csv('oof_deberta_base_sw64.csv')['oof_deberta'].values.astype(np.float32)
    t42 = np.load('test_deberta_base_sw64.npy').astype(np.float32)
    add_seed(o42, t42, 's042_sw64', n_splits)

# s777: prefer per-view and TTA search
if all(os.path.exists(p) for p in ['oof_deberta_base_s777_sw64.csv','oof_deberta_base_s777_sw128.csv','oof_deberta_base_s777_ht.csv',
                                   'test_deberta_base_s777_sw64.npy','test_deberta_base_s777_sw128.npy','test_deberta_base_s777_ht.npy']):
    o64,t64 = load_view('s777_sw64'); o128,t128 = load_view('s777_sw128'); oht,tht = load_view('s777_ht')
    q,w,oofc,tstc = best_tta(o64,o128,oht,t64,t128,tht,y, valid_mask=None, prefer_large_grid=False)
    add_seed(oofc, tstc, f's777_{w}', n_splits)
elif os.path.exists('oof_deberta_base_s777.csv'):
    o777 = pd.read_csv('oof_deberta_base_s777.csv')['oof_deberta'].values.astype(np.float32)
    t777 = np.load('test_deberta_base_s777.npy').astype(np.float32)
    add_seed(o777, t777, 's777_comb', n_splits)

# s2025: use combined (5/5)
if os.path.exists('oof_deberta_base_s2025.csv'):
    o2025 = pd.read_csv('oof_deberta_base_s2025.csv')['oof_deberta'].values.astype(np.float32)
    t2025 = np.load('test_deberta_base_s2025.npy').astype(np.float32)
    add_seed(o2025, t2025, 's2025_comb', n_splits)

# sL: per-view with masked TTA; folds_run inferred by non-zero OOF
if all(os.path.exists(p) for p in ['oof_deberta_base_sL_sw64.csv','oof_deberta_base_sL_sw128.csv','oof_deberta_base_sL_ht.csv',
                                   'test_deberta_base_sL_sw64.npy','test_deberta_base_sL_sw128.npy','test_deberta_base_sL_ht.npy']):
    o64L,t64L = load_view('sL_sw64'); o128L,t128L = load_view('sL_sw128'); ohtL,thtL = load_view('sL_ht')
    maskL = (o64L != 0.0) & np.isfinite(o64L)
    qL,wL,oofL,tstL = best_tta(o64L,o128L,ohtL,t64L,t128L,thtL,y, valid_mask=maskL, prefer_large_grid=True)
    folds_run_L = 0
    for f in range(n_splits):
        va = folds.index[folds['fold']==f].to_numpy();
        if np.isfinite(o64L[va]).mean() > 0.8 and (o64L[va]!=0).mean() > 0.8:
            folds_run_L += 1
    add_seed(oofL, tstL, f'sL_{wL}', folds_run_L); idx_sL = len(deb_oofs)-1

assert len(deb_oofs)>0, 'No DeB seeds found'
W = np.array(cov_w, dtype=np.float32); W = W/W.sum() if W.sum()>0 else np.ones(len(cov_w),dtype=np.float32)/len(cov_w)
O = np.stack(deb_oofs, axis=1); T = np.stack(deb_tests, axis=1)
A = np.isfinite(O).astype(np.float32)
deb_oof_bag = (np.nansum(O * W[None,:], axis=1) / np.clip((A * W[None,:]).sum(axis=1), 1e-6, None)).astype(np.float32)
deb_test_bag = (T * W[None,:]).sum(axis=1).astype(np.float32)

# Optionally skip per-seed isotonic entirely (try both modes and pick best)
def try_mode(apply_seed_iso: bool):
    if apply_seed_iso:
        O_iso = []; T_iso = []
        for j in range(O.shape[1]):
            vm = np.isfinite(O[:,j])
            oj,tj = per_seed_iso(O[:,j].copy(), T[:,j].copy(), y, valid_mask=vm)
            O_iso.append(oj); T_iso.append(tj)
        Oa = np.stack(O_iso,axis=1); Ta = np.stack(T_iso,axis=1)
        deb_o = (np.nansum(Oa * W[None,:], axis=1) / np.clip((np.isfinite(Oa).astype(np.float32) * W[None,:]).sum(axis=1), 1e-6, None)).astype(np.float32)
        deb_t = (Ta * W[None,:]).sum(axis=1).astype(np.float32)
    else:
        deb_o, deb_t = deb_oof_bag, deb_test_bag

    # sL alpha mix (if present) across 0..1
    if idx_sL is not None:
        base_cols = [i for i in range(O.shape[1]) if i != idx_sL]
        Wb = W[base_cols]; Wb = Wb/Wb.sum() if Wb.sum()>0 else Wb
        Ob = O[:, base_cols]; Tb = T[:, base_cols]
        deb_o_base = (np.nansum(Ob * Wb[None,:], axis=1) / np.clip((np.isfinite(Ob).astype(np.float32)*Wb[None,:]).sum(axis=1),1e-6,None)).astype(np.float32)
        deb_t_base = (Tb * Wb[None,:]).sum(axis=1).astype(np.float32)
        sLo = O[:, idx_sL]; sLt = T[:, idx_sL]
        best_local = (-1.0, None, None, None, None)  # q, a, w, th, iso_model
        for a in np.arange(0.0, 1.0+1e-9, 0.02):
            mix_o = a*sLo + (1.0-a)*deb_o_base
            m = ~np.isfinite(sLo)
            mix_o[m] = deb_o_base[m]
            mix_t = a*sLt + (1.0-a)*deb_t_base
            # blend with CatBoost over wide range
            best = (-1.0, None, None, None)
            for w in np.arange(0.45, 0.90+1e-9, 0.01):
                blend_o = w*mix_o + (1.0-w)*pred_cat_oof
                th_u, q_u = optimize_thresholds_constrained(y, blend_o, iters=2)
                if q_u > best[0]: best = (q_u, w, th_u, None)
                iso = IsotonicRegression(increasing=True, out_of_bounds='clip'); iso.fit(blend_o, y)
                o_cal = iso.predict(blend_o).astype(np.float32)
                th_i, q_i = optimize_thresholds_constrained(y, o_cal, iters=2)
                if q_i > best[0]: best = (q_i, w, th_i, iso)
            if best[0] > best_local[0]:
                best_local = (best[0], a, best[1], best[2], best[3])
        return best_local  # q, a, w, th, iso
    else:
        best = (-1.0, None, None, None)
        for w in np.arange(0.45, 0.90+1e-9, 0.01):
            blend_o = w*deb_o + (1.0-w)*pred_cat_oof
            th_u, q_u = optimize_thresholds_constrained(y, blend_o, iters=2)
            if q_u > best[0]: best = (q_u, w, th_u, None)
            iso = IsotonicRegression(increasing=True, out_of_bounds='clip'); iso.fit(blend_o, y)
            o_cal = iso.predict(blend_o).astype(np.float32)
            th_i, q_i = optimize_thresholds_constrained(y, o_cal, iters=2)
            if q_i > best[0]: best = (q_i, w, th_i, iso)
        return (best[0], None, best[1], best[2], best[3])

# Try with and without per-seed iso; pick best
cand1 = try_mode(apply_seed_iso=True)
cand2 = try_mode(apply_seed_iso=False)
best = cand1 if cand1[0] >= cand2[0] else cand2
best_q, best_a, best_w, best_th, best_iso = best
print(f'[FAST-BAG] Best OOF={best_q:.5f} a_sL={best_a} w_deb={best_w:.3f} th={np.round(best_th,3)} iso={best_iso is not None}', flush=True)

# Build final deb mix for test based on best setting
if idx_sL is not None and best_a is not None:
    base_cols = [i for i in range(O.shape[1]) if i != idx_sL]
    Wb = W[base_cols]; Wb = Wb/Wb.sum() if Wb.sum()>0 else Wb
    Ob = O[:, base_cols]; Tb = T[:, base_cols]
    deb_o_base = (np.nansum(Ob * Wb[None,:], axis=1) / np.clip((np.isfinite(Ob).astype(np.float32)*Wb[None,:]).sum(axis=1),1e-6,None)).astype(np.float32)
    deb_t_base = (Tb * Wb[None,:]).sum(axis=1).astype(np.float32)
    sLo = O[:, idx_sL]; sLt = T[:, idx_sL]
    deb_o_final = best_a*sLo + (1.0-best_a)*deb_o_base
    m = ~np.isfinite(sLo); deb_o_final[m] = deb_o_base[m]
    deb_t_final = best_a*sLt + (1.0-best_a)*deb_t_base
else:
    deb_o_final = deb_oof_bag; deb_t_final = deb_test_bag

blend_test = (best_w*deb_t_final + (1.0-best_w)*test_cat).astype(np.float32)
if best_iso is not None:
    blend_oof_for_iso = (best_w*deb_o_final + (1.0-best_w)*pred_cat_oof).astype(np.float32)
    iso_final = IsotonicRegression(increasing=True, out_of_bounds='clip'); iso_final.fit(blend_oof_for_iso, y)
    blend_test = iso_final.predict(blend_test).astype(np.float32)
blend_test = np.clip(blend_test, 1, 6).astype(np.float32)
test_int = np.clip(apply_thresholds(blend_test, best_th), 1, 6).astype(int)
pd.DataFrame({'essay_id': test['essay_id'], 'score': test_int}).to_csv('submission.csv', index=False)
print('[FAST-BAG] Wrote submission.csv in %.1fs' % (time.time()-t0), flush=True)

KeyboardInterrupt: 