# Plan: Google QUEST Q&A Labeling - Target: answer_helpful

Objectives:
- Build strong, fast baseline for answer_helpful using robust CV.
- Iterate with text features + GPU models; aim ≥ bronze CV threshold.

Initial Strategy:
1) Environment & GPU check (exit if no GPU).
2) Data load + schema/size checks; preview text fields.
3) Validation design:
   - Stratify by binned answer_helpful (regression target) if possible or plain KFold with shuffling.
   - 5 folds, fixed seed; save folds to disk.
4) Baseline models:
   A) TF-IDF (char + word n-grams on question_title, question_body, answer) → Ridge/Lasso/ElasticNet/SVR/XGBoost (GPU).
   B) Fast transformer baseline (e.g., distilbert) pooled embeddings + linear head (optional if time).
5) Features v1:
   - Text: separate and concatenated fields; length features; overlap metrics (title/body, body/answer).
   - Simple punctuation/word count ratios.
6) Train, log per-fold Spearman (OOF), cache OOF/test preds.
7) Error analysis: inspect top/bottom deciles of OOF; iterate.
8) Ensembling: linear blend of diverse models if deltas are validated.

Metric:
- column-wise Spearman (single target here); compute Spearman on OOF.

Milestones (request expert review after each):
- Plan (this cell), Data/EDA, Baseline CV, FE v1, Model tuning/ensembling, Pre-submission checks.

Next:
- Add env check cell and run.
- Load data, inspect targets and text fields.

In [1]:
# Environment & GPU check
import os, sys, subprocess, platform, time

print("Python:", sys.version)
print("Platform:", platform.platform())
print("Time:", time.strftime('%Y-%m-%d %H:%M:%S'))

print("\n== NVIDIA SMI ==")
try:
    res = subprocess.run(['bash','-lc','nvidia-smi || true'], capture_output=True, text=True, check=False)
    print(res.stdout)
except Exception as e:
    print("nvidia-smi error:", e)

print("\n== CPU/RAM quick check ==")
try:
    import psutil
    vm = psutil.virtual_memory()
    print(f"CPU cores: {psutil.cpu_count(logical=True)} | RAM: {vm.total/1e9:.1f} GB, avail {vm.available/1e9:.1f} GB")
except Exception:
    print("psutil not available; skipping RAM check")

print("\nIf GPU is unavailable, exit the competition per best-practices.")

Python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
Platform: Linux-6.8.0-1031-azure-x86_64-with-glibc2.35
Time: 2025-09-28 23:58:14

== NVIDIA SMI ==
Sun Sep 28 23:58:14 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     128MiB /  24512MiB |      0%      Default |
|                                         |                        |    

In [6]:
# Data load, schema check, and CV folds creation
import pandas as pd, numpy as np, os, hashlib, re
from sklearn.model_selection import GroupKFold

pd.set_option('display.max_columns', 200)

train_path, test_path = 'train.csv', 'test.csv'
print('Loading data...')
train = pd.read_csv(train_path)
test = pd.read_csv(test_path)
print('Train shape:', train.shape, 'Test shape:', test.shape)
print('Train columns:', list(train.columns))
print('Test columns:', list(test.columns))

# Identify ID and target columns from sample_submission to avoid schema mistakes
sample_sub = pd.read_csv('sample_submission.csv')
print('Sample submission columns:', list(sample_sub.columns))

# Determine ID col and target cols
id_col = sample_sub.columns[0]
target_cols = [c for c in sample_sub.columns if c != id_col]
print('ID column:', id_col, '| #Targets:', len(target_cols))

# Primary target for this task
primary_target = 'answer_helpful'
assert primary_target in train.columns, f'Primary target {primary_target} not found in train columns'
assert primary_target in target_cols, f'Primary target {primary_target} not in sample_submission columns'
assert id_col in train.columns and id_col in test.columns, 'ID column missing in train/test'

# Preview a few rows to verify text fields exist
text_fields = [c for c in ['question_title','question_body','answer'] if c in train.columns]
print('Detected text fields:', text_fields)
print(train[text_fields + [primary_target]].head(2) if text_fields else train.head(2))

# Target stats (primary)
y = train[primary_target].astype(float)
print('Primary target describe:')
print(y.describe())
print('Primary target quantiles (0, .1, .2, ..., 1):')
qs = np.linspace(0,1,11)
print(pd.Series(np.quantile(y, qs), index=qs))

# Simple duplicate/near-duplicate check on question identity to inform CV groups
def stable_hash(s: str) -> str:
    return hashlib.md5(s.encode('utf-8')).hexdigest()

if set(['question_title','question_body']).issubset(train.columns):
    qsig = (train['question_title'].fillna('') + '\n' + train['question_body'].fillna('')).astype(str)
    train['_qhash'] = qsig.apply(stable_hash)
    dup_rate = 1.0 - train['_qhash'].nunique() / len(train)
    print(f'Question signature duplicate rate: {dup_rate:.3f}')
else:
    train['_qhash'] = train[id_col].astype(str)
    print('No question fields found for duplicate check; using ID as group placeholder.')

# Create 5-fold GroupKFold splits grouped by question hash to prevent leakage
n_splits = 5
gkf = GroupKFold(n_splits=n_splits)
groups = train['_qhash'].values
train['fold'] = -1
for fold, (tr_idx, va_idx) in enumerate(gkf.split(train, y.values, groups=groups)):
    train.loc[va_idx, 'fold'] = fold
fold_counts = train['fold'].value_counts().sort_index()
print('Fold counts:', fold_counts.to_dict())
print('Fold primary target means:', train.groupby('fold')[primary_target].mean().round(4).to_dict())

# Persist folds for reuse across notebooks/models
folds_out = 'folds.npy'
np.save(folds_out, train['fold'].values.astype(int))
train[[id_col, 'fold', primary_target]].to_csv('train_folds.csv', index=False)
print(f'Saved folds to {folds_out} and train_folds.csv')

# Save a minimal schema snapshot to guard against later drift
with open('schema_info.txt','w') as f:
    f.write('ID:'+id_col+'\n')
    f.write('PRIMARY_TARGET:'+primary_target+'\n')
    f.write('ALL_TARGETS:'+','.join(target_cols)+'\n')
    f.write('TEXT:'+','.join(text_fields)+'\n')
print('Wrote schema_info.txt')

Loading data...
Train shape: (5471, 41) Test shape: (608, 11)
Train columns: ['qa_id', 'question_title', 'question_body', 'question_user_name', 'question_user_page', 'answer', 'answer_user_name', 'answer_user_page', 'url', 'category', 'host', 'question_asker_intent_understanding', 'question_body_critical', 'question_conversational', 'question_expect_short_answer', 'question_fact_seeking', 'question_has_commonly_accepted_answer', 'question_interestingness_others', 'question_interestingness_self', 'question_multi_intent', 'question_not_really_a_question', 'question_opinion_seeking', 'question_type_choice', 'question_type_compare', 'question_type_consequence', 'question_type_definition', 'question_type_entity', 'question_type_instructions', 'question_type_procedure', 'question_type_reason_explanation', 'question_type_spelling', 'question_well_written', 'answer_helpful', 'answer_level_of_information', 'answer_plausible', 'answer_relevance', 'answer_satisfaction', 'answer_type_instructions'

In [8]:
# Patched Baseline per expert: GroupKFold + TF-IDF (word+char, tuned) + stats + RidgeCV on rank target; no per-fold rank OOF
import time, gc, math, random
import numpy as np, pandas as pd, scipy.sparse as sp
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import RidgeCV
from scipy.stats import spearmanr, rankdata

np.random.seed(42); random.seed(42)

# Reload essentials and folds
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
sample_sub = pd.read_csv('sample_submission.csv')
id_col = sample_sub.columns[0]
target_cols = [c for c in sample_sub.columns if c != id_col]
primary_target = 'answer_helpful'
folds = np.load('folds.npy')
base_fields = [c for c in ['question_title','question_body','answer'] if c in train.columns]
assert all(f in train.columns for f in base_fields), 'Missing required text fields'

# Cleaners: prefer BeautifulSoup, fallback to regex HTML strip
import re
try:
    from bs4 import BeautifulSoup
    _USE_BS = True
except Exception:
    _USE_BS = False
URL_RE = re.compile(r'https?://\S+|www\.\S+')
EMAIL_RE = re.compile(r'\b[\w\.-]+@[\w\.-]+\.[A-Za-z]{2,}\b')
CODE_BLOCK_RE = re.compile(r'`{1,3}.*?`{1,3}', re.S)
HTML_TAG_RE = re.compile(r'<[^>]+>')
DIGIT_RE = re.compile(r'\d')
PUNCT_CHARS = set(list('.,!?:;\-\"\'’”'))
STOPWORDS = set('the a an and or to of in for is it on with this that as at by be are was were from has have had you your we they he she them his her its our their i'.split())

def clean_text(s: str) -> str:
    if not isinstance(s, str):
        s = '' if pd.isna(s) else str(s)
    if _USE_BS:
        try:
            s = BeautifulSoup(s, 'html.parser').get_text(separator=' ')
        except Exception:
            s = HTML_TAG_RE.sub(' ', s)
    else:
        s = HTML_TAG_RE.sub(' ', s)
    s = CODE_BLOCK_RE.sub(' [CODE] ', s)
    s = URL_RE.sub(' [URL] ', s)
    s = EMAIL_RE.sub(' [EMAIL] ', s)
    return s.lower()

def make_corpus(df: pd.DataFrame):
    c = {f: df[f].fillna('').map(clean_text).astype(str).values for f in base_fields}
    qa_all = (c['question_title'] + ' [T] ' + c['question_body'] + ' [A] ' + c['answer'])
    c['qa_all'] = qa_all.values if hasattr(qa_all, 'values') else np.array(list(qa_all))
    return c

def compute_stats(corpus: dict) -> np.ndarray:
    qt = corpus['question_title']; qb = corpus['question_body']; an = corpus['answer']
    n = len(qt)
    feats = np.zeros((n, 18), dtype=np.float32)
    for i in range(n):
        s_t, s_b, s_a = qt[i], qb[i], an[i]
        lt, lb, la = len(s_t), len(s_b), len(s_a)
        wt = (s_t.count(' ') + 1) if lt>0 else 0
        wb = (s_b.count(' ') + 1) if lb>0 else 0
        wa = (s_a.count(' ') + 1) if la>0 else 0
        len_ratio = la / (lb + 1.0)
        wc_ratio  = wa / (wb + 1.0)
        qmark = s_a.count('?')
        excl  = s_a.count('!')
        digits = len(DIGIT_RE.findall(s_a))
        url_cnt = s_a.count('[url]')
        code_cnt = s_a.count('[code]')
        has_url = 1.0 if url_cnt > 0 else 0.0
        has_code = 1.0 if code_cnt > 0 else 0.0
        # overlaps
        set_t = set(s_t.split()) if lt else set()
        set_b = set(s_b.split()) if lb else set()
        set_a = set(s_a.split()) if la else set()
        def jacc(a,b):
            if not a or not b: return 0.0
            inter = len(a & b); uni = len(a | b)
            return (inter / uni) if uni>0 else 0.0
        jac_ta = jacc(set_t, set_a)
        jac_ba = jacc(set_b, set_a)
        # new stats
        starts_punct = 1.0 if la>0 and s_a[0] in PUNCT_CHARS else 0.0
        ends_punct   = 1.0 if la>0 and s_a[-1] in PUNCT_CHARS else 0.0
        letters = sum(ch.isalpha() for ch in s_a)
        uppers  = sum(ch.isupper() for ch in s_a)
        upper_frac = (uppers / (letters + 1.0)) if letters>0 else 0.0
        toks_a = s_a.split()
        sw_hits = sum(1 for w in toks_a if w in STOPWORDS)
        sw_ratio = sw_hits / (len(toks_a) + 1.0)
        feats[i] = [lt, lb, la, wt, wb, wa, len_ratio, wc_ratio, qmark, excl, digits, has_url, has_code, jac_ta + jac_ba, starts_punct, ends_punct, upper_frac, sw_ratio]
    feats[:, [0,1,2,3,4,5,8,9,10]] = np.log1p(feats[:, [0,1,2,3,4,5,8,9,10]])
    return feats

print('Cleaning text...')
t0 = time.time()
train_corpus = make_corpus(train)
test_corpus  = make_corpus(test)
print(f'Cleaned in {time.time() - t0:.2f}s')

print('Computing stats features...')
t1 = time.time()
train_stats = compute_stats(train_corpus)
test_stats  = compute_stats(test_corpus)
print(f'Stats ready in {time.time() - t1:.2f}s; dims train {train_stats.shape}, test {test_stats.shape}')

# Vectorizer configs per field (drop qa_all char branch), tuned caps
feature_fields = ['question_title','question_body','answer','qa_all']
vec_cfgs = {
    'question_title': dict(word_max=50000, char_max=60000),
    'question_body':  dict(word_max=120000, char_max=200000),
    'answer':         dict(word_max=150000, char_max=220000),
    'qa_all':         dict(word_max=180000, char_max=0),  # char skipped
}

def build_features(X_text: dict, fit: bool, vecs_store: dict | None):
    mats = []
    for field in feature_fields:
        # word
        wkey = (field, 'word')
        if fit:
            v_w = TfidfVectorizer(analyzer='word', ngram_range=(1,2), min_df=3, max_df=0.97,
                                   sublinear_tf=True, strip_accents='unicode',
                                   max_features=vec_cfgs[field]['word_max'])
            mat_w = v_w.fit_transform(X_text[field])
            vecs_store[wkey] = v_w
        else:
            v_w = vecs_store[wkey]
            mat_w = v_w.transform(X_text[field])
        mats.append(mat_w)
        # char (skip for qa_all)
        if field != 'qa_all':
            ckey = (field, 'char')
            if fit:
                v_c = TfidfVectorizer(analyzer='char_wb', ngram_range=(3,6), min_df=3,
                                       sublinear_tf=True,
                                       max_features=vec_cfgs[field]['char_max'])
                mat_c = v_c.fit_transform(X_text[field])
                vecs_store[ckey] = v_c
            else:
                v_c = vecs_store[ckey]
                mat_c = v_c.transform(X_text[field])
            mats.append(mat_c)
    X = sp.hstack(mats).tocsr()
    return X

def to_rank01(v: np.ndarray) -> np.ndarray:
    r = rankdata(v, method='average').astype(np.float64)
    denom = max(1.0, len(r) - 1)
    return ((r - 1.0) / denom).astype(np.float32)

# CV training
y = train[primary_target].astype(float).values
unique_folds = np.unique(folds)
oof = np.zeros(len(train), dtype=np.float32)
test_preds_accum = np.zeros((len(unique_folds), len(test)), dtype=np.float32)

print('Starting CV...')
overall_t0 = time.time()
for fi, fold in enumerate(unique_folds):
    f_t0 = time.time()
    tr_idx = np.where(folds != fold)[0]
    va_idx = np.where(folds == fold)[0]
    print(f'Fold {fold}: tr={len(tr_idx)} va={len(va_idx)}')
    vecs = {}
    X_tr = build_features({k: train_corpus[k][tr_idx] for k in feature_fields}, fit=True, vecs_store=vecs)
    X_va = build_features({k: train_corpus[k][va_idx] for k in feature_fields}, fit=False, vecs_store=vecs)
    X_te = build_features(test_corpus, fit=False, vecs_store=vecs)
    # Add stats
    X_tr = sp.hstack([X_tr, sp.csr_matrix(train_stats[tr_idx])]).tocsr()
    X_va = sp.hstack([X_va, sp.csr_matrix(train_stats[va_idx])]).tocsr()
    X_te = sp.hstack([X_te, sp.csr_matrix(test_stats)]).tocsr()
    # Model with rank-transformed target + tiny jitter
    jitter = (np.random.rand(len(tr_idx)).astype(np.float32) - 0.5) * 2e-6
    y_tr_rank = rankdata(y[tr_idx] + jitter, method='average') / len(tr_idx)
    model = RidgeCV(alphas=[4.0, 6.0, 8.0, 12.0, 20.0], fit_intercept=True)
    m_t0 = time.time()
    model.fit(X_tr, y_tr_rank.astype(np.float32))
    print(f'  Fit time: {time.time()-m_t0:.2f}s, nnz/tr {X_tr.nnz/ max(1,X_tr.shape[0]):.1f}, alpha*={getattr(model, "alpha_", None)}')
    # Predict (raw per fold, do NOT rank-normalize for OOF)
    va_pred = model.predict(X_va).astype(np.float32)
    oof[va_idx] = va_pred
    te_pred = model.predict(X_te).astype(np.float32)
    test_preds_accum[fi] = te_pred
    # Metrics
    rho = spearmanr(va_pred, y[va_idx]).correlation
    print(f'  Fold {fold} Spearman: {rho:.5f}, elapsed {time.time()-f_t0:.1f}s', flush=True)
    del X_tr, X_va, X_te, vecs, model, va_pred, te_pred
    gc.collect()

oo_rho = spearmanr(oof, y).correlation
print(f'OOF Spearman (primary {primary_target}): {oo_rho:.5f}')

# Aggregate test preds by averaging raw fold predictions; optional final rank to [0,1]
test_raw = test_preds_accum.mean(axis=0).astype(np.float32)
test_final = np.clip(to_rank01(test_raw), 0.0, 1.0).astype(np.float32)
oof_clip = np.clip(oof, 0.0, 1.0).astype(np.float32)

# Save OOF/test preds
np.save(f'oof_{primary_target}.npy', oof_clip)
np.save(f'test_{primary_target}.npy', test_final)

# Build submission with full schema; TODO: replace placeholders by real per-target models later
sub = sample_sub.copy()
sub[id_col] = test[id_col].values
fill_means = train[target_cols].mean().clip(0,1)
for col in target_cols:
    if col == primary_target:
        sub[col] = test_final
    else:
        sub[col] = float(fill_means[col])
sub.to_csv('submission.csv', index=False)
print('Wrote submission.csv with primary model and other targets filled by train means (temporary).')

print(f'Total CV time: {time.time()-overall_t0:.1f}s')

Cleaning text...


Cleaned in 0.41s
Computing stats features...


Stats ready in 0.61s; dims train (5471, 18), test (608, 18)
Starting CV...
Fold 0: tr=4376 va=1095


  Fit time: 16.12s, nnz/tr 2611.8, alpha*=20.0
  Fold 0 Spearman: 0.21306, elapsed 30.0s


Fold 1: tr=4377 va=1094


  Fit time: 16.31s, nnz/tr 2619.9, alpha*=20.0
  Fold 1 Spearman: 0.22673, elapsed 30.2s


Fold 2: tr=4377 va=1094


  Fit time: 16.15s, nnz/tr 2614.6, alpha*=20.0
  Fold 2 Spearman: 0.18055, elapsed 30.1s


Fold 3: tr=4377 va=1094


  Fit time: 16.59s, nnz/tr 2647.1, alpha*=20.0
  Fold 3 Spearman: 0.18591, elapsed 30.4s


Fold 4: tr=4377 va=1094


  Fit time: 16.36s, nnz/tr 2636.5, alpha*=20.0
  Fold 4 Spearman: 0.15876, elapsed 30.3s


OOF Spearman (primary answer_helpful): 0.19227
Wrote submission.csv with primary model and other targets filled by train means (temporary).
Total CV time: 151.6s


In [9]:
# Install GPU-enabled PyTorch (cu121) and transformers stack; then sanity-check CUDA
import os, sys, subprocess, shutil, time
from pathlib import Path

def pip(*args):
    print('>', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

# Uninstall any existing torch stack to avoid conflicts
for pkg in ('torch','torchvision','torchaudio'):
    subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', pkg], check=False)

# Clean stray site dirs that can shadow correct wheels (idempotent)
for d in (
    '/app/.pip-target/torch',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torch-2.8.0.dist-info',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torchvision-0.23.0.dist-info',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchaudio-2.8.0.dist-info',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
    '/app/.pip-target/torchgen',
    '/app/.pip-target/functorch',
):
    if os.path.exists(d):
        print('Removing', d)
        shutil.rmtree(d, ignore_errors=True)

# 1) Install exact cu121 torch stack
pip('install',
    '--index-url', 'https://download.pytorch.org/whl/cu121',
    '--extra-index-url', 'https://pypi.org/simple',
    'torch==2.4.1', 'torchvision==0.19.1', 'torchaudio==2.4.1')

# 2) Freeze versions for later installs
Path('constraints.txt').write_text(
    'torch==2.4.1\n'
    'torchvision==0.19.1\n'
    'torchaudio==2.4.1\n'
)

# 3) Install transformers stack honoring constraints
pip('install', '-c', 'constraints.txt',
    'transformers==4.44.2', 'accelerate==0.34.2',
    'datasets==2.21.0', 'evaluate==0.4.2',
    'sentencepiece', 'scikit-learn', '--upgrade-strategy', 'only-if-needed')

# 4) Sanity gate
import torch
print('torch:', torch.__version__, 'built CUDA:', getattr(torch.version, 'cuda', None))
print('CUDA available:', torch.cuda.is_available())
assert str(getattr(torch.version,'cuda','')).startswith('12.1'), f'Wrong CUDA build: {torch.version.cuda}'
assert torch.cuda.is_available(), 'CUDA not available'
print('GPU:', torch.cuda.get_device_name(0))





> install --index-url https://download.pytorch.org/whl/cu121 --extra-index-url https://pypi.org/simple torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1




Looking in indexes: https://download.pytorch.org/whl/cu121, https://pypi.org/simple


Collecting torch==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torch-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (799.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 MB 510.2 MB/s eta 0:00:00


Collecting torchvision==0.19.1
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.19.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 427.8 MB/s eta 0:00:00


Collecting torchaudio==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 152.7 MB/s eta 0:00:00


Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 77.6 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 172.5 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 515.3 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 220.0 MB/s eta 0:00:00


Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 141.8 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 416.7 MB/s eta 0:00:00


Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 207.6 MB/s eta 0:00:00


Collecting fsspec
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 457.0 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 268.4 MB/s eta 0:00:00


Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 189.1 MB/s eta 0:00:00


Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 300.4 MB/s eta 0:00:00


Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 196.7 MB/s eta 0:00:00


Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 424.6 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 238.8 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 235.1 MB/s eta 0:00:00


Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 257.7 MB/s eta 0:00:00


Collecting typing-extensions>=4.8.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 410.3 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 287.7 MB/s eta 0:00:00


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 275.3 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 119.4 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB)
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 421.3 MB/s eta 0:00:00


Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio


Successfully installed MarkupSafe-3.0.3 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.3.0 sympy-1.14.0 torch-2.4.1+cu121 torchaudio-2.4.1+cu121 torchvision-0.19.1+cu121 triton-3.0.0 typing-extensions-4.15.0


> install -c constraints.txt transformers==4.44.2 accelerate==0.34.2 datasets==2.21.0 evaluate==0.4.2 sentencepiece scikit-learn --upgrade-strategy only-if-needed


Collecting transformers==4.44.2
  Downloading transformers-4.44.2-py3-none-any.whl (9.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.5/9.5 MB 129.3 MB/s eta 0:00:00
Collecting accelerate==0.34.2
  Downloading accelerate-0.34.2-py3-none-any.whl (324 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 324.4/324.4 KB 484.5 MB/s eta 0:00:00


Collecting datasets==2.21.0
  Downloading datasets-2.21.0-py3-none-any.whl (527 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 527.3/527.3 KB 527.2 MB/s eta 0:00:00
Collecting evaluate==0.4.2
  Downloading evaluate-0.4.2-py3-none-any.whl (84 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.1/84.1 KB 436.2 MB/s eta 0:00:00
Collecting sentencepiece
  Downloading sentencepiece-0.2.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 379.2 MB/s eta 0:00:00


Collecting scikit-learn
  Downloading scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (9.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 215.0 MB/s eta 0:00:00


Collecting packaging>=20.0
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 409.4 MB/s eta 0:00:00
Collecting tqdm>=4.27
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 433.3 MB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.23.2
  Downloading huggingface_hub-0.35.1-py3-none-any.whl (563 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 563.3/563.3 KB 321.2 MB/s eta 0:00:00


Collecting safetensors>=0.4.1
  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.8/485.8 KB 513.5 MB/s eta 0:00:00


Collecting numpy>=1.17
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 549.6 MB/s eta 0:00:00


Collecting regex!=2019.12.17
  Downloading regex-2025.9.18-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (798 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 KB 467.4 MB/s eta 0:00:00


Collecting tokenizers<0.20,>=0.19
  Downloading tokenizers-0.19.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 138.0 MB/s eta 0:00:00
Collecting pyyaml>=5.1
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 115.9 MB/s eta 0:00:00


Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 447.2 MB/s eta 0:00:00
Collecting torch>=1.10.0
  Downloading torch-2.4.1-cp311-cp311-manylinux1_x86_64.whl (797.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 103.2 MB/s eta 0:00:00


Collecting psutil
  Downloading psutil-7.1.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 291.2/291.2 KB 523.5 MB/s eta 0:00:00
Collecting fsspec[http]<=2024.6.1,>=2023.1.0
  Downloading fsspec-2024.6.1-py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.6/177.6 KB 490.1 MB/s eta 0:00:00


Collecting aiohttp
  Downloading aiohttp-3.12.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 456.7 MB/s eta 0:00:00


Collecting pandas
  Downloading pandas-2.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.4/12.4 MB 277.5 MB/s eta 0:00:00
Collecting multiprocess
  Downloading multiprocess-0.70.18-py311-none-any.whl (144 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.5/144.5 KB 457.5 MB/s eta 0:00:00


Collecting pyarrow>=15.0.0
  Downloading pyarrow-21.0.0-cp311-cp311-manylinux_2_28_x86_64.whl (42.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.8/42.8 MB 75.1 MB/s eta 0:00:00
Collecting dill<0.3.9,>=0.3.0
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 KB 480.7 MB/s eta 0:00:00
Collecting xxhash


  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.8/194.8 KB 532.1 MB/s eta 0:00:00


Collecting scipy>=1.8.0
  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 230.2 MB/s eta 0:00:00
Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 308.4/308.4 KB 546.7 MB/s eta 0:00:00
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)


Collecting frozenlist>=1.1.1
  Downloading frozenlist-1.7.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (235 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 235.3/235.3 KB 514.4 MB/s eta 0:00:00


Collecting yarl<2.0,>=1.17.0
  Downloading yarl-1.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (348 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 349.0/349.0 KB 420.6 MB/s eta 0:00:00
Collecting aiohappyeyeballs>=2.5.0
  Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)
Collecting attrs>=17.3.0
  Downloading attrs-25.3.0-py3-none-any.whl (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.8/63.8 KB 458.3 MB/s eta 0:00:00


Collecting multidict<7.0,>=4.5
  Downloading multidict-6.6.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (246 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 246.7/246.7 KB 424.5 MB/s eta 0:00:00
Collecting aiosignal>=1.4.0
  Downloading aiosignal-1.4.0-py3-none-any.whl (7.5 kB)
Collecting propcache>=0.2.0
  Downloading propcache-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.5/213.5 KB 504.8 MB/s eta 0:00:00


Collecting hf-xet<2.0.0,>=1.1.3
  Downloading hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 345.0 MB/s eta 0:00:00
Collecting typing-extensions>=3.7.4.3
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 414.4 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 406.4 MB/s eta 0:00:00


Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 413.6 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 423.5 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 506.8 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 97.9 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 472.2 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 455.6 MB/s eta 0:00:00


Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 95.4 MB/s eta 0:00:00


Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 286.1 MB/s eta 0:00:00
Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 435.4 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 168.1 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 167.1 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 91.1 MB/s eta 0:00:00
Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 94.0 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 243.7 MB/s eta 0:00:00
Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 313.0 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 142.5 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 554.7 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 119.1 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 218.0 MB/s eta 0:00:00
Collecting multiprocess
  Downloading multiprocess-0.70.17-py311-none-any.whl (144 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.3/144.3 KB 520.4 MB/s eta 0:00:00
  Downloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.5/143.5 KB 505.8 MB/s eta 0:00:00
Collecting tzdata>=2022.7
  Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 347.8/347.8 KB 541.3 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.2
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 KB 469.3 MB/s eta 0:00:00


Collecting pytz>=2020.1
  Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 509.2/509.2 KB 380.3 MB/s eta 0:00:00
Collecting six>=1.5
  Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)


Collecting MarkupSafe>=2.0
  Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB)
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 532.9 MB/s eta 0:00:00


Installing collected packages: pytz, mpmath, xxhash, urllib3, tzdata, typing-extensions, tqdm, threadpoolctl, sympy, six, sentencepiece, safetensors, regex, pyyaml, pyarrow, psutil, propcache, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, multidict, MarkupSafe, joblib, idna, hf-xet, fsspec, frozenlist, filelock, dill, charset_normalizer, certifi, attrs, aiohappyeyeballs, yarl, triton, scipy, requests, python-dateutil, nvidia-cusparse-cu12, nvidia-cudnn-cu12, multiprocess, jinja2, aiosignal, scikit-learn, pandas, nvidia-cusolver-cu12, huggingface-hub, aiohttp, torch, tokenizers, transformers, datasets, accelerate, evaluate


Successfully installed MarkupSafe-3.0.3 accelerate-0.34.2 aiohappyeyeballs-2.6.1 aiohttp-3.12.15 aiosignal-1.4.0 attrs-25.3.0 certifi-2025.8.3 charset_normalizer-3.4.3 datasets-2.21.0 dill-0.3.8 evaluate-0.4.2 filelock-3.19.1 frozenlist-1.7.0 fsspec-2024.6.1 hf-xet-1.1.10 huggingface-hub-0.35.1 idna-3.10 jinja2-3.1.6 joblib-1.5.2 mpmath-1.3.0 multidict-6.6.4 multiprocess-0.70.16 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 packaging-25.0 pandas-2.3.2 propcache-0.3.2 psutil-7.1.0 pyarrow-21.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 pyyaml-6.0.3 regex-2025.9.18 requests-2.32.5 safetensors-0.6.2 scikit-learn-1.7.2 scipy-1.16.2 sentencepiece-0.2.1 six-1.17.0









torch: 2.4.1+cu121 built CUDA: 12.1
CUDA available: True
GPU: NVIDIA A10-24Q


In [14]:
# DeBERTa-v3-base multitask (30 targets) with improved pooling and diagnostics (patched per expert)
import os, time, math, gc, random, sys
import numpy as np, pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModel, get_cosine_schedule_with_warmup
from scipy.stats import spearmanr

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)

# Load data and folds
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
sample_sub = pd.read_csv('sample_submission.csv')
id_col = sample_sub.columns[0]
target_cols = [c for c in sample_sub.columns if c != id_col]
assert target_cols == list(sample_sub.columns[1:]), 'target_cols order drift'
assert all(c in train.columns for c in target_cols), 'Missing targets in train'
print('Target dtypes:', train[target_cols].dtypes.unique())
folds = np.load('folds.npy')

# Text fields
title_col, body_col, ans_col = 'question_title', 'question_body', 'answer'
assert all(c in train.columns for c in [title_col, body_col, ans_col])

ANSWER_HELPFUL_IDX = target_cols.index('answer_helpful') if 'answer_helpful' in target_cols else None

# Model/Tokenizer
model_name = 'microsoft/deberta-v3-base'
tokenizer = AutoTokenizer.from_pretrained(model_name)
MAX_LEN = 512

def pack_inputs(title, body, answer):
    # Let tokenizer handle pair separation; disable token_type_ids
    text = f'Title: {title} Body: {body}'
    text_pair = f'Answer: {answer}'
    return tokenizer(text=text, text_pair=text_pair, truncation=True, padding='max_length',
                    max_length=MAX_LEN, return_tensors='pt', return_token_type_ids=False)

class QADataset(Dataset):
    def __init__(self, df, targets=None):
        self.titles = df[title_col].fillna('').astype(str).values
        self.bodies = df[body_col].fillna('').astype(str).values
        self.answers = df[ans_col].fillna('').astype(str).values
        self.targets = None if targets is None else np.asarray(targets, dtype=np.float32)
    def __len__(self):
        return len(self.titles)
    def __getitem__(self, idx):
        enc = pack_inputs(self.titles[idx], self.bodies[idx], self.answers[idx])
        item = {k: v.squeeze(0) for k, v in enc.items()}
        if self.targets is not None:
            item['labels'] = torch.tensor(self.targets[idx], dtype=torch.float32)  # shape [30]
        return item

def spearman_cols(y_pred: np.ndarray, y_true: np.ndarray):
    rhos = []
    for i in range(y_pred.shape[1]):
        try:
            r = spearmanr(y_pred[:, i], y_true[:, i]).correlation
        except Exception:
            r = np.nan
        rhos.append(0.0 if np.isnan(r) else float(r))
    return float(np.mean(rhos)), rhos

class EMA:
    def __init__(self, model, decay=0.99):
        self.decay = decay
        self.shadow = {}
        for name, p in model.named_parameters():
            if p.requires_grad:
                self.shadow[name] = p.detach().clone()
    @torch.no_grad()
    def update(self, model):
        for name, p in model.named_parameters():
            if p.requires_grad:
                self.shadow[name].mul_(self.decay).add_(p.detach(), alpha=1.0 - self.decay)
    def apply_to(self, model):
        self.backup = {}
        for name, p in model.named_parameters():
            if p.requires_grad:
                self.backup[name] = p.detach().clone()
                p.data.copy_(self.shadow[name].data)
    def restore(self, model):
        for name, p in model.named_parameters():
            if p.requires_grad and name in self.backup:
                p.data.copy_(self.backup[name])
        self.backup = {}

def masked_mean_pooling(last_hidden_state, attention_mask):
    mask = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
    masked = last_hidden_state * mask
    summed = masked.sum(dim=1)
    counts = mask.sum(dim=1).clamp(min=1e-6)
    return summed / counts

class WeightedLayerPooling(nn.Module):
    def __init__(self, num_layers: int, layer_start: int = -4):
        super().__init__()
        self.layer_start = layer_start
        n = -layer_start
        self.weights = nn.Parameter(torch.ones(n) / n)
    def forward(self, all_hidden_states):
        selected = all_hidden_states[self.layer_start:]
        stacked = torch.stack(selected, dim=0)  # [n, bs, seq, hidden]
        w = torch.softmax(self.weights, dim=0).view(-1, 1, 1, 1)
        return (w * stacked).sum(dim=0)  # [bs, seq, hidden]

class DebertaMT(nn.Module):
    def __init__(self, name, out_dim=30, dropout_p=0.2, msd_k=1):
        super().__init__()
        self.backbone = AutoModel.from_pretrained(name)
        if hasattr(self.backbone, 'gradient_checkpointing_enable'):
            self.backbone.gradient_checkpointing_enable()
        hidden = self.backbone.config.hidden_size
        self.layer_pool = WeightedLayerPooling(num_layers=getattr(self.backbone.config, 'num_hidden_layers', 12), layer_start=-4)
        self.msd_k = msd_k
        self.dropouts = nn.ModuleList([nn.Dropout(dropout_p) for _ in range(msd_k)])
        self.head = nn.Linear(hidden, out_dim)
        self.loss_fn = nn.SmoothL1Loss(reduction='mean')
    def forward(self, input_ids=None, attention_mask=None, labels=None):
        out = self.backbone(input_ids=input_ids, attention_mask=attention_mask, output_hidden_states=True)
        pooled_seq = self.layer_pool(out.hidden_states)
        feat = masked_mean_pooling(pooled_seq, attention_mask)
        logits_accum = 0
        for dp in self.dropouts:
            logits_accum = logits_accum + self.head(dp(feat))
        logits = logits_accum / self.msd_k
        loss = None
        if labels is not None:
            loss = self.loss_fn(logits, labels)
        return logits, loss

def run_fold(fold, train_idx, val_idx):
    print(f'Fold {fold} start: tr={len(train_idx)} va={len(val_idx)}')
    df_tr = train.iloc[train_idx].reset_index(drop=True)
    df_va = train.iloc[val_idx].reset_index(drop=True)
    y_tr = df_tr[target_cols].astype(np.float32).values
    y_va = df_va[target_cols].astype(np.float32).values

    ds_tr = QADataset(df_tr, y_tr)
    ds_va = QADataset(df_va, y_va)
    ds_te = QADataset(test, None)

    train_loader = DataLoader(ds_tr, batch_size=16, shuffle=True, num_workers=2, pin_memory=True)
    val_loader   = DataLoader(ds_va, batch_size=32, shuffle=False, num_workers=2, pin_memory=True)
    test_loader  = DataLoader(ds_te, batch_size=32, shuffle=False, num_workers=2, pin_memory=True)

    model = DebertaMT(model_name, out_dim=len(target_cols), dropout_p=0.2, msd_k=1).to(device)
    # Initialize head bias toward target mean to stabilize early ranks
    with torch.no_grad():
        if hasattr(model.head, 'bias') and model.head.bias is not None:
            model.head.bias.fill_(0.5)
    optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5, weight_decay=0.01, betas=(0.9,0.999), eps=1e-6)
    num_epochs = 4
    grad_accum = 2  # effective batch 32
    num_training_steps = math.ceil(len(train_loader) / grad_accum) * num_epochs
    warmup_steps = max(10, int(0.1 * num_training_steps))
    scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=warmup_steps, num_training_steps=num_training_steps)

    scaler = torch.amp.GradScaler('cuda', enabled=True)
    ema = EMA(model, decay=0.99)

    best_score = -1.0
    best_val_preds = None

    t0 = time.time()
    global_step = 0
    for epoch in range(num_epochs):
        model.train()
        tr_loss = 0.0
        optimizer.zero_grad(set_to_none=True)
        for step, batch in enumerate(train_loader):
            inputs = {k: v.to(device, non_blocking=True) for k, v in batch.items() if k not in ('labels','token_type_ids')}
            labels = batch['labels'].to(device, non_blocking=True)
            with torch.amp.autocast('cuda', enabled=True):
                logits, loss = model(**inputs, labels=labels)
                loss = loss / grad_accum
            scaler.scale(loss).backward()
            if (step + 1) % grad_accum == 0:
                scaler.step(optimizer)
                scaler.update()
                optimizer.zero_grad(set_to_none=True)
                scheduler.step()
                if global_step >= warmup_steps:
                    ema.update(model)
                global_step += 1
            tr_loss += loss.item() * grad_accum
            if (step+1) % 100 == 0:
                print(f'  Epoch {epoch+1} step {step+1}/{len(train_loader)} loss={tr_loss/(step+1):.4f}', flush=True)

        # Validation: compare plain vs EMA
        def evaluate(use_ema: bool):
            if use_ema:
                ema.apply_to(model)
            model.eval()
            preds, tgts = [], []
            with torch.no_grad():
                for batch in val_loader:
                    inputs = {k: v.to(device, non_blocking=True) for k, v in batch.items() if k not in ('labels','token_type_ids')}
                    labels = batch['labels']
                    logits, _ = model(**inputs, labels=None)
                    preds.append(logits.float().cpu().numpy())
                    tgts.append(labels.float().cpu().numpy())
            if use_ema:
                ema.restore(model)
            preds = np.concatenate(preds, axis=0)
            tgts = np.concatenate(tgts, axis=0)
            score, per_col = spearman_cols(preds, tgts)
            return score, per_col, preds

        s_plain, per_plain, vp_plain = evaluate(use_ema=False)
        s_ema,   per_ema,   vp_ema   = evaluate(use_ema=True)
        p_stats = vp_plain.ravel(); t_stats = df_va[target_cols].to_numpy(dtype=np.float32).ravel()
        print(f'    val preds (plain) min/max/mean/std: {p_stats.min():.3f}/{p_stats.max():.3f}/{p_stats.mean():.3f}/{p_stats.std():.3f}')
        print(f'    val tgts           min/max/mean/std: {t_stats.min():.3f}/{t_stats.max():.3f}/{t_stats.mean():.3f}/{t_stats.std():.3f}')
        if ANSWER_HELPFUL_IDX is not None:
            print(f"    answer_helpful Spearman plain/EMA: {per_plain[ANSWER_HELPFUL_IDX]:.5f}/{per_ema[ANSWER_HELPFUL_IDX]:.5f}")
        print(f'  Epoch {epoch+1} mean-30 Spearman plain/EMA: {s_plain:.5f}/{s_ema:.5f} | time {(time.time()-t0):.1f}s')
        score = s_plain if s_plain >= s_ema else s_ema
        val_preds = vp_plain if s_plain >= s_ema else vp_ema
        if score > best_score:
            best_score = score
            best_val_preds = val_preds.copy()

    # Test prediction with EMA weights by default
    ema.apply_to(model)
    model.eval()
    test_preds = []
    with torch.no_grad():
        for batch in test_loader:
            inputs = {k: v.to(device, non_blocking=True) for k, v in batch.items() if k not in ('labels','token_type_ids')}
            logits, _ = model(**inputs, labels=None)
            test_preds.append(logits.float().cpu().numpy())
    ema.restore(model)
    test_preds = np.concatenate(test_preds, axis=0)

    del model, optimizer, scheduler, scaler, train_loader, val_loader, test_loader, ds_tr, ds_va, ds_te
    torch.cuda.empty_cache(); gc.collect()
    return best_val_preds, test_preds, best_score

# Run CV
unique_folds = np.unique(folds)
oof = np.zeros((len(train), len(target_cols)), dtype=np.float32)
test_accum = np.zeros((len(unique_folds), len(test), len(target_cols)), dtype=np.float32)
fold_scores = []

overall_t0 = time.time()
for i, fold in enumerate(unique_folds):
    tr_idx = np.where(folds != fold)[0]
    va_idx = np.where(folds == fold)[0]
    va_pred, te_pred, score = run_fold(fold, tr_idx, va_idx)
    oof[va_idx] = va_pred
    test_accum[i] = te_pred
    fold_scores.append(float(score))
    print(f'Fold {fold} best mean-30 Spearman: {score:.5f}', flush=True)

oof_mean_score, oof_percol = spearman_cols(oof, train[target_cols].astype(np.float32).values)
print('Fold mean Spearmans:', [round(s,5) for s in fold_scores])
print(f'OOF mean-30 Spearman: {oof_mean_score:.5f}')

# Save OOF/test
np.save('oof_all_targets_deberta_base.npy', np.clip(oof, 0, 1).astype(np.float32))
test_pred = test_accum.mean(axis=0).astype(np.float32)
test_pred = np.clip(test_pred, 0.0, 1.0).astype(np.float32)
np.save('test_all_targets_deberta_base.npy', test_pred)

# Build submission
sub = sample_sub.copy()
sub[id_col] = test[id_col].values
for i, col in enumerate(target_cols):
    sub[col] = test_pred[:, i]
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv with multitask transformer predictions. Total time:', round(time.time()-overall_t0,1),'s')

Using device: cuda
Target dtypes: [dtype('float64')]




Fold 0 start: tr=4376 va=1095


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 1 step 100/274 loss=0.0696


  Epoch 1 step 200/274 loss=0.0505


    val preds (plain) min/max/mean/std: -0.160/1.080/0.478/0.341
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.415
    answer_helpful Spearman plain/EMA: 0.02759/0.01583
  Epoch 1 mean-30 Spearman plain/EMA: 0.25957/0.16082 | time 221.3s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 2 step 100/274 loss=0.0271


  Epoch 2 step 200/274 loss=0.0264


    val preds (plain) min/max/mean/std: -0.145/1.125/0.484/0.357
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.415
    answer_helpful Spearman plain/EMA: 0.08403/0.03672
  Epoch 2 mean-30 Spearman plain/EMA: 0.32423/0.28156 | time 444.9s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 3 step 100/274 loss=0.0236


  Epoch 3 step 200/274 loss=0.0232


    val preds (plain) min/max/mean/std: -0.220/1.184/0.483/0.360
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.415
    answer_helpful Spearman plain/EMA: 0.10203/0.08477
  Epoch 3 mean-30 Spearman plain/EMA: 0.34020/0.32822 | time 669.2s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 4 step 100/274 loss=0.0217


  Epoch 4 step 200/274 loss=0.0218


    val preds (plain) min/max/mean/std: -0.219/1.209/0.484/0.363
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.415
    answer_helpful Spearman plain/EMA: 0.10431/0.10182
  Epoch 4 mean-30 Spearman plain/EMA: 0.34500/0.34205 | time 893.5s


Fold 0 best mean-30 Spearman: 0.34500


Fold 1 start: tr=4377 va=1094


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 1 step 100/274 loss=0.0717


  Epoch 1 step 200/274 loss=0.0514


    val preds (plain) min/max/mean/std: -0.176/1.113/0.474/0.348
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.413
    answer_helpful Spearman plain/EMA: 0.16863/0.11469
  Epoch 1 mean-30 Spearman plain/EMA: 0.24620/0.13213 | time 223.7s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 2 step 100/274 loss=0.0270


  Epoch 2 step 200/274 loss=0.0264


    val preds (plain) min/max/mean/std: -0.179/1.139/0.473/0.355
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.413
    answer_helpful Spearman plain/EMA: 0.17789/0.17127
  Epoch 2 mean-30 Spearman plain/EMA: 0.29495/0.25601 | time 448.0s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 3 step 100/274 loss=0.0236


  Epoch 3 step 200/274 loss=0.0232


    val preds (plain) min/max/mean/std: -0.248/1.135/0.471/0.364
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.413
    answer_helpful Spearman plain/EMA: 0.16486/0.18110
  Epoch 3 mean-30 Spearman plain/EMA: 0.31772/0.30432 | time 672.3s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 4 step 100/274 loss=0.0215


  Epoch 4 step 200/274 loss=0.0217


    val preds (plain) min/max/mean/std: -0.246/1.146/0.476/0.364
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.413
    answer_helpful Spearman plain/EMA: 0.16748/0.17165
  Epoch 4 mean-30 Spearman plain/EMA: 0.32069/0.31730 | time 896.4s


Fold 1 best mean-30 Spearman: 0.32069


Fold 2 start: tr=4377 va=1094


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 1 step 100/274 loss=0.0668


  Epoch 1 step 200/274 loss=0.0487


    val preds (plain) min/max/mean/std: -0.187/1.126/0.463/0.349
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.414
    answer_helpful Spearman plain/EMA: 0.09119/0.05086
  Epoch 1 mean-30 Spearman plain/EMA: 0.24995/0.14592 | time 223.7s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 2 step 100/274 loss=0.0262


  Epoch 2 step 200/274 loss=0.0256


    val preds (plain) min/max/mean/std: -0.216/1.182/0.470/0.359
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.414
    answer_helpful Spearman plain/EMA: 0.10489/0.09390
  Epoch 2 mean-30 Spearman plain/EMA: 0.30284/0.26319 | time 448.1s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 3 step 100/274 loss=0.0230


  Epoch 3 step 200/274 loss=0.0227


    val preds (plain) min/max/mean/std: -0.195/1.189/0.473/0.362
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.414
    answer_helpful Spearman plain/EMA: 0.11766/0.10383
  Epoch 3 mean-30 Spearman plain/EMA: 0.32208/0.30705 | time 672.5s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 4 step 100/274 loss=0.0211


  Epoch 4 step 200/274 loss=0.0213


    val preds (plain) min/max/mean/std: -0.203/1.206/0.474/0.365
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.414
    answer_helpful Spearman plain/EMA: 0.10982/0.10682
  Epoch 4 mean-30 Spearman plain/EMA: 0.32306/0.31976 | time 896.6s


Fold 2 best mean-30 Spearman: 0.32306


Fold 3 start: tr=4377 va=1094


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 1 step 100/274 loss=0.0596


  Epoch 1 step 200/274 loss=0.0451


    val preds (plain) min/max/mean/std: -0.183/1.120/0.472/0.353
    val tgts           min/max/mean/std: 0.000/1.000/0.474/0.415
    answer_helpful Spearman plain/EMA: 0.09215/0.05395
  Epoch 1 mean-30 Spearman plain/EMA: 0.25105/0.15890 | time 223.7s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 2 step 100/274 loss=0.0263


  Epoch 2 step 200/274 loss=0.0257


    val preds (plain) min/max/mean/std: -0.178/1.157/0.476/0.362
    val tgts           min/max/mean/std: 0.000/1.000/0.474/0.415
    answer_helpful Spearman plain/EMA: 0.13999/0.11882
  Epoch 2 mean-30 Spearman plain/EMA: 0.31313/0.27545 | time 448.0s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 3 step 100/274 loss=0.0230


  Epoch 3 step 200/274 loss=0.0225


    val preds (plain) min/max/mean/std: -0.176/1.208/0.479/0.367
    val tgts           min/max/mean/std: 0.000/1.000/0.474/0.415
    answer_helpful Spearman plain/EMA: 0.14840/0.14676
  Epoch 3 mean-30 Spearman plain/EMA: 0.33498/0.32023 | time 672.3s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 4 step 100/274 loss=0.0215


  Epoch 4 step 200/274 loss=0.0213


    val preds (plain) min/max/mean/std: -0.169/1.182/0.477/0.368
    val tgts           min/max/mean/std: 0.000/1.000/0.474/0.415
    answer_helpful Spearman plain/EMA: 0.14625/0.14744
  Epoch 4 mean-30 Spearman plain/EMA: 0.33771/0.33383 | time 896.5s


Fold 3 best mean-30 Spearman: 0.33771


Fold 4 start: tr=4377 va=1094


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 1 step 100/274 loss=0.0567


  Epoch 1 step 200/274 loss=0.0434


    val preds (plain) min/max/mean/std: -0.165/1.132/0.477/0.353
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.413
    answer_helpful Spearman plain/EMA: -0.00698/-0.06542
  Epoch 1 mean-30 Spearman plain/EMA: 0.25801/0.15612 | time 223.8s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 2 step 100/274 loss=0.0261


  Epoch 2 step 200/274 loss=0.0254


    val preds (plain) min/max/mean/std: -0.209/1.120/0.480/0.360
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.413
    answer_helpful Spearman plain/EMA: 0.09658/0.02579
  Epoch 2 mean-30 Spearman plain/EMA: 0.31544/0.27332 | time 448.1s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 3 step 100/274 loss=0.0221


  Epoch 3 step 200/274 loss=0.0222


    val preds (plain) min/max/mean/std: -0.254/1.155/0.479/0.362
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.413
    answer_helpful Spearman plain/EMA: 0.12599/0.09767
  Epoch 3 mean-30 Spearman plain/EMA: 0.33381/0.31829 | time 672.2s


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


  Epoch 4 step 100/274 loss=0.0214


  Epoch 4 step 200/274 loss=0.0212


    val preds (plain) min/max/mean/std: -0.271/1.170/0.479/0.364
    val tgts           min/max/mean/std: 0.000/1.000/0.475/0.413
    answer_helpful Spearman plain/EMA: 0.12760/0.11942
  Epoch 4 mean-30 Spearman plain/EMA: 0.33621/0.33194 | time 896.6s


Fold 4 best mean-30 Spearman: 0.33621


Fold mean Spearmans: [0.345, 0.32069, 0.32306, 0.33771, 0.33621]
OOF mean-30 Spearman: 0.32910
Saved submission.csv with multitask transformer predictions. Total time: 4542.9 s
