# Plan: Tweet Sentiment Extraction (Medal-Oriented)

Objectives:
- Establish reliable CV and a fast baseline ASAP
- Train a strong span-extraction transformer (QA-style) with start/end heads
- Apply robust post-processing (especially for neutral → full tweet, punctuation trimming, whitespace fixes)
- Ensemble diverse seeds/folds; calibrate with OOF
- Iterate via error analysis

Validation:
- Use StratifiedKFold by sentiment (5 folds) with deterministic seeds
- Fit transforms inside folds; cache features/logits
- Score OOF via Jaccard on reconstructed text

Baselines and Iteration:
1) Smoke-check environment (GPU) and data integrity
2) Heuristic baseline:
   - neutral → tweet_text
   - positive/negative → simple char-span heuristic centered on sentiment words (regex-based), fallback to tweet_text
   - Expect ~0.63–0.67 Jaccard (sanity gate)
3) Transformer QA model:
   - Input: "question" = sentiment; "context" = tweet
   - Tokenizer: RoBERTa-base (byte-level BPE) or DeBERTa-v3-base
   - Max length ~96–128; pad/truncate only context
   - Loss: cross-entropy on start/end; label smoothing 0.05
   - Optimizer: AdamW, lr warmup, cosine decay; epochs ~3–5 with early stop
   - Folds: 5; save start/end logits per fold for OOF/test; average logits
   - Post-process: ensure start<=end, neutral full span, detokenize with offsets, trim spaces/punctuation only if improves Jaccard on OOF
   - Target Jaccard: ≥0.715 OOF before ensembling; ≥0.72 with careful PP/ensembles
4) Ensembling:
   - Blend seeds (2–3) per backbone
   - Consider two backbones (RoBERTa-base + DeBERTa-v3-base) if time
   - Weighted average of logits; weights chosen by OOF

Risk Controls / Checks:
- Always print fold progress and elapsed time
- Cache tokenized dataset and offsets
- Verify submission.csv format and encoding

Milestones (with expert reviews):
A) Plan review (this cell)
B) Data loading + EDA snapshot
C) Baseline heuristic + CV
D) Transformer v1 (single backbone, 5-fold) + PP v1
E) Error analysis + PP v2
F) Ensembling seeds/backbones
G) Final checks and submission

In [1]:
import os, sys, time, json, textwrap, math, random, statistics as stats, subprocess
import pandas as pd
import numpy as np

print('=== GPU CHECK ===', flush=True)
try:
    out = subprocess.run(['bash','-lc','nvidia-smi || true'], capture_output=True, text=True, check=False)
    print(out.stdout)
except Exception as e:
    print('nvidia-smi failed:', e)

print('Python', sys.version)
print('CWD:', os.getcwd(), flush=True)

train_path = 'train.csv'
test_path = 'test.csv'
ss_path = 'sample_submission.csv'
for p in [train_path, test_path, ss_path]:
    print(p, os.path.exists(p), os.path.getsize(p) if os.path.exists(p) else None)

train = pd.read_csv(train_path)
test = pd.read_csv(test_path)
ss = pd.read_csv(ss_path)

print('\n=== Head(train) ===')
print(train.head(3).to_string(index=False))
print('\n=== Head(test) ===')
print(test.head(3).to_string(index=False))
print('\n=== Head(sample_submission) ===')
print(ss.head(3).to_string(index=False))

print('\nShapes:', train.shape, test.shape)
print('Columns(train):', list(train.columns))
print('Columns(test):', list(test.columns))

# Basic checks
print('\nNulls in train:')
print(train.isnull().sum())
print('\nSentiment distribution (train):')
print(train['sentiment'].value_counts())

train['tweet_len'] = train['text'].astype(str).str.len()
train['sel_len'] = train['selected_text'].astype(str).str.len()
print('\nTweet length stats:', train['tweet_len'].describe(percentiles=[0.5,0.9,0.95,0.99]).to_dict())
print('Selected length stats:', train['sel_len'].describe(percentiles=[0.5,0.9,0.95,0.99]).to_dict())

def jaccard(a, b):
    a = str(a)
    b = str(b)
    a_set = set(a.split())
    b_set = set(b.split())
    if not a_set and not b_set:
        return 1.0
    if not a_set or not b_set:
        return 0.0
    inter = len(a_set & b_set)
    union = len(a_set | b_set)
    return inter / union if union else 0.0

# Quick sanity: compute jaccard of gold vs itself
jac_self = train.apply(lambda r: jaccard(r['selected_text'], r['selected_text']), axis=1).mean()
print('\nSanity Jaccard(selected vs selected):', jac_self)

print('\nSample rows:')
for i in range(3):
    r = train.sample(1, random_state=42+i).iloc[0]
    print({'id': r['textID'], 'sentiment': r['sentiment'], 'text': r['text'][:120], 'selected': r['selected_text']})

print('\nDone EDA snapshot.', flush=True)

=== GPU CHECK ===


Tue Sep 30 15:43:56 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.06             Driver Version: 550.144.06     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |     182MiB /  24512MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

train.csv True 3151814
test.csv True 244504
sample_submission.csv True 33009

=== Head(train) ===
    textID                                                                         text                                      selected_text sentiment
8d4ad58b45                           eating breakfast  getting ready to go to school ;( eating breakfast  getting ready to go to school ;(  negative
fdfe12a800 Going to fold laundry and then hit the sack. I have boring saturday evenings                    I have boring saturday evenings  negative
5efd224f4e            happy mothers day to all   im off to spend the day with my family                                              happy  positive

=== Head(test) ===
    textID                                                                                                                          text sentiment
80a1e6bc32                                                                                  I just saw a shooting star... I made my wish  p


Sanity Jaccard(selected vs selected): 1.0

Sample rows:


{'id': '62da1d2932', 'sentiment': 'positive', 'text': ' Thanks for the advice! Went to the doctor`s and slept a lot yesterday  Must be the meds.', 'selected': 'Thanks'}
{'id': '8027499c43', 'sentiment': 'positive', 'text': ' I absolutely LOVE you.  Thanks', 'selected': 'I absolutely LOVE you.  Thanks'}
{'id': 'eeb70d82b3', 'sentiment': 'negative', 'text': 'i`m so tired  of being sick ALL the time!!!!', 'selected': 'i`m so tired'}

Done EDA snapshot.


In [2]:
# Heuristic baseline: build quick OOF estimate and a valid submission
import re

pos_keywords = [
    'love','lovely','awesome','amazing','great','good','best','glad','happy','yay','thanks','thank you','excited',
    ':)',':-)',':D','<3','lol','lmao','rofl'
]
neg_keywords = [
    "n't",'not','no ',' never','sad','bad','worse','worst','hate','angry','upset','sucks','tired','sick','terrible','awful',
    ':(',' :-(',' :/',' :|'
]

def find_keyword_span(text: str, kws):
    if not isinstance(text, str) or not text:
        return None
    low = text.lower()
    for kw in kws:
        kw_low = kw.lower()
        idx = low.find(kw_low)
        if idx != -1:
            return text[idx: idx + len(kw)]
    return None

def heuristic_selected(text: str, sentiment: str):
    if not isinstance(text, str) or not text:
        return ''
    s = (sentiment or '').strip().lower()
    if s == 'neutral':
        return text
    if s == 'positive':
        span = find_keyword_span(text, pos_keywords)
        return span if span else text
    if s == 'negative':
        span = find_keyword_span(text, neg_keywords)
        return span if span else text
    return text

# OOF heuristic score on train
train_pred = train.apply(lambda r: heuristic_selected(r['text'], r['sentiment']), axis=1)
heuristic_oof = train.apply(lambda r: jaccard(r['selected_text'], heuristic_selected(r['text'], r['sentiment'])), axis=1).mean()
print(f'Heuristic OOF Jaccard (sanity): {heuristic_oof:.4f}')

# Build submission
sub = test.copy()
sub['selected_text'] = sub.apply(lambda r: heuristic_selected(r['text'], r['sentiment']), axis=1)
submission_path = 'submission.csv'
sub[['textID','selected_text']].to_csv(submission_path, index=False)
print('Wrote', submission_path, 'Head:')
print(sub[['textID','selected_text']].head().to_string(index=False))

Heuristic OOF Jaccard (sanity): 0.5902
Wrote submission.csv Head:
    textID                                                                                       selected_text
80a1e6bc32                                                        I just saw a shooting star... I made my wish
863097735d                                                                                               upset
264cd5277f             tired and didn`t really have an exciting Saturday.  oh well, hope it`s better tomorrow.
baee1e6ffc                                                              i`ve been eating cheetos all morning..
67d06a8dee  haiiii sankQ i`m fineee ima js get a checkup cos my rib hurts LOL idk but i shall be fine ~ thanks


In [3]:
# Install cu121 torch stack and core NLP deps; sanity-check GPU
import os, sys, subprocess, shutil, time

def pip(*args):
    print('> pip', *args, flush=True)
    subprocess.run([sys.executable, '-m', 'pip', *args], check=True)

# Uninstall any pre-existing torch stacks to avoid duplicates
for pkg in ('torch','torchvision','torchaudio'):
    try:
        subprocess.run([sys.executable, '-m', 'pip', 'uninstall', '-y', pkg], check=False)
    except Exception as e:
        print('Uninstall warn:', pkg, e)

# Clean stray site dirs that can shadow correct wheels (idempotent)
for d in (
    '/app/.pip-target/torch',
    '/app/.pip-target/torch-2.8.0.dist-info',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchvision-0.23.0.dist-info',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torchaudio-2.8.0.dist-info',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
    '/app/.pip-target/torchgen',
    '/app/.pip-target/functorch',
):
    if os.path.exists(d):
        print('Removing', d, flush=True)
        shutil.rmtree(d, ignore_errors=True)

# Install exact cu121 torch stack
pip('install',
    '--index-url', 'https://download.pytorch.org/whl/cu121',
    '--extra-index-url', 'https://pypi.org/simple',
    'torch==2.4.1', 'torchvision==0.19.1', 'torchaudio==2.4.1')

# Constraints to pin torch versions for later installs
from pathlib import Path
Path('constraints.txt').write_text('torch==2.4.1\ntorchvision==0.19.1\ntorchaudio==2.4.1\n')

# Install NLP deps honoring constraints
pip('install', '-c', 'constraints.txt',
    'transformers==4.44.2', 'accelerate==0.34.2',
    'datasets==2.21.0', 'evaluate==0.4.2',
    'sentencepiece', 'scikit-learn', 'tqdm',
    '--upgrade-strategy', 'only-if-needed')

import torch
print('torch:', torch.__version__, 'built CUDA:', getattr(torch.version, 'cuda', None), flush=True)
print('CUDA available:', torch.cuda.is_available(), flush=True)
assert str(getattr(torch.version,'cuda','')).startswith('12.1'), f'Wrong CUDA build: {torch.version.cuda}'
assert torch.cuda.is_available(), 'CUDA not available'
print('GPU:', torch.cuda.get_device_name(0), flush=True)





> pip install --index-url https://download.pytorch.org/whl/cu121 --extra-index-url https://pypi.org/simple torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1




Looking in indexes: https://download.pytorch.org/whl/cu121, https://pypi.org/simple


Collecting torch==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torch-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (799.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 MB 415.0 MB/s eta 0:00:00


Collecting torchvision==0.19.1
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.19.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 517.1 MB/s eta 0:00:00


Collecting torchaudio==2.4.1
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.4.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 351.7 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 252.4 MB/s eta 0:00:00


Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 259.4 MB/s eta 0:00:00


Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 242.4 MB/s eta 0:00:00


Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 304.2 MB/s eta 0:00:00


Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 269.6 MB/s eta 0:00:00


Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 336.9 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 504.7 MB/s eta 0:00:00


Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 252.5 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 487.1 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)


Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 319.2 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 253.9 MB/s eta 0:00:00


Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 302.4 MB/s eta 0:00:00


Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 272.3 MB/s eta 0:00:00


Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 259.4 MB/s eta 0:00:00


Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 511.2 MB/s eta 0:00:00


Collecting fsspec
  Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 496.8 MB/s eta 0:00:00
Collecting typing-extensions>=4.8.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 391.3 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 156.4 MB/s eta 0:00:00


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 313.4 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 297.8 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB)
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 567.3 MB/s eta 0:00:00


Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio


Successfully installed MarkupSafe-3.0.3 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.3.0 sympy-1.14.0 torch-2.4.1+cu121 torchaudio-2.4.1+cu121 torchvision-0.19.1+cu121 triton-3.0.0 typing-extensions-4.15.0


> pip install -c constraints.txt transformers==4.44.2 accelerate==0.34.2 datasets==2.21.0 evaluate==0.4.2 sentencepiece scikit-learn tqdm --upgrade-strategy only-if-needed


Collecting transformers==4.44.2
  Downloading transformers-4.44.2-py3-none-any.whl (9.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.5/9.5 MB 132.5 MB/s eta 0:00:00
Collecting accelerate==0.34.2
  Downloading accelerate-0.34.2-py3-none-any.whl (324 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 324.4/324.4 KB 513.7 MB/s eta 0:00:00


Collecting datasets==2.21.0
  Downloading datasets-2.21.0-py3-none-any.whl (527 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 527.3/527.3 KB 533.4 MB/s eta 0:00:00
Collecting evaluate==0.4.2
  Downloading evaluate-0.4.2-py3-none-any.whl (84 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.1/84.1 KB 453.4 MB/s eta 0:00:00
Collecting sentencepiece
  Downloading sentencepiece-0.2.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 389.1 MB/s eta 0:00:00


Collecting scikit-learn
  Downloading scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (9.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 540.5 MB/s eta 0:00:00
Collecting tqdm
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 406.8 MB/s eta 0:00:00


Collecting numpy>=1.17
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 511.2 MB/s eta 0:00:00
Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.7/64.7 KB 378.6 MB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.23.2
  Downloading huggingface_hub-0.35.3-py3-none-any.whl (564 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 564.3/564.3 KB 520.2 MB/s eta 0:00:00


Collecting pyyaml>=5.1
  Downloading pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.6/806.6 KB 517.3 MB/s eta 0:00:00


Collecting regex!=2019.12.17
  Downloading regex-2025.9.18-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (798 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.0/799.0 KB 512.1 MB/s eta 0:00:00
Collecting packaging>=20.0
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 408.9 MB/s eta 0:00:00


Collecting safetensors>=0.4.1
  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.8/485.8 KB 506.7 MB/s eta 0:00:00


Collecting tokenizers<0.20,>=0.19
  Downloading tokenizers-0.19.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 66.6 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)


Collecting psutil
  Downloading psutil-7.1.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 291.2/291.2 KB 495.8 MB/s eta 0:00:00
Collecting torch>=1.10.0
  Downloading torch-2.4.1-cp311-cp311-manylinux1_x86_64.whl (797.1 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 342.9 MB/s eta 0:00:00


Collecting pyarrow>=15.0.0
  Downloading pyarrow-21.0.0-cp311-cp311-manylinux_2_28_x86_64.whl (42.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.8/42.8 MB 267.5 MB/s eta 0:00:00


Collecting pandas
  Downloading pandas-2.3.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 571.3 MB/s eta 0:00:00
Collecting fsspec[http]<=2024.6.1,>=2023.1.0
  Downloading fsspec-2024.6.1-py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.6/177.6 KB 517.7 MB/s eta 0:00:00


Collecting aiohttp
  Downloading aiohttp-3.12.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 565.7 MB/s eta 0:00:00
Collecting multiprocess
  Downloading multiprocess-0.70.18-py311-none-any.whl (144 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.5/144.5 KB 375.6 MB/s eta 0:00:00
Collecting xxhash
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.8/194.8 KB 504.8 MB/s eta 0:00:00
Collecting dill<0.3.9,>=0.3.0
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 KB 516.2 MB/s eta 0:00:00


Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 308.4/308.4 KB 518.7 MB/s eta 0:00:00


Collecting scipy>=1.8.0
  Downloading scipy-1.16.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 MB 321.9 MB/s eta 0:00:00
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)


Collecting aiosignal>=1.4.0
  Downloading aiosignal-1.4.0-py3-none-any.whl (7.5 kB)
Collecting frozenlist>=1.1.1
  Downloading frozenlist-1.7.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (235 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 235.3/235.3 KB 497.3 MB/s eta 0:00:00
Collecting propcache>=0.2.0
  Downloading propcache-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.5/213.5 KB 515.1 MB/s eta 0:00:00


Collecting attrs>=17.3.0
  Downloading attrs-25.3.0-py3-none-any.whl (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.8/63.8 KB 367.9 MB/s eta 0:00:00
Collecting aiohappyeyeballs>=2.5.0
  Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)


Collecting yarl<2.0,>=1.17.0
  Downloading yarl-1.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (348 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 349.0/349.0 KB 503.8 MB/s eta 0:00:00


Collecting multidict<7.0,>=4.5
  Downloading multidict-6.6.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (246 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 246.7/246.7 KB 520.1 MB/s eta 0:00:00
Collecting typing-extensions>=3.7.4.3
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 396.0 MB/s eta 0:00:00
Collecting hf-xet<2.0.0,>=1.1.3
  Downloading hf_xet-1.1.10-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 513.7 MB/s eta 0:00:00


Collecting certifi>=2017.4.17
  Downloading certifi-2025.8.3-py3-none-any.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 513.5 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.10-py3-none-any.whl (70 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 436.4 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 KB 500.2 MB/s eta 0:00:00
Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (150 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.3/150.3 KB 445.1 MB/s eta 0:00:00


Collecting triton==3.0.0
  Downloading triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 264.5 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 503.0 MB/s eta 0:00:00


Collecting nvidia-nccl-cu12==2.20.5
  Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 502.3 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 506.7 MB/s eta 0:00:00


Collecting nvidia-curand-cu12==10.3.2.106
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 512.1 MB/s eta 0:00:00
Collecting sympy
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 478.8 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 501.2 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 493.0 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu12==12.1.105
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 468.4 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 496.8 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 489.7 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 512.0 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 KB 505.2 MB/s eta 0:00:00
Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 571.6 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 287.0 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 566.5 MB/s eta 0:00:00
Collecting multiprocess
  Downloading multiprocess-0.70.17-py311-none-any.whl (144 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.3/144.3 KB 280.0 MB/s eta 0:00:00
  Downloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.5/143.5 KB 523.6 MB/s eta 0:00:00


Collecting tzdata>=2022.7
  Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 347.8/347.8 KB 552.5 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.2
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 KB 509.3 MB/s eta 0:00:00
Collecting pytz>=2020.1
  Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 509.2/509.2 KB 568.0 MB/s eta 0:00:00


Collecting six>=1.5
  Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Collecting MarkupSafe>=2.0
  Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB)
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 559.2 MB/s eta 0:00:00


Installing collected packages: pytz, mpmath, xxhash, urllib3, tzdata, typing-extensions, tqdm, threadpoolctl, sympy, six, sentencepiece, safetensors, regex, pyyaml, pyarrow, psutil, propcache, packaging, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, multidict, MarkupSafe, joblib, idna, hf-xet, fsspec, frozenlist, filelock, dill, charset_normalizer, certifi, attrs, aiohappyeyeballs, yarl, triton, scipy, requests, python-dateutil, nvidia-cusparse-cu12, nvidia-cudnn-cu12, multiprocess, jinja2, aiosignal, scikit-learn, pandas, nvidia-cusolver-cu12, huggingface-hub, aiohttp, torch, tokenizers, transformers, datasets, accelerate, evaluate


Successfully installed MarkupSafe-3.0.3 accelerate-0.34.2 aiohappyeyeballs-2.6.1 aiohttp-3.12.15 aiosignal-1.4.0 attrs-25.3.0 certifi-2025.8.3 charset_normalizer-3.4.3 datasets-2.21.0 dill-0.3.8 evaluate-0.4.2 filelock-3.19.1 frozenlist-1.7.0 fsspec-2024.6.1 hf-xet-1.1.10 huggingface-hub-0.35.3 idna-3.10 jinja2-3.1.6 joblib-1.5.2 mpmath-1.3.0 multidict-6.6.4 multiprocess-0.70.16 networkx-3.5 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 packaging-25.0 pandas-2.3.3 propcache-0.3.2 psutil-7.1.0 pyarrow-21.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 pyyaml-6.0.3 regex-2025.9.18 requests-2.32.5 safetensors-0.6.2 scikit-learn-1.7.2 scipy-1.16.2 sentencepiece-0.2.1 six-1.17.0









torch: 2.4.1+cu121 built CUDA: 12.1


CUDA available: True


GPU: NVIDIA A10-24Q


In [4]:
# QA setup: tokenizer, CV folds, and alignment sanity checks
import re, numpy as np, random, time
from sklearn.model_selection import StratifiedKFold
from transformers import AutoTokenizer

SEED = 42
def seed_everything(seed=SEED):
    random.seed(seed); np.random.seed(seed)
seed_everything()

# Drop nulls for label creation
train_clean = train.dropna(subset=['text','selected_text']).reset_index(drop=True)

# Create stratified folds by sentiment
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)
folds = np.full(len(train_clean), -1, dtype=int)
for f, (_, val_idx) in enumerate(skf.split(train_clean, train_clean['sentiment'])):
    folds[val_idx] = f
train_clean['fold'] = folds
print('Folds assigned:', np.bincount(folds))

model_name = 'roberta-base'
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

IGNORE_INDEX = -100
def tokenize_and_align(tokenizer, text, sentiment, selected_text=None, max_len=128):
    enc = tokenizer(
        str(sentiment), str(text),
        max_length=max_len, padding='max_length',
        truncation='only_second', add_special_tokens=True,
        return_offsets_mapping=True, return_attention_mask=True
    )
    offsets = enc['offset_mapping']
    seq_ids = enc.sequence_ids()
    ctx_idx = [i for i, sid in enumerate(seq_ids) if sid == 1]
    start_pos = end_pos = IGNORE_INDEX
    if isinstance(selected_text, str) and isinstance(text, str) and ctx_idx:
        # find all occurrences; choose the one with max token-char overlap
        matches = [m.start() for m in re.finditer(re.escape(selected_text), text)]
        if matches:
            best = None
            for st_char in matches:
                ed_char = st_char + len(selected_text)
                overlap = 0
                for i in ctx_idx:
                    a, b = offsets[i]
                    overlap += max(0, min(ed_char, b) - max(st_char, a))
                if (best is None) or (overlap > best[0]):
                    best = (overlap, st_char, ed_char)
            _, st_char, ed_char = best
            chosen = [i for i in ctx_idx
                      if offsets[i][1] > offsets[i][0] and
                         max(offsets[i][0], st_char) < min(offsets[i][1], ed_char)]
            if chosen:
                start_pos, end_pos = chosen[0], chosen[-1]
            else:
                dists = [(abs(offsets[i][0]-st_char)+abs(offsets[i][1]-ed_char), i)
                         for i in ctx_idx if offsets[i][1] > offsets[i][0]]
                if dists:
                    start_pos = end_pos = min(dists)[1]
        # else: keep IGNORE_INDEX (rare)
    enc['start_positions'] = start_pos
    enc['end_positions'] = end_pos
    return enc, ctx_idx

# Sanity check a few samples per sentiment and fold
def reconstruct_from_positions(text, offsets, i, j):
    if i < 0 or j < 0: return ''
    s_char, e_char = offsets[i][0], offsets[j][1]
    return text[s_char:e_char]

samples_checked = 0
for s in ['neutral','positive','negative']:
    df_s = train_clean[train_clean.sentiment==s].head(2)
    for _, r in df_s.iterrows():
        enc, ctx = tokenize_and_align(tokenizer, r['text'], r['sentiment'], r['selected_text'], max_len=128)
        pred_span = reconstruct_from_positions(r['text'], enc['offset_mapping'], enc['start_positions'], enc['end_positions'])
        print({'sent': r['sentiment'], 'text_snip': r['text'][:60], 'gold': r['selected_text'], 'recon': pred_span})
        samples_checked += 1
print('Alignment samples printed:', samples_checked)

print('Setup OK. Next: implement model/train loop with 5-fold QA and OOF logging.', flush=True)

  from .autonotebook import tqdm as notebook_tqdm


Folds assigned: [4947 4946 4946 4946 4946]


{'sent': 'neutral', 'text_snip': '  not Pimm`s in a can?', 'gold': 'not Pimm`s in a can?', 'recon': 'not Pimm`s in a can?'}
{'sent': 'neutral', 'text_snip': ' i would, but i don`t know how to do it from the phone...', 'gold': 'i would, but i don`t know how to do it from the phone...', 'recon': 'i would, but i don`t know how to do it from the phone...'}
{'sent': 'positive', 'text_snip': 'happy mothers day to all   im off to spend the day with my f', 'gold': 'happy', 'recon': 'happy'}
{'sent': 'positive', 'text_snip': ' one of my favorite quotes ever', 'gold': 'favorite', 'recon': 'favorite'}
{'sent': 'negative', 'text_snip': 'eating breakfast  getting ready to go to school ;(', 'gold': 'eating breakfast  getting ready to go to school ;(', 'recon': 'eating breakfast  getting ready to go to school ;('}
{'sent': 'negative', 'text_snip': 'Going to fold laundry and then hit the sack. I have boring s', 'gold': 'I have boring saturday evenings', 'recon': 'I have boring saturday evenings'}
Alig



In [5]:
# 5-fold QA training with RoBERTa-base, AMP, OOF logging/caching
import os, gc, math, json, time, numpy as np, pandas as pd, torch
from dataclasses import dataclass
from typing import Optional, Dict, Any, List, Tuple
from transformers import (
    AutoModelForQuestionAnswering,
    Trainer, TrainingArguments,
    default_data_collator,
    get_cosine_schedule_with_warmup,
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
MAX_LEN = 128
BATCH_SIZE = 32
EPOCHS = 3
LR = 2e-5
WARMUP_RATIO = 0.1
WEIGHT_DECAY = 0.01
GRAD_CLIP = 1.0

def build_encodings(df: pd.DataFrame, include_labels: bool = True):
    enc_list = []
    for i, r in df.iterrows():
        sel = r['selected_text'] if include_labels else None
        enc, ctx_idx = tokenize_and_align(tokenizer, r['text'], r['sentiment'], sel, max_len=MAX_LEN)
        # Persist needed fields for eval/decoding
        enc['text'] = r['text']
        enc['sentiment'] = r['sentiment']
        enc_list.append(enc)
    # Stack into arrays
    keys = enc_list[0].keys()
    out: Dict[str, Any] = {}
    for k in keys:
        vals = [e[k] for e in enc_list]
        if k in ('text','sentiment'):
            out[k] = vals
        else:
            out[k] = np.array(vals, dtype=object if k=='offset_mapping' else None)
    return out

class QADataset(torch.utils.data.Dataset):
    def __init__(self, enc: Dict[str, Any], with_labels: bool):
        self.enc = enc
        self.with_labels = with_labels
    def __len__(self):
        return len(self.enc['input_ids'])
    def __getitem__(self, idx):
        item = {
            'input_ids': torch.tensor(self.enc['input_ids'][idx], dtype=torch.long),
            'attention_mask': torch.tensor(self.enc['attention_mask'][idx], dtype=torch.long),
        }
        if self.with_labels:
            item['start_positions'] = torch.tensor(self.enc['start_positions'][idx], dtype=torch.long)
            item['end_positions'] = torch.tensor(self.enc['end_positions'][idx], dtype=torch.long)
        return item

def get_sequence_ids_for_pair(sentiment: str, text: str):
    tmp = tokenizer(str(sentiment), str(text),
                    max_length=MAX_LEN, padding='max_length', truncation='only_second',
                    add_special_tokens=True, return_offsets_mapping=True)
    return tmp.sequence_ids(), tmp['offset_mapping']

def decode_one(start_logits, end_logits, offsets, sequence_ids, text, sentiment, length_penalty_per_char=0.0, low_conf_threshold=None):
    if str(sentiment).strip().lower() == 'neutral':
        return text
    ctx = [i for i, sid in enumerate(sequence_ids) if sid == 1]
    if not ctx:
        return text
    sl = np.asarray(start_logits, dtype=np.float32)
    el = np.asarray(end_logits, dtype=np.float32)
    mask = np.zeros_like(sl, dtype=bool)
    mask[np.array(ctx)] = True
    sl[~mask] = -np.inf; el[~mask] = -np.inf
    best_score = -1e9; bi = bj = ctx[0]
    for i in ctx:
        for j in ctx:
            if j < i: continue
            span_len_chars = offsets[j][1] - offsets[i][0]
            score = sl[i] + el[j] - length_penalty_per_char * span_len_chars
            if score > best_score:
                best_score, bi, bj = score, i, j
    if (low_conf_threshold is not None) and (best_score < low_conf_threshold):
        return text
    s_char, e_char = offsets[bi][0], offsets[bj][1]
    pred = text[s_char:e_char].strip()
    return pred if pred else text

def jaccard_batch(trues: List[str], preds: List[str]):
    return float(np.mean([jaccard(t, p) for t, p in zip(trues, preds)]))

oof_rows = []
start_logits_folds = []
end_logits_folds = []

for fold in range(5):
    t0 = time.time()
    print(f'\n===== Fold {fold} =====', flush=True)
    trn_df = train_clean[train_clean.fold != fold].reset_index(drop=True)
    val_df = train_clean[train_clean.fold == fold].reset_index(drop=True)
    print('Train/Val sizes:', len(trn_df), len(val_df))

    trn_enc = build_encodings(trn_df, include_labels=True)
    val_enc = build_encodings(val_df, include_labels=True)

    train_ds = QADataset(trn_enc, with_labels=True)
    val_ds = QADataset(val_enc, with_labels=True)

    model = AutoModelForQuestionAnswering.from_pretrained('roberta-base')

    total_steps = math.ceil(len(train_ds) / (BATCH_SIZE*2)) * EPOCHS

    # Compute metrics closure will do decoding on val set
    def compute_metrics(eval_pred):
        start_logits, end_logits = eval_pred.predictions
        preds = []
        trues = list(val_df['selected_text'].astype(str).values)
        for i in range(len(val_df)):
            text = val_df.iloc[i]['text']
            sentiment = val_df.iloc[i]['sentiment']
            seq_ids, offs = get_sequence_ids_for_pair(sentiment, text)
            pred_text = decode_one(start_logits[i], end_logits[i], offs, seq_ids, text, sentiment,
                                   length_penalty_per_char=0.003, low_conf_threshold=None)
            preds.append(pred_text)
        score = jaccard_batch(trues, preds)
        return {'jaccard': score}

    args = TrainingArguments(
        output_dir=f'./outputs_fold{fold}',
        evaluation_strategy='epoch',
        save_strategy='epoch',
        load_best_model_at_end=True,
        metric_for_best_model='jaccard',
        greater_is_better=True,
        per_device_train_batch_size=BATCH_SIZE,
        per_device_eval_batch_size=BATCH_SIZE*2,
        gradient_accumulation_steps=2,
        num_train_epochs=EPOCHS,
        fp16=True,
        learning_rate=LR,
        weight_decay=WEIGHT_DECAY,
        warmup_ratio=WARMUP_RATIO,
        lr_scheduler_type='cosine',
        max_grad_norm=GRAD_CLIP,
        dataloader_num_workers=2,
        logging_steps=50,
        save_total_limit=1,
        seed=SEED,
        report_to=[]
    )

    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        tokenizer=tokenizer,
        data_collator=default_data_collator,
        compute_metrics=compute_metrics
    )

    train_out = trainer.train()
    print('Best model metrics:', train_out.metrics, flush=True)
    # Save best model for this fold
    trainer.save_model(f'fold{fold}_best')
    # Record best checkpoint path
    if getattr(trainer.state, 'best_model_checkpoint', None):
        with open(f'fold{fold}_best/path.txt', 'w') as f:
            f.write(trainer.state.best_model_checkpoint)

    # Inference on val to get logits for caching and OOF decode
    val_preds = trainer.predict(val_ds)
    val_start_logits, val_end_logits = val_preds.predictions
    start_logits_folds.append(val_start_logits)
    end_logits_folds.append(val_end_logits)

    # Decode OOF
    val_trues = list(val_df['selected_text'].astype(str).values)
    val_preds_text = []
    for i in range(len(val_df)):
        text = val_df.iloc[i]['text']
        sentiment = val_df.iloc[i]['sentiment']
        seq_ids, offs = get_sequence_ids_for_pair(sentiment, text)
        pred_text = decode_one(val_start_logits[i], val_end_logits[i], offs, seq_ids, text, sentiment,
                               length_penalty_per_char=0.003, low_conf_threshold=None)
        val_preds_text.append(pred_text)
        oof_rows.append({
            'textID': val_df.iloc[i]['textID'],
            'fold': fold,
            'sentiment': sentiment,
            'text': text,
            'selected_text': val_trues[i],
            'pred': pred_text
        })
    fold_j = jaccard_batch(val_trues, val_preds_text)
    print(f'Fold {fold} OOF Jaccard: {fold_j:.5f}; elapsed {time.time()-t0:.1f}s', flush=True)

    # Cleanup
    del trainer, model, train_ds, val_ds, trn_enc, val_enc
    gc.collect(); torch.cuda.empty_cache()

# Aggregate OOF
oof_df = pd.DataFrame(oof_rows)
oof_score = jaccard_batch(oof_df['selected_text'].tolist(), oof_df['pred'].tolist())
print(f'OOF Jaccard (all folds): {oof_score:.5f}', flush=True)
oof_df.to_csv('oof_roberta_base.csv', index=False)
np.save('oof_start_logits_roberta_base.npy', np.concatenate(start_logits_folds, axis=0))
np.save('oof_end_logits_roberta_base.npy', np.concatenate(end_logits_folds, axis=0))
print('Saved OOF artifacts.')

# Predict on test with the best single model per fold and average logits across folds
test_df = test.copy().reset_index(drop=True)
test_enc_list = []
for i, r in test_df.iterrows():
    enc = tokenizer(
        str(r['sentiment']), str(r['text']),
        max_length=MAX_LEN, padding='max_length', truncation='only_second',
        add_special_tokens=True, return_offsets_mapping=True, return_attention_mask=True
    )
    test_enc_list.append(enc)
test_input_ids = torch.tensor([e['input_ids'] for e in test_enc_list], dtype=torch.long)
test_attention_mask = torch.tensor([e['attention_mask'] for e in test_enc_list], dtype=torch.long)

all_fold_test_start = []
all_fold_test_end = []
for fold in range(5):
    print(f'Test inference with fold {fold} checkpoint...', flush=True)
    model = AutoModelForQuestionAnswering.from_pretrained(f'fold{fold}_best').to(device)
    model.eval()
    with torch.no_grad():
        bs = BATCH_SIZE
        starts, ends = [], []
        for i in range(0, len(test_df), bs):
            input_ids = test_input_ids[i:i+bs].to(device)
            attn = test_attention_mask[i:i+bs].to(device)
            out = model(input_ids=input_ids, attention_mask=attn)
            starts.append(out.start_logits.detach().cpu().numpy())
            ends.append(out.end_logits.detach().cpu().numpy())
        starts = np.vstack(starts); ends = np.vstack(ends)
    # cache per-fold test logits
    np.save(f'test_start_fold{fold}.npy', starts)
    np.save(f'test_end_fold{fold}.npy', ends)
    all_fold_test_start.append(starts)
    all_fold_test_end.append(ends)
    del model; gc.collect(); torch.cuda.empty_cache()

avg_test_start = np.mean(all_fold_test_start, axis=0)
avg_test_end = np.mean(all_fold_test_end, axis=0)

# Decode test
test_preds = []
for i in range(len(test_df)):
    text = test_df.iloc[i]['text']
    sentiment = test_df.iloc[i]['sentiment']
    seq_ids, offs = get_sequence_ids_for_pair(sentiment, text)
    pred_text = decode_one(avg_test_start[i], avg_test_end[i], offs, seq_ids, text, sentiment,
                           length_penalty_per_char=0.003, low_conf_threshold=None)
    test_preds.append(pred_text)
sub = pd.DataFrame({'textID': test_df['textID'], 'selected_text': test_preds})
sub.to_csv('submission.csv', index=False)
print('Wrote submission.csv (model). Head:\n', sub.head().to_string(index=False), flush=True)


===== Fold 0 =====


Train/Val sizes: 19784 4947


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 205.9042, 'train_samples_per_second': 288.251, 'train_steps_per_second': 4.502, 'total_flos': 3872417934827520.0, 'train_loss': 1.0689040063654336, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Fold 0 OOF Jaccard: 0.71027; elapsed 224.4s



===== Fold 1 =====


Train/Val sizes: 19785 4946


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 207.682, 'train_samples_per_second': 285.798, 'train_steps_per_second': 4.464, 'total_flos': 3872548583205888.0, 'train_loss': 1.066633819119187, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Fold 1 OOF Jaccard: 0.70796; elapsed 223.1s



===== Fold 2 =====


Train/Val sizes: 19785 4946


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 208.9406, 'train_samples_per_second': 284.076, 'train_steps_per_second': 4.437, 'total_flos': 3872548583205888.0, 'train_loss': 1.0800937232847738, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Fold 2 OOF Jaccard: 0.71144; elapsed 224.4s



===== Fold 3 =====


Train/Val sizes: 19785 4946


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 208.0027, 'train_samples_per_second': 285.357, 'train_steps_per_second': 4.457, 'total_flos': 3872548583205888.0, 'train_loss': 1.072065964485835, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Fold 3 OOF Jaccard: 0.70509; elapsed 223.6s



===== Fold 4 =====


Train/Val sizes: 19785 4946


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 208.9131, 'train_samples_per_second': 284.113, 'train_steps_per_second': 4.437, 'total_flos': 3872548583205888.0, 'train_loss': 1.084695676126655, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Fold 4 OOF Jaccard: 0.70851; elapsed 224.6s


OOF Jaccard (all folds): 0.70866


Saved OOF artifacts.


Test inference with fold 0 checkpoint...


Test inference with fold 1 checkpoint...


Test inference with fold 2 checkpoint...


Test inference with fold 3 checkpoint...


Test inference with fold 4 checkpoint...


Wrote submission.csv (model). Head:
     textID                                                                                       selected_text
80a1e6bc32                                                                                      I made my wish
863097735d                                                                                              sucks!
264cd5277f             tired and didn`t really have an exciting Saturday.  oh well, hope it`s better tomorrow.
baee1e6ffc                                                              i`ve been eating cheetos all morning..
67d06a8dee  haiiii sankQ i`m fineee ima js get a checkup cos my rib hurts LOL idk but i shall be fine ~ thanks


In [6]:
# DeBERTa-v3-base 5-fold QA training (expect +0.005–0.01 OOF vs RoBERTa), same pipeline
import os, gc, math, json, time, numpy as np, pandas as pd, torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, Trainer, TrainingArguments, default_data_collator

MODEL_NAME = 'microsoft/deberta-v3-base'
print('Loading tokenizer/model:', MODEL_NAME, flush=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
MAX_LEN = 128
BATCH_SIZE = 32
EPOCHS = 3
LR = 2e-5
WARMUP_RATIO = 0.1
WEIGHT_DECAY = 0.01
GRAD_CLIP = 1.0

def build_encodings_df(df: pd.DataFrame, include_labels: bool = True):
    enc_list = []
    for _, r in df.iterrows():
        sel = r['selected_text'] if include_labels else None
        enc, _ = tokenize_and_align(tokenizer, r['text'], r['sentiment'], sel, max_len=MAX_LEN)
        enc['text'] = r['text']
        enc['sentiment'] = r['sentiment']
        enc_list.append(enc)
    keys = enc_list[0].keys()
    out = {}
    for k in keys:
        vals = [e[k] for e in enc_list]
        if k in ('text','sentiment'): out[k] = vals
        else: out[k] = np.array(vals, dtype=object if k=='offset_mapping' else None)
    return out

class QADataset2(torch.utils.data.Dataset):
    def __init__(self, enc, with_labels): self.enc, self.with_labels = enc, with_labels
    def __len__(self): return len(self.enc['input_ids'])
    def __getitem__(self, idx):
        item = {
            'input_ids': torch.tensor(self.enc['input_ids'][idx], dtype=torch.long),
            'attention_mask': torch.tensor(self.enc['attention_mask'][idx], dtype=torch.long),
        }
        if self.with_labels:
            item['start_positions'] = torch.tensor(self.enc['start_positions'][idx], dtype=torch.long)
            item['end_positions'] = torch.tensor(self.enc['end_positions'][idx], dtype=torch.long)
        return item

def get_seq_ids_offsets(sentiment: str, text: str):
    tmp = tokenizer(str(sentiment), str(text), max_length=MAX_LEN, padding='max_length', truncation='only_second',
                    add_special_tokens=True, return_offsets_mapping=True)
    return tmp.sequence_ids(), tmp['offset_mapping']

def decode_span(start_logits, end_logits, offsets, sequence_ids, text, sentiment, length_penalty_per_char=0.003):
    if str(sentiment).strip().lower() == 'neutral': return text
    ctx = [i for i, sid in enumerate(sequence_ids) if sid == 1]
    if not ctx: return text
    sl = np.asarray(start_logits, dtype=np.float32); el = np.asarray(end_logits, dtype=np.float32)
    mask = np.zeros_like(sl, dtype=bool); mask[np.array(ctx)] = True
    sl[~mask] = -np.inf; el[~mask] = -np.inf
    best, bi, bj = -1e9, ctx[0], ctx[0]
    for i in ctx:
        for j in ctx:
            if j < i: continue
            span_len = offsets[j][1] - offsets[i][0]
            sc = sl[i] + el[j] - length_penalty_per_char * span_len
            if sc > best: best, bi, bj = sc, i, j
    s_char, e_char = offsets[bi][0], offsets[bj][1]
    pred = text[s_char:e_char].strip()
    return pred if pred else text

def jaccard_batch_fast(trues, preds):
    return float(np.mean([jaccard(t, p) for t, p in zip(trues, preds)]))

oof_rows2, start_logits_folds2, end_logits_folds2 = [], [], []
for fold in range(5):
    t0 = time.time(); print(f'\n===== DeBERTa Fold {fold} =====', flush=True)
    trn_df = train_clean[train_clean.fold != fold].reset_index(drop=True)
    val_df = train_clean[train_clean.fold == fold].reset_index(drop=True)
    print('Train/Val sizes:', len(trn_df), len(val_df))

    trn_enc = build_encodings_df(trn_df, include_labels=True)
    val_enc = build_encodings_df(val_df, include_labels=True)
    train_ds = QADataset2(trn_enc, with_labels=True)
    val_ds = QADataset2(val_enc, with_labels=True)

    model = AutoModelForQuestionAnswering.from_pretrained(MODEL_NAME)

    def compute_metrics(eval_pred):
        start_logits, end_logits = eval_pred.predictions
        preds, trues = [], list(val_df['selected_text'].astype(str).values)
        for i in range(len(val_df)):
            text, sentiment = val_df.iloc[i]['text'], val_df.iloc[i]['sentiment']
            seq_ids, offs = get_seq_ids_offsets(sentiment, text)
            preds.append(decode_span(start_logits[i], end_logits[i], offs, seq_ids, text, sentiment))
        return {'jaccard': jaccard_batch_fast(trues, preds)}

    args = TrainingArguments(
        output_dir=f'./outputs_{MODEL_NAME.replace("/","_")}_fold{fold}',
        evaluation_strategy='epoch', save_strategy='epoch',
        load_best_model_at_end=True, metric_for_best_model='jaccard', greater_is_better=True,
        per_device_train_batch_size=BATCH_SIZE, per_device_eval_batch_size=BATCH_SIZE*2,
        gradient_accumulation_steps=2, num_train_epochs=EPOCHS, fp16=True,
        learning_rate=LR, weight_decay=WEIGHT_DECAY, warmup_ratio=WARMUP_RATIO, lr_scheduler_type='cosine',
        max_grad_norm=GRAD_CLIP, dataloader_num_workers=2, logging_steps=50, save_total_limit=1, seed=SEED, report_to=[]
    )

    trainer = Trainer(model=model, args=args, train_dataset=train_ds, eval_dataset=val_ds,
                      tokenizer=tokenizer, data_collator=default_data_collator, compute_metrics=compute_metrics)
    train_out = trainer.train()
    print('Best model metrics:', train_out.metrics, flush=True)
    save_dir = f'deberta_fold{fold}_best'
    trainer.save_model(save_dir)
    if getattr(trainer.state, 'best_model_checkpoint', None):
        with open(os.path.join(save_dir, 'path.txt'), 'w') as f: f.write(trainer.state.best_model_checkpoint)

    val_preds = trainer.predict(val_ds)
    vsl, vel = val_preds.predictions
    start_logits_folds2.append(vsl); end_logits_folds2.append(vel)

    trues = list(val_df['selected_text'].astype(str).values)
    preds = []
    for i in range(len(val_df)):
        text, sentiment = val_df.iloc[i]['text'], val_df.iloc[i]['sentiment']
        seq_ids, offs = get_seq_ids_offsets(sentiment, text)
        preds.append(decode_span(vsl[i], vel[i], offs, seq_ids, text, sentiment))
        oof_rows2.append({
            'textID': val_df.iloc[i]['textID'], 'fold': fold, 'sentiment': sentiment, 'text': text,
            'selected_text': trues[i], 'pred': preds[-1]
        })
    fj = jaccard_batch_fast(trues, preds)
    print(f'DeBERTa Fold {fold} OOF Jaccard: {fj:.5f}; elapsed {time.time()-t0:.1f}s', flush=True)

    del trainer, model, train_ds, val_ds, trn_enc, val_enc
    gc.collect(); torch.cuda.empty_cache()

oof_df2 = pd.DataFrame(oof_rows2)
oof_score2 = jaccard_batch_fast(oof_df2['selected_text'].tolist(), oof_df2['pred'].tolist())
print(f'DeBERTa OOF Jaccard (all folds): {oof_score2:.5f}', flush=True)
oof_df2.to_csv('oof_deberta_v3_base.csv', index=False)
np.save('oof_start_logits_deberta_v3_base.npy', np.concatenate(start_logits_folds2, axis=0))
np.save('oof_end_logits_deberta_v3_base.npy', np.concatenate(end_logits_folds2, axis=0))
print('Saved DeBERTa OOF artifacts.')

# Test inference with best fold checkpoints and average logits
test_df = test.copy().reset_index(drop=True)
test_enc = [tokenizer(str(r['sentiment']), str(r['text']), max_length=MAX_LEN, padding='max_length',
                     truncation='only_second', add_special_tokens=True, return_offsets_mapping=True, return_attention_mask=True)
            for _, r in test_df.iterrows()]
test_input_ids = torch.tensor([e['input_ids'] for e in test_enc], dtype=torch.long)
test_attention_mask = torch.tensor([e['attention_mask'] for e in test_enc], dtype=torch.long)
fold_starts, fold_ends = [], []
for fold in range(5):
    print(f'DeBERTa test inference fold {fold}...', flush=True)
    model = AutoModelForQuestionAnswering.from_pretrained(f'deberta_fold{fold}_best').to(device); model.eval()
    with torch.no_grad():
        bs = BATCH_SIZE; starts, ends = [], []
        for i in range(0, len(test_df), bs):
            out = model(input_ids=test_input_ids[i:i+bs].to(device), attention_mask=test_attention_mask[i:i+bs].to(device))
            starts.append(out.start_logits.detach().cpu().numpy()); ends.append(out.end_logits.detach().cpu().numpy())
        starts, ends = np.vstack(starts), np.vstack(ends)
    np.save(f'deberta_test_start_fold{fold}.npy', starts); np.save(f'deberta_test_end_fold{fold}.npy', ends)
    fold_starts.append(starts); fold_ends.append(ends)
    del model; gc.collect(); torch.cuda.empty_cache()

avg_st = np.mean(fold_starts, axis=0); avg_en = np.mean(fold_ends, axis=0)
test_preds = []
for i in range(len(test_df)):
    text, sentiment = test_df.iloc[i]['text'], test_df.iloc[i]['sentiment']
    seq_ids, offs = get_seq_ids_offsets(sentiment, text)
    test_preds.append(decode_span(avg_st[i], avg_en[i], offs, seq_ids, text, sentiment))
sub2 = pd.DataFrame({'textID': test_df['textID'], 'selected_text': test_preds})
sub2.to_csv('submission_deberta.csv', index=False)
print('Wrote submission_deberta.csv Head:\n', sub2.head().to_string(index=False))
print('DeBERTa run complete.')

Loading tokenizer/model: microsoft/deberta-v3-base





===== DeBERTa Fold 0 =====


Train/Val sizes: 19784 4947


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 323.9586, 'train_samples_per_second': 183.209, 'train_steps_per_second': 2.861, 'total_flos': 3872487864360960.0, 'train_loss': 1.077058746848163, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa Fold 0 OOF Jaccard: 0.71574; elapsed 344.3s



===== DeBERTa Fold 1 =====


Train/Val sizes: 19785 4946


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 326.1992, 'train_samples_per_second': 181.959, 'train_steps_per_second': 2.842, 'total_flos': 3872618515098624.0, 'train_loss': 1.0758275070108128, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa Fold 1 OOF Jaccard: 0.70747; elapsed 344.9s



===== DeBERTa Fold 2 =====


Train/Val sizes: 19785 4946


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 325.889, 'train_samples_per_second': 182.133, 'train_steps_per_second': 2.845, 'total_flos': 3872618515098624.0, 'train_loss': 1.0792794705980417, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa Fold 2 OOF Jaccard: 0.71837; elapsed 347.2s



===== DeBERTa Fold 3 =====


Train/Val sizes: 19785 4946


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 326.9882, 'train_samples_per_second': 181.52, 'train_steps_per_second': 2.835, 'total_flos': 3872618515098624.0, 'train_loss': 1.0793483944000934, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa Fold 3 OOF Jaccard: 0.70504; elapsed 345.7s



===== DeBERTa Fold 4 =====


Train/Val sizes: 19785 4946


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 324.1724, 'train_samples_per_second': 183.097, 'train_steps_per_second': 2.86, 'total_flos': 3872618515098624.0, 'train_loss': 1.0810155693638543, 'epoch': 2.9951534733441036}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa Fold 4 OOF Jaccard: 0.71348; elapsed 343.8s


DeBERTa OOF Jaccard (all folds): 0.71202


Saved DeBERTa OOF artifacts.


DeBERTa test inference fold 0...


DeBERTa test inference fold 1...


DeBERTa test inference fold 2...


DeBERTa test inference fold 3...


DeBERTa test inference fold 4...


Wrote submission_deberta.csv Head:
     textID                                                                                       selected_text
80a1e6bc32                                                                                                wish
863097735d                                                                                   gosh today sucks!
264cd5277f             tired and didn`t really have an exciting Saturday.  oh well, hope it`s better tomorrow.
baee1e6ffc                                                              i`ve been eating cheetos all morning..
67d06a8dee  haiiii sankQ i`m fineee ima js get a checkup cos my rib hurts LOL idk but i shall be fine ~ thanks
DeBERTa run complete.


In [7]:
# Post-processing sweep on RoBERTa OOF logits; apply tuned PP to test
import numpy as np, pandas as pd, time
from transformers import AutoTokenizer

print('PP tuning on RoBERTa OOF...', flush=True)
oof_df = pd.read_csv('oof_roberta_base.csv')
oof_sl = np.load('oof_start_logits_roberta_base.npy')
oof_el = np.load('oof_end_logits_roberta_base.npy')
assert len(oof_df) == oof_sl.shape[0] == oof_el.shape[0], 'Mismatch OOF shapes'

rb_tokenizer = AutoTokenizer.from_pretrained('roberta-base', use_fast=True)

def seq_ids_offs_rb(sentiment, text, max_len=128):
    tmp = rb_tokenizer(str(sentiment), str(text), max_length=max_len, padding='max_length',
                       truncation='only_second', add_special_tokens=True, return_offsets_mapping=True)
    return tmp.sequence_ids(), tmp['offset_mapping']

def decode_with_score(sl, el, offs, seq_ids, text, sentiment, lp=0.0):
    if str(sentiment).strip().lower() == 'neutral':
        return text, 0.0
    ctx = [i for i, sid in enumerate(seq_ids) if sid == 1]
    if not ctx: return text, -1e9
    sl = np.asarray(sl, dtype=np.float32); el = np.asarray(el, dtype=np.float32)
    mask = np.zeros_like(sl, dtype=bool); mask[np.array(ctx)] = True
    sl[~mask] = -np.inf; el[~mask] = -np.inf
    best, bi, bj = -1e9, ctx[0], ctx[0]
    for i in ctx:
        for j in ctx:
            if j < i: continue
            span_len = offs[j][1] - offs[i][0]
            sc = sl[i] + el[j] - lp * float(span_len)
            if sc > best: best, bi, bj = sc, i, j
    s_char, e_char = offs[bi][0], offs[bj][1]
    pred = text[s_char:e_char].strip()
    if not pred: pred = text
    return pred, float(best)

def jaccard_mean(y_true, y_pred):
    def jac(a,b):
        sa, sb = set(str(a).split()), set(str(b).split())
        if not sa and not sb: return 1.0
        if not sa or not sb: return 0.0
        inter = len(sa & sb); union = len(sa | sb)
        return inter/union if union else 0.0
    return float(np.mean([jac(t,p) for t,p in zip(y_true, y_pred)]))

def eval_params(lp, fallback_pct=None):
    preds = []; scores = []
    t0 = time.time()
    for i in range(len(oof_df)):
        r = oof_df.iloc[i]
        seq_ids, offs = seq_ids_offs_rb(r['sentiment'], r['text'])
        pred, sc = decode_with_score(oof_sl[i], oof_el[i], offs, seq_ids, r['text'], r['sentiment'], lp=lp)
        preds.append(pred); scores.append(sc)
    if fallback_pct is not None:
        thr = np.percentile(scores, fallback_pct*100.0)
        for i in range(len(oof_df)):
            if scores[i] < thr:
                # fallback: return full text for any sentiment (neutral already full)
                preds[i] = oof_df.iloc[i]['text']
    score = jaccard_mean(oof_df['selected_text'].tolist(), preds)
    return score, preds, scores

grid_lp = [0.0, 0.002, 0.003, 0.004, 0.006, 0.008]
grid_fb = [None, 0.02, 0.05]
best = (-1, None, None)
for lp in grid_lp:
    for fb in grid_fb:
        s, _, _ = eval_params(lp, fb)
        print(f'lp={lp:.4f} fb={fb} -> OOF {s:.5f}')
        if s > best[0]: best = (s, lp, fb)
print('Best PP:', best)

# Apply best PP to test using saved per-fold test logits
test_df = pd.read_csv('test.csv')
fold_starts = []; fold_ends = []
for f in range(5):
    fold_starts.append(np.load(f'test_start_fold{f}.npy'))
    fold_ends.append(np.load(f'test_end_fold{f}.npy'))
avg_st = np.mean(fold_starts, axis=0); avg_en = np.mean(fold_ends, axis=0)

best_oof, best_lp, best_fb = best
print(f'Using tuned params: lp={best_lp} fb={best_fb}', flush=True)

test_preds = []
if best_fb is not None:
    # derive threshold from OOF distribution and reuse on test: use same percentile of OOF scores as threshold value
    _, _, oof_scores = eval_params(best_lp, None)
    thr = np.percentile(oof_scores, best_fb*100.0)
else:
    thr = None
for i in range(len(test_df)):
    text = test_df.iloc[i]['text']; sent = test_df.iloc[i]['sentiment']
    seq_ids, offs = seq_ids_offs_rb(sent, text)
    pred, sc = decode_with_score(avg_st[i], avg_en[i], offs, seq_ids, text, sent, lp=best_lp)
    if (thr is not None) and (sc < thr):
        pred = text
    test_preds.append(pred)

sub_pp = pd.DataFrame({'textID': test_df['textID'], 'selected_text': test_preds})
sub_pp.to_csv('submission_pp_roberta.csv', index=False)
print('Wrote submission_pp_roberta.csv. Head:\n', sub_pp.head().to_string(index=False))
print('PP tuning complete.')

PP tuning on RoBERTa OOF...




lp=0.0000 fb=None -> OOF 0.70694


lp=0.0000 fb=0.02 -> OOF 0.70694


lp=0.0000 fb=0.05 -> OOF 0.70694


lp=0.0020 fb=None -> OOF 0.70817


lp=0.0020 fb=0.02 -> OOF 0.70817


lp=0.0020 fb=0.05 -> OOF 0.70817


lp=0.0030 fb=None -> OOF 0.70866


lp=0.0030 fb=0.02 -> OOF 0.70866


lp=0.0030 fb=0.05 -> OOF 0.70866


lp=0.0040 fb=None -> OOF 0.70868


lp=0.0040 fb=0.02 -> OOF 0.70868


lp=0.0040 fb=0.05 -> OOF 0.70868


lp=0.0060 fb=None -> OOF 0.70854


lp=0.0060 fb=0.02 -> OOF 0.70854


lp=0.0060 fb=0.05 -> OOF 0.70854


lp=0.0080 fb=None -> OOF 0.70887


lp=0.0080 fb=0.02 -> OOF 0.70887


lp=0.0080 fb=0.05 -> OOF 0.70887
Best PP: (0.7088670288691741, 0.008, None)
Using tuned params: lp=0.008 fb=None


Wrote submission_pp_roberta.csv. Head:
     textID                                                                                       selected_text
80a1e6bc32                                                                                      I made my wish
863097735d                                                                                              sucks!
264cd5277f             tired and didn`t really have an exciting Saturday.  oh well, hope it`s better tomorrow.
baee1e6ffc                                                              i`ve been eating cheetos all morning..
67d06a8dee  haiiii sankQ i`m fineee ima js get a checkup cos my rib hurts LOL idk but i shall be fine ~ thanks
PP tuning complete.


In [8]:
# PP tuning and blending utilities (staged). Run after DeBERTa training finishes.
import os, json, time, math, numpy as np, pandas as pd
from typing import Dict, Tuple, Optional
from transformers import AutoTokenizer

print('Staging PP tuning and blending code. Do not run until all OOF/test logits are saved.', flush=True)

MAX_LEN = 128

def jaccard_str(a: str, b: str) -> float:
    sa, sb = set(str(a).split()), set(str(b).split())
    if not sa and not sb: return 1.0
    if not sa or not sb: return 0.0
    inter = len(sa & sb); union = len(sa | sb)
    return inter/union if union else 0.0

def jaccard_mean(y_true, y_pred):
    return float(np.mean([jaccard_str(t, p) for t, p in zip(y_true, y_pred)]))

def load_oof(model_key: str) -> Tuple[pd.DataFrame, np.ndarray, np.ndarray, AutoTokenizer]:
    # model_key in {'roberta_base','deberta_v3_base'}
    if model_key == 'roberta_base':
        oof_df = pd.read_csv('oof_roberta_base.csv')
        sl = np.load('oof_start_logits_roberta_base.npy'); el = np.load('oof_end_logits_roberta_base.npy')
        tok = AutoTokenizer.from_pretrained('roberta-base', use_fast=True)
    elif model_key == 'deberta_v3_base':
        oof_df = pd.read_csv('oof_deberta_v3_base.csv')
        sl = np.load('oof_start_logits_deberta_v3_base.npy'); el = np.load('oof_end_logits_deberta_v3_base.npy')
        tok = AutoTokenizer.from_pretrained('microsoft/deberta-v3-base', use_fast=True)
    else:
        raise ValueError('Unknown model_key')
    assert len(oof_df) == sl.shape[0] == el.shape[0], f'Shape mismatch for {model_key}'
    return oof_df, sl, el, tok

def get_seq_ids_offs(tok: AutoTokenizer, sentiment: str, text: str):
    tmp = tok(str(sentiment), str(text), max_length=MAX_LEN, padding='max_length', truncation='only_second',
              add_special_tokens=True, return_offsets_mapping=True)
    return tmp.sequence_ids(), tmp['offset_mapping']

def decode_with_score(sl_row: np.ndarray, el_row: np.ndarray, offs, seq_ids, text: str, sentiment: str, lp_char: float) -> Tuple[str, float]:
    # Guards and neutral rule
    if str(sentiment).strip().lower() == 'neutral':
        return text, 0.0
    ctx = [i for i, sid in enumerate(seq_ids) if sid == 1]
    if not ctx: return text, -1e9
    sl = np.asarray(sl_row, dtype=np.float32).copy()
    el = np.asarray(el_row, dtype=np.float32).copy()
    mask = np.zeros_like(sl, dtype=bool); mask[np.array(ctx)] = True
    sl[~mask] = -np.inf; el[~mask] = -np.inf
    best, bi, bj = -1e9, ctx[0], ctx[0]
    # Ignore zero-length offsets
    valid = [i for i in ctx if offs[i][1] > offs[i][0]]
    if not valid: return text, -1e9
    for i in valid:
        for j in valid:
            if j < i: continue
            span_len = offs[j][1] - offs[i][0]
            sc = float(sl[i]) + float(el[j]) - lp_char * float(span_len)
            if sc > best:
                best, bi, bj = sc, i, j
    s_char, e_char = offs[bi][0], offs[bj][1]
    pred = text[s_char:e_char].strip()
    if not pred: pred = text
    return pred, float(best)

def eval_oof_with_params(oof_df: pd.DataFrame, sl: np.ndarray, el: np.ndarray, tok: AutoTokenizer,
                          lp_by_sent: Dict[str, float], fb_pct_by_sent: Dict[str, Optional[float]]):
    preds, scores = [], []
    for i in range(len(oof_df)):
        r = oof_df.iloc[i]
        sent = str(r['sentiment']).strip().lower()
        seq_ids, offs = get_seq_ids_offs(tok, r['sentiment'], r['text'])
        lp = lp_by_sent.get(sent, 0.0)
        pred, sc = decode_with_score(sl[i], el[i], offs, seq_ids, r['text'], r['sentiment'], lp_char=lp)
        preds.append(pred); scores.append(sc)
    # Apply low-confidence fallback per sentiment for pos/neg only
    for s_key in ('positive','negative'):
        pct = fb_pct_by_sent.get(s_key, None)
        if pct is None: continue
        idxs = [i for i in range(len(oof_df)) if str(oof_df.iloc[i]['sentiment']).strip().lower()==s_key]
        if not idxs: continue
        thr = np.percentile([scores[i] for i in idxs], pct*100.0)
        for i in idxs:
            if scores[i] < thr:
                preds[i] = oof_df.iloc[i]['text']
    score = jaccard_mean(oof_df['selected_text'].tolist(), preds)
    return score, preds, scores

def tune_pp_for_model(model_key: str, lp_grid=(0.0, 0.002, 0.003, 0.004, 0.006, 0.008), fb_grid=(None, 0.02, 0.05)):
    oof_df, sl, el, tok = load_oof(model_key)
    best = (-1.0, None)
    for lp_pos in lp_grid:
        for lp_neg in lp_grid:
            lp_by_sent = {'positive': lp_pos, 'negative': lp_neg}
            for fb_pos in fb_grid:
                for fb_neg in fb_grid:
                    fb_by_sent = {'positive': fb_pos, 'negative': fb_neg}
                    s, _, _ = eval_oof_with_params(oof_df, sl, el, tok, lp_by_sent, fb_by_sent)
                    print(f'[{model_key}] lp(pos)={lp_pos:.4f} lp(neg)={lp_neg:.4f} fb(pos)={fb_pos} fb(neg)={fb_neg} -> OOF {s:.5f}')
                    if s > best[0]:
                        best = (s, (lp_by_sent, fb_by_sent))
    print(f'Best {model_key} PP:', best)
    return best, (oof_df, sl, el, tok)

def load_test_logits(model_key: str) -> Tuple[pd.DataFrame, np.ndarray, np.ndarray, AutoTokenizer]:
    test_df = pd.read_csv('test.csv')
    if model_key == 'roberta_base':
        st = np.mean([np.load(f'test_start_fold{f}.npy') for f in range(5)], axis=0)
        en = np.mean([np.load(f'test_end_fold{f}.npy') for f in range(5)], axis=0)
        tok = AutoTokenizer.from_pretrained('roberta-base', use_fast=True)
    elif model_key == 'deberta_v3_base':
        st = np.mean([np.load(f'deberta_test_start_fold{f}.npy') for f in range(5)], axis=0)
        en = np.mean([np.load(f'deberta_test_end_fold{f}.npy') for f in range(5)], axis=0)
        tok = AutoTokenizer.from_pretrained('microsoft/deberta-v3-base', use_fast=True)
    else:
        raise ValueError('Unknown model_key')
    assert st.shape[0] == len(test_df) == en.shape[0], 'Test shape mismatch'
    return test_df, st, en, tok

def apply_pp_to_test(model_key: str, lp_by_sent: Dict[str, float], fb_pct_by_sent: Dict[str, Optional[float]],
                     ref_oof_scores_by_sent: Dict[str, list], out_path: str):
    test_df, st, en, tok = load_test_logits(model_key)
    # Derive numeric thresholds from OOF score distributions per sentiment (reuse same percentiles)
    thr_by_sent: Dict[str, Optional[float]] = {}
    for s_key, pct in fb_pct_by_sent.items():
        if pct is None: thr_by_sent[s_key] = None
        else:
            vals = ref_oof_scores_by_sent.get(s_key, [])
            thr_by_sent[s_key] = float(np.percentile(vals, pct*100.0)) if len(vals) else None
    preds = []
    for i in range(len(test_df)):
        text, sent = test_df.iloc[i]['text'], test_df.iloc[i]['sentiment']
        s_key = str(sent).strip().lower()
        if s_key == 'neutral':
            preds.append(text); continue
        seq_ids, offs = get_seq_ids_offs(tok, sent, text)
        pred, sc = decode_with_score(st[i], en[i], offs, seq_ids, text, sent, lp_char=lp_by_sent.get(s_key, 0.0))
        thr = thr_by_sent.get(s_key, None)
        if (thr is not None) and (sc < thr): pred = text
        preds.append(pred)
    sub = pd.DataFrame({'textID': test_df['textID'], 'selected_text': preds})
    sub.to_csv(out_path, index=False)
    print('Wrote', out_path, 'Head:\n', sub.head().to_string(index=False))

def try_blend_oof(w: float,
                  oof_rb: Tuple[pd.DataFrame, np.ndarray, np.ndarray, AutoTokenizer],
                  oof_deb: Tuple[pd.DataFrame, np.ndarray, np.ndarray, AutoTokenizer],
                  lp_grid, fb_grid):
    df_r, sl_r, el_r, tok_r = oof_rb
    df_d, sl_d, el_d, tok_d = oof_deb
    # Sanity: ordering must match; we created folds deterministically so row order of OOFs should match original train_clean order
    assert len(df_r)==len(df_d), 'OOF length mismatch between models'
    # Use tokenizer of one backbone for decoding; we must use the same tokenization setup we will use at test time.
    # Decode with each model's tokenizer produces slightly different offsets; to avoid mismatch, decode using each model's tokenizer separately is not feasible when blending logits.
    # Instead, blend logits at token index level requires identical tokenization, which we do not have across backbones.
    # Therefore, we blend at score level by aligning via recomputing logits per model during test; for OOF, we approximate by averaging per-position logits after re-tokenization is NOT possible.
    # Pragmatic approach: align by using the model whose tokenizer we will decode with for both (not strictly aligned across models).
    # Empirically, blending raw logits from different tokenizers is not valid; instead, we will blend predictions at span-score level:
    # For each candidate span (i,j) over the chosen tokenizer, compute score as w*s_d + (1-w)*s_r, where s_* = start[i] + end[j].
    # To support that, we need start/end logits from both models mapped to the same tokenization. Without re-running models, we cannot remap.
    # Hence: switch to submission-time blend by averaging decoded spans is suboptimal; Experts advised blending logits before decode within same tokenization.
    # Conclusion: Do NOT attempt cross-tokenizer logit blending offline here. We'll perform blending per-backbone separately and pick DeBERTa as primary,
    # and provide optional simple text-level blend (choose shorter/longer span) only if needed.
    raise NotImplementedError('Cross-backbone logit blending requires identical tokenization; stage left as future work with re-forward if needed.')

if __name__ == '__main__':
    print(' Guidance: ',
          '\n1) Run tune_pp_for_model("roberta_base") to get best params and OOF scores per sentiment. Save JSON.',
          '\n2) After DeBERTa finishes and its OOF logits exist, run tune_pp_for_model("deberta_v3_base").',
          '\n3) Apply to test with apply_pp_to_test using thresholds derived from respective OOF.',
          '\n4) For cross-backbone blending at logit level, re-forward both models on test using a shared tokenization is required; otherwise, prefer DeBERTa-only for primary submission.', flush=True)
    # Example manual run after training completes:
    # best_rb, pack_rb = tune_pp_for_model('roberta_base')
    # (score_rb, (lp_rb, fb_rb)) = best_rb
    # # Derive OOF score distributions per sentiment for thresholds
    # oof_df_rb, sl_rb, el_rb, tok_rb = pack_rb
    # _, preds_rb, scores_rb = eval_oof_with_params(oof_df_rb, sl_rb, el_rb, tok_rb, lp_rb, fb_rb)
    # scores_by_sent_rb = {'positive': [scores_rb[i] for i in range(len(oof_df_rb)) if str(oof_df_rb.iloc[i]['sentiment']).strip().lower()=='positive'],
    #                      'negative': [scores_rb[i] for i in range(len(oof_df_rb)) if str(oof_df_rb.iloc[i]['sentiment']).strip().lower()=='negative']}
    # with open('pp_params_roberta.json','w') as f: json.dump({'lp': lp_rb, 'fb': fb_rb, 'oof_score': score_rb}, f)
    # apply_pp_to_test('roberta_base', lp_rb, fb_rb, scores_by_sent_rb, 'submission_pp_roberta.csv')

    # # After DeBERTa OOF exists:
    # best_deb, pack_deb = tune_pp_for_model('deberta_v3_base')
    # (score_deb, (lp_deb, fb_deb)) = best_deb
    # oof_df_deb, sl_deb, el_deb, tok_deb = pack_deb
    # _, preds_deb, scores_deb = eval_oof_with_params(oof_df_deb, sl_deb, el_deb, tok_deb, lp_deb, fb_deb)
    # scores_by_sent_deb = {'positive': [scores_deb[i] for i in range(len(oof_df_deb)) if str(oof_df_deb.iloc[i]['sentiment']).strip().lower()=='positive'],
    #                       'negative': [scores_deb[i] for i in range(len(oof_df_deb)) if str(oof_df_deb.iloc[i]['sentiment']).strip().lower()=='negative']}
    # with open('pp_params_deberta.json','w') as f: json.dump({'lp': lp_deb, 'fb': fb_deb, 'oof_score': score_deb}, f)
    # apply_pp_to_test('deberta_v3_base', lp_deb, fb_deb, scores_by_sent_deb, 'submission_pp_deberta.csv')

    print('PP utilities ready.')

Staging PP tuning and blending code. Do not run until all OOF/test logits are saved.


 Guidance:  
1) Run tune_pp_for_model("roberta_base") to get best params and OOF scores per sentiment. Save JSON. 
2) After DeBERTa finishes and its OOF logits exist, run tune_pp_for_model("deberta_v3_base"). 
3) Apply to test with apply_pp_to_test using thresholds derived from respective OOF. 
4) For cross-backbone blending at logit level, re-forward both models on test using a shared tokenization is required; otherwise, prefer DeBERTa-only for primary submission.


PP utilities ready.


In [9]:
# Run PP tuning for DeBERTa, derive per-sentiment thresholds, and write submission
import json, numpy as np, pandas as pd
print('Tuning PP for DeBERTa-v3-base...', flush=True)
best_deb, pack_deb = tune_pp_for_model('deberta_v3_base')
(score_deb, (lp_deb, fb_deb)) = best_deb
print('Best DeBERTa OOF:', score_deb, 'params:', lp_deb, fb_deb)
oof_df_deb, sl_deb, el_deb, tok_deb = pack_deb
# Get OOF preds and scores to calibrate thresholds
oof_score_eval, preds_deb, scores_deb = eval_oof_with_params(oof_df_deb, sl_deb, el_deb, tok_deb, lp_deb, fb_deb)
scores_by_sent_deb = {
    'positive': [scores_deb[i] for i in range(len(oof_df_deb)) if str(oof_df_deb.iloc[i]['sentiment']).strip().lower()=='positive'],
    'negative': [scores_deb[i] for i in range(len(oof_df_deb)) if str(oof_df_deb.iloc[i]['sentiment']).strip().lower()=='negative']
}
with open('pp_params_deberta.json','w') as f:
    json.dump({'lp': lp_deb, 'fb': fb_deb, 'oof_score': float(score_deb)}, f)
print('Saved pp_params_deberta.json')
apply_pp_to_test('deberta_v3_base', lp_deb, fb_deb, scores_by_sent_deb, 'submission_pp_deberta.csv')
print('DeBERTa PP tuning + submission complete.')

# Optionally, save DeBERTa OOF decoded predictions for error analysis
pd.DataFrame({
    'textID': oof_df_deb['textID'],
    'sentiment': oof_df_deb['sentiment'],
    'text': oof_df_deb['text'],
    'selected_text': oof_df_deb['selected_text'],
    'pred': preds_deb
}).to_csv('oof_preds_deberta_pp.csv', index=False)
print('Saved oof_preds_deberta_pp.csv')

Tuning PP for DeBERTa-v3-base...




[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=None fb(neg)=None -> OOF 0.71116


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.02 -> OOF 0.71089


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.05 -> OOF 0.71039


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=None -> OOF 0.71104


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71077


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71026


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=None -> OOF 0.71047


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71021


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.70970


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=None fb(neg)=None -> OOF 0.71153


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.02 -> OOF 0.71129


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.05 -> OOF 0.71077


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=None -> OOF 0.71141


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71117


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71065


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=None -> OOF 0.71085


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71061


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71009


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=None fb(neg)=None -> OOF 0.71147


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.02 -> OOF 0.71125


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.05 -> OOF 0.71068


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=None -> OOF 0.71134


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71113


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71056


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=None -> OOF 0.71078


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71056


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.70999


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=None fb(neg)=None -> OOF 0.71144


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.02 -> OOF 0.71123


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.05 -> OOF 0.71073


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=None -> OOF 0.71132


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71111


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71060


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=None -> OOF 0.71076


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71054


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71004


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=None fb(neg)=None -> OOF 0.71111


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.02 -> OOF 0.71093


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.05 -> OOF 0.71052


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=None -> OOF 0.71099


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71081


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71040


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=None -> OOF 0.71043


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71024


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.70984


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=None fb(neg)=None -> OOF 0.71117


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.02 -> OOF 0.71087


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.05 -> OOF 0.71049


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=None -> OOF 0.71105


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71075


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71037


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=None -> OOF 0.71048


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71019


[deberta_v3_base] lp(pos)=0.0000 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.70981


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=None fb(neg)=None -> OOF 0.71145


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.02 -> OOF 0.71119


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.05 -> OOF 0.71068


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=None -> OOF 0.71131


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71104


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71054


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=None -> OOF 0.71059


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71032


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.70982


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=None fb(neg)=None -> OOF 0.71183


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.02 -> OOF 0.71159


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.05 -> OOF 0.71107


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=None -> OOF 0.71168


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71144


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71092


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=None -> OOF 0.71096


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71072


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71020


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=None fb(neg)=None -> OOF 0.71176


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.02 -> OOF 0.71154


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.05 -> OOF 0.71097


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=None -> OOF 0.71162


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71140


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71083


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=None -> OOF 0.71090


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71068


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71011


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=None fb(neg)=None -> OOF 0.71174


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.02 -> OOF 0.71152


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.05 -> OOF 0.71102


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=None -> OOF 0.71160


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71138


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71088


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=None -> OOF 0.71088


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71066


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71016


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=None fb(neg)=None -> OOF 0.71141


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.02 -> OOF 0.71122


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.05 -> OOF 0.71082


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=None -> OOF 0.71127


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71108


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71068


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=None -> OOF 0.71055


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71036


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.70996


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=None fb(neg)=None -> OOF 0.71146


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.02 -> OOF 0.71117


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.05 -> OOF 0.71078


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=None -> OOF 0.71132


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71103


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71064


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=None -> OOF 0.71060


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71031


[deberta_v3_base] lp(pos)=0.0020 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.70992


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=None fb(neg)=None -> OOF 0.71171


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.02 -> OOF 0.71145


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.05 -> OOF 0.71094


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=None -> OOF 0.71159


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71132


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71082


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=None -> OOF 0.71088


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71061


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71010


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=None fb(neg)=None -> OOF 0.71209


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.02 -> OOF 0.71185


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.05 -> OOF 0.71133


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=None -> OOF 0.71196


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71172


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71120


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=None -> OOF 0.71125


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71101


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71049


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=None fb(neg)=None -> OOF 0.71202


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.02 -> OOF 0.71180


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.05 -> OOF 0.71123


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=None -> OOF 0.71190


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71168


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71111


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=None -> OOF 0.71118


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71097


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71040


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=None fb(neg)=None -> OOF 0.71200


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.02 -> OOF 0.71178


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.05 -> OOF 0.71128


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=None -> OOF 0.71188


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71166


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71116


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=None -> OOF 0.71116


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71095


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71044


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=None fb(neg)=None -> OOF 0.71167


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.02 -> OOF 0.71148


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.05 -> OOF 0.71108


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=None -> OOF 0.71154


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71136


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71096


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=None -> OOF 0.71083


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71064


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71024


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=None fb(neg)=None -> OOF 0.71172


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.02 -> OOF 0.71143


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.05 -> OOF 0.71104


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=None -> OOF 0.71160


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71130


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71092


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=None -> OOF 0.71089


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71059


[deberta_v3_base] lp(pos)=0.0030 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71021


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=None fb(neg)=None -> OOF 0.71187


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.02 -> OOF 0.71160


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.05 -> OOF 0.71110


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=None -> OOF 0.71179


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71152


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71101


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=None -> OOF 0.71101


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71074


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71024


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=None fb(neg)=None -> OOF 0.71224


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.02 -> OOF 0.71200


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.05 -> OOF 0.71148


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=None -> OOF 0.71216


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71192


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71140


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=None -> OOF 0.71138


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71114


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71062


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=None fb(neg)=None -> OOF 0.71218


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.02 -> OOF 0.71196


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.05 -> OOF 0.71139


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=None -> OOF 0.71209


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71188


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71131


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=None -> OOF 0.71132


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71110


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71053


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=None fb(neg)=None -> OOF 0.71216


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.02 -> OOF 0.71194


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.05 -> OOF 0.71144


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=None -> OOF 0.71207


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71186


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71135


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=None -> OOF 0.71129


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71108


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71058


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=None fb(neg)=None -> OOF 0.71183


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.02 -> OOF 0.71164


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.05 -> OOF 0.71124


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=None -> OOF 0.71174


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71156


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71115


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=None -> OOF 0.71096


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71078


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71038


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=None fb(neg)=None -> OOF 0.71188


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.02 -> OOF 0.71159


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.05 -> OOF 0.71120


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=None -> OOF 0.71180


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71150


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71112


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=None -> OOF 0.71102


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71072


[deberta_v3_base] lp(pos)=0.0040 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71034


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=None fb(neg)=None -> OOF 0.71185


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.02 -> OOF 0.71159


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.05 -> OOF 0.71108


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=None -> OOF 0.71159


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71132


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71081


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=None -> OOF 0.71094


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71067


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71016


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=None fb(neg)=None -> OOF 0.71223


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.02 -> OOF 0.71199


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.05 -> OOF 0.71147


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=None -> OOF 0.71196


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71172


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71120


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=None -> OOF 0.71131


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71107


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71055


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=None fb(neg)=None -> OOF 0.71216


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.02 -> OOF 0.71194


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.05 -> OOF 0.71137


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=None -> OOF 0.71190


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71168


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71111


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=None -> OOF 0.71125


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71103


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71046


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=None fb(neg)=None -> OOF 0.71214


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.02 -> OOF 0.71192


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.05 -> OOF 0.71142


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=None -> OOF 0.71187


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71166


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71115


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=None -> OOF 0.71122


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71101


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71050


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=None fb(neg)=None -> OOF 0.71181


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.02 -> OOF 0.71162


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.05 -> OOF 0.71122


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=None -> OOF 0.71154


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71136


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71095


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=None -> OOF 0.71089


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71071


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71030


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=None fb(neg)=None -> OOF 0.71186


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.02 -> OOF 0.71157


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.05 -> OOF 0.71118


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=None -> OOF 0.71160


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71130


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71092


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=None -> OOF 0.71095


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71065


[deberta_v3_base] lp(pos)=0.0060 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71027


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=None fb(neg)=None -> OOF 0.71201


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.02 -> OOF 0.71175


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=None fb(neg)=0.05 -> OOF 0.71124


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=None -> OOF 0.71173


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71146


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71095


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=None -> OOF 0.71110


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71083


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0000 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71032


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=None fb(neg)=None -> OOF 0.71239


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.02 -> OOF 0.71215


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=None fb(neg)=0.05 -> OOF 0.71163


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=None -> OOF 0.71210


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71186


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71134


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=None -> OOF 0.71147


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71123


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0020 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71071


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=None fb(neg)=None -> OOF 0.71232


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.02 -> OOF 0.71210


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=None fb(neg)=0.05 -> OOF 0.71153


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=None -> OOF 0.71203


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71182


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71125


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=None -> OOF 0.71140


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71119


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0030 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71062


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=None fb(neg)=None -> OOF 0.71230


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.02 -> OOF 0.71208


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=None fb(neg)=0.05 -> OOF 0.71158


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=None -> OOF 0.71201


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71180


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71129


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=None -> OOF 0.71138


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71117


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0040 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71066


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=None fb(neg)=None -> OOF 0.71197


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.02 -> OOF 0.71178


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=None fb(neg)=0.05 -> OOF 0.71138


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=None -> OOF 0.71168


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71149


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71109


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=None -> OOF 0.71105


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71087


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0060 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71046


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=None fb(neg)=None -> OOF 0.71202


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.02 -> OOF 0.71173


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=None fb(neg)=0.05 -> OOF 0.71134


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=None -> OOF 0.71174


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.02 -> OOF 0.71144


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=0.02 fb(neg)=0.05 -> OOF 0.71106


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=None -> OOF 0.71111


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.02 -> OOF 0.71081


[deberta_v3_base] lp(pos)=0.0080 lp(neg)=0.0080 fb(pos)=0.05 fb(neg)=0.05 -> OOF 0.71043
Best deberta_v3_base PP: (0.712386302687562, ({'positive': 0.008, 'negative': 0.002}, {'positive': None, 'negative': None}))
Best DeBERTa OOF: 0.712386302687562 params: {'positive': 0.008, 'negative': 0.002} {'positive': None, 'negative': None}


Saved pp_params_deberta.json




Wrote submission_pp_deberta.csv Head:
     textID                                                                                       selected_text
80a1e6bc32                                                                                                wish
863097735d                                                                                   gosh today sucks!
264cd5277f             tired and didn`t really have an exciting Saturday.  oh well, hope it`s better tomorrow.
baee1e6ffc                                                              i`ve been eating cheetos all morning..
67d06a8dee  haiiii sankQ i`m fineee ima js get a checkup cos my rib hurts LOL idk but i shall be fine ~ thanks
DeBERTa PP tuning + submission complete.
Saved oof_preds_deberta_pp.csv


In [10]:
# Set final submission from DeBERTa PP
import pandas as pd, os
src = 'submission_pp_deberta.csv'
dst = 'submission.csv'
assert os.path.exists(src), f'Missing {src}. Run DeBERTa PP tuning first.'
pd.read_csv(src).to_csv(dst, index=False)
print(f'Wrote {dst} from {src}. Head:')
print(pd.read_csv(dst).head().to_string(index=False))

Wrote submission.csv from submission_pp_deberta.csv. Head:
    textID                                                                                       selected_text
80a1e6bc32                                                                                                wish
863097735d                                                                                   gosh today sucks!
264cd5277f             tired and didn`t really have an exciting Saturday.  oh well, hope it`s better tomorrow.
baee1e6ffc                                                              i`ve been eating cheetos all morning..
67d06a8dee  haiiii sankQ i`m fineee ima js get a checkup cos my rib hurts LOL idk but i shall be fine ~ thanks


In [11]:
# OOF-calibrated chooser ensemble: pick model with higher standardized span score per row
import numpy as np, pandas as pd, json
from transformers import AutoTokenizer

print('Building chooser ensemble (OOF-calibrated z-scores)...', flush=True)

# Load OOF for both models and decode with tuned PP to get per-row scores
rb_oof = pd.read_csv('oof_roberta_base.csv')
rb_sl = np.load('oof_start_logits_roberta_base.npy')
rb_el = np.load('oof_end_logits_roberta_base.npy')
rb_tok = AutoTokenizer.from_pretrained('roberta-base', use_fast=True)

deb_oof = pd.read_csv('oof_deberta_v3_base.csv')
deb_sl = np.load('oof_start_logits_deberta_v3_base.npy')
deb_el = np.load('oof_end_logits_deberta_v3_base.npy')
deb_tok = AutoTokenizer.from_pretrained('microsoft/deberta-v3-base', use_fast=True)

def seq_ids_offs(tok, sentiment, text, max_len=128):
    tmp = tok(str(sentiment), str(text), max_length=max_len, padding='max_length', truncation='only_second',
               add_special_tokens=True, return_offsets_mapping=True)
    return tmp.sequence_ids(), tmp['offset_mapping']

def decode_score(sl, el, offs, seq_ids, text, sentiment, lp):
    if str(sentiment).strip().lower()=='neutral':
        return text, 0.0
    ctx = [i for i,sid in enumerate(seq_ids) if sid==1]
    if not ctx: return text, -1e9
    sl = np.asarray(sl, dtype=np.float32); el = np.asarray(el, dtype=np.float32)
    mask = np.zeros_like(sl, dtype=bool); mask[np.array(ctx)] = True
    sl[~mask] = -np.inf; el[~mask] = -np.inf
    best, bi, bj = -1e9, ctx[0], ctx[0]
    # ignore zero-length offsets
    valid = [i for i in ctx if offs[i][1] > offs[i][0]]
    if not valid: return text, -1e9
    for i in valid:
        for j in valid:
            if j < i: continue
            span_len = offs[j][1] - offs[i][0]
            sc = float(sl[i]) + float(el[j]) - float(lp) * float(span_len)
            if sc > best: best, bi, bj = sc, i, j
    s_char, e_char = offs[bi][0], offs[bj][1]
    pred = text[s_char:e_char].strip()
    if not pred: pred = text
    return pred, float(best)

# Tuned PP params discovered earlier
rb_lp = 0.008  # from Cell 7 best
deb_lp_by_sent = {'positive': 0.008, 'negative': 0.002}  # from Cell 9

def jaccard(a,b):
    sa, sb = set(str(a).split()), set(str(b).split())
    if not sa and not sb: return 1.0
    if not sa or not sb: return 0.0
    inter = len(sa & sb); union = len(sa | sb)
    return inter/union if union else 0.0

# Compute OOF per-row scores for both models
assert len(rb_oof)==len(deb_oof), 'OOF length mismatch; cannot align chooser reliably'
N = len(rb_oof)
rb_scores = np.zeros(N, dtype=np.float32)
deb_scores = np.zeros(N, dtype=np.float32)
rb_preds = ['']*N
deb_preds = ['']*N
trues = rb_oof['selected_text'].astype(str).tolist()
for i in range(N):
    # RoBERTa
    text = rb_oof.iloc[i]['text']; sent = rb_oof.iloc[i]['sentiment']
    sids, offs = seq_ids_offs(rb_tok, sent, text)
    p, sc = decode_score(rb_sl[i], rb_el[i], offs, sids, text, sent, rb_lp)
    rb_scores[i] = sc; rb_preds[i] = p
    # DeBERTa
    text2 = deb_oof.iloc[i]['text']; sent2 = deb_oof.iloc[i]['sentiment']
    sids2, offs2 = seq_ids_offs(deb_tok, sent2, text2)
    lp = deb_lp_by_sent.get(str(sent2).strip().lower(), 0.0)
    p2, sc2 = decode_score(deb_sl[i], deb_el[i], offs2, sids2, text2, sent2, lp)
    deb_scores[i] = sc2; deb_preds[i] = p2

# Standardize scores per sentiment and model using OOF
sents = rb_oof['sentiment'].astype(str).str.lower().tolist()
stats_rb = {}; stats_deb = {}
for s in ['positive','negative']:
    idx = [i for i,x in enumerate(sents) if x==s]
    if idx:
        mu_rb = float(np.mean(rb_scores[idx])); sd_rb = float(np.std(rb_scores[idx]) + 1e-6)
        mu_deb = float(np.mean(deb_scores[idx])); sd_deb = float(np.std(deb_scores[idx]) + 1e-6)
        stats_rb[s] = (mu_rb, sd_rb); stats_deb[s] = (mu_deb, sd_deb)

def zscore(sc, mu_sd):
    if mu_sd is None: return sc
    mu, sd = mu_sd; return (sc - mu) / sd

# Evaluate chooser on OOF: pick model with higher z-score
chooser_preds = []
for i in range(N):
    s = sents[i]
    if s=='neutral':
        chooser_preds.append(rb_oof.iloc[i]['text']);
        continue
    z_rb = zscore(rb_scores[i], stats_rb.get(s, None))
    z_deb = zscore(deb_scores[i], stats_deb.get(s, None))
    chooser_preds.append(deb_preds[i] if z_deb >= z_rb else rb_preds[i])
oof_chooser = float(np.mean([jaccard(t,p) for t,p in zip(trues, chooser_preds)]))
print(f'Chooser OOF Jaccard: {oof_chooser:.5f}')

# Apply chooser to test
test_df = pd.read_csv('test.csv')
rb_test_st = np.mean([np.load(f'test_start_fold{f}.npy') for f in range(5)], axis=0)
rb_test_en = np.mean([np.load(f'test_end_fold{f}.npy') for f in range(5)], axis=0)
deb_test_st = np.mean([np.load(f'deberta_test_start_fold{f}.npy') for f in range(5)], axis=0)
deb_test_en = np.mean([np.load(f'deberta_test_end_fold{f}.npy') for f in range(5)], axis=0)

test_preds = []
for i in range(len(test_df)):
    text = test_df.iloc[i]['text']; sent = test_df.iloc[i]['sentiment']
    s_key = str(sent).strip().lower()
    if s_key=='neutral':
        test_preds.append(text); continue
    # RoBERTa decode
    sids, offs = seq_ids_offs(rb_tok, sent, text)
    p_rb, sc_rb = decode_score(rb_test_st[i], rb_test_en[i], offs, sids, text, sent, rb_lp)
    # DeBERTa decode
    sids2, offs2 = seq_ids_offs(deb_tok, sent, text)
    lp = deb_lp_by_sent.get(s_key, 0.0)
    p_deb, sc_deb = decode_score(deb_test_st[i], deb_test_en[i], offs2, sids2, text, sent, lp)
    # standardize using OOF stats
    z_rb = zscore(sc_rb, stats_rb.get(s_key, None))
    z_deb = zscore(sc_deb, stats_deb.get(s_key, None))
    test_preds.append(p_deb if z_deb >= z_rb else p_rb)

sub_ens = pd.DataFrame({'textID': test_df['textID'], 'selected_text': test_preds})
sub_ens.to_csv('submission_ensemble.csv', index=False)
print('Wrote submission_ensemble.csv. Head:\n', sub_ens.head().to_string(index=False))

# Optionally set as final submission
# pd.read_csv('submission_ensemble.csv').to_csv('submission.csv', index=False)
# print('Updated submission.csv with chooser ensemble')

Building chooser ensemble (OOF-calibrated z-scores)...






Chooser OOF Jaccard: 0.71106


Wrote submission_ensemble.csv. Head:
     textID                                                                                       selected_text
80a1e6bc32                                                                                      I made my wish
863097735d                                                                                              sucks!
264cd5277f             tired and didn`t really have an exciting Saturday.  oh well, hope it`s better tomorrow.
baee1e6ffc                                                              i`ve been eating cheetos all morning..
67d06a8dee  haiiii sankQ i`m fineee ima js get a checkup cos my rib hurts LOL idk but i shall be fine ~ thanks


In [12]:
# DeBERTa-v3-large 5-fold training (priority) with OOF/test logits caching
import os, gc, math, json, time, numpy as np, pandas as pd, torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, Trainer, TrainingArguments, default_data_collator

MODEL_NAME = 'microsoft/deberta-v3-large'
print('Loading tokenizer/model:', MODEL_NAME, flush=True)
tok_large = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
MAX_LEN = 128
BATCH_SIZE = 12
GRAD_ACCUM = 6  # effective batch ~72
EPOCHS = 2
LR = 1e-5
WARMUP_RATIO = 0.1
WEIGHT_DECAY = 0.01
GRAD_CLIP = 1.0

def build_encodings_df_large(df: pd.DataFrame, include_labels: bool = True):
    enc_list = []
    for _, r in df.iterrows():
        sel = r['selected_text'] if include_labels else None
        enc, _ = tokenize_and_align(tok_large, r['text'], r['sentiment'], sel, max_len=MAX_LEN)
        enc['text'] = r['text']
        enc['sentiment'] = r['sentiment']
        enc_list.append(enc)
    keys = enc_list[0].keys()
    out = {}
    for k in keys:
        vals = [e[k] for e in enc_list]
        if k in ('text','sentiment'): out[k] = vals
        else: out[k] = np.array(vals, dtype=object if k=='offset_mapping' else None)
    return out

class QADatasetLarge(torch.utils.data.Dataset):
    def __init__(self, enc, with_labels): self.enc, self.with_labels = enc, with_labels
    def __len__(self): return len(self.enc['input_ids'])
    def __getitem__(self, idx):
        item = {
            'input_ids': torch.tensor(self.enc['input_ids'][idx], dtype=torch.long),
            'attention_mask': torch.tensor(self.enc['attention_mask'][idx], dtype=torch.long),
        }
        if self.with_labels:
            item['start_positions'] = torch.tensor(self.enc['start_positions'][idx], dtype=torch.long)
            item['end_positions'] = torch.tensor(self.enc['end_positions'][idx], dtype=torch.long)
        return item

def get_seq_ids_offsets_large(sentiment: str, text: str):
    tmp = tok_large(str(sentiment), str(text), max_length=MAX_LEN, padding='max_length', truncation='only_second',
                    add_special_tokens=True, return_offsets_mapping=True)
    return tmp.sequence_ids(), tmp['offset_mapping']

def decode_span_large(start_logits, end_logits, offsets, sequence_ids, text, sentiment, length_penalty_per_char=0.003):
    if str(sentiment).strip().lower() == 'neutral': return text
    ctx = [i for i, sid in enumerate(sequence_ids) if sid == 1]
    if not ctx: return text
    sl = np.asarray(start_logits, dtype=np.float32); el = np.asarray(end_logits, dtype=np.float32)
    mask = np.zeros_like(sl, dtype=bool); mask[np.array(ctx)] = True
    sl[~mask] = -np.inf; el[~mask] = -np.inf
    best, bi, bj = -1e9, ctx[0], ctx[0]
    valid = [i for i in ctx if offsets[i][1] > offsets[i][0]]
    if not valid: return text
    for i in valid:
        for j in valid:
            if j < i: continue
            span_len = offsets[j][1] - offsets[i][0]
            sc = float(sl[i]) + float(el[j]) - length_penalty_per_char * float(span_len)
            if sc > best: best, bi, bj = sc, i, j
    s_char, e_char = offsets[bi][0], offsets[bj][1]
    pred = text[s_char:e_char].strip()
    return pred if pred else text

def jaccard_batch_fast(trues, preds):
    def jac(a,b):
        sa, sb = set(str(a).split()), set(str(b).split())
        if not sa and not sb: return 1.0
        if not sa or not sb: return 0.0
        inter = len(sa & sb); union = len(sa | sb)
        return inter/union if union else 0.0
    return float(np.mean([jac(t,p) for t,p in zip(trues, preds)]))

oof_rows_l, start_logits_folds_l, end_logits_folds_l = [], [], []
for fold in range(5):
    t0 = time.time(); print(f'\n===== DeBERTa-v3-large Fold {fold} =====', flush=True)
    trn_df = train_clean[train_clean.fold != fold].reset_index(drop=True)
    val_df = train_clean[train_clean.fold == fold].reset_index(drop=True)
    print('Train/Val sizes:', len(trn_df), len(val_df))

    trn_enc = build_encodings_df_large(trn_df, include_labels=True)
    val_enc = build_encodings_df_large(val_df, include_labels=True)
    train_ds = QADatasetLarge(trn_enc, with_labels=True)
    val_ds = QADatasetLarge(val_enc, with_labels=True)

    model = AutoModelForQuestionAnswering.from_pretrained(MODEL_NAME)

    def compute_metrics(eval_pred):
        start_logits, end_logits = eval_pred.predictions
        preds, trues = [], list(val_df['selected_text'].astype(str).values)
        for i in range(len(val_df)):
            text, sentiment = val_df.iloc[i]['text'], val_df.iloc[i]['sentiment']
            seq_ids, offs = get_seq_ids_offsets_large(sentiment, text)
            preds.append(decode_span_large(start_logits[i], end_logits[i], offs, seq_ids, text, sentiment))
        return {'jaccard': jaccard_batch_fast(trues, preds)}

    args = TrainingArguments(
        output_dir=f'./outputs_{MODEL_NAME.replace("/","_")}_fold{fold}',
        evaluation_strategy='epoch', save_strategy='epoch',
        load_best_model_at_end=True, metric_for_best_model='jaccard', greater_is_better=True,
        per_device_train_batch_size=BATCH_SIZE, per_device_eval_batch_size=BATCH_SIZE*2,
        gradient_accumulation_steps=GRAD_ACCUM, num_train_epochs=EPOCHS, fp16=True,
        learning_rate=LR, weight_decay=WEIGHT_DECAY, warmup_ratio=WARMUP_RATIO, lr_scheduler_type='cosine',
        max_grad_norm=GRAD_CLIP, dataloader_num_workers=2, logging_steps=50, save_total_limit=1, seed=SEED, report_to=[]
    )

    trainer = Trainer(model=model, args=args, train_dataset=train_ds, eval_dataset=val_ds,
                      tokenizer=tok_large, data_collator=default_data_collator, compute_metrics=compute_metrics)
    train_out = trainer.train()
    print('Best model metrics:', train_out.metrics, flush=True)
    save_dir = f'deberta_large_fold{fold}_best'
    trainer.save_model(save_dir)
    if getattr(trainer.state, 'best_model_checkpoint', None):
        with open(os.path.join(save_dir, 'path.txt'), 'w') as f: f.write(trainer.state.best_model_checkpoint)

    val_preds = trainer.predict(val_ds)
    vsl, vel = val_preds.predictions
    start_logits_folds_l.append(vsl); end_logits_folds_l.append(vel)

    trues = list(val_df['selected_text'].astype(str).values)
    preds = []
    for i in range(len(val_df)):
        text, sentiment = val_df.iloc[i]['text'], val_df.iloc[i]['sentiment']
        seq_ids, offs = get_seq_ids_offsets_large(sentiment, text)
        preds.append(decode_span_large(vsl[i], vel[i], offs, seq_ids, text, sentiment))
        oof_rows_l.append({
            'textID': val_df.iloc[i]['textID'], 'fold': fold, 'sentiment': sentiment, 'text': text,
            'selected_text': trues[i], 'pred': preds[-1]
        })
    fj = jaccard_batch_fast(trues, preds)
    print(f'DeBERTa-v3-large Fold {fold} OOF Jaccard: {fj:.5f}; elapsed {time.time()-t0:.1f}s', flush=True)

    del trainer, model, train_ds, val_ds, trn_enc, val_enc
    gc.collect(); torch.cuda.empty_cache()

oof_df_l = pd.DataFrame(oof_rows_l)
oof_score_l = jaccard_batch_fast(oof_df_l['selected_text'].tolist(), oof_df_l['pred'].tolist())
print(f'DeBERTa-v3-large OOF Jaccard (all folds): {oof_score_l:.5f}', flush=True)
oof_df_l.to_csv('oof_deberta_v3_large.csv', index=False)
np.save('oof_start_logits_deberta_v3_large.npy', np.concatenate(start_logits_folds_l, axis=0))
np.save('oof_end_logits_deberta_v3_large.npy', np.concatenate(end_logits_folds_l, axis=0))
print('Saved DeBERTa-v3-large OOF artifacts.')

# Test inference and per-fold logits cache
test_df = test.copy().reset_index(drop=True)
test_enc = [tok_large(str(r['sentiment']), str(r['text']), max_length=MAX_LEN, padding='max_length',
                     truncation='only_second', add_special_tokens=True, return_offsets_mapping=True, return_attention_mask=True)
            for _, r in test_df.iterrows()]
test_input_ids = torch.tensor([e['input_ids'] for e in test_enc], dtype=torch.long)
test_attention_mask = torch.tensor([e['attention_mask'] for e in test_enc], dtype=torch.long)
fold_starts, fold_ends = [], []
for fold in range(5):
    print(f'DeBERTa-v3-large test inference fold {fold}...', flush=True)
    model = AutoModelForQuestionAnswering.from_pretrained(f'deberta_large_fold{fold}_best').to(device); model.eval()
    with torch.no_grad():
        bs = BATCH_SIZE
        starts, ends = [], []
        for i in range(0, len(test_df), bs):
            out = model(input_ids=test_input_ids[i:i+bs].to(device), attention_mask=test_attention_mask[i:i+bs].to(device))
            starts.append(out.start_logits.detach().cpu().numpy()); ends.append(out.end_logits.detach().cpu().numpy())
        starts, ends = np.vstack(starts), np.vstack(ends)
    np.save(f'deberta_large_test_start_fold{fold}.npy', starts); np.save(f'deberta_large_test_end_fold{fold}.npy', ends)
    fold_starts.append(starts); fold_ends.append(ends)
    del model; gc.collect(); torch.cuda.empty_cache()

avg_st = np.mean(fold_starts, axis=0); avg_en = np.mean(fold_ends, axis=0)
test_preds = []
for i in range(len(test_df)):
    text, sentiment = test_df.iloc[i]['text'], test_df.iloc[i]['sentiment']
    seq_ids, offs = get_seq_ids_offsets_large(sentiment, text)
    test_preds.append(decode_span_large(avg_st[i], avg_en[i], offs, seq_ids, text, sentiment))
sub_l = pd.DataFrame({'textID': test_df['textID'], 'selected_text': test_preds})
sub_l.to_csv('submission_deberta_large.csv', index=False)
print('Wrote submission_deberta_large.csv Head:\n', sub_l.head().to_string(index=False))
print('DeBERTa-v3-large run complete.')

Loading tokenizer/model: microsoft/deberta-v3-large



===== DeBERTa-v3-large Fold 0 =====




Train/Val sizes: 19784 4947


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-large and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 652.0877, 'train_samples_per_second': 60.679, 'train_steps_per_second': 0.84, 'total_flos': 9159899115988992.0, 'train_loss': 1.0989406474315337, 'epoch': 1.9939357186173439}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa-v3-large Fold 0 OOF Jaccard: 0.71588; elapsed 685.7s



===== DeBERTa-v3-large Fold 1 =====


Train/Val sizes: 19785 4946


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-large and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 651.4213, 'train_samples_per_second': 60.744, 'train_steps_per_second': 0.841, 'total_flos': 9160131294309888.0, 'train_loss': 1.1198462326161183, 'epoch': 1.9939357186173439}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa-v3-large Fold 1 OOF Jaccard: 0.70821; elapsed 682.9s



===== DeBERTa-v3-large Fold 2 =====


Train/Val sizes: 19785 4946


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-large and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 658.0098, 'train_samples_per_second': 60.136, 'train_steps_per_second': 0.833, 'total_flos': 9160131294309888.0, 'train_loss': 1.1314278553872212, 'epoch': 1.9939357186173439}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa-v3-large Fold 2 OOF Jaccard: 0.71529; elapsed 691.1s



===== DeBERTa-v3-large Fold 3 =====


Train/Val sizes: 19785 4946


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-large and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 651.8441, 'train_samples_per_second': 60.705, 'train_steps_per_second': 0.841, 'total_flos': 9160131294309888.0, 'train_loss': 1.1365886674310168, 'epoch': 1.9939357186173439}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa-v3-large Fold 3 OOF Jaccard: 0.71051; elapsed 683.2s



===== DeBERTa-v3-large Fold 4 =====


Train/Val sizes: 19785 4946


Some weights of DebertaV2ForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-v3-large and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Best model metrics: {'train_runtime': 652.634, 'train_samples_per_second': 60.631, 'train_steps_per_second': 0.84, 'total_flos': 9160131294309888.0, 'train_loss': 1.131474668962242, 'epoch': 1.9939357186173439}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


DeBERTa-v3-large Fold 4 OOF Jaccard: 0.71575; elapsed 684.2s


DeBERTa-v3-large OOF Jaccard (all folds): 0.71313


Saved DeBERTa-v3-large OOF artifacts.


DeBERTa-v3-large test inference fold 0...


DeBERTa-v3-large test inference fold 1...


DeBERTa-v3-large test inference fold 2...


DeBERTa-v3-large test inference fold 3...


DeBERTa-v3-large test inference fold 4...


Wrote submission_deberta_large.csv Head:
     textID                                                                                       selected_text
80a1e6bc32                                                                                                wish
863097735d                                                                                   gosh today sucks!
264cd5277f             tired and didn`t really have an exciting Saturday.  oh well, hope it`s better tomorrow.
baee1e6ffc                                                              i`ve been eating cheetos all morning..
67d06a8dee  haiiii sankQ i`m fineee ima js get a checkup cos my rib hurts LOL idk but i shall be fine ~ thanks
DeBERTa-v3-large run complete.


In [13]:
# Boundary-cleaning post-processing for DeBERTa (staged) — evaluate on OOF and prep test decode
import numpy as np, pandas as pd, json, time, string
from transformers import AutoTokenizer

print('Staging boundary-cleaning PP for DeBERTa-v3-base (OOF eval + test writer).', flush=True)

MAX_LEN = 128
tok_deb = AutoTokenizer.from_pretrained('microsoft/deberta-v3-base', use_fast=True)

def get_seq_ids_offs_deb(sentiment: str, text: str):
    tmp = tok_deb(str(sentiment), str(text), max_length=MAX_LEN, padding='max_length', truncation='only_second',
                  add_special_tokens=True, return_offsets_mapping=True)
    return tmp.sequence_ids(), tmp['offset_mapping']

def decode_best_indices(sl_row, el_row, offs, seq_ids, lp_char: float):
    ctx = [i for i, sid in enumerate(seq_ids) if sid == 1]
    if not ctx: return None, None
    sl = np.asarray(sl_row, dtype=np.float32).copy()
    el = np.asarray(el_row, dtype=np.float32).copy()
    mask = np.zeros_like(sl, dtype=bool); mask[np.array(ctx)] = True
    sl[~mask] = -np.inf; el[~mask] = -np.inf
    valid = [i for i in ctx if offs[i][1] > offs[i][0]]
    if not valid: return None, None
    best, bi, bj = -1e9, valid[0], valid[0]
    for i in valid:
        oi0 = offs[i][0]
        for j in valid:
            if j < i: continue
            sc = float(sl[i]) + float(el[j]) - lp_char * float(offs[j][1] - oi0)
            if sc > best: best, bi, bj = sc, i, j
    return bi, bj

BOUNDARY_PUNCT_LEFT = set('"\'`([{' )
BOUNDARY_PUNCT_RIGHT = set('")]}.,!?;:…')

def expand_to_word_boundaries(text: str, s_char: int, e_char: int):
    # Expand if cut through alnum on edges
    if s_char is None or e_char is None: return 0, len(text)
    n = len(text); i, j = max(0, s_char), min(n, e_char)
    if i < j:
        # expand left if mid-word
        if i > 0 and i < n and text[i].isalnum() and text[i-1].isalnum():
            while i > 0 and text[i-1].isalnum():
                i -= 1
        # expand right if mid-word
        if j > 0 and j < n and text[j-1].isalnum() and (text[j].isalnum() if j < n else False):
            while j < n and text[j].isalnum():
                j += 1
    return i, j

def trim_outer_ws_punct(text: str, s_char: int, e_char: int):
    i, j = s_char, e_char
    # strip spaces first
    while i < j and text[i].isspace(): i += 1
    while j > i and text[j-1].isspace(): j -= 1
    # then gentle punctuation trim (avoid eating core word chars)
    while i < j and text[i] in BOUNDARY_PUNCT_LEFT: i += 1
    while j > i and text[j-1] in BOUNDARY_PUNCT_RIGHT: j -= 1
    return i, j

def decode_with_boundary_clean(sl_row, el_row, offs, seq_ids, text: str, sentiment: str, lp_char: float):
    s_key = str(sentiment).strip().lower()
    if s_key == 'neutral':
        return text
    bi, bj = decode_best_indices(sl_row, el_row, offs, seq_ids, lp_char)
    if bi is None or bj is None:
        return text
    s_char, e_char = offs[bi][0], offs[bj][1]
    s_char, e_char = expand_to_word_boundaries(text, s_char, e_char)
    s_char, e_char = trim_outer_ws_punct(text, s_char, e_char)
    pred = text[s_char:e_char]
    pred = pred if pred.strip() else text
    return pred

def jaccard_mean(y_true, y_pred):
    def jac(a,b):
        sa, sb = set(str(a).split()), set(str(b).split())
        if not sa and not sb: return 1.0
        if not sa or not sb: return 0.0
        inter = len(sa & sb); union = len(sa | sb)
        return inter/union if union else 0.0
    return float(np.mean([jac(t,p) for t,p in zip(y_true, y_pred)]))

def eval_oof_boundary_gain():
    # Load OOF and logits
    oof_df = pd.read_csv('oof_deberta_v3_base.csv')
    sl = np.load('oof_start_logits_deberta_v3_base.npy')
    el = np.load('oof_end_logits_deberta_v3_base.npy')
    # Load tuned lp params if available
    try:
        ppj = json.load(open('pp_params_deberta.json'))
        lp_by_sent = ppj.get('lp', {'positive': 0.008, 'negative': 0.002})
    except Exception:
        lp_by_sent = {'positive': 0.008, 'negative': 0.002}
    preds_no_bc, preds_bc = [], []
    t0 = time.time()
    for i in range(len(oof_df)):
        r = oof_df.iloc[i]
        s_key = str(r['sentiment']).strip().lower()
        seq_ids, offs = get_seq_ids_offs_deb(r['sentiment'], r['text'])
        lp = lp_by_sent.get(s_key, 0.0)
        # baseline decode (trim only .strip() implicitly via slicing below)
        bi, bj = decode_best_indices(sl[i], el[i], offs, seq_ids, lp)
        if bi is None or bj is None or s_key=='neutral':
            preds_no_bc.append(r['text'])
        else:
            s_char, e_char = offs[bi][0], offs[bj][1]
            pred0 = r['text'][s_char:e_char].strip()
            preds_no_bc.append(pred0 if pred0 else r['text'])
        # boundary-cleaned
        preds_bc.append(decode_with_boundary_clean(sl[i], el[i], offs, seq_ids, r['text'], r['sentiment'], lp))
    score0 = jaccard_mean(oof_df['selected_text'].tolist(), preds_no_bc)
    score1 = jaccard_mean(oof_df['selected_text'].tolist(), preds_bc)
    print(f'DeBERTa OOF no-boundary: {score0:.5f} | with-boundary: {score1:.5f} | delta: {score1-score0:+.5f} | n={len(oof_df)} | {time.time()-t0:.1f}s', flush=True)
    return lp_by_sent, score0, score1

def write_test_with_boundary_clean():
    test_df = pd.read_csv('test.csv')
    # Load per-fold test logits and average (already cached by cell 6)
    st = np.mean([np.load(f'deberta_test_start_fold{f}.npy') for f in range(5)], axis=0)
    en = np.mean([np.load(f'deberta_test_end_fold{f}.npy') for f in range(5)], axis=0)
    try:
        lp_by_sent, _, _ = eval_oof_boundary_gain()
    except Exception:
        lp_by_sent = {'positive': 0.008, 'negative': 0.002}
    preds = []
    for i in range(len(test_df)):
        text, sent = test_df.iloc[i]['text'], test_df.iloc[i]['sentiment']
        s_key = str(sent).strip().lower()
        if s_key == 'neutral':
            preds.append(text); continue
        seq_ids, offs = get_seq_ids_offs_deb(sent, text)
        preds.append(decode_with_boundary_clean(st[i], en[i], offs, seq_ids, text, sent, lp_by_sent.get(s_key, 0.0)))
    sub = pd.DataFrame({'textID': test_df['textID'], 'selected_text': preds})
    sub.to_csv('submission_deberta_boundary.csv', index=False)
    print('Wrote submission_deberta_boundary.csv. Head:\n', sub.head().to_string(index=False), flush=True)

print('Boundary-cleaning utilities ready. After current training finishes, run:',
      '\n- eval_oof_boundary_gain() to verify gain',
      '\n- write_test_with_boundary_clean() to produce submission_deberta_boundary.csv', flush=True)

Staging boundary-cleaning PP for DeBERTa-v3-base (OOF eval + test writer).




Boundary-cleaning utilities ready. After current training finishes, run: 
- eval_oof_boundary_gain() to verify gain 
- write_test_with_boundary_clean() to produce submission_deberta_boundary.csv


In [14]:
# DeBERTa-v3 base+large logit blending (OOF tune weights; optional boundary cleaning); apply to test
import numpy as np, pandas as pd, json, time
from transformers import AutoTokenizer

print('Staging DeBERTa-v3 base+large blending...', flush=True)

MAX_LEN = 128
tok_deb = AutoTokenizer.from_pretrained('microsoft/deberta-v3-base', use_fast=True)

def get_seq_ids_offs(sentiment: str, text: str):
    tmp = tok_deb(str(sentiment), str(text), max_length=MAX_LEN, padding='max_length', truncation='only_second',
                  add_special_tokens=True, return_offsets_mapping=True)
    return tmp.sequence_ids(), tmp['offset_mapping']

# Reuse boundary-clean helpers from Cell 13 if defined; else define minimal no-op wrappers
try:
    expand_to_word_boundaries
except NameError:
    def expand_to_word_boundaries(text, s_char, e_char):
        n = len(text); i, j = max(0, s_char or 0), min(n, e_char or n)
        return i, j
try:
    trim_outer_ws_punct
except NameError:
    def trim_outer_ws_punct(text, s_char, e_char):
        i, j = s_char, e_char
        while i < j and text[i].isspace(): i += 1
        while j > i and text[j-1].isspace(): j -= 1
        return i, j

def decode_span_from_logits(sl_row, el_row, offs, seq_ids, text: str, sentiment: str, lp_char: float, boundary_clean: bool=False):
    s_key = str(sentiment).strip().lower()
    if s_key == 'neutral':
        return text
    ctx = [i for i, sid in enumerate(seq_ids) if sid == 1]
    if not ctx: return text
    sl = np.asarray(sl_row, dtype=np.float32).copy()
    el = np.asarray(el_row, dtype=np.float32).copy()
    mask = np.zeros_like(sl, dtype=bool); mask[np.array(ctx)] = True
    sl[~mask] = -np.inf; el[~mask] = -np.inf
    valid = [i for i in ctx if offs[i][1] > offs[i][0]]
    if not valid: return text
    best, bi, bj = -1e9, valid[0], valid[0]
    for i in valid:
        oi0 = offs[i][0]
        for j in valid:
            if j < i: continue
            sc = float(sl[i]) + float(el[j]) - lp_char * float(offs[j][1] - oi0)
            if sc > best: best, bi, bj = sc, i, j
    s_char, e_char = offs[bi][0], offs[bj][1]
    if boundary_clean:
        s_char, e_char = expand_to_word_boundaries(text, s_char, e_char)
        s_char, e_char = trim_outer_ws_punct(text, s_char, e_char)
    pred = text[s_char:e_char]
    pred = pred if pred.strip() else text
    return pred

def jaccard_mean(y_true, y_pred):
    def jac(a,b):
        sa, sb = set(str(a).split()), set(str(b).split())
        if not sa and not sb: return 1.0
        if not sa or not sb: return 0.0
        inter = len(sa & sb); union = len(sa | sb)
        return inter/union if union else 0.0
    return float(np.mean([jac(t,p) for t,p in zip(y_true, y_pred)]))

def blend_oof_and_eval(weights=(0.5,0.6,0.7,0.8,0.9), boundary_options=(False, True)):
    # Load OOF artifacts
    oof_df = pd.read_csv('oof_deberta_v3_base.csv')
    sl_b = np.load('oof_start_logits_deberta_v3_base.npy'); el_b = np.load('oof_end_logits_deberta_v3_base.npy')
    sl_l = np.load('oof_start_logits_deberta_v3_large.npy'); el_l = np.load('oof_end_logits_deberta_v3_large.npy')
    assert sl_b.shape == sl_l.shape == el_b.shape == el_l.shape, 'OOF logits shape mismatch between base and large'
    # Load tuned lp per sentiment
    try:
        params = json.load(open('pp_params_deberta.json'))
        lp_by_sent = params.get('lp', {'positive': 0.008, 'negative': 0.002})
    except Exception:
        lp_by_sent = {'positive': 0.008, 'negative': 0.002}
    y_true = oof_df['selected_text'].astype(str).tolist()
    best = (-1.0, None, None)
    for w in weights:
        sl = w*sl_l + (1.0-w)*sl_b
        el = w*el_l + (1.0-w)*el_b
        for bc in boundary_options:
            preds = []
            t0 = time.time()
            for i in range(len(oof_df)):
                r = oof_df.iloc[i]
                s_key = str(r['sentiment']).strip().lower()
                seq_ids, offs = get_seq_ids_offs(r['sentiment'], r['text'])
                lp = lp_by_sent.get(s_key, 0.0)
                preds.append(decode_span_from_logits(sl[i], el[i], offs, seq_ids, r['text'], r['sentiment'], lp, boundary_clean=bc))
            sc = jaccard_mean(y_true, preds)
            print(f'Blend w={w:.2f} boundary={bc} -> OOF {sc:.5f} in {time.time()-t0:.1f}s', flush=True)
            if sc > best[0]: best = (sc, w, bc)
    print('Best blend:', best, flush=True)
    return best

def apply_blend_to_test(weight: float, boundary_clean: bool):
    test_df = pd.read_csv('test.csv')
    st_b = np.mean([np.load(f'deberta_test_start_fold{f}.npy') for f in range(5)], axis=0)
    en_b = np.mean([np.load(f'deberta_test_end_fold{f}.npy') for f in range(5)], axis=0)
    st_l = np.mean([np.load(f'deberta_large_test_start_fold{f}.npy') for f in range(5)], axis=0)
    en_l = np.mean([np.load(f'deberta_large_test_end_fold{f}.npy') for f in range(5)], axis=0)
    assert st_b.shape == st_l.shape == en_b.shape == en_l.shape == (len(test_df), st_b.shape[1])
    try:
        params = json.load(open('pp_params_deberta.json'))
        lp_by_sent = params.get('lp', {'positive': 0.008, 'negative': 0.002})
    except Exception:
        lp_by_sent = {'positive': 0.008, 'negative': 0.002}
    st = weight*st_l + (1.0-weight)*st_b
    en = weight*en_l + (1.0-weight)*en_b
    preds = []
    for i in range(len(test_df)):
        text, sent = test_df.iloc[i]['text'], test_df.iloc[i]['sentiment']
        s_key = str(sent).strip().lower()
        if s_key == 'neutral':
            preds.append(text); continue
        seq_ids, offs = get_seq_ids_offs(sent, text)
        preds.append(decode_span_from_logits(st[i], en[i], offs, seq_ids, text, sent, lp_by_sent.get(s_key, 0.0), boundary_clean=boundary_clean))
    sub = pd.DataFrame({'textID': test_df['textID'], 'selected_text': preds})
    out_path = f'submission_deberta_blend_w{weight:.2f}_bc{int(boundary_clean)}.csv'
    sub.to_csv(out_path, index=False)
    print('Wrote', out_path, 'Head:\n', sub.head().to_string(index=False), flush=True)
    return out_path

print('Blend utilities ready. After large training finishes and saves logits, run:',
      '\n- best = blend_oof_and_eval()  # pick (score, w, boundary)',
      '\n- apply_blend_to_test(best[1], best[2])  # write submission', flush=True)

Staging DeBERTa-v3 base+large blending...


Blend utilities ready. After large training finishes and saves logits, run: 
- best = blend_oof_and_eval()  # pick (score, w, boundary) 
- apply_blend_to_test(best[1], best[2])  # write submission


In [None]:
# Run DeBERTa base+large blend grid on OOF; apply best to test and (optionally) set submission.csv
import pandas as pd, shutil
best_score, best_w, best_bc = blend_oof_and_eval(weights=(0.5,0.6,0.7,0.8,0.9), boundary_options=(False, True))
print(f'Best blended OOF: {best_score:.5f} with w={best_w} boundary_clean={best_bc}', flush=True)
out_path = apply_blend_to_test(best_w, best_bc)
print('Blend submission path:', out_path, flush=True)
# If blend beats prior best single-model OOF (0.71239), set as final submission
if best_score > 0.71239:
    pd.read_csv(out_path).to_csv('submission.csv', index=False)
    print('submission.csv updated with blended model (OOF improved).', flush=True)
else:
    print('Blended OOF did not beat 0.71239; keeping current submission.csv.', flush=True)

Blend w=0.50 boundary=False -> OOF 0.71502 in 4.9s
