<a href="https://colab.research.google.com/github/safoura-banihashemi/Augmentation_Legal_Texts/blob/main/Classification_augmented.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Demosthenes — Baseline vs Pre‑computed Augmentations *(No Cleaning)*  
**Fixed Train / Validation / Test Split — 31 May 2025**

This notebook evaluates three corpora for the Demosthenes project

| Corpus | Source file (Google Drive) | Description |
|--------|---------------------------|-------------|
| **Baseline** | `demosthenes (3).pkl` | Original sentences only |
| **Aug‑GV**   | `final_augmented_data (3).pkl` | GloVe‑guided synonym substitutions (pre‑built) |
| **Aug‑POS**  | `final_augmented_data_POS.pkl` | POS‑matched random synonym substitutions (pre‑built) |

### Evaluation tasks

Each corpus is sliced into five prediction tasks, using the fixed **`Split`** column (`1 = val`, `2 = test`, `3/4/5 = train`):

1. **AD** — *Argument Detection* (Argument vs None)  
2. **AC** — *Argument Component* (Claim, Premise, …) – operates **only on AD‑positive rows**  
3. **TC** — *Type Classification* (multi‑label) – rows whose `Type` is filled  
4. **SC_All** — *Scheme Classification* (multi‑label) – rows with a non‑empty `Scheme` list  
5. **SC_-Princ** — same as SC_All but **excludes** rows containing the `Princ` scheme

### What changed vs the original notebook

* The augmentation pipelines and text‑`cleaning()` helper are **removed**.  
* Augmented datasets are **loaded from Drive** instead of being generated on the fly.  
* Delta tables now show **top‑5 gains _and_ top‑5 drops** for each augmented corpus.  
* Every code cell is preceded by a short markdown caption explaining its role.


## ⚙️ Install pinned libraries
Install exact versions of NumPy, pandas, scikit‑learn, NLTK, transformers, etc., to guarantee reproducibility in Google Colab.

In [None]:
!pip install -q --upgrade \
    numpy<2.0 \
    pandas==2.2.2 \
    scikit-learn \
    tqdm \
    nltk==3.9.1 \
    sentence-transformers \
    transformers==4.39.3 \
    python-Levenshtein

/bin/bash: line 1: 2.0: No such file or directory


## Mount Drive & load corpora
Mount your Google Drive and read the baseline and two pre‑computed augmented pickles into memory.

In [None]:
from google.colab import drive; drive.mount('/content/drive')

import pandas as pd, pathlib

# 👉 Adjust these paths if your files live elsewhere
BASE_PATH = '/content/drive/MyDrive/demosthenes (3).pkl'          # baseline
GV_PATH   = '/content/drive/MyDrive/final_augmented_data (3).pkl' # Aug‑GV
POS_PATH  = '/content/drive/MyDrive/final_augmented_data_POS.pkl' # Aug‑POS

df         = pd.read_pickle(BASE_PATH)
corpus_gv  = pd.read_pickle(GV_PATH)
corpus_pos = pd.read_pickle(POS_PATH)

print('Baseline shape :', df.shape)
print('Aug‑GV   shape :', corpus_gv.shape)
print('Aug‑POS  shape :', corpus_pos.shape)

Mounted at /content/drive
Baseline shape : (2535, 8)
Aug‑GV   shape : (2818, 9)
Aug‑POS  shape : (2818, 9)


## Normalise `Scheme` and `Name`
Ensure every corpus has a list in `Scheme` (even if empty) and a scalar string in `Name`. This prevents the multi‑label binariser from choking later.

In [None]:
def _normalize_scheme(val):
    if isinstance(val, list):
        return val
    if pd.isna(val):
        return []
    return [val]

def _flatten_name(val):
    if isinstance(val, list):
        return val[0] if val else 'None'
    return str(val) if pd.notna(val) else 'None'

for _frame in (df, corpus_gv, corpus_pos):
    _frame['Scheme'] = _frame['Scheme'].apply(_normalize_scheme)
    _frame['Name']   = _frame['Name'].apply(_flatten_name)

## Define evaluation utilities
Implements the `evaluate()` function:
* Embeddings: TF‑IDF, SBERT (`bert‑base‑nli`), Legal‑BERT‑Small.
* Classifiers: Linear SVC, Random Forest, Gaussian NB, k‑NN, polynomial SVC, plus random/majority baselines.
* Five tasks (AD, AC, TC, SC_All, SC_-Princ) with fixed train/val/test splits.

In [None]:
import itertools, numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import f1_score
from sklearn.multiclass import OneVsRestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import LinearSVC, SVC

_sbert  = SentenceTransformer('bert-base-nli-mean-tokens')
_legal  = SentenceTransformer('nlpaueb/legal-bert-small-uncased')

def _embed(model, corpus, cache, key):
    if key not in cache:
        cache[key] = model.encode(corpus, show_progress_bar=True)
    return cache[key]

def _clf_roster(multilabel=False):
    base = ['random', 'majority']
    real = [LinearSVC(), RandomForestClassifier(), GaussianNB(),
            KNeighborsClassifier(), SVC(kernel='poly')]
    if multilabel:
        real = [OneVsRestClassifier(c) for c in real]
    return base + real

_PARAM_GRID = {
    'LinearSVC':            {'C': [0.1, 1, 10]},
    'RandomForestClassifier': {'n_estimators': [100, 300]},
    'GaussianNB':           {},
    'KNeighborsClassifier': {'n_neighbors': [3, 5, 7]},
    'SVC':                  {'C': [0.1, 1], 'degree': [2, 3]}
}

def evaluate(df_in):
    embeddings  = ['tfidf', 'sbert', 'legal']
    corpus_all  = df_in['Text'].values
    cache       = {}
    tasks       = {}

    # ----- define five label-generation lambdas -----
    def _generators():
        ad_lbl = df_in['Name'].apply(
            lambda x: 'Argument' if str(x).lower() not in ['none', 'nan', 'na', '0', ''] else 'None')
        yield 'AD', df_in, ad_lbl, False

        ac_df = df_in[ad_lbl == 'Argument']
        yield 'AC', ac_df, ac_df['Name'], False

        tc_df = df_in.dropna(subset=['Type'])
        yield 'TC', tc_df, tc_df['Type'], True

        sc_all = df_in.dropna(subset=['Scheme'])
        yield 'SC_All', sc_all, sc_all['Scheme'], True

        sc_np  = df_in[~df_in['Scheme'].apply(lambda xs: any(x == 'Princ' for x in xs))]
        sc_np  = sc_np.dropna(subset=['Scheme'])
        yield 'SC_-Princ', sc_np, sc_np['Scheme'], True

    # ----- loop over tasks -----
    for code, df_t, y_all, multi in _generators():
        idx = df_t.index

        tr = (df_in['Split'].isin([3, 4, 5])).reindex(idx).values
        va = (df_in['Split'] == 1).reindex(idx).values
        te = (df_in['Split'] == 2).reindex(idx).values

        corpus = df_in.loc[idx, 'Text'].tolist()

        if multi:
            mlb = MultiLabelBinarizer().fit(y_all[tr])

        rows = []
        for emb in embeddings:
            # ---- build embedding matrix ----
            if emb == 'tfidf':
                vec = TfidfVectorizer().fit(np.array(corpus)[tr])
                X   = vec.transform(corpus).toarray()
            elif emb == 'sbert':
                X   = _embed(_sbert, corpus_all, cache, 'sbert')[idx.to_numpy()]
            else:  # legal
                X   = _embed(_legal, corpus_all, cache, 'legal')[idx.to_numpy()]

            X_tr, X_va, X_te = X[tr], X[va], X[te]

            def lab(mask):
                return mlb.transform(y_all[mask]) if multi else y_all[mask]

            y_tr, y_va, y_te = lab(tr), lab(va), lab(te)

            # ---- classifiers ----
            for clf in _clf_roster(multi):
                if isinstance(clf, str):
                    # ----- dummy baselines -----
                    if clf == 'random':
                        if multi:
                            labs  = list(mlb.classes_)
                            preds = [np.random.choice(labs,
                                                       size=np.random.randint(1, len(labs)+1),
                                                       replace=False)
                                     for _ in range(len(X_te))]
                            y_pred = mlb.transform(preds)
                        else:
                            labs   = list(set(y_tr))
                            y_pred = [np.random.choice(labs) for _ in range(len(X_te))]
                    else:  # 'majority'
                        if multi:
                            mask   = [sum(l in row for row in y_tr) > len(y_tr)/2
                                      for l in sorted(mlb.classes_)]
                            maj    = [1 if m else 0 for m in mask]
                            y_pred = np.tile(maj, (len(X_te), 1))
                        else:
                            maj    = max(set(y_tr), key=list(y_tr).count)
                            y_pred = [maj] * len(X_te)
                    name = clf.capitalize()
                else:
                    # ----- real classifiers with tiny grid search -----
                    if not multi and len(set(y_tr)) < 2:
                        continue  # avoid training on single-class data

                    base_cls  = clf.estimator if isinstance(clf, OneVsRestClassifier) else clf
                    base_name = base_cls.__class__.__name__
                    grid      = _PARAM_GRID.get(base_name, {})

                    if not grid:
                        best = clf.fit(X_tr, y_tr)
                    else:
                        best_f1 = -1
                        best    = None
                        keys, vals = list(grid.keys()), list(grid.values())
                        for combo in itertools.product(*vals):
                            params = dict(zip(keys, combo))
                            fresh  = base_cls.__class__(**params)
                            trial  = OneVsRestClassifier(fresh) if multi else fresh
                            trial.fit(X_tr, y_tr)
                            f1 = f1_score(y_va, trial.predict(X_va),
                                          average='macro', zero_division=0)
                            if f1 > best_f1:
                                best_f1, best = f1, trial

                    y_pred = best.predict(X_te)
                    name   = (best.estimator.__class__.__name__
                              if isinstance(best, OneVsRestClassifier)
                              else best.__class__.__name__)

                macro = f1_score(y_te, y_pred, average='macro', zero_division=0)
                rows.append({'Embedding': emb, 'Classifier': name, 'MacroF1': macro})

        tasks[code] = pd.DataFrame(rows)

    return tasks



## Run evaluation & show results
Execute the evaluation for each corpus, then present full score tables **plus** delta columns with top‑5 gains **and** top‑5 drops.

In [None]:
import pandas as pd

baseline = evaluate(df)
aug_gv   = evaluate(corpus_gv)
aug_pos  = evaluate(corpus_pos)

from IPython.display import display
pd.set_option('display.max_rows', None)

for task in baseline:
    print(f"\n## {task}")
    base = baseline[task].set_index(['Embedding', 'Classifier'])
    gv   = aug_gv [task].set_index(['Embedding', 'Classifier'])
    pos  = aug_pos[task].set_index(['Embedding', 'Classifier'])

    full = pd.concat({'Baseline': base,
                      'Aug-GV' : gv,
                      'Aug-POS': pos}, axis=1)

    # ---- main table ----
    display(full.sort_values(('Aug-GV', 'MacroF1'), ascending=False))

    # ---- deltas ----
    d_gv  = gv ['MacroF1'] - base['MacroF1']
    d_pos = pos['MacroF1'] - base['MacroF1']

    print('\nΔ (Aug-GV – Base)  | top‑5 gains ↑')
    display(d_gv.sort_values(ascending=False).head())

    print('Δ (Aug-GV – Base)  | top‑5 drops ↓')
    display(d_gv.sort_values(ascending=True).head())

    print('Δ (Aug-POS – Base) | top‑5 gains ↑')
    display(d_pos.sort_values(ascending=False).head())

    print('Δ (Aug-POS – Base) | top‑5 drops ↓')
    display(d_pos.sort_values(ascending=True).head())

    print(f"Summary Δ — GV μ {d_gv.mean():.4f} / median {d_gv.median():.4f}  "
          f"| POS μ {d_pos.mean():.4f} / median {d_pos.median():.4f}")

Batches:   0%|          | 0/80 [00:00<?, ?it/s]

Batches:   0%|          | 0/80 [00:00<?, ?it/s]

Batches:   0%|          | 0/89 [00:00<?, ?it/s]

Batches:   0%|          | 0/89 [00:00<?, ?it/s]

Batches:   0%|          | 0/89 [00:00<?, ?it/s]

Batches:   0%|          | 0/89 [00:00<?, ?it/s]


## AD


Unnamed: 0_level_0,Unnamed: 1_level_0,Baseline,Aug-GV,Aug-POS
Unnamed: 0_level_1,Unnamed: 1_level_1,MacroF1,MacroF1,MacroF1
Embedding,Classifier,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
tfidf,Random,1.0,1.0,1.0
tfidf,Majority,1.0,1.0,1.0
sbert,Random,1.0,1.0,1.0
sbert,Majority,1.0,1.0,1.0
legal,Random,1.0,1.0,1.0
legal,Majority,1.0,1.0,1.0



Δ (Aug-GV – Base)  | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,Random,0.0
tfidf,Majority,0.0
sbert,Random,0.0
sbert,Majority,0.0
legal,Random,0.0


Δ (Aug-GV – Base)  | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,Random,0.0
tfidf,Majority,0.0
sbert,Random,0.0
sbert,Majority,0.0
legal,Random,0.0


Δ (Aug-POS – Base) | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,Random,0.0
tfidf,Majority,0.0
sbert,Random,0.0
sbert,Majority,0.0
legal,Random,0.0


Δ (Aug-POS – Base) | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,Random,0.0
tfidf,Majority,0.0
sbert,Random,0.0
sbert,Majority,0.0
legal,Random,0.0


Summary Δ — GV μ 0.0000 / median 0.0000  | POS μ 0.0000 / median 0.0000

## AC


Unnamed: 0_level_0,Unnamed: 1_level_0,Baseline,Aug-GV,Aug-POS
Unnamed: 0_level_1,Unnamed: 1_level_1,MacroF1,MacroF1,MacroF1
Embedding,Classifier,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
tfidf,KNeighborsClassifier,0.884104,0.94623,0.940794
sbert,LinearSVC,0.869795,0.931781,0.930445
legal,KNeighborsClassifier,0.879117,0.929046,0.92758
tfidf,SVC,0.903294,0.92758,0.920168
legal,LinearSVC,0.903294,0.92758,0.921806
legal,SVC,0.903294,0.926043,0.92758
tfidf,LinearSVC,0.84346,0.920168,0.904858
legal,GaussianNB,0.910836,0.917798,0.910384
tfidf,GaussianNB,0.871922,0.914744,0.889627
legal,RandomForestClassifier,0.878141,0.902751,0.894632



Δ (Aug-GV – Base)  | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,LinearSVC,0.076708
tfidf,KNeighborsClassifier,0.062126
sbert,LinearSVC,0.061986
legal,KNeighborsClassifier,0.049929
sbert,SVC,0.047503


Δ (Aug-GV – Base)  | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,Majority,-0.013728
sbert,Majority,-0.013728
legal,Majority,-0.013728
legal,Random,-0.004635
tfidf,Random,0.00238


Δ (Aug-POS – Base) | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
sbert,Random,0.069416
sbert,SVC,0.063503
tfidf,LinearSVC,0.061397
sbert,LinearSVC,0.06065
tfidf,KNeighborsClassifier,0.05669


Δ (Aug-POS – Base) | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,Majority,-0.013728
sbert,Majority,-0.013728
legal,Majority,-0.013728
legal,GaussianNB,-0.000452
tfidf,RandomForestClassifier,0.002443


Summary Δ — GV μ 0.0234 / median 0.0227  | POS μ 0.0247 / median 0.0185

## TC


Unnamed: 0_level_0,Unnamed: 1_level_0,Baseline,Aug-GV,Aug-POS
Unnamed: 0_level_1,Unnamed: 1_level_1,MacroF1,MacroF1,MacroF1
Embedding,Classifier,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
legal,RandomForestClassifier,0.897983,0.914223,0.902599
legal,SVC,0.912906,0.89992,0.914757
tfidf,RandomForestClassifier,0.893896,0.896023,0.889033
tfidf,SVC,0.881557,0.886531,0.886325
legal,GaussianNB,0.894177,0.886204,0.884547
sbert,SVC,0.873804,0.881543,0.879825
tfidf,LinearSVC,0.8786,0.881255,0.88197
legal,LinearSVC,0.874251,0.879476,0.873033
legal,KNeighborsClassifier,0.875833,0.877418,0.878163
sbert,RandomForestClassifier,0.825709,0.86117,0.84109



Δ (Aug-GV – Base)  | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
sbert,RandomForestClassifier,0.035461
tfidf,GaussianNB,0.017481
legal,RandomForestClassifier,0.01624
sbert,SVC,0.007739
legal,LinearSVC,0.005225


Δ (Aug-GV – Base)  | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
legal,Random,-0.03059
sbert,Random,-0.025769
sbert,GaussianNB,-0.013424
legal,SVC,-0.012986
legal,GaussianNB,-0.007972


Δ (Aug-POS – Base) | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
sbert,RandomForestClassifier,0.015382
sbert,SVC,0.006021
tfidf,SVC,0.004768
legal,RandomForestClassifier,0.004616
tfidf,LinearSVC,0.003371


Δ (Aug-POS – Base) | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
legal,Random,-0.035244
sbert,GaussianNB,-0.015527
tfidf,Random,-0.014046
sbert,LinearSVC,-0.009821
legal,GaussianNB,-0.009629


Summary Δ — GV μ -0.0001 / median 0.0002  | POS μ -0.0025 / median 0.0000

## SC_All


Unnamed: 0_level_0,Unnamed: 1_level_0,Baseline,Aug-GV,Aug-POS
Unnamed: 0_level_1,Unnamed: 1_level_1,MacroF1,MacroF1,MacroF1
Embedding,Classifier,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
tfidf,LinearSVC,0.692033,0.752909,0.750024
sbert,LinearSVC,0.501205,0.718403,0.571289
legal,LinearSVC,0.766548,0.7168,0.626228
tfidf,SVC,0.400328,0.636338,0.447873
legal,KNeighborsClassifier,0.400391,0.632364,0.59802
tfidf,KNeighborsClassifier,0.343553,0.623241,0.609458
tfidf,RandomForestClassifier,0.364533,0.559252,0.556571
legal,SVC,0.387877,0.46723,0.481797
legal,GaussianNB,0.369637,0.436754,0.405719
tfidf,GaussianNB,0.28981,0.406364,0.270839



Δ (Aug-GV – Base)  | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,KNeighborsClassifier,0.279689
tfidf,SVC,0.23601
legal,KNeighborsClassifier,0.231973
sbert,LinearSVC,0.217198
tfidf,RandomForestClassifier,0.194719


Δ (Aug-GV – Base)  | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
legal,LinearSVC,-0.049748
sbert,Random,-0.011791
legal,Majority,0.0
tfidf,Majority,0.0
sbert,Majority,0.0


Δ (Aug-POS – Base) | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,KNeighborsClassifier,0.265906
legal,KNeighborsClassifier,0.197629
tfidf,RandomForestClassifier,0.192038
legal,SVC,0.09392
sbert,SVC,0.0707


Δ (Aug-POS – Base) | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
legal,LinearSVC,-0.14032
tfidf,GaussianNB,-0.018971
tfidf,Random,-0.006865
sbert,Random,-0.006755
legal,Majority,0.0


Summary Δ — GV μ 0.0781 / median 0.0380  | POS μ 0.0481 / median 0.0361

## SC_-Princ


Unnamed: 0_level_0,Unnamed: 1_level_0,Baseline,Aug-GV,Aug-POS
Unnamed: 0_level_1,Unnamed: 1_level_1,MacroF1,MacroF1,MacroF1
Embedding,Classifier,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
tfidf,LinearSVC,0.632861,0.703649,0.712677
sbert,LinearSVC,0.601436,0.688119,0.689856
legal,LinearSVC,0.715223,0.65159,0.732686
tfidf,SVC,0.489789,0.638136,0.550853
tfidf,RandomForestClassifier,0.434365,0.631097,0.639386
legal,KNeighborsClassifier,0.482295,0.610381,0.584811
legal,SVC,0.455386,0.610094,0.597022
tfidf,KNeighborsClassifier,0.416256,0.568502,0.531035
sbert,SVC,0.345644,0.487261,0.415104
legal,GaussianNB,0.434054,0.481151,0.480516



Δ (Aug-GV – Base)  | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,RandomForestClassifier,0.196732
legal,SVC,0.154707
tfidf,KNeighborsClassifier,0.152246
tfidf,SVC,0.148348
sbert,SVC,0.141617


Δ (Aug-GV – Base)  | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
legal,LinearSVC,-0.063633
sbert,Random,-0.027202
legal,Random,-0.010536
sbert,Majority,0.0
legal,Majority,0.0


Δ (Aug-POS – Base) | top‑5 gains ↑


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
tfidf,RandomForestClassifier,0.20502
legal,SVC,0.141635
tfidf,KNeighborsClassifier,0.114779
legal,KNeighborsClassifier,0.102516
sbert,LinearSVC,0.088421


Δ (Aug-POS – Base) | top‑5 drops ↓


Unnamed: 0_level_0,Unnamed: 1_level_0,MacroF1
Embedding,Classifier,Unnamed: 2_level_1
legal,Random,-0.013202
tfidf,GaussianNB,-0.004976
tfidf,Random,-0.003945
sbert,Random,-0.000192
sbert,Majority,0.0


Summary Δ — GV μ 0.0545 / median 0.0269  | POS μ 0.0492 / median 0.0402
