# 02 — Zero-shot Baseline & Scoring

**Purpose:**
 Run the unmodified NLLB model directly on your test data to create a **baseline** system for comparison.

**Steps:**

1. Extract test sources (`test.src`) and references (`test.ref`) from `test.tsv`.
2. Call your `translate_simple.py` script to generate predictions (`hyp.txt`).
3. Score with `score.py` to compute BLEU and chrF2.
4. Print metrics to verify baseline performance.

**Outputs:**

- `experiments/baseline/test.src`
- `experiments/baseline/test.ref`
- `experiments/baseline/hyp.txt`
- `experiments/baseline/metrics.json`

In [3]:
from pathlib import Path
import pandas as pd

pivot_model = "facebook/nllb-200-distilled-600M"
root = Path("..") # project root (one level up from notebooks/)
pp   = root / "data" / "processed"
exp  = root / "experiments" / "baseline"
exp.mkdir(parents=True, exist_ok=True)

# Load test TSV using pandas' interpretation
df = pd.read_csv(pp / "test.tsv", sep="\t", header=None, names=["src","tgt"])

# Clean target: remove embedded newlines, excess whitespace
df["tgt"] = (
    df["tgt"]
    .astype(str)
    .str.replace(r"\s+", " ", regex=True)
    .str.strip()
)

# Write clean src and tgt files
df["src"].to_csv(exp / "test.src", index=False, header=False, encoding="utf-8")
df["tgt"].to_csv(exp / "test.ref", index=False, header=False, encoding="utf-8")

print("Prepared:", exp / "test.src", "and", exp / "test.ref", "lines:", len(df))

Prepared: ..\experiments\baseline\test.src and ..\experiments\baseline\test.ref lines: 2750


In [None]:
!python ../src/decode/translate_simple.py \
  --model $pivot_model \
  --src ../experiments/baseline/test.src \
  --out ../experiments/baseline/hyp.txt

Loaded 2,750 lines from ..\experiments\baseline\test.src
Loading model: facebook/nllb-200-distilled-600M
Using source=tgl_Latn tag='tgl_Latn' → target=ceb_Latn (id=256035)
Device: cuda | batch=12 | beams=2
✅ Wrote 2750 translations → ..\experiments\baseline\hyp.txt


In [5]:
!python ../src/eval/score.py \
  --ref ../experiments/baseline/test.ref \
  --hyp ../experiments/baseline/hyp.txt \
  --out ../experiments/baseline/metrics.json

{
  "BLEU": 22.95,
  "chrF2": 44.5,
  "ref_len": 114606,
  "sys_len": 88718,
  "signature": "nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.5.1",
  "sacrebleu_version": "2.5.1"
}
