## Evaluation for Disambiguation
The evaluation for disambiguation will be split up into three stages:
1) Parsing the evaluation set
2) Creating the gold standard from the eval set (done by hand)
3) Evaluating the model on the gold standard

In [1]:
# Set working directory to root
import sys
from pathlib import Path
ROOT = Path.cwd().resolve()
while ROOT != ROOT.parent and not (ROOT / "src").exists():
    ROOT = ROOT.parent
sys.path.insert(0, str(ROOT))

### Parsing the evaluation set
For our evaluation set, we will be using a sample of 4646 example sentences from the OPD, of which we will sample sentences that are fully analyzed by the FST (but could contain no ambiguities). The code used to sample the tsv file and create the evaluation set lives in the `eval_modules/` folder.

**Note:** It's best to skip re-running the following cell, as this will write over the manually corrected files (as is detailed in the following section). If needed, uncomment and run. 

In [2]:
from evaluation.eval_modules.data_io import read_opd_tsv, build_eval_sample, write_eval_artifacts

TSV_PATH = ROOT / "data" / "opd" / "example_sentences.tsv"
FST_BINARY_PATH = ROOT / "data" / "fst" / "ojibwe7.fomabin"
OUTDIR = ROOT / "evaluation" / "eval_data" / "sample_disambig"

# read tsv
#rows = read_opd_tsv(TSV_PATH)
# build the evaluation set with random seed
#keep = build_eval_sample(rows, FST_BINARY_PATH, sample_size=300, seed=421)
# write the evaluation sample sets
# write_eval_artifacts(keep, FST_BINARY_PATH, OUTDIR, write_100=True)

print("Wrote files:",
      OUTDIR / "sample_100.tsv",
      OUTDIR / "sample_100.txt",
      OUTDIR / "sample_300.tsv",
      OUTDIR / "sample_300.txt",
      sep="\n")


Wrote files:
/Users/matthias/ELF-Lab Repos/Ojibwe_Constraint_Grammar/evaluation/eval_data/sample_disambig/sample_100.tsv
/Users/matthias/ELF-Lab Repos/Ojibwe_Constraint_Grammar/evaluation/eval_data/sample_disambig/sample_100.txt
/Users/matthias/ELF-Lab Repos/Ojibwe_Constraint_Grammar/evaluation/eval_data/sample_disambig/sample_300.tsv
/Users/matthias/ELF-Lab Repos/Ojibwe_Constraint_Grammar/evaluation/eval_data/sample_disambig/sample_300.txt


### Creating the gold standard

The gold standard was created by doing disambiguation by hand on the sample sets parsed above. Incorrect lines were deleted from the `.cg3` file by hand, as if the human was the disambiguation module. In total, 241 sentence examples were disambiguated for the gold standard, which is located in The gold set is in `eval_data/gold/disambig_gold_241.txt`.

When manually disambiguating the sentences parsed above, the teammate analyzing the sentences (who is a speaker of Ojibwe) mentioned that sentence #30 of the subset was an ungrammatical Ojibwe sentence. Therefore, we manually replace this sentence with sentence #300. 

### Evaluating the model on the gold standard

We now parse corresponding system outputs and get evaluation metrics against the gold standard.

`eval_data/sample_disambig/sample_241.tsv` contains the 241 examples corresponding to the gold standard.

In [3]:
from evaluation.eval_modules.run_eval import write_sys_from_tsv
from pathlib import Path

AMBIG_FILE = ROOT / "evaluation" / "eval_data" / "sample_disambig" / "sample_241.tsv"
DISAMBIG_OUT = ROOT / "evaluation" / "eval_data" / "out" / "sys_241_out.txt"
CG3 = ROOT / "data" / "grammars" / "disambiguation.cg3"
FST = ROOT / "data" / "fst" / "ojibwe7.fomabin"

# System outputs on the 241 examples
write_sys_from_tsv(AMBIG_FILE, DISAMBIG_OUT, CG3, FST)

FST file is /Users/matthias/ELF-Lab Repos/Ojibwe_Constraint_Grammar/data/fst/ojibwe7.fomabin
Wrote 241 sentences to /Users/matthias/ELF-Lab Repos/Ojibwe_Constraint_Grammar/evaluation/eval_data/out/sys_241_out.txt


#### Get general disambiguation stats on the sample of 241 examples

In [4]:
# get the 241 sentences into a per-line .txt file (disambiguate_with_stats expects this)
from pathlib import Path
import csv

       
OUT = ROOT / "evaluation" / "eval_data" / "sample_disambig" / "per_line_sample_241.txt"

with AMBIG_FILE.open("r", encoding="utf-8") as fin, OUT.open("w", encoding="utf-8", newline="") as fout:
    reader = csv.DictReader(fin, delimiter="\t")
    for row in reader:
        text = (row.get("Ojibwe") or "").strip()
        if text:
            fout.write(text + "\n")


# Run disambiguation stats on it
from src.stats_disambiguation import disambiguate_with_stats
from src.foma_adapter import FomaFst

after_block, stats = disambiguate_with_stats(
    text_path=OUT,
    grammar=CG3,
    fst=FomaFst(FST),
    verbose=True, 
)

FST file is /Users/matthias/ELF-Lab Repos/Ojibwe_Constraint_Grammar/data/fst/ojibwe7.fomabin
[K[############################] 4/4 All done!g text with cg3lmost all running time is spent here!)
|-------------------|----------|
| total words       |  954     |
| readings before   | 1409     |
| readings after    | 1165     |
| analyses removed  |  244     |
| ambiguity removed |    0.382 |
| ambiguous before  |    0.297 |
| ambiguous after   |    0.183 |

| type    |   words b |   words a |   readings b |   readings a |   removed |   avg b |   avg a |
|---------|-----------|-----------|--------------|--------------|-----------|---------|---------|
| verb    |       389 |       389 |          714 |          550 |       164 |    1.84 |    1.41 |
| pronoun |       110 |       110 |          146 |          125 |        21 |    1.33 |    1.14 |
| noun    |       192 |       191 |          268 |          222 |        46 |    1.4  |    1.16 |
| adverb  |       187 |       190 |          200 | 

#### Compare ambiguity cases to gold standard

In [5]:
from pathlib import Path
from evaluation.eval_modules.auto_contrast_eval import per_contrast_eval_from_paths

AMBIG_FILE = ROOT / "evaluation" / "eval_data" / "sample_disambig" / "sample_241.txt"
GOLD_FILE = ROOT / "evaluation" / "eval_data" / "gold" / "disambig_gold_241.txt"
SYS_FILE = ROOT / "evaluation" / "eval_data" / "out" / "sys_241_out.txt"

df = per_contrast_eval_from_paths(AMBIG_FILE, GOLD_FILE, SYS_FILE)
df.head(20)


Unnamed: 0,Contrast,n,Exact,Fail,Exact%,ContainsGold%,Avg sys kept,Examples
0,"OBJ{0PlObj, 0SgObj}",29,21,8,72.413793,100.0,1.275862,"[{'sent_id': '13', 'token_idx': 1, 'surface': ..."
1,"FORM{ChCnj|Cnj|Pos, Pcp|Pos}",24,0,24,0.0,100.0,2.166667,"[{'sent_id': '2', 'token_idx': 3, 'surface': '..."
2,"OBJ{3PlObvObj, 3SgObvObj}",24,0,24,0.0,100.0,2.0,"[{'sent_id': '9', 'token_idx': 2, 'surface': '..."
3,"NOUN{ObvPl, ObvSg, Pl}",23,4,19,17.391304,100.0,1.826087,"[{'sent_id': '9', 'token_idx': 3, 'surface': '..."
4,"NOUN{ObvPl, ObvSg}",17,0,17,0.0,100.0,2.0,"[{'sent_id': '38', 'token_idx': 4, 'surface': ..."
5,"FORM{ChCnj|Cnj|Pos, Cnj|Pos}",14,1,13,7.142857,100.0,2.071429,"[{'sent_id': '35', 'token_idx': 2, 'surface': ..."
6,"NOUN{NA, NI}",13,13,0,100.0,100.0,1.0,"[{'sent_id': '1', 'token_idx': 3, 'surface': '..."
7,"POS{ADVInter, PCInterj}",12,9,3,75.0,100.0,1.25,"[{'sent_id': '20', 'token_idx': 1, 'surface': ..."
8,"PV{daa, ga}",11,0,11,0.0,100.0,2.0,"[{'sent_id': '10', 'token_idx': 2, 'surface': ..."
9,"OBJ{3PlProxObj, 3SgProxObj}",10,9,1,90.0,100.0,1.1,"[{'sent_id': '41', 'token_idx': 1, 'surface': ..."
