# Example 
* One possible way how the translation process can be conducted
* `some_pairs` were chosen arbitarily, it shows that we can either try to translate all 110 pairs or we do it in batches, using multiple cells in the JupyterNotebook

## Translation Task
* Code used for translation, namely from `data_management`, `util`, `translators` and `task` MUST NOT CHANGE mid or post translation.
* It has to be decided at which commit code is considered `fixed` and after that those 3 files must remain untouched.
* If Git still tracks changes, those changes may not impact anything that would make the code behave differently from before.

In [1]:
from scripts.task import TranslationTask
from scripts.data_management import EPManager
from scripts.translators import GPT4Client
from scripts.util import MyLogger
from os.path import join

some_pairs = [
    ('en', 'de'),
    ('de', 'en'),
]

example_folder = 'exmpl' 
mt_folder = join(example_folder, 'gpt41') # store translation of specified translator
dm = EPManager() # choose dataset, in this case EuroParl 
logger = MyLogger(logfile=join('exmpl', 'log.jsonl')) # setup logger
client = GPT4Client(logger=logger)  # choose translator

task = TranslationTask(
    target_pairs=some_pairs,
    dm=dm,
    client=client,
    logger=logger,
    mt_folder=mt_folder,
    num_of_sents=50
)

In [2]:
task.run()

50 translated from en to de
50 translated from de to en


### Logs
* We can print our logs within the notebook but it is safer to store them externally.
* This notebook can be re-run post translation, API calls will not be made but the logs will change
* External stored logs represent logs created at time of translation and can be viewed through Python or unix commands

In [3]:
!cat $example_folder/log.jsonl

{"translator": "gpt-4.1", "src_lang": "en", "tgt_lang": "de", "start": 1745406813.9225166, "id": "bd108883-8a62-4471-be35-f07d8f34def4", "in_lines": 50, "in_sents": 51, "stamp": "2025-04-23 13:13:33.936517+02:00", "in_chars": 6527, "in_tiktoks": 1329, "dataset": {"name": "Helsinki-NLP/europarl", "num_of_sents": 50, "start_idx": 0, "split": "train[:500]"}, "out_chars": 7259, "out_tiktoks": 1589, "out_sents": 51, "in_toks": 1378, "out_toks": 1590, "out_lines": 50, "end": 1745406839.1053126, "error": null, "error_msg": null, "time": 25.182796001434326}
{"translator": "gpt-4.1", "src_lang": "de", "tgt_lang": "en", "start": 1745406839.1353137, "id": "656f6da5-024a-475c-a1bf-3e959f77e00c", "in_lines": 50, "in_sents": 54, "stamp": "2025-04-23 13:13:59.147312+02:00", "in_chars": 7023, "in_tiktoks": 1546, "dataset": {"name": "Helsinki-NLP/europarl", "num_of_sents": 50, "start_idx": 0, "split": "train[:500]"}, "out_chars": 6353, "out_tiktoks": 1325, "out_sents": 54, "in_toks": 1594, "out_toks": 

In [4]:
from scripts.stats import GPT41_RATE
import json
with open(join(example_folder, 'log.jsonl')) as f:
    log_data = [json.loads(ln) for ln in f]

total_est_cost = 0
total_real_cost = 0
for log in log_data:
    print(log['src_lang'], log['tgt_lang'])
    est_cost = GPT41_RATE[0]*log['in_tiktoks'] + GPT41_RATE[1]*log['out_tiktoks']
    real_cost = GPT41_RATE[0]*log['in_toks']+GPT41_RATE[1]*log['out_toks']
    ratio = est_cost / real_cost
    total_est_cost+=est_cost
    total_real_cost+=real_cost
    
    print(f'Estimated Cost:\t{est_cost:.5f}')
    print(f'Real Cost:\t{real_cost:.5f}')
    print(f'Ratio\t{ratio:.5f}')
    print(f'Est Difference Input\t{log['in_tiktoks']-log['in_toks']}')
    print(f'Est Difference Output\t{log['out_tiktoks']-log['out_toks']}\n')

print(f'Total estimated cost:\t{total_est_cost}')
print(f'Total real cost:\t{total_real_cost}')
print(f'Ratio\t{total_est_cost/total_real_cost}')

en de
Estimated Cost:	0.01537
Real Cost:	0.01548
Ratio	0.99315
Est Difference Input	-49
Est Difference Output	-1

de en
Estimated Cost:	0.01369
Real Cost:	0.01380
Ratio	0.99246
Est Difference Input	-48
Est Difference Output	-1

Total estimated cost:	0.029061999999999998
Total real cost:	0.029272
Ratio	0.992825908718229


## Post Processing
* This example case was an ideal case, as the number of input and output remained the same. 
    * For DeepL this is likely. 
    * For GPT, this can also go wrong and we may get back malformatted output that we have to align again. 
* This is an ideal case, hence we perform a direct alignment. 
* Code for post-processing can change whenever, **last one committed counts**

In [5]:
from scripts.post_process import direct_triplet_align
from scripts.util import load_sents

for pair in some_pairs:
    s, t = pair
    src_sents, tgt_sents = dm.get_sentence_pairs(s, t, num_of_sents=100)
    mt_sents = load_sents(mt_folder, s, t)
    direct_triplet_align(
        mt_sents=mt_sents,
        ref_sents=tgt_sents,
        src_sents=src_sents,
        src_lang=s,
        ref_lang=t,
        folder_path=mt_folder
    )

## Eval
* The eval code I use requires files in COMET format, i.e. JSONL with each object of format: 
    ```json
    {"mt" : "sent", "ref" : "sent", "src" : "sent"}
    ```
* Locally, we only compute BLEU and chrF scores but we can later uploud these files on Colab and compute COMET and BERT-F1 scores as well.
* Similar to post-processing, code can change whenever, **last one committed counts**

In [6]:
from scripts.scoring import ResultProducer
import os
l2f = {f.replace('.jsonl', ''): join(mt_folder, f) for f in os.listdir(mt_folder) if f.endswith('.jsonl')}
rp = ResultProducer(label2files=l2f)
rp.compute_results()
rp.display_results()

   Label       BLEU       chrF
0  de-en  30.641867  56.657114
1  en-de  22.436096  54.381047
