# Example 
* One possible way how the translation process can be conducted
* `some_pairs` were chosen arbitarily, it shows that we can either try to translate all 110 pairs or we do it in batches, using multiple cells in the JupyterNotebook

## Translation Task
* Code used for translation, namely from `data_management`, `util`, `translators` and `task` MUST NOT CHANGE mid or post translation.
* It has to be decided at which commit code is considered `fixed` and after that those 3 files must remain untouched.
* If Git still tracks changes, those changes may not impact anything that would make the code behave differently from before.

In [6]:
from scripts.task import TranslationTask
from scripts.data_management import EPManager
from scripts.translators import GPT4Client
from scripts.util import MyLogger
from os.path import join

some_pairs = [
    ('en', 'de'),
    ('de', 'en'),
]

example_folder = 'exmpl' 
mt_folder = join(example_folder, 'gpt41') # store translation of specified translator
dm = EPManager() # choose dataset, in this case EuroParl 
logger = MyLogger(logfile=join('exmpl', 'log.jsonl')) # setup logger
client = GPT4Client(logger=logger)  # choose translator

task = TranslationTask(
    target_pairs=some_pairs,
    dm=dm,
    client=client,
    logger=logger,
    mt_folder=mt_folder,
    num_of_sents=50
)

In [7]:
task.run()

Document for pair en-de has been translated already.
50 translated from en to de
Document for pair de-en has been translated already.
50 translated from de to en


### Logs
* We can print our logs within the notebook but it is safer to store them externally.
* This notebook can be re-run post translation, API calls will not be made but the logs will change
* External stored logs represent logs created at time of translation and can be viewed through Python or unix commands

In [8]:
!cat $example_folder/log.jsonl

{"translator": "gpt-4.1", "src_lang": "en", "tgt_lang": "de", "start": 1745412408.0646613, "id": "1b7f362d-ef57-47fe-ab07-769fcb80f9a5", "in_lines": 50, "in_sents": 51, "timestamp": "2025-04-23 14:46:48.084661+02:00", "in_chars": 6527, "in_tokens": 1329, "dataset": {"name": "Helsinki-NLP/europarl", "num_of_sents": 50, "start_idx": 0, "split": "train[:500]"}, "out_chars": 7226, "out_tokens": 1580, "out_sents": 51, "in_model_tokens": 1378, "out_model_tokens": 1581, "out_lines": 50, "end": 1745412435.163151, "error": null, "error_msg": null, "time": 27.09848976135254}
{"translator": "gpt-4.1", "src_lang": "de", "tgt_lang": "en", "start": 1745412435.2071714, "id": "388a3d29-d86e-41c1-9e69-991a81b65f27", "in_lines": 50, "in_sents": 54, "timestamp": "2025-04-23 14:47:15.219134+02:00", "in_chars": 7023, "in_tokens": 1546, "dataset": {"name": "Helsinki-NLP/europarl", "num_of_sents": 50, "start_idx": 0, "split": "train[:500]"}, "out_chars": 6321, "out_tokens": 1317, "out_sents": 54, "in_model_t

In [9]:
from scripts.stats import GPT41_RATE
import json
with open(join(example_folder, 'log.jsonl')) as f:
    log_data = [json.loads(ln) for ln in f]

total_est_cost = 0
total_real_cost = 0
for log in log_data:
    print(log['src_lang'], log['tgt_lang'])
    est_cost = GPT41_RATE[0]*log['in_tokens'] + GPT41_RATE[1]*log['out_tokens']
    real_cost = GPT41_RATE[0]*log['in_model_tokens']+GPT41_RATE[1]*log['out_model_tokens']
    ratio = est_cost / real_cost
    total_est_cost+=est_cost
    total_real_cost+=real_cost
    
    print(f'Estimated Cost:\t{est_cost:.5f}')
    print(f'Real Cost:\t{real_cost:.5f}')
    print(f'Ratio\t{ratio:.5f}')
    print(f'Est Difference Input\t{log['in_tokens']-log['in_model_tokens']}')
    print(f'Est Difference Output\t{log['out_tokens']-log['out_model_tokens']}\n')

print(f'Total estimated cost:\t{total_est_cost}')
print(f'Total real cost:\t{total_real_cost}')
print(f'Ratio\t{total_est_cost/total_real_cost}')

en de
Estimated Cost:	0.01530
Real Cost:	0.01540
Ratio	0.99312
Est Difference Input	-49
Est Difference Output	-1

de en
Estimated Cost:	0.01363
Real Cost:	0.01373
Ratio	0.99243
Est Difference Input	-48
Est Difference Output	-1

Total estimated cost:	0.028926
Total real cost:	0.029136
Ratio	0.9927924217462933


## Post Processing
* This example case was an ideal case, as the number of input and output remained the same. 
    * For DeepL this is likely. 
    * For GPT, this can also go wrong and we may get back malformatted output that we have to align again. 
* This is an ideal case, hence we perform a direct alignment. 
* Code for post-processing can change whenever, **last one committed counts**

In [10]:
from scripts.post_process import direct_triplet_align
from scripts.util import load_sents

for pair in some_pairs:
    s, t = pair
    src_sents, tgt_sents = dm.get_sentence_pairs(s, t, num_of_sents=100)
    mt_sents = load_sents(mt_folder, s, t)
    direct_triplet_align(
        mt_sents=mt_sents,
        ref_sents=tgt_sents,
        src_sents=src_sents,
        src_lang=s,
        ref_lang=t,
        folder_path=mt_folder
    )

## Eval
* The eval code I use requires files in COMET format, i.e. JSONL with each object of format: 
    ```json
    {"mt" : "sent", "ref" : "sent", "src" : "sent"}
    ```
* Locally, we only compute BLEU and chrF scores but we can later uploud these files on Colab and compute COMET and BERT-F1 scores as well.
* Similar to post-processing, code can change whenever, **last one committed counts**

In [11]:
from scripts.scoring import ResultProducer
import os
l2f = {f.replace('.jsonl', ''): join(mt_folder, f) for f in os.listdir(mt_folder) if f.endswith('.jsonl')}
rp = ResultProducer(label2files=l2f)
rp.compute_results()
rp.display_results()

   Label       BLEU       chrF
0  de-en  30.760405  56.616469
1  en-de  22.475731  54.351526
