# Example 
* One possible way how the translation process can be conducted
* `some_pairs` were chosen arbitarily, it shows that we can either try to translate all 110 pairs or we do it in batches, using multiple cells in the JupyterNotebook

## Translation Task
* Code used for translation, namely from `data_management`, `util` and `translators` MUST NOT CHANGE mid or post translation.
* It has to be decided at which commit code is considered `fixed` and after that those 3 files must remain untouched.
* If Git still tracks changes, those changes may not impact anything that would make the code behave differently from before.

In [1]:
from scripts.data_management import EPManager
from scripts.util import MyLogger, load_sents
from scripts.translators import translate_document
from os.path import join
import os

example_folder = 'exmpl'
os.makedirs(example_folder, exist_ok=True)
logger = MyLogger(logfile=join('exmpl', 'log.jsonl'))
dm = EPManager()

def translation_loop(target_pairs, translator, mt_folder):
    for pair in target_pairs:
        src_lang, tgt_lang = pair
        src_sents, _ = dm.get_sentence_pairs(
            src_lang, tgt_lang, num_of_sents=100)
        logger.add_dataset_info(name='ep', num_of_sents=100)
        try:
            translate_document(
                text=src_sents,
                src_lang=src_lang,
                tgt_lang=tgt_lang,
                logger=logger,
                mt_folder=mt_folder,
                translator=translator
            )
        except Exception as e:
            logger.log_error(
                error=e,
                src_lang=src_lang,
                tgt_lang=tgt_lang,
                translator=translator
            )
            print(str(e))
            continue
        mt_sents = load_sents(mt_folder, src_lang, tgt_lang)
        print(f'{len(mt_sents)} translated from {src_lang} to {tgt_lang}')

In [2]:
some_pairs = [
    ('en', 'de'),
    ('de', 'en'),
    ('el', 'de'),
    ('es', 'en'),
    ('fi', 'fr')
]

In [3]:
mt_folder = join(example_folder, 'gpt41')
os.makedirs(mt_folder, exist_ok=True)

translation_loop(
    target_pairs=some_pairs,
    translator='gpt-4.1',
    mt_folder=mt_folder
)

Document for pair en-de has been translated already.
100 translated from en to de
Document for pair de-en has been translated already.
100 translated from de to en
Document for pair el-de has been translated already.
100 translated from el to de
Document for pair es-en has been translated already.
100 translated from es to en
Document for pair fi-fr has been translated already.
100 translated from fi to fr


### Logs
* We can print our logs within the notebook but it is safer to store them externally.
* This notebook can be re-run post translation, API calls will not be made but the logs will change
* External stored logs represent logs created at time of translation and can be viewed through Python or unix commands

In [4]:
!cat $example_folder/log.jsonl

{"translator": "gpt-4.1", "src_lang": "en", "tgt_lang": "de", "start": 1745152481.7342436, "id": "9406b04b-f88f-45fe-a6bb-3fa46c5c210c", "in_lines": 100, "in_sents": 105, "stamp": "2025-04-20 14:34:41.760207+02:00", "in_chars": 13477, "in_tiktoks": 2714, "dataset": {"name": "ep", "num_of_sents": 100, "start_idx": 0}, "out_chars": 14872, "out_tiktoks": 3202, "out_sents": 106, "in_toks": 2763, "out_toks": 3203, "out_lines": 100, "end": 1745152527.4710276, "error": null, "error_msg": null, "time": 45.73678398132324}
{"translator": "gpt-4.1", "src_lang": "de", "tgt_lang": "en", "start": 1745152527.7747035, "id": "386a77fe-0582-45fe-a67e-5c40fd448f52", "in_lines": 100, "in_sents": 107, "stamp": "2025-04-20 14:35:27.798743+02:00", "in_chars": 14815, "in_tiktoks": 3199, "dataset": {"name": "ep", "num_of_sents": 100, "start_idx": 0}, "out_chars": 13403, "out_tiktoks": 2743, "out_sents": 107, "in_toks": 3247, "out_toks": 2744, "out_lines": 100, "end": 1745152566.1485453, "error": null, "error_m

In [11]:
from scripts.stats import GPT41_RATE
import json
with open(join(example_folder, 'log.jsonl')) as f:
    log_data = [json.loads(ln) for ln in f]

total_est_cost = 0
total_real_cost = 0
for log in log_data:
    print(log['src_lang'], log['tgt_lang'])
    est_cost = GPT41_RATE[0]*log['in_tiktoks'] + GPT41_RATE[1]*log['out_tiktoks']
    real_cost = GPT41_RATE[0]*log['in_toks']+GPT41_RATE[1]*log['out_toks']
    ratio = est_cost / real_cost
    total_est_cost+=est_cost
    total_real_cost+=real_cost
    
    print(f'Estimated Cost:\t{est_cost:.5f}')
    print(f'Real Cost:\t{real_cost:.5f}')
    print(f'Ratio\t{ratio:.5f}')
    print(f'Est Difference Input\t{log['in_tiktoks']-log['in_toks']}')
    print(f'Est Difference Output\t{log['out_tiktoks']-log['out_toks']}\n')

print(f'Total estimated cost:\t{total_est_cost}')
print(f'Total real cost:\t{total_real_cost}')
print(f'Ratio\t{total_est_cost/total_real_cost}')

en de
Estimated Cost:	0.03104
Real Cost:	0.03115
Ratio	0.99660
Est Difference Input	-49
Est Difference Output	-1

de en
Estimated Cost:	0.02834
Real Cost:	0.02845
Ratio	0.99634
Est Difference Input	-48
Est Difference Output	-1

el de
Estimated Cost:	0.03997
Real Cost:	0.04008
Ratio	0.99736
Est Difference Input	-49
Est Difference Output	-1

es en
Estimated Cost:	0.03036
Real Cost:	0.03047
Ratio	0.99659
Est Difference Input	-48
Est Difference Output	-1

fi fr
Estimated Cost:	0.03847
Real Cost:	0.03858
Ratio	0.99725
Est Difference Input	-49
Est Difference Output	-1

Total estimated cost:	0.168192
Total real cost:	0.168718
Ratio	0.9968823717682761


## Post Processing
* This example case was an ideal case, as the number of input and output remained the same. 
    * For DeepL this is likely. 
    * For GPT, this can also go wrong and we may get back malformatted output that we have to align again. 
* This is an ideal case, hence we perform a direct alignment. 
* Code for post-processing can change whenever, **last one committed counts**

In [6]:
from scripts.post_process import direct_triplet_align
from scripts.util import load_sents

for pair in some_pairs:
    s, t = pair
    src_sents, tgt_sents = dm.get_sentence_pairs(s, t, num_of_sents=100)
    mt_sents = load_sents(mt_folder, s, t)
    direct_triplet_align(
        mt_sents=mt_sents,
        ref_sents=tgt_sents,
        src_sents=src_sents,
        src_lang=s,
        ref_lang=t,
        folder_path=mt_folder
    )

## Eval
* The eval code I use requires files in COMET format, i.e. JSONL with each object of format: 
    ```json
    {"mt" : "sent", "ref" : "sent", "src" : "sent"}
    ```
* Locally, we only compute BLEU and chrF scores but we can later uploud these files on Colab and compute COMET and BERT-F1 scores as well.
* Similar to post-processing, code can change whenever, **last one committed counts**

In [7]:
from scripts.scoring import ResultProducer
l2f = {f.replace('.jsonl', ''): join(mt_folder, f) for f in os.listdir(mt_folder) if f.endswith('.jsonl')}
rp = ResultProducer(label2files=l2f)
rp.compute_results()
rp.display_results()

   Label       BLEU       chrF
0  de-en  31.044467  58.292028
1  el-de  28.888493  60.285039
2  en-de  23.492979  56.414147
3  es-en  37.936680  64.139182
4  fi-fr  29.306910  58.440390
