# 04b — Fine-tuning with Back-Translated Data

Repeat the fine-tuning process from `03_finetune.ipynb`,  
but uses the merged dataset `train_plus_bt.tsv`, which combines:
- Original parallel data (`train.tsv`)
- Synthetic back-translated pairs generated in `04_pivot_bt_eval.ipynb`

Goal: Assess whether data augmentation via back-translation improves BLEU and chrF2 scores.

### Fine-tuning with Back-Translated Data

This command runs the same `finetune.py` training script, but this time uses the augmented dataset that includes synthetic pairs generated through back-translation (BT).

In [1]:
!python ../src/train/finetune.py \
  --train ../data/processed/train_plus_bt.tsv \
  --dev ../data/processed/dev.tsv \
  --out ../experiments/finetune_bt

Resolved codes → src: 'tgl_Latn' as 'tgl_Latn' (id=256174) | tgt: 'ceb_Latn' as 'ceb_Latn' (id=256035)
Train samples: 42,851 | Dev samples: 2,930
{'loss': 2.3216, 'grad_norm': 3.421875, 'learning_rate': 1.9908530894157178e-05, 'epoch': 0.01}
{'loss': 2.2993, 'grad_norm': 3.578125, 'learning_rate': 1.9815195071868583e-05, 'epoch': 0.02}
{'loss': 2.1626, 'grad_norm': 3.5625, 'learning_rate': 1.9721859249579992e-05, 'epoch': 0.03}
{'loss': 2.1358, 'grad_norm': 3.546875, 'learning_rate': 1.9628523427291398e-05, 'epoch': 0.04}
{'loss': 2.1457, 'grad_norm': 3.4375, 'learning_rate': 1.95351876050028e-05, 'epoch': 0.05}
{'loss': 2.1357, 'grad_norm': 3.25, 'learning_rate': 1.944185178271421e-05, 'epoch': 0.06}
{'loss': 2.1019, 'grad_norm': 3.171875, 'learning_rate': 1.9348515960425614e-05, 'epoch': 0.07}
{'loss': 2.0749, 'grad_norm': 3.140625, 'learning_rate': 1.925518013813702e-05, 'epoch': 0.07}
{'loss': 2.0775, 'grad_norm': 3.109375, 'learning_rate': 1.9161844315848422e-05, 'epoch': 0.08}
{'

'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: d8d281d7-11ef-4a17-b341-47eeb5cd8b5f)')' thrown while requesting HEAD https://huggingface.co/facebook/nllb-200-distilled-600M/resolve/main/tokenizer_config.json
Retrying in 1s [Retry 1/5].
'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 62b9f4d7-8c35-451a-8c30-4226f6e321da)')' thrown while requesting HEAD https://huggingface.co/facebook/nllb-200-distilled-600M/resolve/main/tokenizer_config.json
Retrying in 2s [Retry 2/5].
'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 4173a0cc-0f55-46a8-87ee-1242e761f9f8)')' thrown while requesting HEAD https://huggingface.co/facebook/nllb-200-distilled-600M/resolve/main/tokenizer_config.json
Retrying in 4s [Retry 3/5].
'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.

Similar to the fine-tuning processing in `03_finetune.ipynb`, this cell:
- Loads the **combined training corpus**:
  - `../data/processed/train_plus_bt.tsv` → includes both real parallel data and synthetic (back-translated) examples.  
  - `../data/processed/dev.tsv` → validation set for monitoring performance.
- Fine-tunes the multilingual **NLLB-200 distilled 600M** model using the same hyperparameters and language codes (`tgl_Latn → ceb_Latn`).
- The additional synthetic Tagalog–Cebuano examples help the model generalize better to unseen patterns and reduce overfitting to limited real data.

This run allows comparison between:
- The **standard fine-tuning** (real parallel data only), and  
- The **BT-augmented fine-tuning**, which includes synthetic examples to improve translation quality and domain robustness.

### Evaluate the fine-tuned + back-translation model on the test set

Runs the evaluation script on the model trained with **train_plus_bt.tsv** (real + synthetic pairs). It translates the test set and computes **BLEU** and **chrF2** using SacreBLEU.

In [2]:
!python ../src/eval/evaluate.py \
  --model_dir ../experiments/finetune_bt \
  --test_tsv ../data/processed/test.tsv \
  --out_json ../experiments/finetune_bt/metrics.json \
  --save_hyp ../experiments/finetune_bt/hyp.txt

{
  "BLEU": 27.17,
  "chrF2": 49.26,
  "ref_len": 109569,
  "sys_len": 87749,
  "sacrebleu_version": "2.5.1",
  "n_samples": 2750,
  "model_dir": "../experiments/finetune_bt",
  "codes": {
    "src": "tgl_Latn",
    "tgt": "ceb_Latn"
  },
  "decoding": {
    "beams": 5,
    "max_new_tokens": 200,
    "batch_size": 16
  }
}


`torch_dtype` is deprecated! Use `dtype` instead!


Translating:   1%|          | 1/172 [00:02<08:24,  2.95s/it]
Translating:   1%|          | 2/172 [00:05<08:20,  2.94s/it]
Translating:   2%|▏         | 3/172 [00:08<07:49,  2.78s/it]
Translating:   2%|▏         | 4/172 [00:11<08:09,  2.92s/it]
Translating:   3%|▎         | 5/172 [00:13<07:24,  2.66s/it]
Translating:   3%|▎         | 6/172 [00:17<08:12,  2.97s/it]
Translating:   4%|▍         | 7/172 [00:20<07:56,  2.89s/it]
Translating:   5%|▍         | 8/172 [00:23<08:15,  3.02s/it]
Translating:   5%|▌         | 9/172 [00:25<07:50,  2.88s/it]
Translating:   6%|▌         | 10/172 [00:29<08:00,  2.97s/it]
Translating:   6%|▋         | 11/172 [00:31<07:37,  2.84s/it]
Translating:   7%|▋         | 12/172 [00:35<08:37,  3.23s/it]
Translating:   8%|▊         | 13/172 [00:38<07:56,  3.00s/it]
Translating:   8%|▊         | 14/172 [00:40<07:37,  2.90s/it]
Translating:   9%|▊         | 15/172 [00:43<07:05,  2.71s/it]
Translating:   9%|▉        