# Fine-tune Whisper for Enenlhet on Google Colab

This notebook takes <https://huggingface.co/blog/fine-tune-whisper#prepare-environment> as a point of departure. Many changes had to be made because of the idiosyncrasies of this dataset.

It should be noted at the outset that Enenlhet is a low-resource, endangered language, so it doesn't have any pre-built tokenizers or any other items that Whisper models require.

I'm going to try here to make a notebook that will successfully fine-tune a Whisper model. To do that, I need to:

1. Set up the environment properly
    a. Install packages
    b. Make sure we're using the GPU
    c. Create directories
2. Prepare the dataset
3. Download configure the model
4. Write a custom data collator
5. Set up a `compute_metrics` function to use Word Error Rate to measure the model's performance
6. Set up and initialize a trainer with arguments optimized for GPU
7. Train the model
8. Save the output
9. Evaluate the model

## Set up the environment

### Install packages

Several packages are not installed by default on Google Colab, so they must be added.

In [1]:
# Install necessary packages
!pip install transformers datasets accelerate evaluate huggingface_hub codecarbon jiwer --quiet
# If you don't update datasets and fsspec, there will be errors when you load the dataset.
!pip install -U datasets fsspec --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/277.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m277.7/277.7 kB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/278.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.0/278.0 kB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m83.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Import the necessary libraries

Most of the libraries I'll use are from the Hugging Face API:

- `datasets`: for importing and providing datasets to the model
- `evaluate`: standardizes the way evaluation metrics are implemented
- `transformers`: contains the Whisper libraries

I will also make use of the following individual Python libraries

- `dataclasses`: for defining the data collator as a class
- `numpy`: for handling some of the mathematical operations
- `os`: for interacting with the operating system
- `random`: for generating a random seed (if I decide to do that)
- `torch`: standard library for machine learning functionality
- `typing`: necessary for the data collator
- `tqdm`: for showing a progress meter

In [2]:
# Import necessary libraries
#from codecarbon import EmissionsTracker

# Hugging Face API
from datasets import (
    Audio,
    Dataset,
    DatasetDict,
    load_dataset
)
import evaluate
from transformers import (
    WhisperProcessor,
    WhisperForConditionalGeneration,
    WhisperFeatureExtractor,
    WhisperTokenizer
)
from transformers.trainer_callback import EarlyStoppingCallback
from transformers.trainer_seq2seq import Seq2SeqTrainer
from transformers.training_args_seq2seq import Seq2SeqTrainingArguments

# Python Libraries
from dataclasses import dataclass
import numpy as np
import os
import random
import torch
from typing import Any, Dict, List, Union
import tqdm

### Make sure that we're using a GPU

In [3]:
# Check if GPU is available
if not torch.cuda.is_available():
    raise RuntimeError("GPU is not available. Please enable GPU in 'Runtime > Change runtime type'.")
else:
    print("GPU is available:", torch.cuda.get_device_name(0))

GPU is available: NVIDIA L4


### Create directories

In [4]:
# Create necessary directories
output_dir = "./enenlhet-whisper-model"
log_dir = "./logs"
dataset_dir = "./enenlhet-dataset"

# If directories do not exist, create them
os.makedirs(output_dir, exist_ok=True)
os.makedirs(log_dir, exist_ok=True)
os.makedirs(dataset_dir, exist_ok=True)

### Download the dataset

In [5]:
dataset = load_dataset("enenlhet-asr/enenlhet-whisper-dataset")

README.md:   0%|          | 0.00/1.31k [00:00<?, ?B/s]

train/data-00000-of-00006.arrow:   0%|          | 0.00/489M [00:00<?, ?B/s]

train/data-00001-of-00006.arrow:   0%|          | 0.00/489M [00:00<?, ?B/s]

train/data-00002-of-00006.arrow:   0%|          | 0.00/489M [00:00<?, ?B/s]

train/data-00003-of-00006.arrow:   0%|          | 0.00/489M [00:00<?, ?B/s]

train/data-00004-of-00006.arrow:   0%|          | 0.00/489M [00:00<?, ?B/s]

train/data-00005-of-00006.arrow:   0%|          | 0.00/488M [00:00<?, ?B/s]

test/data-00000-of-00001.arrow:   0%|          | 0.00/163M [00:00<?, ?B/s]

validation/data-00000-of-00001.arrow:   0%|          | 0.00/163M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

## Set up the model

This needs some explanation, since I have had to implement a lot of custom configurations just to get Whisper to train. Indeed, this step was so difficult to get right that I resorted to three different LLM's to deal with all the errors that arose. Ultimately, Claude Sonnet 4 was able to provide a clean function that would configure the model correctly.

The `setup_whisper_model()` function attempts to set some options of the `whisper-small` model and alter others. First, it loads the model and the model's processor. Second, it sets the language, which I've set to Spanish for now, since Enenlhet is not an option, and Whisper demands that a language be set. If I leave the selection to Whisper's auto-detection, I'll end up with a lot of poor attempts to force the inputs into a language. Third, the function sets the `pad_token` and the `eos_token` (eos = end-of-sentence) to values different from each other. Fourth, the function adjusts the configuration for the model's generation side. Fifth, the function cleans up some deprecated settings. Finally, the function moves the model to the GPU (if one is available) and verifies the settings.

When I call the function, I set the `model_name`, `target_language`, and `max_length`. I figured out the max_length for my dataset by doing this:

```
label_lengths = []
for sample in dataset["train"]:
    if 'labels' in sample:
        label_lengths.append(len(sample['labels']))

# Calculate statistics
lengths = np.array(label_lengths)
print(f"Min length: {lengths.min()}")
print(f"Max length: {lengths.max()}")
print(f"Mean length: {lengths.mean():.1f}")
print(f"Median length: {np.median(lengths):.1f}")
print(f"95th percentile: {np.percentile(lengths, 95):.1f}")
print(f"99th percentile: {np.percentile(lengths, 99):.1f}")
```

In [6]:
# Clean Whisper Model Loading and Configuration
def setup_whisper_model(model_name="openai/whisper-small", target_language="es", max_length=60):
    """
    Load and configure Whisper model for fine-tuning.

    Args:
        model_name: Hugging Face model identifier
        target_language: Language code (e.g., "es" for Spanish)
        max_length: Maximum generation length based on your data analysis

    Returns:
        model, processor: Configured model and processor
    """
    print(f"Loading model: {model_name}")

    # 1. Load model and processor
    model = WhisperForConditionalGeneration.from_pretrained(model_name)
    processor = WhisperProcessor.from_pretrained(model_name, language="es", task="transcribe")

    # 2. Set language and task (if you want to fix to Spanish transcription)
    # For low-resource languages, you might want to comment this out and use auto-detection
    if target_language:
        processor.tokenizer.set_prefix_tokens([f"<|{target_language}|>", "<|transcribe|>"])
        print(f"Set language to: {target_language}")
    else:
        print("Using auto-detection for language")

    # 3. Fix pad token issue (pad token != eos token)
    original_pad = processor.tokenizer.pad_token_id
    original_eos = processor.tokenizer.eos_token_id

    processor.tokenizer.pad_token = processor.tokenizer.unk_token
    model.config.pad_token_id = processor.tokenizer.pad_token_id

    print(f"Pad token changed: {original_pad} -> {processor.tokenizer.pad_token_id}")
    print(f"EOS token: {original_eos}")
    print(f"Pad != EOS: {processor.tokenizer.pad_token_id != processor.tokenizer.eos_token_id}")

    # 4. Configure generation settings
    model.generation_config.max_length = max_length
    model.generation_config.pad_token_id = processor.tokenizer.pad_token_id
    model.generation_config.eos_token_id = processor.tokenizer.eos_token_id
    model.generation_config.forced_decoder_ids = None
    model.generation_config.suppress_tokens = []
    model.generation_config.begin_suppress_tokens = []
    model.generation_config.do_sample = False
    model.generation_config.num_beams = 1

    # Set to Spanish (es) for now. Consider using auto-detect.
    model.generation_config.language = "es"
    model.generation_config.task = "transcribe"

    # 5. Clean up deprecated config attributes
    deprecated_attrs = ['max_length', 'suppress_tokens', 'begin_suppress_tokens', 'forced_decoder_ids']
    for attr in deprecated_attrs:
        if hasattr(model.config, attr):
            delattr(model.config, attr)
            print(f"Removed deprecated config.{attr}")

    # 6. Move to GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    print(f"Model moved to: {device}")

    # 7. Verify configuration
    print("\n=== Configuration Summary ===")
    print(f"Max generation length: {model.generation_config.max_length}")
    print(f"Language: {model.generation_config.language or 'auto-detect'}")
    print(f"Task: {model.generation_config.task}")
    print(f"Pad token ID: {processor.tokenizer.pad_token_id}")
    print(f"EOS token ID: {processor.tokenizer.eos_token_id}")
    print(f"Has Whisper task_to_id: {hasattr(model.generation_config, 'task_to_id')}")
    print("Configuration complete!")

    return model, processor

# Usage
model, processor = setup_whisper_model(
    model_name="openai/whisper-small",
    target_language="es",
    max_length=60
)

Loading model: openai/whisper-small


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/967M [00:00<?, ?B/s]

generation_config.json: 0.00B [00:00, ?B/s]

preprocessor_config.json: 0.00B [00:00, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

normalizer.json: 0.00B [00:00, ?B/s]

added_tokens.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Set language to: es
Pad token changed: 50257 -> 50257
EOS token: 50257
Pad != EOS: False
Removed deprecated config.max_length
Removed deprecated config.suppress_tokens
Removed deprecated config.begin_suppress_tokens
Removed deprecated config.forced_decoder_ids
Model moved to: cuda

=== Configuration Summary ===
Max generation length: 60
Language: es
Task: transcribe
Pad token ID: 50257
EOS token ID: 50257
Has Whisper task_to_id: True
Configuration complete!


### Create the data collator

A data collator takes elements from the prepared datasets and creates batches for passing to the model. It also applies extra processing steps, like padding and masking here, to ensure that all the inputs are the same length.

Note that `input_features` and `label_features` correspond to "audio" and text, respectively, in the original dataset.

Note, too, that `-100` is a special value that PyTorch loss functions will ignore. The data collator replaces the padding token with `-100` so that the padding will be ignored. Otherwise, the model would try to interpret the padding token, which is meaningless.

In [7]:
# Define the custom DataCollator class

@dataclass
class DataCollatorForWhisper:
    processor: Any
    padding: Union[bool, str] = "longest"

    def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, Any]:
        # Separate inputs and labels
        input_features = [{"input_features": f["input_features"]} for f in features]
        label_features = [{"input_ids": f["labels"]} for f in features]

        # Collate audio features
        batch = self.processor.feature_extractor.pad(
          input_features,
          padding=self.padding,
          return_tensors="pt"
        )

        # Collate labels (token IDs)
        labels_batch = self.processor.tokenizer.pad(
            label_features,
            padding=self.padding,
            return_tensors="pt"
        )

        # Replace padding token ID by -100 to ignore in loss
        labels = labels_batch["input_ids"].masked_fill(
            labels_batch["input_ids"] == self.processor.tokenizer.pad_token_id, -100
        )

        batch["labels"] = labels
        return batch


# Initialize the data collator
data_collator = DataCollatorForWhisper(processor=processor, padding="longest")

### Define a custom `compute_metrics() function

The best metric for an ASR model is Word Error Rate (WER), so the `compute_metrics()` function must focus on that.

In [8]:
# Evaluation metric
wer_metric = evaluate.load("wer")

def compute_metrics(pred):
    # Unpack logits from tuple
    if isinstance(pred.predictions, tuple):
        logits = pred.predictions[0]
    else:
        logits = pred.predictions

    pred_ids = torch.argmax(torch.tensor(logits), dim=-1)

    # Decode predictions
    pred_str = processor.tokenizer.batch_decode(pred_ids, skip_special_tokens=True)

    # Handle labels
    label_ids = pred.label_ids
    label_ids = [
        [token if token != -100 else processor.tokenizer.pad_token_id for token in label]
        for label in label_ids
    ]
    label_str = processor.tokenizer.batch_decode(label_ids, skip_special_tokens=True)

    return {"wer": wer_metric.compute(predictions=pred_str, references=label_str)}

Downloading builder script: 0.00B [00:00, ?B/s]

### Define the model's settings

The `training_args` variable holds many important settings that affect the outcome of the fine-tuning. The settings here are optimized for use when fine-tuning on a GPU.

I'm going to explain the settings, even though they are [documented on Hugging Face](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments), since I want to be sure that I understand them. 🤓

- `run_name`: This is just a way to keep track of the training runs.
- `output_dir`: This is where the model's files will be stored. The output directory was defined earlier in this notebook.
- `per_device_train_batch_size`: The number of input and label pairs included per batch sent to the device (GPU) for training.
- `gradient_accumulation_steps`: The trainer will perform a backward pass after two steps. The backward pass is part of the learning process, where the model makes adjustments based on what it has learned to that point. This setting is inversely proportional to `per_device_train_batch_size`: increase by 2x for every 2x decrease in batch size.
- `per_device_eval_batch_size`: The number of input and label pairs included per batch sent to the device (GPU) for evaluation.
- `learning_rate`: This is the, well, learning rate for the optimizer. I have selected `1.25e-5` (i.e., 1.25 x 10 − 5) as suggested at <https://github.com/vasistalodagala/whisper-finetune>
- `predict_with_generate`: This is set to `True` because the model generates predictions for transcriptions.
- `warmup_steps`: The warmup process helps to avoid any overfitting in the early stages of training, before the model has had a chance to learn.
- `fp16`: This is a performance boost. It tells the trainer to use 16-bit floating point numbers instead of the default 32-bit, which take longer to calculate.
- `eval_strategy`: The evaluation (WER) will be performed after each epoch.
- `num_train_epochs`: I have set the total number of epochs to 30 so that the model has a chance to learn. The early stopping callback (see below) will probably make sure that it never reaches that limit.
- `save_strategy`: Set to the same as `eval_strategy`.
- `save_total_limit`: I'm setting this to be the same as `num_train_epochs` so that `load_best_model_at_end` will have the full range of checkpoints from which to select.
- `logging_dir`: This is where the logs will be saved. I set it earlier in the notebook.
- `logging_steps`: How often information will be logged
- `report_to`: This will send the log data to Tensorboard, which is a nice way of visualizing the information.
- `load_best_model_at_end`: This ensures that only the best model is loaded for saving.
- `metric_for_best_model`: Defines Word Error Rate as the metric for determining the best model.
- `greater_is_better`: This is set to false because a lower WER is better.
- `hub_model_id`: Identifies the model repo on Hugging Face Hub.
- ` hub_strategy`: Set to `end` to push to the hub when the trainer has finished.
- `push_to_hub`: Pushes the model to the hub at the end of the training.

I have used epochs instead of steps to allow the Hugging Face API calculate the number of steps per epoch.

In [9]:
training_args = Seq2SeqTrainingArguments(
    run_name="enenlhet-whisper-model",
    output_dir=output_dir,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=2,
    learning_rate=1.25e-5,
    predict_with_generate=False,
    generation_max_length=80,
    warmup_steps=500,
    fp16=True,
    eval_strategy="epoch",
    num_train_epochs=15,
    save_strategy="epoch",
    save_total_limit=15,
    logging_dir=log_dir,
    logging_steps=10,
    report_to="tensorboard",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    hub_model_id="enenlhet-asr/enenlhet-whisper-model",
    hub_strategy="end",
    push_to_hub=True,
)

### Initialize the trainer

The `trainer` gets some additional settings here, including the splits to use for training and testing. Note that most of the settings just point back to previously defined variables. The new part is the implementation of `EarlyStoppingCallback`, which monitors the WER and halts the training after n epochs without a certain amount of improvement.

In [10]:
# Initialize the trainer
trainer = Seq2SeqTrainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    processing_class=processor.tokenizer,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3, early_stopping_threshold=0.1)]
)

## Train the model

In [11]:
train_result = trainer.train()

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Epoch,Training Loss,Validation Loss,Wer
1,4.5862,4.126627,1.473272
2,2.1185,2.034356,1.810952
3,1.3809,1.563745,1.767927
4,1.0208,1.298294,1.794003
5,0.5986,1.159142,1.520209
6,0.2743,0.97849,2.185137
7,0.1248,0.981134,2.087353
8,0.0572,1.003611,1.865711
9,0.0223,1.028683,1.9309


There were missing keys in the checkpoint model loaded: ['proj_out.weight'].


In [12]:
trainer.save_model()  # Save the final model

Uploading...:   0%|          | 0.00/967M [00:00<?, ?B/s]

In [16]:
import pandas as pd

df = pd.DataFrame(trainer.state.log_history)
print(df[["epoch", "eval_loss", "loss"]].dropna())

Empty DataFrame
Columns: [epoch, eval_loss, loss]
Index: []


## Evaluate

In [13]:
model.eval().to("cuda" if torch.cuda.is_available() else "cpu")

pred_strs = []
label_strs = []

forced_decoder_ids = processor.get_decoder_prompt_ids(language="es", task="transcribe")

for example in tqdm.tqdm(dataset["test"]):
    # Already pre-extracted features → convert to tensor and batch
    input_features = torch.tensor(example["input_features"]).unsqueeze(0).to(model.device)

    with torch.no_grad():
        predicted_ids = model.generate(
            input_features,
            forced_decoder_ids=forced_decoder_ids
        )

    pred_str = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]

    # Decode labels
    label_ids = example["labels"]
    label_ids = [token if token != -100 else processor.tokenizer.pad_token_id for token in label_ids]
    label_str = processor.tokenizer.decode(label_ids, skip_special_tokens=True)

    pred_strs.append(pred_str)
    label_strs.append(label_str)

  0%|          | 0/170 [00:00<?, ?it/s]The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
100%|██████████| 170/170 [01:44<00:00,  1.63it/s]


In [14]:
wer = wer_metric.compute(predictions=pred_strs, references=label_strs)
print(f"Test WER: {wer:.4f}")

Test WER: 1.2486


In [15]:
def show_prediction_mismatches(pred_strs, label_strs, max_examples=10):
    mismatches = []

    for pred, ref in zip(pred_strs, label_strs):
        pred_clean = pred.strip()
        ref_clean = ref.strip()
        if pred_clean != ref_clean:
            mismatches.append((ref_clean, pred_clean))

    print(f"\nShowing {min(len(mismatches), max_examples)} mismatches out of {len(label_strs)} total samples:\n")

    for i, (ref, pred) in enumerate(mismatches[:max_examples]):
        print(f"[{i+1}]")
        print(f"REF : {ref}")
        print(f"PRED: {pred}")
        print("-" * 40)

    if len(mismatches) == 0:
        print("✅ No mismatches found! (unlikely if WER > 0)")

show_prediction_mismatches(pred_strs, label_strs, max_examples=10)



Showing 10 mismatches out of 170 total samples:

[1]
REF : nempai'akha mamma
PRED: nempaiakha mamma'a
----------------------------------------
[2]
REF : mo'ok alhta ngke kvai'o nak
PRED: mo'ok alhta ngkekvai'o nak'a
----------------------------------------
[3]
REF : kanhan entengiai'anhan peia'
PRED: canha nentengiai'anha peia'a.
----------------------------------------
[4]
REF : sekla ktemo nak kelvesai'a konalhma ie
PRED: se acla a la ptema, ptema na pkelvesa konalhmaiehe'
----------------------------------------
[5]
REF : netamen kelpaqmetek amelhanhan ma'a
PRED: netamen, kelpaqmetekamelhamanma'a,
----------------------------------------
[6]
REF : akiamasma aktoma niekhe' ngkelvana koka'
PRED: akiam asmaktoma niekhe ngkelvana koka'a
----------------------------------------
[7]
REF : ngvai'a lhta niekhe nemmaskema m'a nenekev'
PRED: y altengakha nemmaskema'a Nenekev'
----------------------------------------
[8]
REF : sekhek nak ma'a
PRED: alhano, sequehek nak ma'a'a'a'a'a'a'a'a'a'a'

First complete run:

```
[1]
REF : nempai'akha mamma
PRED: nengpaiakha mamma'a
----------------------------------------
[2]
REF : mo'ok alhta ngke kvai'o nak
PRED: もかしたんけ コアイオナ
----------------------------------------
[3]
REF : kanhan entengiai'anhan peia'
PRED: kanhan nentengiai'a i'a nha peia'a
----------------------------------------
[4]
REF : sekla ktemo nak kelvesai'a konalhma ie
PRED: Секла, тема, тема на кервеса, кунахма, ехе
----------------------------------------
[5]
REF : netamen kelpaqmetek amelhanhan ma'a
PRED: "نتام" "كل" "بقمتek" "a mellamhan ma'a"
----------------------------------------
[6]
REF : akiamasma aktoma niekhe' ngkelvana koka'
PRED: akiamasma'a tomaniek hengkelvana koka'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'
----------------------------------------
[7]
REF : ngvai'a lhta niekhe nemmaskema m'a nenekev'
PRED: アルタニアクhene マスケマ ア ネネクト
----------------------------------------
[8]
REF : sekhek nak ma'a
PRED: alha m'o sekhek nak ma'a
----------------------------------------
[9]
REF : selha apvaneiam kelha
PRED: Tiemme, c'est l'aie à la vanneiam pqlha'aie
----------------------------------------
[10]
REF : iamalheng alhta ngkolhek akto
PRED: namalhega lhta ngko lha ektok'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'
----------------------------------------
```

Not much improvement during the second round:

```
[1]
REF : nempai'akha mamma
PRED: Nempaiakha mamma'a
----------------------------------------
[2]
REF : mo'ok alhta ngke kvai'o nak
PRED: no vocalta, no que cuaiona
----------------------------------------
[3]
REF : kanhan entengiai'anhan peia'
PRED: ganhan en teniai anhapeia'a.
----------------------------------------
[4]
REF : sekla ktemo nak kelvesai'a konalhma ie
PRED: se aclaro que no, que no se acuerves a connalma ieh
----------------------------------------
[5]
REF : netamen kelpaqmetek amelhanhan ma'a
PRED: netamen kelpaqmetekamelha mehanma'a
----------------------------------------
[6]
REF : akiamasma aktoma niekhe' ngkelvana koka'
PRED: aquí amasma tomaniek hengkelvana koka
----------------------------------------
[7]
REF : ngvai'a lhta niekhe nemmaskema m'a nenekev'
PRED: y altañekha nemmaskema'a nenekem'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a'a
----------------------------------------
[8]
REF : sekhek nak ma'a
PRED: alhama se quehek nak ma'a
----------------------------------------
[9]
REF : selha apvaneiam kelha
PRED: tiempiempsela acuaneiamkela'a
----------------------------------------
[10]
REF : iamalheng alhta ngkolhek akto
PRED: namalhegaltangko lhektong'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'o'
----------------------------------------
```