<a href="https://colab.research.google.com/github/jimregan/wav2vec2-sprint/blob/comparison/Irish_comparisons.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Pocketsphinx

The pocketsphinx model doesn't have a real language model (it came from dictionary, so it's single words only, headwords only), so to show it in the best light, I'm using the data from the website for [Fuaimeanna na Gaeilge](http://www.fuaimeanna.ie/en/) *("The Sounds of Irish")*, which has equivalent pronunciation examples. I've put an old scraper for the site [here](https://github.com/jimregan/wav2vec2-sprint/blob/main/irish/fuaimeanna.pl); this writes a .tsv file with the data, and a shell script to use wget to download the sounds. I already have the data, so I'm just uploading it, but during the sprint I wrote a [script](https://github.com/jimregan/wav2vec2-sprint/blob/main/irish/convert-fuaimeanna-csv.pl) to convert the .tsv to a .csv that `datasets` could read more easily.

Setting up is easy:

In [None]:
!apt-get install pocketsphinx

Next, grab the pretrained model:

In [None]:
!wget https://github.com/jimregan/irish-asr-data/releases/download/teanglann-0.1/cmusphinx-ga-teanglann-0.1.zip

In [None]:
!unzip cmusphinx-ga-teanglann-0.1.zip

In [None]:
!unzip fuaimeanna.zip

Pocketsphinx comes from the bad old days before audio libraries were something that could be relied on being present, so the files need to be 16k `.wav`

In [None]:
!for i in fuaimeanna/mp3/*.mp3;do ffmpeg -i "$i" -acodec pcm_s16le -ac 1 -ar 16000 "$i.wav";done

In [None]:
!for i in fuaimeanna/mp3/*.wav; do f=$(echo $i|awk -F/ '{print $NF}');printf "%s\t" $f >> ps-output; pocketsphinx_continuous -infile $i -hmm cmusphinx-ga-teanglann-0.1/ -dict cmusphinx-ga-teanglann-0.1/ga.dic -lm cmusphinx-ga-teanglann-0.1/ga.lm.DMP >> ps-output;done

In [None]:
!pip install jiwer

In [48]:
import csv
def get_lists(filea, fileb="/content/fuaimeanna/all-fuaimeanna-data.tsv"):
  data = dict()
  with open(fileb) as file:
      all = csv.reader(file, delimiter="\t", quotechar=None)
      for row in all:
        if row[0] == 'Orthographic':
          continue
        else:
          file1 = row[1].replace('/sounds/', '')
          data[file1] = row[0]
          file2 = row[3].replace('/sounds/', '')
          data[file2] = row[0]
          file3 = row[5].replace('/sounds/', '')
          data[file3] = row[0]
  merged = list()
  with open(filea) as file:
    ps = csv.reader(file, delimiter="\t", quotechar=None)
    for row in ps:
      if len(row) != 2:
        continue
      filename = row[0].replace('.wav', '')
      add=(row[1],data[filename])
      merged.append(add)
  lista = [a[0] for a in merged]
  listb = [a[1] for a in merged]
  return (lista, listb)

In [49]:
from jiwer import wer
lista, listb = get_lists("ps-output")
result = wer(lista, listb)
'{:.2f}'.format(result)

'0.99'

# DeepSpeech

The DeepSpeech model was trained on an earlier version of common voice, so there was about an hour less audio in the training data. It was meant to replicate the Common Voice paper, so it was trained with transfer learning, using the English model provided by Mozilla.

In [None]:
!pip install deepspeech

In [None]:
!wget https://github.com/jimregan/DeepSpeech/releases/download/0.8.2-ga-test/output_graph_ga.pbmm https://github.com/jimregan/DeepSpeech/releases/download/0.8.2-ga-test/kenlm.scorer

In [None]:
!for i in fuaimeanna/mp3/*.wav;do f=$(echo $i|awk -F/ '{print $NF}'); printf "%s\t" $f >> ds-output; deepspeech --model output_graph_ga.pbmm --scorer kenlm.scorer --audio $i >> ds-output;done

In [50]:
from jiwer import wer
lista, listb = get_lists("ds-output")
result = wer(lista, listb)
'{:.2f}'.format(result)

'7.83'

7.83 looks pretty impressive! But it's a false impression:

In [51]:
!head ds-output

aaineas_i1_s1.mp3.wav	
aaineas_i2_s2.mp3.wav	
aaineas_i3_s3.mp3.wav	
aaine_i1_s1.mp3.wav	
aaine_i2_s2.mp3.wav	
aaine_i3_s3.mp3.wav	
aaisiuuil_i1_s1.mp3.wav	is
aaisiuuil_i2_s2.mp3.wav	is 
aaisiuuil_i3_s3.mp3.wav	is
aa_ndiiol_i1_s1.mp3.wav	ní


In [52]:
!cat ds-output |awk -F'\t' 'BEGIN{c=0}($2==""){c++}END{print "Fields: " NR " With output: " c}'

Fields: 2276 With output: 1968


In [53]:
!cat ds-output |awk -F'\t' '{print $2}'|sort|uniq


a 
ach
ach 
an
an 
ar an 
i 
is
is 
is as
is as 
is í 
ní
ní 
sa
sa 
seans
sin
tá 


# Sprint models

In [None]:
!pip install transformers datasets

In [None]:
!pip install torchaudio

In [11]:
from datasets import load_dataset
fuaimeanna = load_dataset('csv', data_files='fuaimeanna.csv', split='train')

Using custom data configuration default-fe4208b278cc54b2
Reusing dataset csv (/root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0)


[manandey/wav2vec2-large-xlsr-_irish](https://huggingface.co/manandey/wav2vec2-large-xlsr-_irish)

In [17]:
import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re
test_dataset = fuaimeanna
wer = load_metric("wer")
processor = Wav2Vec2Processor.from_pretrained("manandey/wav2vec2-large-xlsr-_irish")
model = Wav2Vec2ForCTC.from_pretrained("manandey/wav2vec2-large-xlsr-_irish")
model.to("cuda")
chars_to_ignore_regex = '[\\,\\?\\.\\!\\-\\;\\:\\"\\“\\%\\‘\\”\\�\\’\\–\\(\\)]'
resampler = torchaudio.transforms.Resample(48_000, 16_000)
# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
    speech_array, sampling_rate = torchaudio.load(batch["path"])
    batch["speech"] = resampler(speech_array).squeeze().numpy()
    return batch
test_dataset = test_dataset.map(speech_file_to_array_fn)
# Preprocessing the datasets.
# We need to read the aduio files as arrays
def evaluate(batch):
    inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
    pred_ids = torch.argmax(logits, dim=-1)
    batch["pred_strings"] = processor.batch_decode(pred_ids)
    return batch
result = test_dataset.map(evaluate, batched=True, batch_size=8)
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))

Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained.


HBox(children=(FloatProgress(value=0.0, max=2276.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))


WER: 104.576752


[cpierse/wav2vec2-large-xlsr-53-irish](https://huggingface.co/cpierse/wav2vec2-large-xlsr-53-irish)

In [29]:
!rm -rf /root/.cache/huggingface/
from datasets import load_dataset
fuaimeanna = load_dataset('csv', data_files='fuaimeanna.csv', split='train')

Using custom data configuration default-fe4208b278cc54b2


Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0. Subsequent calls will reuse this data.


In [30]:
import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re

test_dataset = fuaimeanna
wer = load_metric("wer")

processor = Wav2Vec2Processor.from_pretrained("cpierse/wav2vec2-large-xlsr-53-irish") 
model = Wav2Vec2ForCTC.from_pretrained("cpierse/wav2vec2-large-xlsr-53-irish")
model.to("cuda")

chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“\%\‘\”\�\„\«\(\»\)\’\']' 
resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
   batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
   speech_array, sampling_rate = torchaudio.load(batch["path"])
   batch["speech"] = resampler(speech_array).squeeze().numpy()
   return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def evaluate(batch):
   inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

   with torch.no_grad():
      logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits

   pred_ids = torch.argmax(logits, dim=-1)
   batch["pred_strings"] = processor.batch_decode(pred_ids)
   return batch

result = test_dataset.map(evaluate, batched=True, batch_size=8)

print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1764.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=158.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=313.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=138.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=85.0, style=ProgressStyle(description_w…




Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1557.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1262073239.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, max=2276.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))


WER: 104.414743


[mine](https://huggingface.co/jimregan/wav2vec2-large-xlsr-irish-basic)

In [27]:
!rm -rf /root/.cache/huggingface/
from datasets import load_dataset
fuaimeanna = load_dataset('csv', data_files='fuaimeanna.csv', split='train')

Using custom data configuration default-fe4208b278cc54b2


Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0. Subsequent calls will reuse this data.


In [28]:
import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re
test_dataset = fuaimeanna
wer = load_metric("wer")
processor = Wav2Vec2Processor.from_pretrained("jimregan/wav2vec2-large-xlsr-irish-basic")
model = Wav2Vec2ForCTC.from_pretrained("jimregan/wav2vec2-large-xlsr-irish-basic") 
model.to("cuda")
# So, tolower() for Irish is a bit complicated: tAthar -> t-athair
# toupper() is non-deterministic :)
def is_upper_vowel(letter):
    if letter in ['A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú']:
        return True
    else:
        return False
def irish_lower(word):
    if len(word) > 1 and word[0] in ['n', 't'] and is_upper_vowel(word[1]):
        return word[0] + '-' + word[1:].lower()
    else:
        return word.lower()
def irish_lower_sentence(sentence):
    return " ".join([irish_lower(w) for w in sentence.split(" ")])
chars_to_ignore_regex = '[,\?\.\!\;\:\"\“\%\‘\”\(\)\*]'
def remove_special_characters(sentence):
    tmp = re.sub('’ ', ' ', sentence)
    tmp = re.sub("’", '', tmp)
    tmp = re.sub("’$", '', tmp)
    tmp = re.sub('’', '\'', tmp)
    tmp = re.sub(chars_to_ignore_regex, '', tmp)
    sentence = irish_lower_sentence(tmp) + ' '
    return sentence
resampler = torchaudio.transforms.Resample(48_000, 16_000)
# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    batch["sentence"] = remove_special_characters(batch["sentence"])
    speech_array, sampling_rate = torchaudio.load(batch["path"])
    batch["speech"] = resampler(speech_array).squeeze().numpy()
    return batch
test_dataset = test_dataset.map(speech_file_to_array_fn)
# Preprocessing the datasets.
# We need to read the audio files as arrays
def evaluate(batch):
    inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits    
    pred_ids = torch.argmax(logits, dim=-1)
    batch["pred_strings"] = processor.batch_decode(pred_ids)
    return batch
result = test_dataset.map(evaluate, batched=True, batch_size=8)
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1764.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=158.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=309.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=138.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=85.0, style=ProgressStyle(description_w…




Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1563.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1262073239.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, max=2276.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))


WER: 105.953827


manandey had an earlier model, but seems to have replaced it, rather committing a new revision

[cpierse/wav2vec2-large-xlsr-53-irish](https://huggingface.co/cpierse/wav2vec2-large-xlsr-53-irish), revision 8d6ded1aa00974aab223273d6109bb94f6889f53

In [23]:
!rm -rf /root/.cache/huggingface/

In [25]:
from datasets import load_dataset
fuaimeanna = load_dataset('csv', data_files='fuaimeanna.csv', split='train')

Using custom data configuration default-fe4208b278cc54b2


Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0. Subsequent calls will reuse this data.


In [26]:
import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re

test_dataset = fuaimeanna
wer = load_metric("wer")

processor = Wav2Vec2Processor.from_pretrained("cpierse/wav2vec2-large-xlsr-53-irish", revision='8d6ded1aa00974aab223273d6109bb94f6889f53') 
model = Wav2Vec2ForCTC.from_pretrained("cpierse/wav2vec2-large-xlsr-53-irish", revision='8d6ded1aa00974aab223273d6109bb94f6889f53')
model.to("cuda")

chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“\%\‘\”\�\„\«\(\»\)\’\']' 
resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
   batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
   speech_array, sampling_rate = torchaudio.load(batch["path"])
   batch["speech"] = resampler(speech_array).squeeze().numpy()
   return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def evaluate(batch):
   inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

   with torch.no_grad():
      logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits

   pred_ids = torch.argmax(logits, dim=-1)
   batch["pred_strings"] = processor.batch_decode(pred_ids)
   return batch

result = test_dataset.map(evaluate, batched=True, batch_size=8)

print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))

Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained.


HBox(children=(FloatProgress(value=0.0, max=2276.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))


WER: 109.639530


[mine](https://huggingface.co/jimregan/wav2vec2-large-xlsr-irish-basic), revision cfded2e8b4ce3258977baa5404e6a5cab5928522

In [31]:
!rm -rf /root/.cache/huggingface/
from datasets import load_dataset
fuaimeanna = load_dataset('csv', data_files='fuaimeanna.csv', split='train')

Using custom data configuration default-fe4208b278cc54b2


Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-fe4208b278cc54b2/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0. Subsequent calls will reuse this data.


In [32]:
import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re
test_dataset = fuaimeanna
wer = load_metric("wer")
processor = Wav2Vec2Processor.from_pretrained("jimregan/wav2vec2-large-xlsr-irish-basic", revision='cfded2e8b4ce3258977baa5404e6a5cab5928522')
model = Wav2Vec2ForCTC.from_pretrained("jimregan/wav2vec2-large-xlsr-irish-basic", revision='cfded2e8b4ce3258977baa5404e6a5cab5928522') 
model.to("cuda")
# So, tolower() for Irish is a bit complicated: tAthar -> t-athair
# toupper() is non-deterministic :)
def is_upper_vowel(letter):
    if letter in ['A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú']:
        return True
    else:
        return False
def irish_lower(word):
    if len(word) > 1 and word[0] in ['n', 't'] and is_upper_vowel(word[1]):
        return word[0] + '-' + word[1:].lower()
    else:
        return word.lower()
def irish_lower_sentence(sentence):
    return " ".join([irish_lower(w) for w in sentence.split(" ")])
chars_to_ignore_regex = '[,\?\.\!\;\:\"\“\%\‘\”\(\)\*]'
def remove_special_characters(sentence):
    tmp = re.sub('’ ', ' ', sentence)
    tmp = re.sub("’", '', tmp)
    tmp = re.sub("’$", '', tmp)
    tmp = re.sub('’', '\'', tmp)
    tmp = re.sub(chars_to_ignore_regex, '', tmp)
    sentence = irish_lower_sentence(tmp) + ' '
    return sentence
resampler = torchaudio.transforms.Resample(48_000, 16_000)
# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    batch["sentence"] = remove_special_characters(batch["sentence"])
    speech_array, sampling_rate = torchaudio.load(batch["path"])
    batch["speech"] = resampler(speech_array).squeeze().numpy()
    return batch
test_dataset = test_dataset.map(speech_file_to_array_fn)
# Preprocessing the datasets.
# We need to read the audio files as arrays
def evaluate(batch):
    inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits    
    pred_ids = torch.argmax(logits, dim=-1)
    batch["pred_strings"] = processor.batch_decode(pred_ids)
    return batch
result = test_dataset.map(evaluate, batched=True, batch_size=8)
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1764.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=158.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=320.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=138.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=85.0, style=ProgressStyle(description_w…




Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1558.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1262077335.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, max=2276.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))


WER: 104.657756


In [33]:
print(result["pred_strings"])

['ibreair', 'a raibh', 'a bribh', 'iblain', 'arimh', 'a bholhain', 'i post', 'as aisc', 'a post', 'a tho', 'ta', 'a cheo', 'an bhfuil', 'a dha', 'i bhear', 'a bró', 'a bheail', 'a bhathar', 'i mótar', 'airlár', 'ar nlor', 'an irlábh', 'airnamhla', 'arlabhra', 'an mbothain', 'airlár', 'an raibh', 'a mbeis', 'an leas', 'abhas', 'an bhair', 'an near', 'an bhfor', 'inneair', 'ar', 'an rur', 'an bhfuil', 'a rala', 'arábh', 'i feil', 'is orah', 'a phoral', 'is oras', 'a harait', 'a charaic', 'a chrodh', 'a phoghr', 'a chu', 'a chu', 'a fhear', 'a chair', 'go phlob', 'a fhear', 'a bhaip', 'a thostáir', 'athur scei', 'a thorstái', 'gebal', 'abal', 'eibal', 'a raibh', 'a beith', 'araiph', 'etaim', 'at', 'aice', 'a balla', 'i baira', 'a bara', 'a baille', 'a deir', 'a baramh', 'adar', 'a bol', 'an bo', 'ad', 'ada', 'ad', 'cad aibh', 'eabal', 'adabh', 'apal', 'an', 'arail', 'abi', 'ar', 'agaim', 'an buip', 'ab', 'agat', 'apa', 'ata', 'atá', 'aaé', 'ada', 'adá', 'ala', 'airne', 'airle', 'oraich', 