# 5. Transcription

Transcription is a vital task in language documentation. State-of-the-art transcription models are trained with huge amounts of recorded and transcribed data; however, a low-resource language likely does not have access to sufficient data.

Instead, we will use [Allosaurus](https://github.com/xinjli/allosaurus). Allosaurus is a tool that transcribes audio to any IPA symbol, which we can then filter down to just the IPA sounds in our language.

In [1]:
!pip install allosaurus



We will use the attached audio file `1.wav`. First, let's run Allosaurus without any phonetic inventory to transcribe to the closest IPA symbol.

In [2]:
from allosaurus.app import read_recognizer

test_audio_file = '1.wav'

model = read_recognizer()
model.recognize(test_audio_file)

'w o m a b̞ a t̪ i u m a k͡p̚ e tɕ i e ts ɪ ʌ ŋ tʂ ʌ n x ʌ i tsʰ a ɪ t a ɴ t̪ʰ i n iː uə d uə o ʂ y'

The `recognize` function can also take a language argument. You can list the available languages with the following command.

In [10]:
# !python -m allosaurus.bin.list_lang

In [3]:
# Try with a pretrained language argument
model = read_recognizer()
model.recognize(test_audio_file, 'yue')

'w o m a m a t̪ i u m a k e t̪ʰ i e t ɪ ŋ t a n kʰ œ i tʰ a ɪ t a n t̪ʰ i n i t o y'

We can also provide a custom inventory. To do this, we specify a file with all of the sounds in the language. For instance, we can see the English inventory using the following command.

In [7]:
!python -m allosaurus.bin.write_phone --lang eng --output eng.txt

For instance, we have provided the Uspanteko inventory in the file `usp.txt`. You can create your own inventory if you are using a different language. Now, we can load the inventory into the model.

In [12]:
!python -m allosaurus.bin.update_phone --lang ipa --input eng.txt

Traceback (most recent call last):
  File "/Users/milesper/miniforge3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/milesper/miniforge3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/milesper/miniforge3/lib/python3.10/site-packages/allosaurus/bin/update_phone.py", line 22, in <module>
    assert args.lang != 'ipa', "ipa is not a proper lang to update. use list_lang to find a proper language"
AssertionError: ipa is not a proper lang to update. use list_lang to find a proper language


In [4]:
!python -m allosaurus.bin.list_phone --lang yue

a a̞ e f h i j k kʰ kʷ kʷʰ l l̥ l̪ l̪̥ m m̩ n n̪ o p pʰ r s sʰ t tʰ t̠ t̪ t̪ʰ u w y æ ŋ ŋ̩ œ œ̞ ɐ ɔ ɛ ɪ ɪ̞ ɵ ʃ ʃʰ ʊ ʊ̟ β
