# Welcome to ExKaldi

In this section, we will further process the Kaldi decoding lattice and score the results.

In [1]:
import os
dataDir = "librispeech_dummy"

os.environ["LD_LIBRARY_PATH"] = "/home/khanh/workspace/miniconda3/envs/kaldi/lib/;/home/khanh/workspace/miniconda3/envs/test/lib/"

import exkaldi
exkaldi.info.reset_kaldi_root("/home/khanh/workspace/projects/kaldi")

exkaldi.info.reset_kaldi_root( yourPath )
If not, ERROR will occur when implementing some core functions.


Load the lattice file (generated in 09_decode_back_HMM-GMM_and_WFST).

In [2]:
latFile = os.path.join(dataDir, "exp", "train_delta", "decode_test", "test.lat")

lat = exkaldi.decode.wfst.load_lat(latFile)

lat

<exkaldi.decode.wfst.Lattice at 0x7f27004f34f0>

To be simple and straightforward, we get the 1-best result from lattice. Word-id table and HMM model are necessary.

Word-ID table can be __words.txt__ file (If decoded in word level) or __phones.txt__ file (If decoded in phone level) or Exkaldi __ListTable__ object.  

Ideally, __LexiconBank__ object is also avaliable because you can get both "words" and "phones" from it.

In [3]:
wordsFile = os.path.join(dataDir, "exp", "words.txt")

hmmFile = os.path.join(dataDir, "exp", "train_delta", "final.mdl")

In [4]:
result = lat.get_1best(symbolTable=wordsFile, hmm=hmmFile, lmwt=1, acwt=0.5)

result.subset(nHead=1)

{'1272-128104-0000': '1376 1308 439 883 1068 1091 215 871 629 4 534 685 1023 1167 653 4 575 1391 4 1390 1409 180 585 261 38'}

___result___ is a exkaldi __Transcription__ object.

The decoding result is int-ID format. If you want it by text-format, try this:

In [5]:
textResult = exkaldi.hmm.transcription_from_int(result, wordsFile)

textResult.subset(nHead=1)

{'1272-128104-0000': 'WAS TOOK FAULT OR ROOM SCENE CLASSED OLD IN A GOT LA RED SO IS A HER WERE A WENT WILL CAN HIS COST ALL'}

Just for convenience, we restorage lexicons.

In [6]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.load_lex(lexFile)

In [7]:
del textResult

Besides the __transcription_from_int__ function, we can transform transcription by using the __Transcription__'s own method, like this:

In [8]:
word2id = lexicons("words")
oovID = word2id[lexicons("oov")]
id2word = word2id.reverse()

textResult = result.convert(symbolTable=id2word, unkSymbol=oovID)

textResult.subset(nHead=1)

{'1272-128104-0000': 'WAS TOOK FAULT OR ROOM SCENE CLASSED OLD IN A GOT LA RED SO IS A HER WERE A WENT WILL CAN HIS COST ALL'}

In [9]:
del result

Now we can score the decoding result. Typically, you can compute the WER(word err rate).

In [10]:
refFile = os.path.join(dataDir, "test", "text")

score = exkaldi.decode.score.wer(ref=refFile, hyp=textResult, mode="present")

score

Score(WER=137.95, words=419, insErr=238, delErr=5, subErr=335, SER=100.0, sentences=20, wrongSentences=20, missedSentences=0)

Or some times, compute the edit distance score. 

In [11]:
score = exkaldi.decode.score.edit_distance(ref=refFile, hyp=textResult, mode="present")

score

Score(editDistance=1384, words=1866, SER=1.0, sentences=20, wrongSentences=20, missedSentences=0)

Then compute the accuracy of words levels.

In [12]:
1 - score.editDistance/score.words

0.25830653804930337

We tested this and only get the WER 134.37, and the accuracy rate of words is 27.6%.

We support further process the lattice, for example, to add penalty or to scale it.

Here is a example to config different language model weight(LMWT) and penalty. (In Instead of text-format result, we use int-format reference file.)

In [13]:
refInt = exkaldi.hmm.transcription_to_int(refFile, lexicons("words"), unkSymbol=lexicons("oov"))
refIntFile = os.path.join(dataDir, "exp", "train_delta", "decode_test", "text.int")
refInt.save(refIntFile)

refInt.subset(nHead=1)

{'1272-128104-0000': '801 1000 653 1268 64 865 1268 789 216 53 1381 75 526 1304 1387 585 533'}

In [14]:
for penalty in [0., 0.5, 1.0]:
    for LMWT in range(10, 15):
        
        newLat = lat.add_penalty(penalty)
        result = newLat.get_1best(lexicons("words"), hmmFile, lmwt=LMWT, acwt=0.5)

        score = exkaldi.decode.score.wer(ref=refInt, hyp=result, mode="present")
        
        print(f"Penalty {penalty}, LMWT {LMWT}: WER {score.WER}")

Penalty 0.0, LMWT 10: WER 135.08
Penalty 0.0, LMWT 11: WER 135.08
Penalty 0.0, LMWT 12: WER 134.84
Penalty 0.0, LMWT 13: WER 134.84
Penalty 0.0, LMWT 14: WER 134.84
Penalty 0.5, LMWT 10: WER 134.61
Penalty 0.5, LMWT 11: WER 134.61
Penalty 0.5, LMWT 12: WER 134.61
Penalty 0.5, LMWT 13: WER 134.61
Penalty 0.5, LMWT 14: WER 134.61
Penalty 1.0, LMWT 10: WER 134.61
Penalty 1.0, LMWT 11: WER 134.13
Penalty 1.0, LMWT 12: WER 134.13
Penalty 1.0, LMWT 13: WER 134.13
Penalty 1.0, LMWT 14: WER 134.13


From the lattice, you can get the phone-level result.

In [15]:
phoneResult = lat.get_1best(lexicons("phones"), hmmFile, lmwt=1, acwt=0.5, phoneLevel=True)

phoneResult = exkaldi.hmm.transcription_from_int(phoneResult, lexicons("phones"))

phoneResult.subset(nHead=1)

{'1272-128104-0000': '<SIL> W AH0 Z T UH1 K F AO1 L T ER0 R UW1 M S IY1 N K L AE1 S T OW1 L D IH0 N AH0 G AA1 T L AA1 R EH1 D S OW1 IH0 Z <SIL> AH0 HH ER0 W ER0 AH0 W EH1 N T W AH0 L K AH0 N HH IH0 Z K AA1 S T AO1 L <SIL>'}

From lattice, N-Best results can also be extracted.

In [16]:
result = lat.get_nbest(
                        n=3,
                        symbolTable=lexicons("words"),
                        hmm=hmmFile, 
                        acwt=0.5, 
                        phoneLevel=False,
                        requireCost=False,
                )

for re in result:
    print(re.name, type(re))

1-best <class 'exkaldi.core.archive.Transcription'>
2-best <class 'exkaldi.core.archive.Transcription'>
3-best <class 'exkaldi.core.archive.Transcription'>


___result___ is a list of N-bests __Transcription__ objects. If ___requireCost___ is True, return the LM score and AM score sumultaneously.

In [17]:
result = lat.get_nbest(
                        n=3,
                        symbolTable=lexicons("words"),
                        hmm=hmmFile, 
                        acwt=0.5, 
                        phoneLevel=False,
                        requireCost=True,
                )

for re in result[0]:
    print(re.name, type(re))
    
for re in result[1]:
    print(re.name, type(re))

for re in result[2]:
    print(re.name, type(re))

1-best <class 'exkaldi.core.archive.Transcription'>
2-best <class 'exkaldi.core.archive.Transcription'>
3-best <class 'exkaldi.core.archive.Transcription'>
AM-1-best <class 'exkaldi.core.archive.Metric'>
AM-2-best <class 'exkaldi.core.archive.Metric'>
AM-3-best <class 'exkaldi.core.archive.Metric'>
LM-1-best <class 'exkaldi.core.archive.Metric'>
LM-2-best <class 'exkaldi.core.archive.Metric'>
LM-3-best <class 'exkaldi.core.archive.Metric'>


And importantly, Alignment can be returned. 

In [18]:
result = lat.get_nbest(
                        n=3,
                        symbolTable=lexicons("words"),
                        hmm=hmmFile, 
                        acwt=0.5, 
                        phoneLevel=False,
                        requireCost=False,
                        requireAli=True,
                )

for re in result[1]:
    print(re.name, type(re))

1-best <class 'exkaldi.core.archive.NumpyAliTrans'>
2-best <class 'exkaldi.core.archive.NumpyAliTrans'>
3-best <class 'exkaldi.core.archive.NumpyAliTrans'>


We will not train __LDA+MLLT__ and __SAT__ in this tutorial. If you need tutorial about them, please look the `examples` directory. We prepare some actual recipes for, for example, __TIMIT__ corpus.