# Welcome to ExKaldi

In this section, we will train a monophone HMM-GMM model.

In [1]:
import os
dataDir = "librispeech_dummy"

os.environ["LD_LIBRARY_PATH"] = "/home/khanh/workspace/miniconda3/envs/kaldi/lib/;/home/khanh/workspace/miniconda3/envs/test/lib/"

import exkaldi
exkaldi.info.reset_kaldi_root("/home/khanh/workspace/projects/kaldi")

exkaldi.info.reset_kaldi_root( yourPath )
If not, ERROR will occur when implementing some core functions.


Firstly, prepare lexicons. Restorage the LexiconBank from file (Generated in step 3).

In [2]:
lexicons = exkaldi.load_lex(os.path.join(dataDir, "exp", "lexicons.lex"))

Then we need to make the HMM-GMM topology file and acoustic feature data in order to initialize a monophone GMM-HMM model.

In [3]:
topoFile = os.path.join(dataDir, "exp", "topo")

exkaldi.hmm.make_topology(lexicons, outFile=topoFile, numNonsilStates=3, numSilStates=3)

'librispeech_dummy/exp/topo'

In early step (2_feature_processing), we have made the mfcc feature, now use it.

In [4]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")

feat = exkaldi.load_feat(featFile, name="mfcc")

feat.dim

13

Then add 2-order deltas to this feature.

In [5]:
feat = feat.add_delta(order=2)

feat.dim

39

Now, instantiate a HMM-GMM model.

In [6]:
model = exkaldi.hmm.MonophoneHMM(lexicons=lexicons, name="mono")

model

<exkaldi.hmm.hmm.MonophoneHMM at 0x7f858c7f9430>

___model___ is an exkaldi __MonophoneHMM__ object. Exkaldi have two GMM-HMM APIs.

__MonophoneHMM__: the monphone HMM-GMM model.  
__TriphoneHMM__: the context-phone HMM-GMM model.  

Now, this __model__ is void and unavaliable. We must initialize it's archtecture and parameters.

In [7]:
model.initialize(feat=feat, topoFile=topoFile)

model.info

GmmHmmInfo(phones=69, pdfs=207, transitionIds=414, transitionStates=207, dimension=39, gaussians=207)

Then we are about to train this model. We provide a high-level API, __model.train(...)__ to train this model in a nutshell, but we still introduce the basic training loop step by step here.

### Train HMM-GMM in detail

#### 1. Prepare the int-ID format transcription.

We actually use the transcription with int-ID format, so it's necessary convert text format to int-ID format firstly.

In [8]:
transFile = os.path.join(dataDir, "train", "text")
oov = lexicons("oov")

trans = exkaldi.hmm.transcription_to_int(transFile, lexicons)

type(trans)

exkaldi.core.archive.Transcription

___trans___ is an exkaldi __Transcription__ object, which is designed to hold the transcription. We save the int-format transcription for further using.

In [9]:
intTransFile = os.path.join(dataDir, "exp", "text.int")

trans.save(intTransFile)

'librispeech_dummy/exp/text.int'

Have a look at this transcription.

In [10]:
trans.subset(nHead=1)

{'103-1240-0000': '201 875 800 1004 744 653 1239 800 1004 744 725 671 1395 1268 96 751 1064 328 348 648 4 724 588 501 1416 36 53 687 367 53 1314 177 4 168'}

#### 2. Compile the train graph.

Compile the train graph. Here, L.fst file is necessary. In early step (3_prepare_lexicons), we have generated one, now use it.

In [11]:
Lfile = os.path.join(dataDir, "exp", "L.fst")

Even though decision tree is actually useless when traing monophone HMM-GMM, Kaldi still need it. 

When the monophone HMM is initialized, a temporary tree is generated automatically. Use it directly.

In [12]:
tree = model.tree

tree

<exkaldi.hmm.hmm.DecisionTree at 0x7f858c70b4f0>

___tree___ is an exkaldi __DecisionTree__ object. In next step, we will introduce how to build a normal decision tree. But now, skip it.

In [13]:
outDir = os.path.join(dataDir, "exp", "train_mono")

exkaldi.utils.make_dependent_dirs(outDir, pathIsFile=False)

In [14]:
trainGraphFile = os.path.join(outDir, "train_graph")

model.compile_train_graph(tree=tree, transcription=trans, LFile=Lfile, outFile=trainGraphFile)

'librispeech_dummy/exp/train_mono/train_graph'

When training the HMM-GMM model, a basic loop is:  

    align feature >> accumulate statistics >> update gassian functions

Then we introduce one training loop in detail.

#### 3. Align acoustic feature averagely in order to start the first train step.

Kaldi align feature equally in the first step.

In [15]:
ali = model.align_equally(feat, trainGraphFile)

ali

<exkaldi.core.archive.BytesAliTrans at 0x7f858c7e4940>

___ali___ is an exkaldi __BytesAliTrans__ object. It holds the alignment in transition-ID level. 

You can covert it to __NumPy__ format to check it.

In [16]:
ali.subset(nHead=1).to_numpy().data

{'103-1240-0000': array([  8,   7,   7, ..., 255, 258, 257], dtype=int32)}

#### 4. Use alignment to accumulate the statistics in order to update the parameters of model

In [17]:
statsFile = os.path.join(outDir, "stats.acc")

model.accumulate_stats(feat=feat, ali=ali, outFile=statsFile)

'librispeech_dummy/exp/train_mono/stats.acc'

#### 5. Use these statistics to update model parameters.

This step can increase the numbers of gaussians. We try to use 10 more gaussians.

In [18]:
targetGaussians = model.info.gaussians + 10

model.update(statsFile, numgauss=targetGaussians)

model.info

GmmHmmInfo(phones=69, pdfs=207, transitionIds=414, transitionStates=207, dimension=39, gaussians=217)

In next training step, use Viterbi aligning to instead of average aligning.

#### 6. Align acoustic feature with Vertibi algorithm.

In [19]:
del ali

In [20]:
ali = model.align(feat=feat, trainGraphFile=trainGraphFile)

ali

<exkaldi.core.archive.BytesAliTrans at 0x7f8579cf0a60>

In [21]:
ali.subset(nHead=1).to_numpy().data

{'103-1240-0000': array([122, 124, 123, ..., 254, 256, 258], dtype=int32)}

A basic training loop is just like this. Actually, we have a high-level API to train the model.

### Train HMM-GMM with high-level API

In [22]:
os.remove(trainGraphFile)
os.remove(statsFile)
del ali
del trans

We try to train 10 iterations.

Note that the text format transcription is expected when you use this method.

In [23]:
finalAli = model.train(feat, transFile, Lfile, tempDir=outDir, numIters=10, maxIterInc=8, totgauss=500)

Start to train monophone model.
Start Time: 2022/09/01-20:35:50
Convert transcription to int value format.
Compiling training graph.
Iter >> 0
Aligning data equally
Accumulate GMM statistics
Update GMM parameters
Used time: 1.4823 seconds
Iter >> 1
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 19.2555 seconds
Iter >> 2
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 6.1158 seconds
Iter >> 3
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 3.5586 seconds
Iter >> 4
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 3.1796 seconds
Iter >> 5
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 3.7082 seconds
Iter >> 6
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 3.6484 seconds
Iter >> 7
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 3.4367 seconds
Iter >> 8
Aligning data
Accumulate GMM statistics
Update GMM parameters
Us

In [24]:
finalAli.subset(nHead=5)

{'103-1240-0000': IndexInfo(frames=1407, startIndex=0, dataSize=7056, filePath='/home/khanh/workspace/projects/exkaldi/tutorials/librispeech_dummy/exp/train_mono/final.ali'),
 '103-1240-0001': IndexInfo(frames=1593, startIndex=7056, dataSize=7986, filePath='/home/khanh/workspace/projects/exkaldi/tutorials/librispeech_dummy/exp/train_mono/final.ali'),
 '103-1240-0002': IndexInfo(frames=1393, startIndex=15042, dataSize=6986, filePath='/home/khanh/workspace/projects/exkaldi/tutorials/librispeech_dummy/exp/train_mono/final.ali'),
 '103-1240-0003': IndexInfo(frames=1469, startIndex=22028, dataSize=7366, filePath='/home/khanh/workspace/projects/exkaldi/tutorials/librispeech_dummy/exp/train_mono/final.ali'),
 '103-1240-0004': IndexInfo(frames=1250, startIndex=29394, dataSize=6271, filePath='/home/khanh/workspace/projects/exkaldi/tutorials/librispeech_dummy/exp/train_mono/final.ali')}

An __Indextable__ of final alignment object will be returned.

In [25]:
model.info

GmmHmmInfo(phones=69, pdfs=207, transitionIds=414, transitionStates=207, dimension=39, gaussians=535)

Final model and alignment are saved in files automatically. You can save them manually. 

In [26]:
#modelFile = os.path.join(outDir, "mono.mdl")
#model.save(modelFile)
#treeFile = os.path.join(outDir, "tree")
#tree.save(treeFile)