# Welcome to ExKaldi

In this section, we will train a monophone HMM-GMM model.

In [None]:
import exkaldi

import os
dataDir = "librispeech_dummy"

Firstly, prepare lexicons. Restorage the LexiconBank from file (Generated in step 3).

In [None]:
lexicons = exkaldi.load_lex(os.path.join(dataDir, "exp", "lexicons.lex"))

Then we need to make the HMM-GMM topology file and acoustic feature data in order to initialize a monophone GMM-HMM model.

In [None]:
topoFile = os.path.join(dataDir, "exp", "topo")

exkaldi.hmm.make_topology(lexicons, outFile=topoFile, numNonsilStates=3, numSilStates=3)

In early step (2_feature_processing), we have made the mfcc feature, now use it.

In [None]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")

feat = exkaldi.load_feat(featFile, name="mfcc")

feat.dim

Then add 2-order deltas to this feature.

In [None]:
feat = feat.add_delta(order=2)

feat.dim

Now, instantiate a HMM-GMM model.

In [None]:
model = exkaldi.hmm.MonophoneHMM(lexicons=lexicons, name="mono")

model

___model___ is an exkaldi __MonophoneHMM__ object. Exkaldi have two GMM-HMM APIs.

__MonophoneHMM__: the monphone HMM-GMM model.  
__TriphoneHMM__: the context-phone HMM-GMM model.  

Now, this __model__ is void and unavaliable. We must initialize it's archtecture and parameters.

In [None]:
model.initialize(feat=feat, topoFile=topoFile)

model.info

Then we are about to train this model. We provide a high-level API, __model.train(...)__ to train this model in a nutshell, but we still introduce the basic training loop step by step here.

### Train HMM-GMM in detail

#### 1. Prepare the int-ID format transcription.

We actually use the transcription with int-ID format, so it's necessary convert text format to int-ID format firstly.

In [None]:
transFile = os.path.join(dataDir, "train", "text")
oov = lexicons("oov")

trans = exkaldi.hmm.transcription_to_int(transFile, lexicons)

type(trans)

___trans___ is an exkaldi __Transcription__ object, which is designed to hold the transcription. We save the int-format transcription for further using.

In [None]:
intTransFile = os.path.join(dataDir, "exp", "text.int")

trans.save(intTransFile)

Have a look at this transcription.

In [None]:
trans.subset(nHead=1)

#### 2. Compile the train graph.

Compile the train graph. Here, L.fst file is necessary. In early step (3_prepare_lexicons), we have generated one, now use it.

In [None]:
Lfile = os.path.join(dataDir, "exp", "L.fst")

Even though decision tree is actually useless when traing monophone HMM-GMM, Kaldi still need it. 

When the monophone HMM is initialized, a temporary tree is generated automatically. Use it directly.

In [None]:
tree = model.tree

tree

___tree___ is an exkaldi __DecisionTree__ object. In next step, we will introduce how to build a normal decision tree. But now, skip it.

In [None]:
outDir = os.path.join(dataDir, "exp", "train_mono")

exkaldi.utils.make_dependent_dirs(outDir, pathIsFile=False)

In [None]:
trainGraphFile = os.path.join(outDir, "train_graph")

model.compile_train_graph(tree=tree, transcription=trans, LFile=Lfile, outFile=trainGraphFile)

When training the HMM-GMM model, a basic loop is:  

    align feature >> accumulate statistics >> update gassian functions

Then we introduce one training loop in detail.

#### 3. Align acoustic feature averagely in order to start the first train step.

Kaldi align feature equally in the first step.

In [None]:
ali = model.align_equally(feat, trainGraphFile)

ali

___ali___ is an exkaldi __BytesAliTrans__ object. It holds the alignment in transition-ID level. 

You can covert it to __NumPy__ format to check it.

In [None]:
ali.subset(nHead=1).to_numpy().data

#### 4. Use alignment to accumulate the statistics in order to update the parameters of model

In [None]:
statsFile = os.path.join(outDir, "stats.acc")

model.accumulate_stats(feat=feat, ali=ali, outFile=statsFile)

#### 5. Use these statistics to update model parameters.

This step can increase the numbers of gaussians. We try to use 10 more gaussians.

In [None]:
targetGaussians = model.info.gaussians + 10

model.update(statsFile, numgauss=targetGaussians)

model.info

In next training step, use Viterbi aligning to instead of average aligning.

#### 6. Align acoustic feature with Vertibi algorithm.

In [None]:
del ali

In [None]:
ali = model.align(feat=feat, trainGraphFile=trainGraphFile)

ali

In [None]:
ali.subset(nHead=1).to_numpy().data

A basic training loop is just like this. Actually, we have a high-level API to train the model.

### Train HMM-GMM with high-level API

In [None]:
os.remove(trainGraphFile)
os.remove(statsFile)
del ali
del trans

We try to train 10 iterations.

Note that the text format transcription is expected when you use this method.

In [None]:
finalAli = model.train(feat, transFile, Lfile, tempDir=outDir, numIters=10, maxIterInc=8, totgauss=500)

In [None]:
finalAli.subset(nHead=5)

An __Indextable__ of final alignment object will be returned.

In [None]:
model.info

Final model and alignment are saved in files automatically. You can save them manually. 

In [None]:
#modelFile = os.path.join(outDir, "mono.mdl")
#model.save(modelFile)
#treeFile = os.path.join(outDir, "tree")
#tree.save(treeFile)