# Welcome to Exkaldi

In this section, we will train the triphone HMM-GMM with delta feature.

In [1]:
CONDA_DIR = "/home/khanh/workspace/miniconda3"
KALDI_ENV = "kaldi"
EXKALDI_ENV = "exkaldi"
KALDI_ROOT = "/home/khanh/workspace/projects/kaldi"

DATA_DIR = "librispeech_dummy"

def import_exkaldi():
    import os

    # add lib path
    os.environ["LD_LIBRARY_PATH"] = ";".join([
        os.path.join(CONDA_DIR, "envs", KALDI_ENV, "lib"),
        os.path.join(CONDA_DIR, "envs", EXKALDI_ENV, "lib"),
    ])

    import exkaldi
    exkaldi.info.reset_kaldi_root(KALDI_ROOT)

    return exkaldi
exkaldi = import_exkaldi()
dataDir = "librispeech_dummy"

import os

exkaldi.info.reset_kaldi_root( yourPath )
If not, ERROR will occur when implementing some core functions.


Firstly, instantiate a triphone model object.

In [2]:
model = exkaldi.hmm.TriphoneHMM()

model

<exkaldi.hmm.hmm.TriphoneHMM at 0x7f3948303a90>

___model___ is unavaliable now. We have to initialize its data. We will use these files which have been generated in early steps.

___tree___ and ___treeStats___ : generated in 6_train_decision_tree  
___topo___:  generated in 5_train_mono_HMM-GMM

Actually, you can initialize the TriphoneHMM object from feature directly, but here we use tree statistics file.

In [3]:
treeFile = os.path.join(dataDir, "exp", "train_delta", "tree")
treeStatsFile = os.path.join(dataDir, "exp", "train_delta", "treeStats.acc")
topoFile = os.path.join(dataDir, "exp", "topo")

In [4]:
model.initialize(tree=treeFile, treeStatsFile=treeStatsFile, topoFile=topoFile)

model.info

GmmHmmInfo(phones=69, pdfs=264, transitionIds=684, transitionStates=342, dimension=39, gaussians=264)

The training steps of triphone HMM are almost the same as monophone HMM except that we don't use equally aligning at the first time.

We will introduce the traing step in a nutshell.

### Training in detail

At the first step, we must generate the new alignment data in the first step. You can convert the lastest alignment data generated by monophone model to a new alignment data corresponding to triphone. Use the generated alignment and final monophone model.

In [5]:
aliFile = os.path.join(dataDir, "exp", "train_mono", "final.ali")
monoFile= os.path.join(dataDir, "exp", "train_mono", "final.mdl")
ali = exkaldi.load_ali(aliFile)

In [6]:
newAli = exkaldi.hmm.convert_alignment(
                                 ali=ali, 
                                 originHmm=monoFile, 
                                 targetHmm=model, 
                                 tree=treeFile
                            )
newAli

<exkaldi.core.archive.BytesAliTrans at 0x7f39319c7550>

In another way, align feature again directly with new triphone model. 

In the next steps, we will review the steps to train a HMM-GMM model and use it to align acoustic feature.

In [7]:
del newAli

Prepare a lexicons (generated in 3_prepare_lexicons).

In [8]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.load_lex(lexFile)

lexicons

<exkaldi.decode.graph.LexiconBank at 0x7f3931a053d0>

Prepare int-format transcription (generated in 5_train_mono_HMM-GMM ).

In [9]:
intTransFile = os.path.join(dataDir, "exp", "text.int")

trans = exkaldi.load_transcription(intTransFile)

trans.subset(nHead=1)

{'103-1240-0000': '201 875 800 1004 744 653 1239 800 1004 744 725 671 1395 1268 96 751 1064 328 348 648 4 724 588 501 1416 36 53 687 367 53 1314 177 4 168'}

Prepare L.fst file (generated in 3_prepare_lexicons).

In [10]:
Lfile = os.path.join(dataDir, "exp", "L.fst")

Prepare feature (generated in 2_feature_processing).

In [11]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")
feat = exkaldi.load_feat(featFile)
feat = feat.add_delta(order=2)

feat.dim

39

#### 1. Compile new train graph.

In [12]:
outDir = os.path.join(dataDir, "exp", "train_delta")

exkaldi.utils.make_dependent_dirs(outDir, pathIsFile=False)

In [13]:
trainGraphFile = os.path.join(outDir, "train_graph")

model.compile_train_graph(tree=treeFile, transcription=trans, LFile=Lfile, outFile=trainGraphFile, lexicons=lexicons)

'librispeech_dummy/exp/train_delta/train_graph'

#### 2. Align acoustic feature.

In [14]:
ali = model.align(feat, trainGraphFile, lexicons=lexicons)

ali

<exkaldi.core.archive.BytesAliTrans at 0x7f3931331c40>

#### 3. Accumulate statistics.

In [15]:
statsFile = os.path.join(outDir, "stats.acc")

model.accumulate_stats(feat=feat, ali=ali, outFile=statsFile)

'librispeech_dummy/exp/train_delta/stats.acc'

#### 4. Update HMM-GMM parameters.

In [16]:
targetGaussians = 300

model.update(statsFile, targetGaussians)

model.info

GmmHmmInfo(phones=69, pdfs=264, transitionIds=684, transitionStates=342, dimension=39, gaussians=300)

### Training in high-level

In this step, we will introduce how to training the triphone in directly.

In [17]:
del model
del ali
del trans

os.remove(trainGraphFile)
os.remove(statsFile)

Some file paths or objects defined above will be used here. 

Firstly, initialize model. Give lexicons as a optional parameter.

In [18]:
model = exkaldi.hmm.TriphoneHMM(lexicons=lexicons)

model.initialize(tree=treeFile, treeStatsFile=treeStatsFile, topoFile=topoFile)

model.info

GmmHmmInfo(phones=69, pdfs=264, transitionIds=684, transitionStates=342, dimension=39, gaussians=264)

Then train it.

In [19]:
outDir = os.path.join(dataDir, "exp", "train_delta")
transFile = os.path.join(dataDir, "train", "text")

aliIndex = model.train(feat=feat, transcription=transFile, LFile=Lfile, tree=treeFile, tempDir=outDir, 
                         numIters=10, maxIterInc=8, totgauss=1500
                    )

Start to train triphone model.
Start Time: 2022/09/05-15:32:16
Convert transcription to int value format.
Compiling training graph.
Iter >> 1
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 2.0075 seconds
Iter >> 2
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 2.8225 seconds
Iter >> 3
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 2.5729 seconds
Iter >> 4
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 2.6493 seconds
Iter >> 5
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 3.5087 seconds
Iter >> 6
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 2.6120 seconds
Iter >> 7
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 2.9597 seconds
Iter >> 8
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 2.6623 seconds
Iter >> 9
Aligning data
Accumulate GMM statistics
Update GMM parameters
Used time: 2

Final model and alignment have been saved in file automatically. Look the final model information.

In [20]:
model.info

GmmHmmInfo(phones=69, pdfs=264, transitionIds=684, transitionStates=342, dimension=39, gaussians=1652)