# Welcome to ExKaldi

In this section, we will training a DNN acoustic model with __Tensorflow 2.x__.

If you want run this step, please install Tensorflow firstly.  
In this tutorial, we will customize the training loop with out using "__fit__".

In [1]:
import os
dataDir = "librispeech_dummy"

os.environ["LD_LIBRARY_PATH"] = "/home/khanh/workspace/miniconda3/envs/kaldi/lib/;/home/khanh/workspace/miniconda3/envs/test/lib/"

import exkaldi
exkaldi.info.reset_kaldi_root("/home/khanh/workspace/projects/kaldi")

exkaldi.info.reset_kaldi_root( yourPath )
If not, ERROR will occur when implementing some core functions.


We use keras to build and train model.

In [2]:
import tensorflow as tf
from tensorflow import keras
import random
import datetime
import numpy as np

Fix the random seed.

In [3]:
seed = 1
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

### Prepare Dataset

Restorage the training feature.

In [4]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")
feat = exkaldi.load_feat(featFile)
feat = feat.add_delta(order=2)
feat = feat.splice(left=1,right=1)
feat = feat.to_numpy()

feat.dim

117

___feat___ is an exkaldi __NumpyFeat__ object.

This feature is made following these steps:

    compute mfcc (13) >> apply CMVN (13) >> add 2 order deltas (39) >> splice 1-1 frames (117)

We still further do global standerd normalization on it.

In [5]:
feat = feat.normalize(std=True)

Them we load the alignment data. They have been generated in early step (07_train_triphone_HMM-GMM_delta).

We will use pdf-ID as target label. In exkaldi, transition-ID and phone-ID can also be extracted for mutiple tasks.

In [6]:
aliFile = os.path.join(dataDir, "exp", "train_delta", "final.ali")
hmmFile = os.path.join(dataDir, "exp", "train_delta", "final.mdl")

ali = exkaldi.load_ali(aliFile)

ali = ali.to_numpy(aliType="pdfID", hmm=hmmFile)

ali

<exkaldi.core.archive.NumpyAliPdf at 0x7f09f0284340>

The alignment will be label date to train the NN model.

Then we tuple the feature and alignment in order to generate a dataset for deep learning framework. We use __tuple_data(...)__ function to group them. 

But note that, this function will group the archives by their name, so please ensure their names are avaliable as python identifiers. (that means, we only allow lower and upper letters, digits, and underline in their names.)

In [7]:
feat.rename("mfcc")
ali.rename("pdfID")

dataset = exkaldi.tuple_dataset([feat,ali], frameLevel=True)

datasetSize = len(dataset)
datasetSize

129328

___dataset___ is a list. whose members are namedtuples. For example:

In [8]:
oneRecord = dataset[0]

oneRecord

TupledData(key='103-1240-0000', frameID=0, mfcc=array([[-8.1976879e-01, -1.9850516e-01,  6.3038737e-01,  9.0955116e-02,
         8.7195081e-01,  1.2406298e+00,  1.1615741e+00,  3.2836774e-01,
         4.5308763e-01,  6.5059960e-02,  2.0171395e-01,  6.1103612e-01,
         3.3649167e-01,  6.7091221e-04,  2.5476824e-02,  1.7797188e-01,
         1.7894910e-01,  4.5316700e-02, -5.1516163e-01, -4.1636893e-01,
         2.9141966e-01,  6.8266052e-01,  5.2522767e-01, -3.1561017e-01,
        -2.7952522e-01,  1.5361856e-01,  2.5082199e-02,  3.9682295e-02,
         2.3468968e-01, -2.3263440e-01,  3.4785096e-02, -6.7042999e-02,
        -1.5004002e-01, -1.2814204e-01,  4.1532350e-01,  6.2387514e-01,
        -1.9245759e-01, -1.8295710e-01, -5.5213046e-01, -8.1962723e-01,
        -1.9862096e-01,  6.3036972e-01,  9.1023326e-02,  8.7193406e-01,
         1.2405851e+00,  1.1615763e+00,  3.2848203e-01,  4.5308581e-01,
         6.4929694e-02,  2.0160693e-01,  6.1108011e-01,  3.3655649e-01,
         7.04733

Use name to get specified data.

In [9]:
oneRecord.pdfID

array([0], dtype=int32)

If you train a sequential NN model, you may not want to tuple archieves data in __frame level__ but in __utterance level__. try to change the mode of tuple. 

You can tuple all kinds of exkaldi archieves such as feature, CMVN, alignment, probability, transcription and so on. And even different feature such as MFCC and fBank, different alignment such as PdfID and Phone ID, can also be grouped. For example, now we want to do multiple tasks. 

In [10]:
ali2 = exkaldi.load_ali(aliFile)

ali2 = ali2.to_numpy(aliType="phoneID", hmm=hmmFile)

ali2.rename("phoneID")

dataset2 = exkaldi.tuple_dataset([feat,ali,ali2], frameLevel=True)

In [11]:
dataset2[0]

TupledData(key='103-1240-0000', frameID=0, mfcc=array([[-8.1976879e-01, -1.9850516e-01,  6.3038737e-01,  9.0955116e-02,
         8.7195081e-01,  1.2406298e+00,  1.1615741e+00,  3.2836774e-01,
         4.5308763e-01,  6.5059960e-02,  2.0171395e-01,  6.1103612e-01,
         3.3649167e-01,  6.7091221e-04,  2.5476824e-02,  1.7797188e-01,
         1.7894910e-01,  4.5316700e-02, -5.1516163e-01, -4.1636893e-01,
         2.9141966e-01,  6.8266052e-01,  5.2522767e-01, -3.1561017e-01,
        -2.7952522e-01,  1.5361856e-01,  2.5082199e-02,  3.9682295e-02,
         2.3468968e-01, -2.3263440e-01,  3.4785096e-02, -6.7042999e-02,
        -1.5004002e-01, -1.2814204e-01,  4.1532350e-01,  6.2387514e-01,
        -1.9245759e-01, -1.8295710e-01, -5.5213046e-01, -8.1962723e-01,
        -1.9862096e-01,  6.3036972e-01,  9.1023326e-02,  8.7193406e-01,
         1.2405851e+00,  1.1615763e+00,  3.2848203e-01,  4.5308581e-01,
         6.4929694e-02,  2.0160693e-01,  6.1108011e-01,  3.3655649e-01,
         7.04733

In [12]:
del ali2
del dataset2

### Training

Now we start to train DNN acoustic model. Fisrtly, design a data iterator from our provided dataset.

In [13]:
featureDim = feat.dim
pdfClasses = exkaldi.hmm.load_hmm(hmmFile,hmmType="tri").info.pdfs

del ali
del feat

In [14]:
def data_generater(dataset):

    length = len(dataset)
    while True:
        index = 0
        random.shuffle(dataset)
        while index < length:
            one = dataset[index]
            index += 1
            yield (one.mfcc[0], one.pdfID)

In [15]:
batchSize = 64
tf_datasets = tf.data.Dataset.from_generator(
                                 lambda : data_generater(dataset),
                                 (tf.float32, tf.int32)
                            ).batch(batchSize).prefetch(3)

2022-09-05 15:37:24.480986: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-05 15:37:24.611922: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 


Then define a simple Dense model.

In [16]:
def make_DNN_model(inputsShape, classes):
    
    inputs = keras.Input(inputsShape)
    h1 = keras.layers.Dense(256, activation="relu", kernel_initializer="he_normal")(inputs)
    h1_bn = keras.layers.BatchNormalization()(h1)
    
    h2 = keras.layers.Dense(512, activation="relu", kernel_initializer="he_normal")(h1_bn)
    h2_bn = keras.layers.BatchNormalization()(h2)
    
    h3 = keras.layers.Dense(512, activation="relu", kernel_initializer="he_normal")(h2_bn)
    h3_bn = keras.layers.BatchNormalization()(h3)
    
    outputs = keras.layers.Dense(classes, use_bias=False)(h3_bn)
    
    return keras.Model(inputs, outputs)

In [17]:
model = make_DNN_model((featureDim,), pdfClasses)

model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 117)]             0         
                                                                 
 dense (Dense)               (None, 256)               30208     
                                                                 
 batch_normalization (BatchN  (None, 256)              1024      
 ormalization)                                                   
                                                                 
 dense_1 (Dense)             (None, 512)               131584    
                                                                 
 batch_normalization_1 (Batc  (None, 512)              2048      
 hNormalization)                                                 
                                                                 
 dense_2 (Dense)             (None, 512)               262656

Here are optimizer and metrics.

In [18]:
optimizer = keras.optimizers.Adam(0.001)

losses = keras.metrics.Mean(name="train/loss", dtype=tf.float32)
accs = keras.metrics.Mean(name="train/accuracy", dtype=tf.float32)

Speecify the output dir. You can use tensorboard to check the training results.

In [19]:
outDir = os.path.join(dataDir, "exp", "train_DNN")

stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
logDir = os.path.join(outDir, "log", stamp)
file_writer = tf.summary.create_file_writer(logDir)

In [20]:
epochs = 1

epoch_iterations = datasetSize//batchSize

epoch_iterations

2020

In order to print the progress bar and control the epoch ending, we will lend a hand from __tqdm__ package.

In [21]:
! pip install tqdm 2>/dev/null

Collecting tqdm
  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
[K     |████████████████████████████████| 78 kB 4.3 MB/s  eta 0:00:01
[?25hInstalling collected packages: tqdm
Successfully installed tqdm-4.64.1


In [22]:
from tqdm import tqdm

Start to train this model. During the training loop, You can use tensorboard to look the visiable training result.

```
tensorboard --logdir=./librispeech_dummy/exp/train_DNN/log --bind_all
```

Just for fun, we do not validate the model during the training, but in real case, you should do it.

In [23]:
with file_writer.as_default():
    
    for epoch in range(epochs):
        
        for batch,i in zip(tf_datasets, tqdm(range(epoch_iterations))):
            data, label = batch
            
            with tf.GradientTape() as tape:
                logits = model(data, training=True)
                loss = keras.losses.sparse_categorical_crossentropy(label, logits, from_logits=True)
                losses(loss)
                gradients = tape.gradient(loss, model.trainable_variables)
                optimizer.apply_gradients(zip(gradients, model.trainable_variables))

                pred = keras.backend.argmax(logits, axis=1)

                acc = exkaldi.nn.accuracy(label.numpy(), pred.numpy())
                accs(acc.accuracy)
        
            #if int(optimizer.iterations.numpy()) % epoch_iterations == 0:     #<<<< if you don't use tqdm
            #    break
        
        current_loss = losses.result()
        current_acc = accs.result()
        tf.print( f"Epoch {epoch}", f" Loss {current_loss:.6f}", f" Acc {current_acc:.6f}")

        tf.summary.scalar("train/loss", data=current_loss, step=epoch)
        tf.summary.scalar("train/accuracy", data=current_acc, step=epoch)

    tf.print( "Training Done" )

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2020/2020 [04:18<00:00,  7.83it/s]


Epoch 0  Loss 2.133084  Acc 0.454819
Training Done


Save the model in file.

In [24]:
tfModelFile = os.path.join(outDir, "dnn.h5")

model.save(tfModelFile, include_optimizer=False)



Now we predict the network output for test data for decoding. We do the same processing as training feature.

In [25]:
testFeatFile = os.path.join(dataDir, "exp", "test_mfcc_cmvn.ark")
testFeat = exkaldi.load_feat(testFeatFile)
testFeat = testFeat.add_delta(order=2).splice(left=1,right=1)
testFeat = testFeat.to_numpy()
testFeat = testFeat.normalize(std=True)

testFeat.dim

117

In [26]:
prob = {}
for utt, mat in testFeat.items():
    logits = model(mat, training=False)
    prob[utt] = logits.numpy()

prob = exkaldi.load_prob(prob)

prob

<exkaldi.core.archive.NumpyProb at 0x7f09c58c25e0>

___prob___ is an exkaldi __NumpyProb__ object. Save it to file. We will decode it in the next step.

In [27]:
probFile = os.path.join(outDir, "amp.npy")

prob.save(probFile)

  arr = np.asanyarray(arr)


'librispeech_dummy/exp/train_DNN/amp.npy'