# Welcome to ExKaldi

In this section, we will training a DNN acoustic model with __Tensorflow 2.x__.

If you want run this step, please install Tensorflow firstly.  
In this tutorial, we will customize the training loop with out using "__fit__".

In [None]:
import exkaldi

import os
dataDir = "librispeech_dummy"

We use keras to build and train model.

In [None]:
import tensorflow as tf
from tensorflow import keras
import random
import datetime
import numpy as np

Fix the random seed.

In [None]:
seed = 1
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

### Prepare Dataset

Restorage the training feature.

In [None]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")
feat = exkaldi.load_feat(featFile)
feat = feat.add_delta(order=2)
feat = feat.splice(left=1,right=1)
feat = feat.to_numpy()

feat.dim

___feat___ is an exkaldi __NumpyFeat__ object.

This feature is made following these steps:

    compute mfcc (13) >> apply CMVN (13) >> add 2 order deltas (39) >> splice 1-1 frames (117)

We still further do global standerd normalization on it.

In [None]:
feat = feat.normalize(std=True)

Them we load the alignment data. They have been generated in early step (07_train_triphone_HMM-GMM_delta).

We will use pdf-ID as target label. In exkaldi, transition-ID and phone-ID can also be extracted for mutiple tasks.

In [None]:
aliFile = os.path.join(dataDir, "exp", "train_delta", "final.ali")
hmmFile = os.path.join(dataDir, "exp", "train_delta", "final.mdl")

ali = exkaldi.load_ali(aliFile)

ali = ali.to_numpy(aliType="pdfID", hmm=hmmFile)

ali

The alignment will be label date to train the NN model.

Then we tuple the feature and alignment in order to generate a dataset for deep learning framework. We use __tuple_data(...)__ function to group them. 

But note that, this function will group the archives by their name, so please ensure their names are avaliable as python identifiers. (that means, we only allow lower and upper letters, digits, and underline in their names.)

In [None]:
feat.rename("mfcc")
ali.rename("pdfID")

dataset = exkaldi.tuple_dataset([feat,ali], frameLevel=True)

datasetSize = len(dataset)
datasetSize

___dataset___ is a list. whose members are namedtuples. For example:

In [None]:
oneRecord = dataset[0]

oneRecord

Use name to get specified data.

In [None]:
oneRecord.pdfID

If you train a sequential NN model, you may not want to tuple archieves data in __frame level__ but in __utterance level__. try to change the mode of tuple. 

You can tuple all kinds of exkaldi archieves such as feature, CMVN, alignment, probability, transcription and so on. And even different feature such as MFCC and fBank, different alignment such as PdfID and Phone ID, can also be grouped. For example, now we want to do multiple tasks. 

In [None]:
ali2 = exkaldi.load_ali(aliFile)

ali2 = ali2.to_numpy(aliType="phoneID", hmm=hmmFile)

ali2.rename("phoneID")

dataset2 = exkaldi.tuple_dataset([feat,ali,ali2], frameLevel=True)

In [None]:
dataset2[0]

In [None]:
del ali2
del dataset2

### Training

Now we start to train DNN acoustic model. Fisrtly, design a data iterator from our provided dataset.

In [None]:
featureDim = feat.dim
pdfClasses = exkaldi.hmm.load_hmm(hmmFile,hmmType="tri").info.pdfs

del ali
del feat

In [None]:
def data_generater(dataset):

    length = len(dataset)
    while True:
        index = 0
        random.shuffle(dataset)
        while index < length:
            one = dataset[index]
            index += 1
            yield (one.mfcc[0], one.pdfID)

In [None]:
batchSize = 64
tf_datasets = tf.data.Dataset.from_generator(
                                 lambda : data_generater(dataset),
                                 (tf.float32, tf.int32)
                            ).batch(batchSize).prefetch(3)

Then define a simple Dense model.

In [None]:
def make_DNN_model(inputsShape, classes):
    
    inputs = keras.Input(inputsShape)
    h1 = keras.layers.Dense(256, activation="relu", kernel_initializer="he_normal")(inputs)
    h1_bn = keras.layers.BatchNormalization()(h1)
    
    h2 = keras.layers.Dense(512, activation="relu", kernel_initializer="he_normal")(h1_bn)
    h2_bn = keras.layers.BatchNormalization()(h2)
    
    h3 = keras.layers.Dense(512, activation="relu", kernel_initializer="he_normal")(h2_bn)
    h3_bn = keras.layers.BatchNormalization()(h3)
    
    outputs = keras.layers.Dense(classes, use_bias=False)(h3_bn)
    
    return keras.Model(inputs, outputs)

In [None]:
model = make_DNN_model((featureDim,), pdfClasses)

model.summary()

Here are optimizer and metrics.

In [None]:
optimizer = keras.optimizers.Adam(0.001)

losses = keras.metrics.Mean(name="train/loss", dtype=tf.float32)
accs = keras.metrics.Mean(name="train/accuracy", dtype=tf.float32)

Speecify the output dir. You can use tensorboard to check the training results.

In [None]:
outDir = os.path.join(dataDir, "exp", "train_DNN")

stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
logDir = os.path.join(outDir, "log", stamp)
file_writer = tf.summary.create_file_writer(logDir)

In [None]:
epochs = 1

epoch_iterations = datasetSize//batchSize

epoch_iterations

In order to print the progress bar and control the epoch ending, we will lend a hand from __tqdm__ package.

In [None]:
! pip install tqdm 2>/dev/null

In [None]:
from tqdm import tqdm

Start to train this model. During the training loop, You can use tensorboard to look the visiable training result.

```
tensorboard --logdir=./librispeech_dummy/exp/train_DNN/log --bind_all
```

Just for fun, we do not validate the model during the training, but in real case, you should do it.

In [None]:
with file_writer.as_default():
    
    for epoch in range(epochs):
        
        for batch,i in zip(tf_datasets, tqdm(range(epoch_iterations))):
            data, label = batch
            
            with tf.GradientTape() as tape:
                logits = model(data, training=True)
                loss = keras.losses.sparse_categorical_crossentropy(label, logits, from_logits=True)
                losses(loss)
                gradients = tape.gradient(loss, model.trainable_variables)
                optimizer.apply_gradients(zip(gradients, model.trainable_variables))

                pred = keras.backend.argmax(logits, axis=1)

                acc = exkaldi.nn.accuracy(label.numpy(), pred.numpy())
                accs(acc.accuracy)
        
            #if int(optimizer.iterations.numpy()) % epoch_iterations == 0:     #<<<< if you don't use tqdm
            #    break
        
        current_loss = losses.result()
        current_acc = accs.result()
        tf.print( f"Epoch {epoch}", f" Loss {current_loss:.6f}", f" Acc {current_acc:.6f}")

        tf.summary.scalar("train/loss", data=current_loss, step=epoch)
        tf.summary.scalar("train/accuracy", data=current_acc, step=epoch)

    tf.print( "Training Done" )

Save the model in file.

In [None]:
tfModelFile = os.path.join(outDir, "dnn.h5")

model.save(tfModelFile, include_optimizer=False)

Now we predict the network output for test data for decoding. We do the same processing as training feature.

In [None]:
testFeatFile = os.path.join(dataDir, "exp", "test_mfcc_cmvn.ark")
testFeat = exkaldi.load_feat(testFeatFile)
testFeat = testFeat.add_delta(order=2).splice(left=1,right=1)
testFeat = testFeat.to_numpy()
testFeat = testFeat.normalize(std=True)

testFeat.dim

In [None]:
prob = {}
for utt, mat in testFeat.items():
    logits = model(mat, training=False)
    prob[utt] = logits.numpy()

prob = exkaldi.load_prob(prob)

prob

___prob___ is an exkaldi __NumpyProb__ object. Save it to file. We will decode it in the next step.

In [None]:
probFile = os.path.join(outDir, "amp.npy")

prob.save(probFile)