# 6.3: Inspecting the `exp` directories

`run_train_phones.sh` will generate a new directory for each layer of the acoustic model in `exp`.  We will inspect their contents below.

In [1]:
ls exp

[0m[01;34mmonophones[0m          [01;34mtriphones_aligned[0m      [01;34mtriphones_sat[0m
[01;34mmonophones_aligned[0m  [01;34mtriphones_lda[0m          [01;34mtriphones_sat_aligned[0m
[01;34mtriphones[0m           [01;34mtriphones_lda_aligned[0m


## `exp/monophones`

This directory contains the files generated from the *first* layer of training: `monophones`.

In [2]:
ls exp/monophones

0.mdl     [0m[01;31mali.2.gz[0m   [01;36mfinal.mdl[0m   [01;31mfsts.3.gz[0m               num_jobs  [01;35mtree.png[0m
40.mdl    [01;31mali.3.gz[0m   [01;36mfinal.occs[0m  [01;31mfsts.4.gz[0m               tree      tree.ps
40.occs   [01;31mali.4.gz[0m   [01;31mfsts.1.gz[0m   kaldi_config_args.json  tree.dot
[01;31mali.1.gz[0m  cmvn_opts  [01;31mfsts.2.gz[0m   [01;34mlog[0m                     [01;35mtree.jpg[0m


### `exp/monophones/log`

This directory contains all the logs of all the steps run in the process of training `monophones`.  You'll notice that there will be one or more `.[0-9]` postfixes to the log.  The **last** of these will refer to the thread used during parallelization.  The **first** of these will refer to a particular iteration (for those steps that are iterative).  Some of these are more useful than others, but they are **always** useful when an error occurs. 

In [3]:
ls exp/monophones/log

acc.1.1.log   acc.26.3.log  acc.8.1.log     align.4.3.log
acc.1.2.log   acc.26.4.log  acc.8.2.log     align.4.4.log
acc.1.3.log   acc.27.1.log  acc.8.3.log     align.5.1.log
acc.1.4.log   acc.27.2.log  acc.8.4.log     align.5.2.log
acc.10.1.log  acc.27.3.log  acc.9.1.log     align.5.3.log
acc.10.2.log  acc.27.4.log  acc.9.2.log     align.5.4.log
acc.10.3.log  acc.28.1.log  acc.9.3.log     align.6.1.log
acc.10.4.log  acc.28.2.log  acc.9.4.log     align.6.2.log
acc.11.1.log  acc.28.3.log  align.0.1.log   align.6.3.log
acc.11.2.log  acc.28.4.log  align.0.2.log   align.6.4.log
acc.11.3.log  acc.29.1.log  align.0.3.log   align.7.1.log
acc.11.4.log  acc.29.2.log  align.0.4.log   align.7.2.log
acc.12.1.log  acc.29.3.log  align.1.1.log   align.7.3.log
acc.12.2.log  acc.29.4.log  align.1.2.log   align.7.4.log
acc.12.3.log  acc.3.1.log   align.1.3.log   align.8.1.log
acc.12.4.log  acc.3.2.log   align.1.4.log   align.8.2.log
acc.13.1.log  acc.3.3.log   align.10.1.log  align.8.3.log
acc.13.2.log  

### `num_jobs`

There will often be a `num_jobs` file in `kaldi` directories.  This is simply one `integer`, the number of threads used if parallelization was used.

In [4]:
cat exp/monophones/num_jobs

4


### `cmvn_opts`

You will often see a file ending in `_opts`.  This is an `options` file that *sometimes* contains hyperparameter settings that will be read by scripts.  They will take the same format as the arguments we can add to our `non_vanilla_*` arguments in `kaldi_config.json`:

```
--variable_name [variable_value]
```

In this case, `cmvn_opts` is empty.


In [5]:
cat exp/monophones/cmvn_opts




### `{40,final}.occs`

This file contains the "per-transition-id occupation counts" and is "rarely needed" (quotes from a post by the main author of `kaldi`).  So we will ignore this file.  

In this case, you see a `40_` and a `final_`.  This implies that this information was updated iteratively, and all but the last iteration (in this case, `40_`) were deleted.  `final_` is then a `symbolic link` to the highest valued file left in the directory.  You can see this represented by the `->` in the `ls -lah` command below.

**Note:** `kaldi` will utilize this structure often, including below with the `.mdl` files.

In [6]:
ls -lah exp/monophones | grep occs

-rw-r--r--  1 root root  811 Nov 29 20:12 40.[01;31m[Koccs[m[K
lrwxrwxrwx  1 root root    7 Nov 29 20:12 final.[01;31m[Koccs[m[K -> 40.[01;31m[Koccs[m[K


### `{40,final}.mdl`

The `.mdl` file is the actual acoustic model file for this step.  If we were so inclined, we could use this `.mdl` file as one of the arguments passed to our decoding step.  Each "layer" of our acoustic training will generate a `.mdl` file.

This file [does ????], and we'll look at these `.mdl` files in more detail in the next notebook.  But they can be converted to "human-readable" form using `show-transitions` (as long as you have `source`d `path.sh`)


In [7]:
. ${KALDI_INSTRUCTIONAL_PATH}/path.sh
show-transitions

show-transitions 

Print debugging info from transition model, in human-readable form
Usage:  show-transitions <phones-symbol-table> <transition/model-file> [<occs-file>]
e.g.: 
 show-transitions phones.txt 1.mdl 1.occs

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)



: 1

In [8]:
show-transitions \
    data/lang/phones.txt \
    exp/monophones/final.mdl \
    | head

show-transitions data/lang/phones.txt exp/monophones/final.mdl 
Transition-state 1: phone = SIL hmm-state = 0 pdf = 0
 Transition-id = 1 p = 0.825838 [self-loop]
 Transition-id = 2 p = 0.01 [0 -> 1]
 Transition-id = 3 p = 0.154166 [0 -> 2]
 Transition-id = 4 p = 0.01 [0 -> 3]
Transition-state 2: phone = SIL hmm-state = 1 pdf = 1
 Transition-id = 5 p = 0.951921 [self-loop]
 Transition-id = 6 p = 0.01 [1 -> 2]
 Transition-id = 7 p = 0.01 [1 -> 3]
 Transition-id = 8 p = 0.0280863 [1 -> 4]


### `fsts.*.gz`

These files (one for each parallelized thread) contain the `FST`s representing our training data.  We will look at similar `FST`s used during **test** time at a later date, so for now, we'll ignore these files.

### `ali.*.gz`

These files contain the alignment information mapping each frame to a phone.  You may recall that we used a similar `ali.*.gz` file in `4_3-examining_mfccs.ipynb`.  We can use `ali-to-phones` to convert these aligments into a sequence of phones.  We will look at these alignments in more detail later.

In [9]:
ali-to-phones

ali-to-phones 

Convert model-level alignments to phone-sequences (in integer, not text, form)
Usage:  ali-to-phones  [options] <model> <alignments-rspecifier> <phone-transcript-wspecifier|ctm-wxfilename>
e.g.: 
 ali-to-phones 1.mdl ark:1.ali ark:-
or:
 ali-to-phones --ctm-output 1.mdl ark:1.ali 1.ctm
See also: show-alignments lattice-align-phones

Options:
  --ctm-output                : If true, output the alignments in ctm format (the confidences will be set to 1) (bool, default = false)
  --frame-shift               : frame shift used to control the times of the ctm output (float, default = 0.01)
  --per-frame                 : If true, write out the frame-level phone alignment (else phone sequence) (bool, default = false)
  --write-lengths             : If true, write the #frames for each phone (different format) (bool, default = false)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help       

: 1

**Note:** Notice that they are `gzipped` (compressed).  So, in order to access the "actual" binary file, you'll need to decompress the file, either in a separate, initial step or via a `piped` step.  Below you can see how you can decompress "on-the-fly" using `gzip -cd`.

**Note:** You'll also notice we're `pip`ing `int2sym.pl` since the output of `fsts-to-transcripts` are indexes.  This will convert those indexes to their appropriate words.

In [10]:
ali-to-phones \
    --per-frame=true \
    exp/monophones/final.mdl \
    "ark:gzip -cd exp/monophones/ali.1.gz|" \
    "ark,t:|int2sym.pl -f 2- data/lang/phones.txt" \
    | head -n1

ali-to-phones --per-frame=true exp/monophones/final.mdl 'ark:gzip -cd exp/monophones/ali.1.gz|' 'ark,t:|int2sym.pl -f 2- data/lang/phones.txt' 
1272-128104-0009 SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL SIL HH_B HH_B HH_B HH_B HH_B HH_B HH_B HH_B HH_B HH_B HH_B HH_B HH_B HH_B HH_B IY1_E IY1_E IY1_E IY1_E IY1_E IY1_E IY1_E L_B L_B L_B L_B L_B L_B L_B L_B L_B L_B L_B L_B AH0_I AH0_I AH0_I AH0_I M_I M_I M_I M_I M_I M_I M_I M_I M_I M_I M_I EH1_I EH1_I EH1_I EH1_I EH1_I N_I N_I N_I N_I N_I N_I T_I T_I T_I S_E S_E S_E S_E S_E S_E S_E S_E S_E S_E S_E M_B M_B M_B M_B M_B M_B M_B M_B M_B M_B M_B OW1_I OW1_I OW1_I OW1_I OW1_I OW1_I OW1_I OW1_I OW1_I OW1_I OW1_I OW1_I OW1_I S_I S_I S_I S_I S_I S_I S_I S_I S_I S_I S_I S_I S_I S_I S_I T_E T_E T_E T_E T_E SIL SIL SIL SIL SIL SIL SIL SIL SIL B_B B_B B_B B_B B_B B_B B_B B_B IH1_I IH1_I IH

LOG (ali-to-phones[5.2.191~1-48be1]:main():ali-to-phones.cc:134) Done 68 utterances.


### `tree`

This file is a representation of the decision tree that will be used to cluster the phones.  We will go into much more detail about this later and we will look at a visual representation of this tree generated by `draw-tree`.

In [11]:
draw-tree

draw-tree 

Outputs a decision tree description in GraphViz format
Usage: draw-tree [options] <phone-symbols> <tree>
e.g.: draw-tree phones.txt tree | dot -Gsize=8,10.5 -Tps | ps2pdf - tree.pdf

Options:
  --gen-html                  : generates HTML boilerplate(useful with SVG) (bool, default = false)
  --query                     : a query to trace through the tree(format: pdf-class/ctx-phone1/.../ctx-phoneN) (string, default = "")
  --use-tooltips              : use tooltips instead of labels (bool, default = false)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)



: 255

The command below will save a `.jpg` of the tree to `exp/monphones/tree`, and the next cell will render that `.jpg` using `Markdown` (if you want to see how to render images in `Markdown`, click on the next cell and the `Markdown` command will be revealed).  

In [1]:
draw-tree \
    data/lang/phones.txt \
    exp/monophones/tree \
    | dot -Tpng -Gsize=8,10.5 > exp/monophones/tree.jpg

bash: draw-tree: command not found
bash: dot: command not found


: 127

![tree](exp/monophones/tree.jpg)

Obviously, it's not to easy to inspect in this form, but you can at least verify that the structure is, in fact, a tree.

## `exp/monophones_aligned`

This directory contains the files generated from the *first* layer of training: `monophones`, but this time the alignment step of the first layer.  There are no new file types in this directory that we didn't already see in `monophones`.

In [34]:
ls exp/monophones_aligned

[0m[01;31mali.1.gz[0m  [01;31mali.3.gz[0m  cmvn_opts  final.occs              [01;34mlog[0m       tree
[01;31mali.2.gz[0m  [01;31mali.4.gz[0m  final.mdl  kaldi_config_args.json  num_jobs


## `exp/triphones`

This directory contains the files generated from the *second* layer of training: `triphones`.  This directory only has two additional file types that were not present in `monophones`: `questions.{int,qst}`.

In [35]:
ls -lah exp/triphones

total 25M
drwxr-xr-x  3 root root 4.0K Nov 29 20:31 [0m[01;34m.[0m
drwxr-xr-x 10 root root 4.0K Nov 29 21:10 [01;34m..[0m
-rw-r--r--  1 root root 3.6M Nov 29 20:31 35.mdl
-rw-r--r--  1 root root  11K Nov 29 20:31 35.occs
-rw-r--r--  1 root root 427K Nov 29 20:31 [01;31mali.1.gz[0m
-rw-r--r--  1 root root 447K Nov 29 20:31 [01;31mali.2.gz[0m
-rw-r--r--  1 root root 414K Nov 29 20:31 [01;31mali.3.gz[0m
-rw-r--r--  1 root root 400K Nov 29 20:31 [01;31mali.4.gz[0m
-rw-r--r--  1 root root    1 Nov 29 20:25 cmvn_opts
lrwxrwxrwx  1 root root    6 Nov 29 20:31 [01;36mfinal.mdl[0m -> 35.mdl
lrwxrwxrwx  1 root root    7 Nov 29 20:31 [01;36mfinal.occs[0m -> 35.occs
-rw-r--r--  1 root root 4.7M Nov 29 20:27 [01;31mfsts.1.gz[0m
-rw-r--r--  1 root root 4.9M Nov 29 20:27 [01;31mfsts.2.gz[0m
-rw-r--r--  1 root root 4.6M Nov 29 20:27 [01;31mfsts.3.gz[0m
-rw-r--r--  1 root root 4.5M Nov 29 20:27 [01;31mfsts.4.gz[0m
-rw-r--r--  1 root root  910 Nov 29 20:31 kaldi_config_args.json

### `questions.{int,qst}`

These files are a representation of the "questions" asked by the `decision tree`.  In other words, these `questions` will represent the phones that we will cluster together.  In the `.int` file below, each line represents a cluster. (The `.qst` file is a `kaldi` `binary` file representing the same thing.)

We will look at the *quality* of these clustering decisions later.

**Note:** We had to `pipe` the file through `int2sym.pl` because the original file consisted of phone indexes.

In [18]:
cat exp/triphones/questions.int | int2sym.pl -f 1- data/lang/phones.txt | head -n3

SIL SIL_B SIL_E SIL_I SIL_S B_B B_E B_I B_S CH_B CH_E CH_I CH_S D_B D_E D_I D_S F_B F_E F_I F_S HH_B HH_E HH_I HH_S IY2_B IY2_E IY2_I IY2_S JH_B JH_E JH_I JH_S K_B K_E K_I K_S OW0_B OW0_E OW0_I OW0_S P_B P_E P_I P_S S_B S_E S_I S_S SH_B SH_E SH_I SH_S T_B T_E T_I T_S TH_B TH_E TH_I TH_S V_B V_E V_I V_S Z_B Z_E Z_I Z_S 
AA0_B AA0_E AA0_I AA0_S AA1_B AA1_E AA1_I AA1_S AA2_B AA2_E AA2_I AA2_S AE0_B AE0_E AE0_I AE0_S AE1_B AE1_E AE1_I AE1_S AE2_B AE2_E AE2_I AE2_S AH0_B AH0_E AH0_I AH0_S AH1_B AH1_E AH1_I AH1_S AH2_B AH2_E AH2_I AH2_S AO0_B AO0_E AO0_I AO0_S AO1_B AO1_E AO1_I AO1_S AO2_B AO2_E AO2_I AO2_S AW0_B AW0_E AW0_I AW0_S AW1_B AW1_E AW1_I AW1_S AW2_B AW2_E AW2_I AW2_S AY0_B AY0_E AY0_I AY0_S AY1_B AY1_E AY1_I AY1_S AY2_B AY2_E AY2_I AY2_S DH_B DH_E DH_I DH_S EH0_B EH0_E EH0_I EH0_S EH1_B EH1_E EH1_I EH1_S EH2_B EH2_E EH2_I EH2_S ER0_B ER0_E ER0_I ER0_S ER1_B ER1_E ER1_I ER1_S ER2_B ER2_E ER2_I ER2_S EY0_B EY0_E EY0_I EY0_S EY1_B EY1_E EY1_I EY1_S EY2_B EY2_E EY2_I EY2_S G_B G_E G_I

## `exp/triphones_aligned`

As was the case with `monophones_aligned`, this directory contains the files generated from the alignment step of the `triphones` layer of our model.  There are no new file types in this directory that we didn't already see in `triphones`.

In [30]:
ls exp/triphones_aligned

[0m[01;31mali.1.gz[0m  [01;31mali.3.gz[0m  cmvn_opts  final.occs              [01;34mlog[0m       tree
[01;31mali.2.gz[0m  [01;31mali.4.gz[0m  final.mdl  kaldi_config_args.json  num_jobs


## `exp/triphones_lda`

This directory contains the files generated from the *third* layer of training: `LDA_MLLT` over `triphones`.

In [32]:
ls -lah exp/triphones_lda

total 30M
drwxr-xr-x  3 root root 4.0K Nov 29 20:51 [0m[01;34m.[0m
drwxr-xr-x 10 root root 4.0K Nov 29 21:10 [01;34m..[0m
-rw-r--r--  1 root root  19K Nov 29 20:43 0.mat
-rw-r--r--  1 root root  19K Nov 29 20:47 12.mat
-rw-r--r--  1 root root 6.3K Nov 29 20:47 12.mat.new
-rw-r--r--  1 root root  19K Nov 29 20:45 2.mat
-rw-r--r--  1 root root 6.3K Nov 29 20:45 2.mat.new
-rw-r--r--  1 root root 7.1M Nov 29 20:51 35.mdl
-rw-r--r--  1 root root  16K Nov 29 20:51 35.occs
-rw-r--r--  1 root root  19K Nov 29 20:46 4.mat
-rw-r--r--  1 root root 6.3K Nov 29 20:46 4.mat.new
-rw-r--r--  1 root root  19K Nov 29 20:46 6.mat
-rw-r--r--  1 root root 6.3K Nov 29 20:46 6.mat.new
-rw-r--r--  1 root root 460K Nov 29 20:50 [01;31mali.1.gz[0m
-rw-r--r--  1 root root 480K Nov 29 20:50 [01;31mali.2.gz[0m
-rw-r--r--  1 root root 445K Nov 29 20:50 [01;31mali.3.gz[0m
-rw-r--r--  1 root root 430K Nov 29 20:50 [01;31mali.4.gz[0m
-rw-r--r--  1 root root    1 Nov 29 20:43 cmvn_opts
lrwxrwxrwx  1 root r

### `*.mat`

The only file type we haven't seen before is `.mat`, which is short for `matrix`.  These files simply represent the `matrix` required to perform the `LDA` operation on our existing data.  And, like we have already seen, the `final.mat` is a symbolic link to the largest-indexed `.mat` file (in this case `12.mat`).

We can use `copy-matrix` with the `--binary=false` flag to convert this `kaldi` `binary` into text form.

In [41]:
copy-matrix

copy-matrix 

Copy matrices, or archives of matrices (e.g. features or transforms)
Also see copy-feats which has other format options

Usage: copy-matrix [options] <matrix-in-rspecifier> <matrix-out-wspecifier>
  or: copy-matrix [options] <matrix-in-rxfilename> <matrix-out-wxfilename>
 e.g.: copy-matrix --binary=false 1.mat -
   copy-matrix ark:2.trans ark,t:-
See also: copy-feats

Options:
  --apply-exp                 : This option can be used to apply exp on the matrices (bool, default = false)
  --apply-log                 : This option can be used to apply log on the matrices. Must be avoided if matrix has negative quantities. (bool, default = false)
  --apply-power               : This option can be used to apply a power on the matrices (float, default = 1)
  --apply-softmax-per-row     : This option can be used to apply softmax per row of the matrices (bool, default = false)
  --binary                    : Write in binary mode (only relevant if output is a wxfilename) (bool, def

: 1

In [47]:
copy-matrix \
    --binary=false \
    exp/triphones_lda/final.mat \
    "| head -n3"

copy-matrix --binary=false exp/triphones_lda/final.mat '| head -n3' 
 [
  0.07149278 0.008385221 -0.01232026 -7.043179e-05 -0.002204495 0.005338904 0.002215973 0.003546142 0.0005242043 -0.0009295529 -0.001774614 -0.0007837248 0.000380595 -0.0144319 0.002918942 -0.003731522 -0.000601034 -0.002036393 0.0014272 0.002937004 0.002266583 -7.352007e-05 0.001647709 -0.0002968625 0.0003556351 -0.0001980344 0.05281531 0.009069759 -0.008264982 -0.002705297 -0.001498894 0.001901628 0.001680963 0.002671708 0.0006718803 0.001273278 -0.0006866669 -0.0006726775 -0.0002037716 0.04388294 0.01003785 -0.008299275 -0.003537703 -0.001521669 0.0003429263 0.002231088 0.002696633 0.0008015841 0.001603368 0.001965621 0.0002240839 -0.0002978679 0.09146843 0.01477116 -0.0123357 -0.003690582 -0.00179056 -0.0007164379 0.001799415 0.001465623 0.002091999 0.002410769 0.001265119 0.0002804297 -0.0003286418 0.03134751 0.009644954 -0.009132599 -0.001378151 -0.001029731 -0.0003904275 0.001487889 0.001683231 0.001501346 0

This is not very helpful to us, but we can make sure the matrix dimensions match up with our expectations using `matrix-dim`.

In [42]:
matrix-dim

matrix-dim 

Print dimension info on an input matrix (rows then cols, separated by tab), to
standard output.  Output for single filename: rows[tab]cols.  Output per line for
archive of files: key[tab]rows[tab]cols
Usage: matrix-dim [options] <matrix-in>|<in-rspecifier>
e.g.: matrix-dim final.mat | cut -f 2
See also: feat-to-len, feat-to-dim

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)



: 1

In [43]:
matrix-dim exp/triphones_lda/final.mat

matrix-dim exp/triphones_lda/final.mat 
40	117


## `exp/triphones_lda_aligned`

As was the case with the other `*_aligned` directories, this directory contains the files generated from the alignment step of the `LDA` layer of our model.  There is only one new file type we haven't seen before: `trans.*`

In [48]:
ls exp/triphones_lda_aligned

[0m[01;31mali.1.gz[0m  cmvn_opts   [01;31mfsts.1.gz[0m  full.mat                splice_opts  trans.4
[01;31mali.2.gz[0m  final.mat   [01;31mfsts.2.gz[0m  kaldi_config_args.json  trans.1      tree
[01;31mali.3.gz[0m  final.mdl   [01;31mfsts.3.gz[0m  [01;34mlog[0m                     trans.2
[01;31mali.4.gz[0m  final.occs  [01;31mfsts.4.gz[0m  num_jobs                trans.3
