# 6.4: Examining the acoustic models

In [1]:
. ${KALDI_INSTRUCTIONAL_PATH}/path.sh

## understanding the `tree` file

The `tree` file is a `binary` representation of the decision tree built during the acoustic training.  This `tree` contains information about which phones were clustered together (*e.g.* "state-tied") to reduce the space of possibilities we need to model.

In the end, each `leaf` of the tree will represent a probability distribution (`pdf`), and the `tree`'s job is to decide which phones contexts can be grouped together appropriately.

### `tree-info`

We can use `tree-info` to get a few useful pieces of information about the `decision` tree.

In [10]:
tree-info

tree-info 

Print information about decision tree (mainly the number of pdfs), to stdout
Usage:  tree-info <tree-in>

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)



: 1

In [8]:
tree-info exp/monophones/tree

tree-info exp/monophones/tree 
num-pdfs 212
context-width 1
central-position 0


`num-pdfs` is the number of distributions that we end up with in the `tree`.  Remember, these are the `leaves`.  

`context-width` refers to how many phones of "context" we included.  `central-position` refers to which item is the "central" phone (and which are the "context" phones).  Since the tree above was built from `monophones` the width was `1` and the `central-position` was `0`.  But compare to the `tree` from the `triphones`.

In [11]:
tree-info exp/triphones/tree

tree-info exp/triphones/tree 
num-pdfs 2678
context-width 3
central-position 1


Now we have a `width` of 3, meaning one phone of "context" on each side, and thus the `central-position` is the second phone (remember, we're using `0-indexing` here).

Also not surprising here should be the fact that we have significantly more `pdf`s in the `triphone` `tree`.  

Because every combination of phones with context on each side needed to be modeled we started with `num_phones^3` possible combinations.

In [18]:
cat raw_data/librispeech-phones.txt | wc -l

70


Since we started off with 70 possible phones, we could have up to 343000 total combinations.  The purpose of the decision tree is to cluster together as many phones as possible **without losing our modeling power**.  In our case, the `triphones` `tree` reduceed that significantly.

## understanding the `.mdl` file

The `.mdl` file is the final `HMM` representing the acoustic model.  We can get some general information about this model by using a few `C++` methods (assuming we `sourc`ed `path.sh`).

### `gmm-info`

`gmm-info` will give use some very general statistics about the model, including some information about the `Gaussian Mixture Model` (`GMM`) and some information about the `HMM`.

In [2]:
gmm-info

gmm-info 

Write to standard output various properties of GMM-based model
Usage:  gmm-info [options] <model-in>
e.g.:
 gmm-info 1.mdl
See also: gmm-global-info, am-info

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)



: 1

In [3]:
gmm-info exp/monophones/final.mdl

gmm-info exp/monophones/final.mdl 
number of phones 281
number of pdfs 212
number of transition-ids 1746
number of transition-states 853
feature dimension 39
number of gaussians 9658


So we can see:
 - `number of phones` = how many phones this model represents
 - `number of pdfs` = how many representations of the phones were generated; this **SHOULD** be less than the number of `phones` because our decision tree grouped some phones together
 - `number_of_transition_ids` = ...
    TODO finish

In [5]:
gmm-info exp/triphones/final.mdl

gmm-info exp/triphones/final.mdl 
number of phones 281
number of pdfs 2678
number of transition-ids 21686
number of transition-states 10823
feature dimension 39
number of gaussians 10019


### `gmm-copy`

TODO here

In [16]:
gmm-copy

gmm-copy 

Copy GMM based model (and possibly change binary/text format)
Usage:  gmm-copy [options] <model-in> <model-out>
e.g.:
 gmm-copy --binary=false 1.mdl 1_txt.mdl

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --copy-am                   : Copy the acoustic model (AmDiagGmm object) (bool, default = true)
  --copy-tm                   : Copy the transition model (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)



: 1

In [8]:
gmm-copy \
    --binary=false \
    exp/monophones/final.mdl \
    - \
    | head -n30

gmm-copy --binary=false exp/monophones/final.mdl - 
<TransitionModel> 
<Topology> 
<TopologyEntry> 
<ForPhones> 
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 2

In [None]:
TODO explain

We can also look more closely at the transitions for each state of the `HMM` by using `show-transitions`.

### `show-transitions`

TODO here

In [8]:
show-transitions

show-transitions 

Print debugging info from transition model, in human-readable form
Usage:  show-transitions <phones-symbol-table> <transition/model-file> [<occs-file>]
e.g.: 
 show-transitions phones.txt 1.mdl 1.occs

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)



: 1

In [9]:
show-transitions \
    data/lang/phones.txt \
    exp/monophones/final.mdl \
    | head

show-transitions data/lang/phones.txt exp/monophones/final.mdl 
Transition-state 1: phone = SIL hmm-state = 0 pdf = 0
 Transition-id = 1 p = 0.825838 [self-loop]
 Transition-id = 2 p = 0.01 [0 -> 1]
 Transition-id = 3 p = 0.154166 [0 -> 2]
 Transition-id = 4 p = 0.01 [0 -> 3]
Transition-state 2: phone = SIL hmm-state = 1 pdf = 1
 Transition-id = 5 p = 0.951921 [self-loop]
 Transition-id = 6 p = 0.01 [1 -> 2]
 Transition-id = 7 p = 0.01 [1 -> 3]
 Transition-id = 8 p = 0.0280863 [1 -> 4]


In [None]:
Phone = SIL
state = 0
pdf = 0
arc = self-loop

In [12]:
copy-int-vector "ark:gunzip -c exp/monophones/ali.1.gz|" \
ark,t:- | head -n 1

copy-int-vector 'ark:gunzip -c exp/monophones/ali.1.gz|' ark,t:- 
1272-128104-0009 3 1 1 1 1 1 1 12 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 18 17 884 883 883 883 883 883 883 883 883 883 886 885 885 888 887 1010 1009 1012 1011 1014 1013 1013 1100 1099 1099 1099 1099 1099 1099 1099 1099 1102 1101 1104 248 250 249 252 1136 1135 1135 1135 1135 1135 1138 1137 1137 1137 1140 656 658 657 657 660 1160 1162 1161 1164 1163 1163 1448 1450 1452 1394 1393 1396 1395 1395 1395 1398 1397 1397 1397 1397 1124 1123 1123 1123 1123 1123 1126 1125 1125 1125 1128 1232 1234 1236 1235 1235 1235 1235 1235 1235 1235 1235 1235 1235 1400 1399 1399 1399 1399 1402 1401 1401 1404 1403 1403 1403 1403 1403 1403 1442 1441 1441 1444 1446 3 1 1 1 1 1 1 12 18 524 523 526 525 525 528 527 527 944 946 945 948 1448 1450 1449 1449 1452 1451 704 706 705 708 707 707 1112 1111 1111 1111 1111 1114 1113 1113 1113 1113 1116 986 985 985 985 985 98

ERROR (copy-int-vector[5.2.191~1-48be1]:Write():util/kaldi-table-inl.h:1515) Error in TableWriter::Write

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::TableWriter<kaldi::BasicVectorHolder<int> >::Write(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<int, std::allocator<int> > const&) const
main
__libc_start_main
_start


gzip: stdout: Broken pipe
ERROR (copy-int-vector[5.2.191~1-48be1]:~TableWriter():util/kaldi-table-inl.h:1539) Error closing TableWriter [in destructor].

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::TableWriter<kaldi::BasicVectorHolder<int> >::~TableWriter()
main
__libc_start_main
_start



In [11]:
show-alignments data/lang/phones.txt exp/monophones/final.mdl \
"ark:gunzip -c exp/monophones/ali.1.gz |" | head -n 2

show-alignments data/lang/phones.txt exp/monophones/final.mdl 'ark:gunzip -c exp/monophones/ali.1.gz |' 
1272-128104-0009  [ 3 1 1 1 1 1 1 12 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 18 17 ] [ 884 883 883 883 883 883 883 883 883 883 886 885 885 888 887 ] [ 1010 1009 1012 1011 1014 1013 1013 ] [ 1100 1099 1099 1099 1099 1099 1099 1099 1099 1102 1101 1104 ] [ 248 250 249 252 ] [ 1136 1135 1135 1135 1135 1135 1138 1137 1137 1137 1140 ] [ 656 658 657 657 660 ] [ 1160 1162 1161 1164 1163 1163 ] [ 1448 1450 1452 ] [ 1394 1393 1396 1395 1395 1395 1398 1397 1397 1397 1397 ] [ 1124 1123 1123 1123 1123 1123 1126 1125 1125 1125 1128 ] [ 1232 1234 1236 1235 1235 1235 1235 1235 1235 1235 1235 1235 1235 ] [ 1400 1399 1399 1399 1399 1402 1401 1401 1404 1403 1403 1403 1403 1403 1403 ] [ 1442 1441 1441 1444 1446 ] [ 3 1 1 1 1 1 1 12 18 ] [ 524 523 526 525 525 528 527 527 ] [ 944 946 945 948 ] [ 1448 1450 1449 1449 1

1272-128104-0009  SIL                                                                                                                                                         HH_B                                                            IY1_E                                  L_B                                                             AH0_I               M_I                                                        EH1_I                   N_I                               T_I                S_E                                                        M_B                                                        OW1_I                                                                S_I                                                                            T_E                          SIL                     B_B                                 IH1_I               T_I                               ER0_I                       L_I                                                        IY0_E  