# 7.1: Creating `G.fst`

Decoding in `kaldi` (when we actually predict what words we "heard") is simply a traversal through a large graph that represents most of our learned model.  We will go into this in much more detail later, but for now, we will focus on converting our language model into an `FST`.

### **TODO** add resources

We need to `source` `path.sh` so that we can call `C++` functions without full paths.

In [1]:
. path.sh

And we'll use the aptly-named `arpa2fst` to convert our `ARPA`-formatted language model into an `FST`.

In [2]:
arpa2fst

arpa2fst 

Convert an ARPA format language model into an FST
Usage: arpa2fst [opts] <input-arpa> <output-fst>
 e.g.: arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt lm/input.arpa G.fst

Note: When called without switches, the output G.fst will contain
an embedded symbol table. This is compatible with the way a previous
version of arpa2fst worked.

Options:
  --bos-symbol                : Beginning of sentence symbol (string, default = "<s>")
  --disambig-symbol           : Disambiguator. If provided (e. g. #0), used on input side of backoff links, and <s> and </s> are replaced with epsilons (string, default = "")
  --eos-symbol                : End of sentence symbol (string, default = "</s>")
  --ilabel-sort               : Ilabel-sort the output FST (bool, default = true)
  --keep-symbols              : Store symbol table with FST. Symbols always saved to FST if symbol tables are neither read or written (otherwise symbols would be lost entirely) (bool, default =

: 1

In [5]:
mkdir resource_files/fst

In [6]:
arpa2fst resource_files/language_model/animal_lm-2_gram.arpa resource_files/fst/animal_fst-2_gram.fst

arpa2fst resource_files/language_model/animal_lm-2_gram.arpa resource_files/fst/animal_fst-2_gram.fst 
LOG (arpa2fst[5.2.191~1-48be1]:Read():arpa-file-parser.cc:98) Reading \data\ section.
LOG (arpa2fst[5.2.191~1-48be1]:Read():arpa-file-parser.cc:153) Reading \1-grams: section.
LOG (arpa2fst[5.2.191~1-48be1]:Read():arpa-file-parser.cc:153) Reading \2-grams: section.


`kaldi` will use [`openFST`](http://www.openfst.org/twiki/bin/view/FST/WebHome) for some steps of this process in addition to some "homemade" functions (like `arpa2fst`).  We will use a `python` wrapper for `openFST` to inspect this `FST` in more detail in the next notebook.  But for now, we can confirm the building of the `FST` and get some of its properties by using the `openFST` function `fstinfo` 

**Note:** Because we `sourc`ed `path.sh` above, we can also call `openFST` functions without full paths.

In [9]:
fstinfo --help

Prints out information about an FST.

  Usage: fstinfo [in.fst]

PROGRAM FLAGS:

  --arc_filter: type = string, default = "any"
  Arc filter: one of: "any", "epsilon", "iepsilon", "oepsilon"; this only affects the counts of (co)accessible states, connected states, and (strongly) connected components
  --fst_verify: type = bool, default = true
  Verify FST sanity
  --info_type: type = string, default = "auto"
  Info format: one of: "auto", "long", "short"
  --pipe: type = bool, default = false
  Send info to stderr, input to stdout
  --test_properties: type = bool, default = true
  Compute property values (if unknown to FST)

LIBRARY FLAGS:

Flags from: flags.cc
  --help: type = bool, default = false
  show usage information
  --helpshort: type = bool, default = false
  show brief usage information
  --tmpdir: type = string, default = "/tmp"
  temporary directory
  --v: type = int32, default = 0
  verbosity level

Flags from: fst.cc
  --fst_align: type = bool, default = false
  Write FS

: 1

We'll put the resulting `FST` in a new directory, `resource_files/fst`.

In [8]:
fstinfo resource_files/fst/animal_fst-2_gram.fst

fst type                                          vector
arc type                                          standard
input symbol table                                resource_files/fst/animal_fst-2_gram.fst
output symbol table                               resource_files/fst/animal_fst-2_gram.fst
# of states                                       17
# of arcs                                         61
initial state                                     3
# of final states                                 1
# of input/output epsilons                        14
# of input epsilons                               14
# of output epsilons                              14
input label multiplicity                          1
output label multiplicity                         1
# of accessible states                            17
# of coaccessible states                          17
# of connected states                             17
# of connected components                         1
# of strongly conn

In the next notebook, we'll examine this `FST` in more detail.