Supplementary files for the sequential routing framework

An author of this git repo: Kyungmin Lee (sephiroce@snu.ac.kr)
Git: https://github.com/sephiroce/srf
DOI: https://doi.org/10.1016/j.csl.2021.101228

Highlights

Capsule network only structures can successfully map sequences to sequences
Mappings are refined by initializing routing iteration based on the previous output
Sequence-wise routing iteration allows for non-iterative inference
Structures of capsule network are more important than the number of parameters
Top layer capsules become similar to the capsule corresponding to a sequence label

Prerequisites

Tensorflow >= 2.3.0
Cuda >= 10.0
Kaldi: https://github.com/kaldi-asr/kaldi
SCTK: https://github.com/usnistgov/SCTK/blob/master/doc/sclite.htm
Python libraries: Please check requirements.txt in the "tfsr" folder.
- "tfsr" stands for "TensorFlow based Speech Recognition toolkit"

Directory structure

tfsr

the python scripts for training and decoding

egs

conf
- {timit, wsj}.conf: a configuration for the TIMIT and WSJ corpus
data
- timit_62.vocab: 62 label symbols.
- wsj_31.vocab: 31 label symbols.
  - A blank symbol is automatically added during training and decoding.
- sample.json: input file format for generating TFrecords.
script

train_{srf, cnn, lstm, stf}_{timit, wsj}.sh: This is a bash script to train and decoding models.
- Please check "log2utt.py", if you want to see how 61 symbols are mapped to 39 symbols for the TIMIT corpus.

How to use

Preparing TFrecords

Generating wav.scp and text files by running the Kaldi script ${KALDI}/egs/timit/s5/run.sh or ${KALDI}/egs/wsj/s5/run.sh
Extracting features to npy using egs/script/fbank123.sh, then you can find npy files.
Make json format files by referring to egs/data/sample.json, for examples, train.json, valid.json, and test.json.
run script/save_tfr.sh

Change the path in configurations and training scripts.

egs/conf/{timit, wsj}.conf
- path-{train, valid, test}-ptrn: file patterns of TFrecords
training/decoding scripts
- path-base: base path for TFrecords, vocab files and configuration files.

Run scripts to train and evaluate

Sequential Routing Framework (SRF)
$egs/script/train_srf_{timit, wsj}.sh $LAYER $PH $CH $DIM $LPAD $RPAD $METHOD $ITER
- $PH and $CH: heights of primary and convolutional capsule groups
- $DIM: the depth of all capsule groups
- $LPAD, and $RPAD indicates left and right context size of the window.
- $METHOD means the routing algorithm you can choose SDR or DR.
- $ITER is the number of routing iteration.
Speech TransFormer (STF)-based CTC network (it always uses the same CNN-FE with the SRF models) $egs/script/train_stf_{timit,wsj}.sh $LAYER $DIM $INN
- $DIM means the embedding dimension for STF models.
- $INN is the dimension of inner layers, i.e. the point-wise feed forward layers in STF models.
Bi/Uni-directional Long Short Term Memory-based CTC network
$egs/script/train_lstm_wsj.sh $LAYER $TYPE $DIM $CNNFE $LR
- $TYPE: blstm or ulstm
- $DIM means cell sizes
- $CNNFE: whether to use the same CNN-FE structure with SRF models (set to True to use the CNN-FE).
- $LR: Learning rate for Adam optimizers. (We decide to use Adam optimizers to LSTM-based models.)
Convolutional Neural Network (CNN)-based CTC network
$egs/script/train_cnn_{timit,wsj}.sh $LAYER $FILT_INP $FILT_INN $PROJ_NUM $PROJ_DIM $STRIDE $IS_MP
- $FILT_INP: the number of filters for the first four layers
- $FILT_INN: the number of filters for the rest of layers
- $PROJ_NUM: the number of feed forwarding layers
- $PROJ_DIM: the number of neurons in feed forwarding layers
- $STRIDE: the numbe of stride for the first two layers.
- $IS_MP : whether to use the same CNN-FE structure with SRF models. (set to False to use the CNN-FE).

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
egs		egs
tfsr		tfsr
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

egs

egs

tfsr

tfsr

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Supplementary files for the sequential routing framework

Highlights

Prerequisites

Directory structure

tfsr

egs

How to use

Preparing TFrecords

Change the path in configurations and training scripts.

Run scripts to train and evaluate

License

About

Releases

Packages

Languages

sephiroce/srf

Folders and files

Latest commit

History

Repository files navigation

Supplementary files for the sequential routing framework

Highlights

Prerequisites

Directory structure

tfsr

egs

How to use

Preparing TFrecords

Change the path in configurations and training scripts.

Run scripts to train and evaluate

License

About

Topics

Resources

Stars

Watchers

Forks

Languages