Neural Acoustic Word Embeddings for Switchboard
- S. Settle and K. Livescu, "Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches," in Proc. SLT, 2016.
- S. Settle, K. Levin, H. Kamper, and K. Livescu, "Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings," in Proc. Interspeech, 2017.
- python code to create, run, and save the model
- modification of Kaldi's swbd/s5c used to extract features and set up data as required
- word/segment list used in prior work Kamper et al., 2015 can also be found under kaldi/data/kamperh
- train/dev/test partitioning into Switchboard conversation sides (these are consistent with prior work, and extracted from the aforementioned word/segment list)
Ensure access to installed dependencies.
Check that $KALDI_ROOT variable points to the location of installed/compiled kaldi. This can be set in your ~/.bashrc or in kaldi/path.sh.
- set $swbd variable to your local switchboard datapath
- set $nj to number of desired jobs (default=8)
- set $stage to desired stage in feature creation (default=1)
- set $min_word_length to desired minimum length character sequence allowed for included words (default=6)
- set $min_audio_duration to minimum audio duration (in frames) allowed for included audio (default=50)
- set $min_train_occurrence_count to limit how common training words must have been (default=2, note: this must be >= 2 or siamese training will not work)
Navigate to kaldi directory and run "./run.sh". Now you should have the desired features.
Navigate to code directory and run "python main.py". This will train, evaluate, and save models.