Skip to content

eginhard/speech_correspondence

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Correspondence and Autoencoder Networks for Speech

Collaborators

  • Herman Kamper
  • Micha Elsner
  • Sharon Goldwater
  • Aren Jansen
  • Daniel Renshaw

Dependencies

Typical steps

  1. Put all the speech data for pretraining into a Numpy array and save in .npy format. In the code below, this matrix is specified using the dataset_npy_fn parameter.
  2. Pretrain stacked AE using the raw speech data:

    ./speech_correspondence/train_stacked_dae.py

    The parameter_dict dictionary is used to set the model parameters. This pretrains a model and saves the output.

  3. The next step is to put matching frames from word instances into two Numpy arrays and save these separately in .npy format. Every row in the first matrix should match with the corresponding row in the second matrix. In the code below these are specified with the dataset_npy_fn_x and dataset_npy_fn_y parameters.
  4. Train the correspondence AE:

    ./speech_correspondence/train_correspondence_ae.py

    The values in parameter_dict determines which pretrained model is used to initialize the network weights.

  5. Finally, test data can be encoded using ./speech_correspondence/encode.py. Run this program without any command line parameters to see its options.

References

If you used this code, please cite:

  • H. Kamper, M. Elsner, A. Jansen, and S. J. Goldwater, "Unsupervised neural network based feature extraction using weak top-down constraints," accepted for presentation at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

About

Correspondence and autoencoder neural network training for speech using Pylearn2.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%