Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


A basic makefile is provided as a starting point for compiling the code. You will want to edit the makefile (for example, specify the correct path of the compiler) to compile the code. You should be able to do a simple make to compile the code.


Please note that a large portion of the source code in this repo was contributed by Mark Johnson ( I'm very grateful for that he made his source code for his NAACL'09 paper avaialble, and please cite his work if you find his portion of the code helpful for your own project.

File Descriptions

For files that were contributed by me, I'm providing a brief description for each of them here. If you find the description unclear or have any questions, please feel free to send me an email at or file a pull request to change it!

To better understand the descriptions, I suggest the reader to refer to this paper ( I'll use terminologies that are consistent with those used in that paper.

  • represents the boundary variables. A bound object consists of a few speech feature frames.
  • represents the segments. A segment object consists of one or more bound objects.
  • represents an automatically discovered unit. You can think of each cluster as a phone unit.
  • sets up all the experiment configuration.
  • represents an input speech utterance.
  • an implementation of a Gaussian mixture model.
  • a Gaussian mixture component.
  • contains code for doing the sampling-based inference steps.

Test Data

In the exps folder, I've put the data I used to run one of the experiments reported in my TACL paper. You should be able to run it by doing after you change the path to the compiled binary, whose name will be adaptor. I haven't tested it, so let me know if it doesn't work -- just shoot me an email (



  1. Added data for all the lectures used in the experiments reported in the TACL paper.
  2. Added the scripts folder, where you can find the scripts that map the discovered plus to words.


Matthew Goldstein has pointed out that the path to the MKL libary needs to be changed for some *.d files in order to compile and run the source code.


Source code for "Unsupervised Lexicon Discovery from Acoustic Input ", Lee et al, 2015 TACL







No releases published


No packages published