Recipe for applying the embedded segmental k-means model to the ZeroSpeech2017 Track 2 challenge.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Embedded Segmental K-Means for ZeroSpeech2017 Track 2


This is a preliminary version of our system. This is not a final recipe, and is still being worked on.


A description of the challenge can be found here:


The code provided here is not pretty. But I believe that research should be reproducible, and I hope that this repository is sufficient to make this possible for the paper mentioned above. I provide no guarantees with the code, but please let me know if you have any problems, find bugs or have general comments.


Clone the zerospeech repositories:

mkdir ../src/
git clone \
# To-do: add installation and data download instructions
git clone \

Clone the eskmeans repository:

git clone \

Get the surprise data:

cd ../src/zerospeech2017_surprise/
source \
cd -

Update all the paths in to match your directory structure.

Feature extraction

Extract MFCC features by running the steps in features/

Unsupervised syllable boundary detection

We use the unsupervised syllable boundary detection algorithm described in:

  • O. J. Räsänen, G. Doyle, and M. C. Frank, "Unsupervised word discovery from speech using automatic segmentation into syllable-like units," in Proc. Interspeech, 2015.

Obtain the syllabe boundaries by running the steps in syllables/

Acoustic word embeddings: downsampling

We use one of the simplest methods to obtain acoustic word embeddings: downsampling. Different types of input features can be used. Run the steps in downsample/

Unsupervised segmentation and clustering

Segmentation and clustering is performed using the ESKMeans package. Run the steps in segmentation/