Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Embedded Segmental K-Means Applied to Buckeye English and NCHLT Xitsonga

License: MIT


Unsupervised acoustic word segmentation and clustering of Buckeye English and NCHLT Xitsonga data using the embedded segmental K-means (ES-KMeans) algorithm. The experiments are described in:

H. Kamper, K. Livescu, and S. J. Goldwater, "An embedded segmental K-means model for unsupervised segmentation and clustering of speech," in Proc. ASRU, 2017. [arXiv]

Please cite this paper if you use the code.

This recipe relies on the separate ES-KMeans package, which performs the actual unsupervised segmentation and clustering.

Download datasets

The Buckeye English and portions of the NCHLT Xitsonga corpora are used:

From the complete Buckeye corpus we split off several subsets. The most important are the sets labelled as devpart1 and zs. These sets respectively correspond to English1 and English2 in (Kamper et al., 2016).

Install dependencies

Dependencies can be installed in a conda environment:

conda env create -f environment.yml
conda activate eskmeans

Install the ES-KMeans package:

mkdir ../src/
git clone ../src/eskmeans/

Extract speech features

Extract MFCCs in features/ as follows:

cd features/

More details on the feature file formats are given in features/

Unsupervised syllable boundary detection

As a preprocessing step, we constrain the allowed word boundary positions to boundaries detected by an unsupervised syllable boundary detection algorithm. We specifically use the algorithm described in:

O. J. Räsänen, G. Doyle, and M. C. Frank, "Pre-linguistic segmentation of speech into syllable-like units," Cognition, 2018.

Extract the syllable boundaries in syllables/ as follows:

cd syllables/
./ buckeye
./ xitsonga

Downsampled acoustic word embeddings

Extract and evaluate downsampled acoustic word embeddings by running the steps in downsample/

ES-KMeans: Segmentation and clustering

Segmentation and clustering is performed using the ES-KMeans package. Run the steps in segmentation/



Unsupervised segmentation and clustering of the Buckeye English and NCHLT Xitsonga datasets using the ES-KMeans algorithm.







No releases published


No packages published