We support importing Kaldi data directories that contain at least the wav.scp
file, required to create the RecordingSet
. Other files, such as segments
, utt2spk
, etc. are used to create the SupervisionSet
.
We currently do not support the following (but may start doing so in the future):
- Importing Kaldi's extracted features (
feats.scp
is ignored) - Exporting Lhotse manifests as Kaldi data directories.
Python methods related to Kaldi support:
lhotse.kaldi
Converting Kaldi data directory called data/train
, with 16kHz sampling rate recordings, to a directory with Lhotse manifests called train_manifests
:
lhotse convert-kaldi data/train 16000 train_manifests