We support importing Kaldi data directories that contain at least the wav.scp
file, required to create the ~lhotse.audio.RecordingSet
. Other files, such as segments
, utt2spk
, etc. are used to create the ~lhotse.supervision.SupervisionSet
. We also support converting feats.scp
to ~lhotse.features.base.FeatureSet
, and reading features directly from Kaldi's scp/ark files via kaldi_native_io library (which is an optional Lhotse's dependency).
We also allow to export a pair of ~lhotse.audio.RecordingSet
and ~lhotse.supervision.SupervisionSet
to a Kaldi data directory.
We currently do not support the following (but may start doing so in the future):
- Exporting Lhotse extracted features to Kaldi's
feats.scp
- Export Lhotse's multi-channel recording sets to Kaldi
We support Kaldi-compatible log-mel filter energies ("fbank") and MFCCs. We provide a PyTorch implementation that is GPU-compatible, allows batching, and backpropagation. To learn more about feature extraction in Lhotse, see features
.
Python methods related to Kaldi support:
lhotse.kaldi
Converting Kaldi data directory called data/train
, with 16kHz sampling rate recordings, to a directory with Lhotse manifests called train_manifests
:
# Convert data/train to train_manifests/{recordings,supervisions}.json
lhotse kaldi import \
data/train \
16000 \
train_manifests
# Convert train_manifests/{recordings,supervisions}.json to data/train
lhotse kaldi export \
train_manifests/recordings.json \
train_manifests/supervisions.json \
data/train
Hint
Before you continue, make sure you have run pip install kaldi-native-io
; otherwise, you won't be able to get features.jsonl.gz
below.
In the following, we demonstrate how to import a Kaldi data directory using the yesno
dataset.
Assume you have run the following commands with Kaldi:
cd kaldi/egs/yesno/s5
./run.sh
Take the data/train_yesno
directory as an example:
ls data/train_yesno/
cmvn.scp conf feats.scp frame_shift spk2utt split1 text utt2dur utt2num_frames utt2spk wav.scp
You can use the following command to import it into lhotse:
lhotse kaldi import \
--frame-shift 0.01 \
./data/train_yesno \
8000 \
./data/train_manifests/
Hint
You can use lhotse kaldi import --help
to view the help information. In the above, 8000
is the sampling rate for the yesno
dataset.
It will generate the following files:
$ ls data/train_manifests/
features.jsonl.gz recordings.jsonl.gz supervisions.jsonl.gz
To create a CutSet
from the above files, you can use:
lhotse cut simple \
-r ./data/train_manifests/recordings.jsonl.gz \
-f ./data/train_manifests/features.jsonl.gz \
-s ./data/train_manifests/supervisions.jsonl.gz \
./yesno_train.jsonl.gz
Now you can use ./yesno_train.jsonl.gz
for training.