Kaldi Interoperability

Data import/export

We support importing Kaldi data directories that contain at least the wav.scp file, required to create the ~lhotse.audio.RecordingSet. Other files, such as segments, utt2spk, etc. are used to create the ~lhotse.supervision.SupervisionSet. We also support converting feats.scp to ~lhotse.features.base.FeatureSet, and reading features directly from Kaldi's scp/ark files via kaldi_native_io library (which is an optional Lhotse's dependency).

We also allow to export a pair of ~lhotse.audio.RecordingSet and ~lhotse.supervision.SupervisionSet to a Kaldi data directory.

We currently do not support the following (but may start doing so in the future):

Exporting Lhotse extracted features to Kaldi's feats.scp
Export Lhotse's multi-channel recording sets to Kaldi

Kaldi feature extractors

We support Kaldi-compatible log-mel filter energies ("fbank") and MFCCs. We provide a PyTorch implementation that is GPU-compatible, allows batching, and backpropagation. To learn more about feature extraction in Lhotse, see features.

Python

Python methods related to Kaldi support:

lhotse.kaldi

CLI

Converting Kaldi data directory called data/train, with 16kHz sampling rate recordings, to a directory with Lhotse manifests called train_manifests:

# Convert data/train to train_manifests/{recordings,supervisions}.json
lhotse kaldi import \
    data/train \
    16000 \
    train_manifests

# Convert train_manifests/{recordings,supervisions}.json to data/train
lhotse kaldi export \
    train_manifests/recordings.json \
    train_manifests/supervisions.json \
    data/train

Example

Hint

Before you continue, make sure you have run pip install kaldi-native-io; otherwise, you won't be able to get features.jsonl.gz below.

In the following, we demonstrate how to import a Kaldi data directory using the yesno dataset.

Assume you have run the following commands with Kaldi:

cd kaldi/egs/yesno/s5
./run.sh

Take the data/train_yesno directory as an example:

ls data/train_yesno/
cmvn.scp  conf  feats.scp  frame_shift  spk2utt  split1  text  utt2dur  utt2num_frames  utt2spk  wav.scp

You can use the following command to import it into lhotse:

lhotse kaldi import \
  --frame-shift 0.01 \
  ./data/train_yesno \
  8000 \
  ./data/train_manifests/

Hint

You can use lhotse kaldi import --help to view the help information. In the above, 8000 is the sampling rate for the yesno dataset.

It will generate the following files:

$ ls data/train_manifests/
features.jsonl.gz  recordings.jsonl.gz  supervisions.jsonl.gz

To create a CutSet from the above files, you can use:

lhotse cut simple \
  -r ./data/train_manifests/recordings.jsonl.gz \
  -f ./data/train_manifests/features.jsonl.gz \
  -s ./data/train_manifests/supervisions.jsonl.gz \
  ./yesno_train.jsonl.gz

Now you can use ./yesno_train.jsonl.gz for training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kaldi.rst

kaldi.rst

Kaldi Interoperability

Data import/export

Kaldi feature extractors

Python

CLI

Example

Files

kaldi.rst

Latest commit

History

kaldi.rst

File metadata and controls

Kaldi Interoperability

Data import/export

Kaldi feature extractors

Python

CLI

Example