Audio cuts are one of the main Lhotse features. Cut is a part of a recording, but it can be longer than a supervision segment, or even span multiple segments. The regions without a supervision are just audio that we don't assume we know anything about - there may be silence, noise, non-transcribed speech, etc. Task-specific datasets can leverage this information to generate masks for such regions.
lhotse.cut.Cut
lhotse.cut.CutSet
There are three cut classes: ~lhotse.cut.MonoCut
, ~lhotse.cut.MixedCut
, and ~lhotse.cut.PaddingCut
that are described below in more detail:
lhotse.cut.MonoCut
lhotse.cut.MixedCut
lhotse.cut.PaddingCut
We provide a limited CLI to manipulate Lhotse manifests. Some examples of how to perform manipulations in the terminal:
# Reject short segments
lhotse yaml filter 'duration>=3.0' cuts.jsonl cuts-3s.jsonl
# Pad short segments to 5 seconds.
lhotse cut pad --duration 5.0 cuts-3s.jsonl cuts-5s-pad.jsonl
# Truncate longer segments to 5 seconds.
lhotse cut truncate --max-duration 5.0 --offset-type random cuts-5s-pad.jsonl cuts-5s.jsonl