Extending Lhotse dataloading to text/multimodal data #1295
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a very basic support for incorporating text-only data into Lhotse samplers to enable text and multimodal dataloading. Highlights:
SamplingConstraint
that generalizesTimeConstraint
, and allows to create other types of constraints to decide when to stop sampling a mini-batch as well as how to determine the "size" of an example (e.g. for audio its duration, but for text it may be sth like num tokens)constraint
whereSamplingConstraint
instances may be passed directlyTokenConstraint
which is almost identical toTimeConstraint
but uses num_tokens / max_tokensTextExample
that wraps text/tokens,CutSet
can be used to yield those (just pass text iterator to CutSet likeCutSet(text_example_iter)
) (it's not super clean but it works; trying to figure out if we can make this cleaner)This is stretching the original scope of Lhotse a bit, but I feel like it's worth it: we accumulated a bunch of solid techniques here and it'd be a pity to have to use something completely different for multimodal modeling, especially when so little changes are required to make it work here. Would love to know your thoughts @danpovey @csukuangfj @desh2608 @m-wiesner