Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster initialization option in DynamicBucketingSampler + various fixes #1210

Merged
merged 6 commits into from
Nov 10, 2023

Conversation

pzelasko
Copy link
Collaborator

Specifically:

  • user may pass precomputed duration_bins into DynamicBucketingSampler to avoid the initial iteration over cuts
  • user may override default audio loading duration mismatch tolerance vs manifest metadata with env var LHOTSE_AUDIO_DURATION_MISMATCH_TOLERANCE=<value-in-seconds>
  • fix torchaudio version detection
  • fault tolerant Shar audio export
  • fault tolerant orjson loading
  • IterableDatasetWrapper by default won't reset the sampler when __iter__ is called again, unless the sampler finished iteration

@pzelasko pzelasko added this to the v1.18 milestone Nov 10, 2023
@pzelasko pzelasko merged commit 230c8fc into master Nov 10, 2023
8 of 10 checks passed
@pzelasko pzelasko deleted the feature/dynamic-bucketing-precomputed-bins branch November 10, 2023 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant