Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making the kaldi import more robust #1129

Merged

Commits on Aug 24, 2023

  1. making the kaldi import more robust

    get_duration():
    - recover if audio file cannot be loaded for get_duration(), drop such recordings...
    - use chunksize for ProcessPoolExecutor::map (avoid hanging of ProcessPoolExecutor for large RecordingSets)
    KarelVesely84 committed Aug 24, 2023
    Configuration menu
    Copy the full SHA
    3b94fc7 View commit details
    Browse the repository at this point in the history
  2. incorporating the PR comments, adding sanity check

    - not more than 20% utterances can be dropped on `kaldi import`
    KarelVesely84 committed Aug 24, 2023
    Configuration menu
    Copy the full SHA
    c70b20b View commit details
    Browse the repository at this point in the history