Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing CV data in DATA_FORMATTING.md fails due to sox deps not in Docker Hub image #4

Closed
KathyReid opened this issue Feb 10, 2021 · 1 comment · Fixed by #15
Closed
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Milestone

Comments

@KathyReid
Copy link
Contributor

The instructions given for importing Common Voice datasets in DATA_FORMATTING.md fail as the Docker Hub training image for DeepSpeech does not include sox dependencies.

If you try to import Common Voice using the current instructions, it will fail with:

root@c7f3e6f3c302:/DeepSpeech# bin/import_cv2.py deepspeech-data/cv-corpus-6.1-2020-12-11/vi
/bin/sh: 1: sox: not found
SoX could not be found!

    If you do not have SoX, proceed here:
     - - - http://sox.sourceforge.net/ - - -

    If you do (or think that you should) have SoX, double-check your
    path variables.
    
Loading TSV file:  /DeepSpeech/deepspeech-data/cv-corpus-6.1-2020-12-11/vi/test.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "bin/import_cv2.py", line 65, in one_sample
    _maybe_convert_wav(mp3_filename, wav_filename)
  File "bin/import_cv2.py", line 185, in _maybe_convert_wav
    transformer.build(mp3_filename, wav_filename)
  File "/usr/local/lib/python3.6/dist-packages/sox/transform.py", line 594, in build
    input_filepath, input_array, sample_rate_in
  File "/usr/local/lib/python3.6/dist-packages/sox/transform.py", line 496, in _parse_inputs
    input_format['channels'] = file_info.channels(input_filepath)
  File "/usr/local/lib/python3.6/dist-packages/sox/file_info.py", line 82, in channels
    output = soxi(input_filepath, 'c')
  File "/usr/local/lib/python3.6/dist-packages/sox/core.py", line 149, in soxi
    stderr=subprocess.PIPE
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'sox': 'sox'
"""This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "bin/import_cv2.py", line 221, in <module>
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
    main()
  File "bin/import_cv2.py", line 216, in main
    _preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)
  File "bin/import_cv2.py", line 172, in _preprocess_data
    set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)
  File "bin/import_cv2.py", line 127, in _maybe_convert_set
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
    for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
This install of SoX cannot process .mp3 files.
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
    raise value
FileNotFoundError: [Errno 2] No such file or directory: 'sox': 'sox'
This install of SoX cannot process .mp3 files.

upstream PR at:
mozilla/DeepSpeech#3488

@KathyReid KathyReid added this to the Beta release milestone Feb 28, 2021
@KathyReid KathyReid self-assigned this Feb 28, 2021
@KathyReid KathyReid added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 28, 2021
@KathyReid
Copy link
Contributor Author

Based on the discussion at:
mozilla/DeepSpeech#3488
what I am going to do here is:

  • add instructions on the DATA_FORMATTING.md page on installing the sox deps
  • add instructions to the ENVIRONMENT.md page on how to extend the base deepspeech-training:xx Docker image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant