fix: dataset-root and dataset-folder-in-archive flag is empty #1060

hoangtnm · 2020-11-28T14:33:35Z

In line 151-160 and Line 437 of main.py, the default value of dataset-root and dataset-folder-in-archive will be None, which prevents main.py from knowing where the dataset is actually in the computer and loading it.

Moreover, n-hidden-channels 2000 has not been defined in main.py, so it needs to be removed.

Erro log:

python main.py \
    --reduce-lr-valid \
    --dataset-train train-clean-100 train-clean-360 train-other-500 \
    --dataset-valid dev-clean \
    --batch-size 128 \
    --learning-rate .6 \
    --momentum .8 \
    --weight-decay .00001 \
    --clip-grad 0. \
    --gamma .99 \
    --hop-length 160 \
    --win-length 400 \
    --n-bins 13 \
    --normalize \
    --optimizer adadelta \
    --scheduler reduceonplateau \                                         
    --epochs 30

/home/hoangtnm/anaconda3/envs/dl/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '
INFO:root:Namespace(batch_size=128, checkpoint='', clip_grad=0.0, dataset_folder_in_archive=None, dataset_root=None, dataset_train=['train-clean-100', 'train-clean-360', 'train-other-500'], dataset_valid=['dev-clean'], decoder='greedy', distributed=False, epochs=30, eps=1e-08, freq_mask=0, gamma=0.99, hop_length=160, jit=False, learning_rate=0.6, momentum=0.8, n_bins=13, normalize=True, optimizer='adadelta', progress_bar=False, reduce_lr_valid=True, rho=0.95, scheduler='reduceonplateau', seed=0, start_epoch=0, time_mask=0, type='mfcc', weight_decay=1e-05, win_length=400, workers=0, world_size=8)
INFO:root:Start time: 2020-11-28 21:18:22.337478
/home/hoangtnm/anaconda3/envs/dl/lib/python3.7/site-packages/torchaudio/backend/utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False` before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  'The interface of "soundfile" backend is planned to change in 0.8.0 to '
Traceback (most recent call last):
  File "main.py", line 670, in <module>
    spawn_main(main, args)
  File "main.py", line 663, in spawn_main
    main(0, args)
  File "main.py", line 454, in main
    root=args.dataset_root,
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 65, in split_process_vlsp2020asr
    return tuple(create(dataset) for dataset in datasets)
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 65, in <genexpr>
    return tuple(create(dataset) for dataset in datasets)
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 57, in create
    for tag, transform in zip(tags, transform_list)
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 57, in <listcomp>
    for tag, transform in zip(tags, transform_list)
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 15, in __init__
    self._path = os.path.join(root, url)
  File "/home/hoangtnm/anaconda3/envs/dl/lib/python3.7/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

In line [151-160](https://github.com/pytorch/audio/blob/master/examples/pipeline_wav2letter/main.py#L151) and Line [437](https://github.com/pytorch/audio/blob/fb3ef9ba427acd7db3084f988ab55169fab14854/examples/pipeline_wav2letter/main.py#L437) of main.py, the default value of `dataset-root` and `dataset-folder-in-archive` will be None, which prevents `main.py` from knowing where the dataset is actually in the computer and loading it. Moreover, `n-hidden-channels 2000` has not been defined in `main.py`, so it needs to be removed. Erro log: ```bash python main.py \ --reduce-lr-valid \ --dataset-train train-clean-100 train-clean-360 train-other-500 \ --dataset-valid dev-clean \ --batch-size 128 \ --learning-rate .6 \ --momentum .8 \ --weight-decay .00001 \ --clip-grad 0. \ --gamma .99 \ --hop-length 160 \ --win-length 400 \ --n-bins 13 \ --normalize \ --optimizer adadelta \ --scheduler reduceonplateau \ --epochs 30 /home/hoangtnm/anaconda3/envs/dl/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch#903 for the detail. '"sox" backend is being deprecated. ' INFO:root:Namespace(batch_size=128, checkpoint='', clip_grad=0.0, dataset_folder_in_archive=None, dataset_root=None, dataset_train=['train-clean-100', 'train-clean-360', 'train-other-500'], dataset_valid=['dev-clean'], decoder='greedy', distributed=False, epochs=30, eps=1e-08, freq_mask=0, gamma=0.99, hop_length=160, jit=False, learning_rate=0.6, momentum=0.8, n_bins=13, normalize=True, optimizer='adadelta', progress_bar=False, reduce_lr_valid=True, rho=0.95, scheduler='reduceonplateau', seed=0, start_epoch=0, time_mask=0, type='mfcc', weight_decay=1e-05, win_length=400, workers=0, world_size=8) INFO:root:Start time: 2020-11-28 21:18:22.337478 /home/hoangtnm/anaconda3/envs/dl/lib/python3.7/site-packages/torchaudio/backend/utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False` before setting the backend to "soundfile". Please refer to pytorch#903 for the detail. 'The interface of "soundfile" backend is planned to change in 0.8.0 to ' Traceback (most recent call last): File "main.py", line 670, in <module> spawn_main(main, args) File "main.py", line 663, in spawn_main main(0, args) File "main.py", line 454, in main root=args.dataset_root, File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 65, in split_process_vlsp2020asr return tuple(create(dataset) for dataset in datasets) File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 65, in <genexpr> return tuple(create(dataset) for dataset in datasets) File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 57, in create for tag, transform in zip(tags, transform_list) File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 57, in <listcomp> for tag, transform in zip(tags, transform_list) File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 15, in __init__ self._path = os.path.join(root, url) File "/home/hoangtnm/anaconda3/envs/dl/lib/python3.7/posixpath.py", line 80, in join a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType ```

facebook-github-bot · 2020-11-28T14:33:51Z

Hi @hoangtnm!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

…ial_win Modify torchvision_tutorial doc for windows

vincentqb

Thanks!

facebook-github-bot added the CLA Signed label Nov 28, 2020

mthrok requested a review from vincentqb January 21, 2021 01:09

mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021

Merge pull request pytorch#1060 from guyang3532/fix_torchvision_tutor…

c83c23d

…ial_win Modify torchvision_tutorial doc for windows

vincentqb approved these changes Apr 15, 2021

View reviewed changes

vincentqb merged commit 245da37 into pytorch:master Apr 15, 2021

carolineechen pushed a commit to carolineechen/audio that referenced this pull request Apr 30, 2021

fix: dataset-folder-in-archive flag is empty (pytorch#1060)

ced4907

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: dataset-root and dataset-folder-in-archive flag is empty #1060

fix: dataset-root and dataset-folder-in-archive flag is empty #1060

hoangtnm commented Nov 28, 2020

facebook-github-bot commented Nov 28, 2020

vincentqb left a comment

fix: dataset-root and dataset-folder-in-archive flag is empty #1060

fix: dataset-root and dataset-folder-in-archive flag is empty #1060

Conversation

hoangtnm commented Nov 28, 2020

facebook-github-bot commented Nov 28, 2020

vincentqb left a comment

Choose a reason for hiding this comment