Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error found when running librispeech recipe with latest version of espresso #47

Closed
PhenixCFLi opened this issue Nov 11, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@PhenixCFLi
Copy link

PhenixCFLi commented Nov 11, 2020

🐛 Bug

There are two issues after install the latest version of espresso:

  1. The specaug parameter parsing errro occur once we enable the specaug function
2020-11-11 12:04:42 | INFO | espresso.speech_train | --max-tokens is the maximum number of input frames in a batch
Traceback (most recent call last):
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/examples/asr_librispeech/../../espresso/speech_train.py", line 415, in <module>
    cli_main()
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/examples/asr_librispeech/../../espresso/speech_train.py", line 404, in cli_main
    cfg = convert_namespace_to_omegaconf(args)
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/fairseq/dataclass/utils.py", line 324, in convert_namespace_to_omegaconf
    composed_cfg = compose("config", overrides=overrides, strict=False)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/experimental/compose.py", line 31, in compose
    cfg = gh.hydra.compose_config(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
    return self._load_configuration(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 180, in _load_configuration
    parsed_overrides = parser.parse_overrides(overrides=overrides)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/core/override_parser/overrides_parser.py", line 95, in parse_overrides
    raise OverrideParseException(
hydra.errors.OverrideParseException: mismatched input 'W' expecting <EOF>
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details
  1. It crash in model training step (step 8) without any error
2020-11-11 12:38:55 | INFO | espresso.speech_train | task: SpeechRecognitionEspressoTask
2020-11-11 12:38:55 | INFO | espresso.speech_train | model: SpeechLSTMModel
2020-11-11 12:38:55 | INFO | espresso.speech_train | criterion: LabelSmoothedCrossEntropyV2Criterion)
2020-11-11 12:38:55 | INFO | espresso.speech_train | num. model params: 159660204 (num. trained: 159660204)
2020-11-11 12:38:55 | INFO | fairseq.trainer | detected shared parameter: decoder.attention.query_proj.bias <- decoder.attention.value_proj.bias
2020-11-11 12:38:55 | INFO | espresso.speech_train | training on 1 devices (GPUs/TPUs)
2020-11-11 12:38:55 | INFO | espresso.speech_train | max tokens per GPU = 26000 and batch size per GPU = 24
2020-11-11 12:38:55 | INFO | fairseq.trainer | no existing checkpoint found exp/lstm_wsj.specaug.bpe1k/checkpoint_last.pt
2020-11-11 12:38:55 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-11 12:39:05 | INFO | espresso.tasks.speech_recognition | /nfs/mercury-13/u20/cli/src/espresso.latest/espresso/examples/asr_librispeech/data-bulgarian-bpe1k/train.json 33004 examples
./run.sh: line 259:  4839 Segmentation fault      CUDA_VISIBLE_DEVICES=$free_gpu speech_train.py $data_dir --task speech_recognition_espresso --seed 1 --log-interval $((8000/ngpus/update_freq)) --log-format simple --print-training-sample-interval $((4000/ngpus/update_freq)) --num-workers 0 --data-buffer-size 0 --max-tokens 26000 --batch-size 24 --curriculum 1 --empty-cache-freq 50 --valid-subset $valid_subset --batch-size-valid 48 --ddp-backend no_c10d --update-freq $update_freq --distributed-world-size $ngpus --optimizer adam --lr 0.001 --weight-decay 0.0 --clip-norm 2.0 --save-dir $dir --restore-file checkpoint_last.pt --save-interval-updates $((6000/ngpus/update_freq)) --keep-interval-updates 3 --keep-last-epochs 5 --validate-interval 1 --best-checkpoint-metric wer --criterion label_smoothed_cross_entropy_v2 --label-smoothing 0.1 --smoothing-type uniform --dict $dict --bpe sentencepiece --sentencepiece-model ${sentencepiece_model}.model --max-source-positions 9999 --max-target-positions 999 $opts --specaugment-config "$specaug_config" 2>&1

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd: ./run.sh
  2. See error: listed above

Expected behavior

Able to train model with the recipe

Environment

  • fairseq Version (e.g., 1.0 or master): 1.0.0a0+d966482
  • PyTorch Version (e.g., 1.0): 1.4.0
  • OS (e.g., Linux): CentOS Linux release 7.7.1908 (Core)
  • How you installed fairseq (pip, source): pip install from source
  • Build command you used (if compiling from source): pip install --editable .
  • Python version: 3.8.5
  • CUDA/cuDNN version: py3.8_cuda10.0.130_cudnn7.6.3_0
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@PhenixCFLi PhenixCFLi added the bug Something isn't working label Nov 11, 2020
@PhenixCFLi PhenixCFLi changed the title Error found in librispeech recipe Error found when running librispeech recipe with latest version of espresso Nov 11, 2020
@freewym
Copy link
Owner

freewym commented Nov 12, 2020

I tested them locally and don't have these issues. Can you please checkout the latest version (in the temp branch)? It's possible that fairseq had some issue and maybe it has been fixed now.

edit: it's now in master

@PhenixCFLi
Copy link
Author

Thanks a lot, I will try again and let you know the result.

@PhenixCFLi
Copy link
Author

I have checked the latest version also not working.
But it can be resolved by downgrading sentencepiece to version 0.1.91

@freewym
Copy link
Owner

freewym commented Nov 17, 2020

Did sentencepiece cause the 1st issue or the 2nd one? But anyways if the version causes the problem, IDK why it didn't happen in the 1st iteration. Are there any special symbols in the iteration where it crashes?

@PhenixCFLi
Copy link
Author

Sorry, I clarify again.
Issue 1) Error occur when enable to specaug.
- Need to escape the symbol ' in the specaug_config, like this specaug_config="{\'W\': 80, \'F\': 27, \'T\': 100, \'num_freq_masks\': 2, \'num_time_masks\': 2, \'p\': 1.0}"

Issue 2) Crash without error

  • Version issue, downgrade the sentensepiece will do.

@freewym
Copy link
Owner

freewym commented Nov 17, 2020 via email

@PhenixCFLi
Copy link
Author

thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants