Error found when running librispeech recipe with latest version of espresso #47

PhenixCFLi · 2020-11-11T17:58:49Z

🐛 Bug

There are two issues after install the latest version of espresso:

The specaug parameter parsing errro occur once we enable the specaug function

2020-11-11 12:04:42 | INFO | espresso.speech_train | --max-tokens is the maximum number of input frames in a batch
Traceback (most recent call last):
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/examples/asr_librispeech/../../espresso/speech_train.py", line 415, in <module>
    cli_main()
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/examples/asr_librispeech/../../espresso/speech_train.py", line 404, in cli_main
    cfg = convert_namespace_to_omegaconf(args)
  File "/nfs/mercury-13/u20/cli/src/espresso-11112020/espresso/fairseq/dataclass/utils.py", line 324, in convert_namespace_to_omegaconf
    composed_cfg = compose("config", overrides=overrides, strict=False)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/experimental/compose.py", line 31, in compose
    cfg = gh.hydra.compose_config(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
    return self._load_configuration(
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 180, in _load_configuration
    parsed_overrides = parser.parse_overrides(overrides=overrides)
  File "/nfs/mercury-13/u20/cli/miniconda3/envs/espresso-11112020/lib/python3.8/site-packages/hydra/core/override_parser/overrides_parser.py", line 95, in parse_overrides
    raise OverrideParseException(
hydra.errors.OverrideParseException: mismatched input 'W' expecting <EOF>
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details

It crash in model training step (step 8) without any error

2020-11-11 12:38:55 | INFO | espresso.speech_train | task: SpeechRecognitionEspressoTask
2020-11-11 12:38:55 | INFO | espresso.speech_train | model: SpeechLSTMModel
2020-11-11 12:38:55 | INFO | espresso.speech_train | criterion: LabelSmoothedCrossEntropyV2Criterion)
2020-11-11 12:38:55 | INFO | espresso.speech_train | num. model params: 159660204 (num. trained: 159660204)
2020-11-11 12:38:55 | INFO | fairseq.trainer | detected shared parameter: decoder.attention.query_proj.bias <- decoder.attention.value_proj.bias
2020-11-11 12:38:55 | INFO | espresso.speech_train | training on 1 devices (GPUs/TPUs)
2020-11-11 12:38:55 | INFO | espresso.speech_train | max tokens per GPU = 26000 and batch size per GPU = 24
2020-11-11 12:38:55 | INFO | fairseq.trainer | no existing checkpoint found exp/lstm_wsj.specaug.bpe1k/checkpoint_last.pt
2020-11-11 12:38:55 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-11 12:39:05 | INFO | espresso.tasks.speech_recognition | /nfs/mercury-13/u20/cli/src/espresso.latest/espresso/examples/asr_librispeech/data-bulgarian-bpe1k/train.json 33004 examples
./run.sh: line 259:  4839 Segmentation fault      CUDA_VISIBLE_DEVICES=$free_gpu speech_train.py $data_dir --task speech_recognition_espresso --seed 1 --log-interval $((8000/ngpus/update_freq)) --log-format simple --print-training-sample-interval $((4000/ngpus/update_freq)) --num-workers 0 --data-buffer-size 0 --max-tokens 26000 --batch-size 24 --curriculum 1 --empty-cache-freq 50 --valid-subset $valid_subset --batch-size-valid 48 --ddp-backend no_c10d --update-freq $update_freq --distributed-world-size $ngpus --optimizer adam --lr 0.001 --weight-decay 0.0 --clip-norm 2.0 --save-dir $dir --restore-file checkpoint_last.pt --save-interval-updates $((6000/ngpus/update_freq)) --keep-interval-updates 3 --keep-last-epochs 5 --validate-interval 1 --best-checkpoint-metric wer --criterion label_smoothed_cross_entropy_v2 --label-smoothing 0.1 --smoothing-type uniform --dict $dict --bpe sentencepiece --sentencepiece-model ${sentencepiece_model}.model --max-source-positions 9999 --max-target-positions 999 $opts --specaugment-config "$specaug_config" 2>&1

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Run cmd: ./run.sh
See error: listed above

Expected behavior

Able to train model with the recipe

Environment

fairseq Version (e.g., 1.0 or master): 1.0.0a0+d966482
PyTorch Version (e.g., 1.0): 1.4.0
OS (e.g., Linux): CentOS Linux release 7.7.1908 (Core)
How you installed fairseq (pip, source): pip install from source
Build command you used (if compiling from source): pip install --editable .
Python version: 3.8.5
CUDA/cuDNN version: py3.8_cuda10.0.130_cudnn7.6.3_0
GPU models and configuration:
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

freewym · 2020-11-12T00:07:46Z

I tested them locally and don't have these issues. Can you please checkout the latest version (in the temp branch)? It's possible that fairseq had some issue and maybe it has been fixed now.

edit: it's now in master

PhenixCFLi · 2020-11-12T07:08:48Z

Thanks a lot, I will try again and let you know the result.

PhenixCFLi · 2020-11-17T14:44:53Z

I have checked the latest version also not working.
But it can be resolved by downgrading sentencepiece to version 0.1.91

freewym · 2020-11-17T19:45:49Z

Did sentencepiece cause the 1st issue or the 2nd one? But anyways if the version causes the problem, IDK why it didn't happen in the 1st iteration. Are there any special symbols in the iteration where it crashes?

PhenixCFLi · 2020-11-17T22:17:25Z

Sorry, I clarify again.
Issue 1) Error occur when enable to specaug.
- Need to escape the symbol ' in the specaug_config, like this specaug_config="{\'W\': 80, \'F\': 27, \'T\': 100, \'num_freq_masks\': 2, \'num_time_masks\': 2, \'p\': 1.0}"

Issue 2) Crash without error

Version issue, downgrade the sentensepiece will do.

freewym · 2020-11-17T22:41:38Z

Is it a known issue for sentencepiece, or anyone reported an issue to the sentencepiece team?

…

On Tue, Nov 17, 2020 at 5:17 PM Phenix C.F.Li ***@***.***> wrote: Sorry, I clarify again. Issue 1) Error occur when enable to specaug. - Need to escape the symbol ' in the specaug_config, like this specaug_config="{\'W\': 80, \'F\': 27, \'T\': 100, \'num_freq_masks\': 2, \'num_time_masks\': 2, \'p\': 1.0}" Issue 2) Crash without error - Version issue, downgrade the sentensepiece will do. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#47 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2YBEWTDYD556PICGQ7N43SQLZAJANCNFSM4TSJHTBA> .

-- Yiming Wang Department of Computer Science The Johns Hopkins University 3400 N. Charles St. Baltimore, MD 21218

PhenixCFLi · 2020-11-20T14:34:19Z

thx

PhenixCFLi added the bug Something isn't working label Nov 11, 2020

PhenixCFLi changed the title ~~Error found in librispeech recipe~~ Error found when running librispeech recipe with latest version of espresso Nov 11, 2020

PhenixCFLi closed this as completed Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error found when running librispeech recipe with latest version of espresso #47

Error found when running librispeech recipe with latest version of espresso #47

PhenixCFLi commented Nov 11, 2020 •

edited

freewym commented Nov 12, 2020 •

edited

PhenixCFLi commented Nov 12, 2020

PhenixCFLi commented Nov 17, 2020

freewym commented Nov 17, 2020

PhenixCFLi commented Nov 17, 2020

freewym commented Nov 17, 2020 via email

PhenixCFLi commented Nov 20, 2020

Error found when running librispeech recipe with latest version of espresso #47

Error found when running librispeech recipe with latest version of espresso #47

Comments

PhenixCFLi commented Nov 11, 2020 • edited

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

freewym commented Nov 12, 2020 • edited

PhenixCFLi commented Nov 12, 2020

PhenixCFLi commented Nov 17, 2020

freewym commented Nov 17, 2020

PhenixCFLi commented Nov 17, 2020

freewym commented Nov 17, 2020 via email

PhenixCFLi commented Nov 20, 2020

PhenixCFLi commented Nov 11, 2020 •

edited

freewym commented Nov 12, 2020 •

edited