Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: unexpected EOF. Corrupted File? #13

Closed
gouldju1 opened this issue Jun 10, 2020 · 4 comments
Closed

RuntimeError: unexpected EOF. Corrupted File? #13

gouldju1 opened this issue Jun 10, 2020 · 4 comments

Comments

@gouldju1
Copy link

gouldju1 commented Jun 10, 2020

Hello,

I performed the following:

  1. Clone prophetnet repository
  2. Installed torch and fairseq
  3. Download ProphetNet-large-160GB pre-trained model
  4. Download CNN/DM data
  5. Preprocess CNN/DM data via preprocess_cnn_dm.py
  6. Use fairseq-preprocess to generate binaries

When I run fairseq-train or inference fairseq-generate, I get the following errors:
Train

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 51, in main
    model = task.build_model(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/tasks/fairseq_task.py", line 185, in build_model
    return models.build_model(args, self)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 147, in build_model
    states = torch.load(args.load_from_pretrained_model, map_location='cpu')
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 1092436 more bytes. The file might be corrupted.

Inference

Traceback (most recent call last):  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 151, in load_checkpoint_to_cpu    from fairseq.fb_pathmgr import fb_pathmgr
ModuleNotFoundError: No module named 'fairseq.fb_pathmgr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-generate", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 199, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 47, in main
    task=task,
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 179, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 160, in load_checkpoint_to_cpu
    path, map_location=lambda s, l: default_restore_location(s, "cpu")
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 5239485 more bytes. The file might be corrupted.

Inputs:

Train

fairseq-train \
--fp16 \
--user-dir ./prophetnet --task translation_prophetnet --arch ngram_transformer_prophet_large \
--optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 \
--lr 0.0001 \
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 1000 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--criterion ngram_language_loss --label-smoothing 0.1 \
--update-freq 32  --max-sentences 2 \
--num-workers 4 \
--load-from-pretrained-model ../prophetnet_large_pretrained_160G_14epoch_model.pt \
--load-sep \
--ddp-backend=no_c10d --max-epoch 10 \
--max-source-positions 512 --max-target-positions 512 \
--skip-invalid-size-inputs-valid-test \
--seed 1 \
--save-dir ./cnndm/finetune_cnndm_checkpoints \
--keep-last-epochs 10 \
--tensorboard-logdir ./cnndm/finetune_cnndm_tensorboard \
./cnndm/processed

Inference

fairseq-generate \
./cnndm/processed \
--path ../prophetnet_large_pretrained_16G_64epoch_model.pt \
--user-dir prophetnet \
--task translation_prophetnet \
--batch-size 32 \
--gen-subset test \
--beam 5 \
--num-workers 4 \
--min-len 45 \
--max-len-b 110 \
--no-repeat-ngram-size 3 --lenpen 1.2 2>&1 > ../logs.output

Any idea how to handle this? Thank you.

@yuyan2do
Copy link
Member

Looks like the binary data is incomplete. Please check the size of your bin, idx files, reprocess the data could help resolve this issue.

@gouldju1
Copy link
Author

Yes, it looks like this resolves the issue. However, now, after entering an input sentence during fairseq-interaction, I get the following:

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-interactive", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-interactive')()
  File "/workspace/fairseq/fairseq_cli/interactive.py", line 213, in cli_main
    main(args)
  File "/workspace/fairseq/fairseq_cli/interactive.py", line 164, in main
    translations = task.inference_step(generator, models, sample)
  File "/workspace/fairseq/fairseq/tasks/fairseq_task.py", line 356, in inference_step
    return generator.generate(models, sample, prefix_tokens=prefix_tokens)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/workspace/fairseq/fairseq/sequence_generator.py", line 161, in generate
    return self._generate(sample, **kwargs)
  File "/workspace/fairseq/fairseq/sequence_generator.py", line 261, in _generate
    tokens[:, : step + 1], encoder_outs, self.temperature
  File "/workspace/fairseq/fairseq/sequence_generator.py", line 726, in forward_decoder
    incremental_state=self.incremental_states[i],
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 590, in forward
    x_list, extra = self.extract_features(prev_output_tokens, encoder_out, incremental_state, **unused)
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 751, in extract_features
    real_positions=real_positions
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 365, in forward
    real_positions=real_positions
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/ProphetNet/src/prophetnet/ngram_multihead_attention.py", line 244, in forward
    saved_state = self._get_input_buffer(incremental_state)
  File "/workspace/ProphetNet/src/prophetnet/ngram_multihead_attention.py", line 418, in _get_input_buffer
    'attn_state',
  File "/workspace/fairseq/fairseq/utils.py", line 91, in get_incremental_state
    return module.get_incremental_state(incremental_state, key)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'NgramMultiheadAttention' object has no attribute 'get_incremental_state'

@qiweizhen
Copy link
Contributor

qiweizhen commented Jun 11, 2020

@gouldju1 Hi, no attribution error is caused by Fairseq version. The master version of Fairseq keeps changing in its api, thus we build ProphetNet in v-0.9.0.
Please pip install fairseq==v0.9.0, and try if it works

@gouldju1
Copy link
Author

Yes, that works. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants