Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fairseq-generate giving me the error: 'RuntimeError: Mask Type should be defined' on Colab #4899

Open
FleetAdmiral opened this issue Dec 10, 2022 · 10 comments

Comments

@FleetAdmiral
Copy link

Some background:

I'm working on a translation problem where I am able to get through the fairseq-preprocess and fairseq-train but during the process of fairseq-generate, the operation fails in the middle.

I have not found any mention of this error message online as an issue or in any documentation.

What I've attempted from my end:

Reducing train/test size.
Increase the train and/or test size.
Making sure the test dataset has no unknown token.
I'm a novice so this may look elementary, but I'd really appreciate it if you can help me out here.

!fairseq-generate drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/ \
    --path drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt \
    --batch-size 128 \
    --beam 5 \
    --seed 1 \
    --source-lang mt --target-lang pe \
    --results-path drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/results_text \
    --scoring bleu \
    --wandb-project "Hi to Hi" 

This is the error that is then presented:

2022-12-08 14:55:12 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2022-12-08 14:55:14 | INFO | fairseq_cli.generate | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'log_file': None, 'aim_repo': None, 'aim_run_hash': None, 'tensorboard_logdir': None, 'wandb_project': 'Hi to Hi', 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt', 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/results_text'}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 128, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 128, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False, 'skip_remainder_batch': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False, 'eos_token': None}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'wav2vec2', 'extractor_mode': 'default', 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': 'gelu', 'layer_type': 'transformer', 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 0, 'layer_norm_first': False, 'conv_feature_layers': '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]', 'conv_bias': False, 'logit_temp': 0.1, 'quantize_targets': False, 'quantize_input': False, 'same_quantizer': False, 'target_glu': False, 'feature_grad_mult': 1.0, 'quantizer_depth': 1, 'quantizer_factor': 3, 'latent_vars': 320, 'latent_groups': 2, 'latent_dim': 0, 'mask_length': 10, 'mask_prob': 0.65, 'mask_selection': 'static', 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'require_same_masks': True, 'mask_dropout': 0.0, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_before': False, 'mask_channel_selection': 'static', 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'num_negatives': 100, 'negatives_from_everywhere': False, 'cross_sample_negatives': 0, 'codebook_negatives': 0, 'conv_pos': 128, 'conv_pos_groups': 16, 'pos_conv_depth': 1, 'latent_temp': [2.0, 0.5, 0.999995], 'max_positions': 100000, 'checkpoint_activations': False, 'required_seq_len_multiple': 1, 'crop_seq_to_multiple': 1, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}, 'task': {'_name': 'translation', 'data': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/', 'source_lang': 'mt', 'target_lang': 'pe', 'load_alignments': False, 'left_pad_source': True, 'left_pad_target': False, 'max_source_positions': 1024, 'max_target_positions': 1024, 'upsample_primary': -1, 'truncate_source': False, 'num_batch_buckets': 0, 'train_subset': 'train', 'dataset_impl': None, 'required_seq_len_multiple': 1, 'eval_bleu': False, 'eval_bleu_args': '{}', 'eval_bleu_detok': 'space', 'eval_bleu_detok_args': '{}', 'eval_tokenized_bleu': False, 'eval_bleu_remove_bpe': None, 'eval_bleu_print_samples': False}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}}
2022-12-08 14:55:14 | INFO | fairseq.tasks.translation | [mt] dictionary: 130200 types
2022-12-08 14:55:14 | INFO | fairseq.tasks.translation | [pe] dictionary: 62888 types
2022-12-08 14:55:14 | INFO | fairseq_cli.generate | loading model(s) from drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt
2022-12-08 14:55:16 | INFO | fairseq.data.data_utils | loaded 54,997 examples from: drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/test.mt-pe.mt
2022-12-08 14:55:16 | INFO | fairseq.data.data_utils | loaded 54,997 examples from: drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/test.mt-pe.pe
2022-12-08 14:55:16 | INFO | fairseq.tasks.translation | drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/ test mt-pe 54997 examples
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-generate", line 8, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 413, in cli_main
    main(args)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 48, in main
    return _main(cfg, h)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 201, in _main
    hypos = task.inference_step(
  File "/usr/local/lib/python3.8/dist-packages/fairseq/tasks/fairseq_task.py", line 540, in inference_step
    return generator.generate(
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 204, in generate
    return self._generate(sample, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 274, in _generate
    encoder_outs = self.model.forward_encoder(net_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 801, in forward_encoder
    return [model.encoder.forward_torchscript(net_input) for model in self.models]
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 801, in <listcomp>
    return [model.encoder.forward_torchscript(net_input) for model in self.models]
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/fairseq_encoder.py", line 55, in forward_torchscript
    return self.forward_non_torchscript(net_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/fairseq_encoder.py", line 62, in forward_non_torchscript
    return self.forward(**encoder_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/transformer/transformer_encoder.py", line 165, in forward
    return self.forward_scriptable(
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/transformer/transformer_encoder.py", line 294, in forward_scriptable
    lr = layer(x, encoder_padding_mask=encoder_padding_mask_out)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/modules/transformer_layer.py", line 319, in forward
    output = torch._transformer_encoder_layer_fwd(
RuntimeError: Mask Type should be defined
@OmarAshrafFathy
Copy link

I got the same error

@geehaad
Copy link

geehaad commented Dec 11, 2022

Try to down grade Fairseq to the previous version.

@OmarAshrafFathy
Copy link

You can try the following lines:
!pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
then
!pip install fairseq==0.12.2
it solved the issue, the problem was with the new version of torch that is installed on Colab, so installing the previous version of torch solves the issue.

@arnavmehta7
Copy link

@OmarAshrafFathy Thankyou for this. This error gave me a shock on the piece of code which we hadn't touched since months. 😂

@jcheigh
Copy link

jcheigh commented Nov 1, 2023

Love you @OmarAshrafFathy you saved me

@boolmriver
Copy link

The model I trained with fairseq 0.12.2 and torch 2.1.0 also encountered this situation. If the torch version is reduced, does the model still need to be retrained? @OmarAshrafFathy !thank you

@OmarAshrafFathy
Copy link

@boolmriver No, you don't need to retrain the model again.

@krgy12138
Copy link

Sorry I don't think it's optimal to fix this problem with a downgraded version. It looks like a problem with the higher version of torch? Is there a solution to this problem

@krgy12138
Copy link

I have a simpler solution , which is to skip _transformer_encoder_layer_fwd by setting can_use_fastpath to False at generate, but that doesn't look good

@udiboy1209
Copy link

This issue seems to be fixed on the latest main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants