Fairseq-generate giving me the error: 'RuntimeError: Mask Type should be defined' on Colab #4899

FleetAdmiral · 2022-12-10T07:42:49Z

Some background:

I'm working on a translation problem where I am able to get through the fairseq-preprocess and fairseq-train but during the process of fairseq-generate, the operation fails in the middle.

I have not found any mention of this error message online as an issue or in any documentation.

What I've attempted from my end:

Reducing train/test size.
Increase the train and/or test size.
Making sure the test dataset has no unknown token.
I'm a novice so this may look elementary, but I'd really appreciate it if you can help me out here.

!fairseq-generate drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/ \
    --path drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt \
    --batch-size 128 \
    --beam 5 \
    --seed 1 \
    --source-lang mt --target-lang pe \
    --results-path drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/results_text \
    --scoring bleu \
    --wandb-project "Hi to Hi"

This is the error that is then presented:

2022-12-08 14:55:12 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2022-12-08 14:55:14 | INFO | fairseq_cli.generate | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'log_file': None, 'aim_repo': None, 'aim_run_hash': None, 'tensorboard_logdir': None, 'wandb_project': 'Hi to Hi', 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt', 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/results_text'}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 128, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 128, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False, 'skip_remainder_batch': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False, 'eos_token': None}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'wav2vec2', 'extractor_mode': 'default', 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': 'gelu', 'layer_type': 'transformer', 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 0, 'layer_norm_first': False, 'conv_feature_layers': '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]', 'conv_bias': False, 'logit_temp': 0.1, 'quantize_targets': False, 'quantize_input': False, 'same_quantizer': False, 'target_glu': False, 'feature_grad_mult': 1.0, 'quantizer_depth': 1, 'quantizer_factor': 3, 'latent_vars': 320, 'latent_groups': 2, 'latent_dim': 0, 'mask_length': 10, 'mask_prob': 0.65, 'mask_selection': 'static', 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'require_same_masks': True, 'mask_dropout': 0.0, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_before': False, 'mask_channel_selection': 'static', 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'num_negatives': 100, 'negatives_from_everywhere': False, 'cross_sample_negatives': 0, 'codebook_negatives': 0, 'conv_pos': 128, 'conv_pos_groups': 16, 'pos_conv_depth': 1, 'latent_temp': [2.0, 0.5, 0.999995], 'max_positions': 100000, 'checkpoint_activations': False, 'required_seq_len_multiple': 1, 'crop_seq_to_multiple': 1, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}, 'task': {'_name': 'translation', 'data': 'drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/', 'source_lang': 'mt', 'target_lang': 'pe', 'load_alignments': False, 'left_pad_source': True, 'left_pad_target': False, 'max_source_positions': 1024, 'max_target_positions': 1024, 'upsample_primary': -1, 'truncate_source': False, 'num_batch_buckets': 0, 'train_subset': 'train', 'dataset_impl': None, 'required_seq_len_multiple': 1, 'eval_bleu': False, 'eval_bleu_args': '{}', 'eval_bleu_detok': 'space', 'eval_bleu_detok_args': '{}', 'eval_tokenized_bleu': False, 'eval_bleu_remove_bpe': None, 'eval_bleu_print_samples': False}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}}
2022-12-08 14:55:14 | INFO | fairseq.tasks.translation | [mt] dictionary: 130200 types
2022-12-08 14:55:14 | INFO | fairseq.tasks.translation | [pe] dictionary: 62888 types
2022-12-08 14:55:14 | INFO | fairseq_cli.generate | loading model(s) from drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/last_mt_pe/hi_to_hi/checkpoint_hi_hi/checkpoint_best.pt
2022-12-08 14:55:16 | INFO | fairseq.data.data_utils | loaded 54,997 examples from: drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/test.mt-pe.mt
2022-12-08 14:55:16 | INFO | fairseq.data.data_utils | loaded 54,997 examples from: drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/test.mt-pe.pe
2022-12-08 14:55:16 | INFO | fairseq.tasks.translation | drive/MyDrive/IITB_small/backward_generation/data-bin_mt_pe/ test mt-pe 54997 examples
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-generate", line 8, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 413, in cli_main
    main(args)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 48, in main
    return _main(cfg, h)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/generate.py", line 201, in _main
    hypos = task.inference_step(
  File "/usr/local/lib/python3.8/dist-packages/fairseq/tasks/fairseq_task.py", line 540, in inference_step
    return generator.generate(
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 204, in generate
    return self._generate(sample, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 274, in _generate
    encoder_outs = self.model.forward_encoder(net_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 801, in forward_encoder
    return [model.encoder.forward_torchscript(net_input) for model in self.models]
  File "/usr/local/lib/python3.8/dist-packages/fairseq/sequence_generator.py", line 801, in <listcomp>
    return [model.encoder.forward_torchscript(net_input) for model in self.models]
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/fairseq_encoder.py", line 55, in forward_torchscript
    return self.forward_non_torchscript(net_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/fairseq_encoder.py", line 62, in forward_non_torchscript
    return self.forward(**encoder_input)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/transformer/transformer_encoder.py", line 165, in forward
    return self.forward_scriptable(
  File "/usr/local/lib/python3.8/dist-packages/fairseq/models/transformer/transformer_encoder.py", line 294, in forward_scriptable
    lr = layer(x, encoder_padding_mask=encoder_padding_mask_out)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/modules/transformer_layer.py", line 319, in forward
    output = torch._transformer_encoder_layer_fwd(
RuntimeError: Mask Type should be defined

The text was updated successfully, but these errors were encountered:

OmarAshrafFathy · 2022-12-10T18:38:20Z

I got the same error

geehaad · 2022-12-11T02:58:07Z

Try to down grade Fairseq to the previous version.

OmarAshrafFathy · 2022-12-11T20:55:13Z

You can try the following lines:
!pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
then
!pip install fairseq==0.12.2
it solved the issue, the problem was with the new version of torch that is installed on Colab, so installing the previous version of torch solves the issue.

arnavmehta7 · 2022-12-23T11:54:25Z

@OmarAshrafFathy Thankyou for this. This error gave me a shock on the piece of code which we hadn't touched since months. 😂

jcheigh · 2023-11-01T23:29:48Z

Love you @OmarAshrafFathy you saved me

boolmriver · 2023-12-01T05:10:51Z

The model I trained with fairseq 0.12.2 and torch 2.1.0 also encountered this situation. If the torch version is reduced, does the model still need to be retrained? @OmarAshrafFathy ！thank you

OmarAshrafFathy · 2023-12-01T17:23:20Z

@boolmriver No, you don't need to retrain the model again.

krgy12138 · 2023-12-17T01:58:53Z

Sorry I don't think it's optimal to fix this problem with a downgraded version. It looks like a problem with the higher version of torch? Is there a solution to this problem

krgy12138 · 2023-12-17T02:06:15Z

I have a simpler solution , which is to skip _transformer_encoder_layer_fwd by setting can_use_fastpath to False at generate, but that doesn't look good

udiboy1209 · 2024-02-02T04:54:32Z

This issue seems to be fixed on the latest main branch.

FleetAdmiral added bug needs triage labels Dec 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fairseq-generate giving me the error: 'RuntimeError: Mask Type should be defined' on Colab #4899

Fairseq-generate giving me the error: 'RuntimeError: Mask Type should be defined' on Colab #4899

FleetAdmiral commented Dec 10, 2022

OmarAshrafFathy commented Dec 10, 2022

geehaad commented Dec 11, 2022

OmarAshrafFathy commented Dec 11, 2022

arnavmehta7 commented Dec 23, 2022

jcheigh commented Nov 1, 2023

boolmriver commented Dec 1, 2023

OmarAshrafFathy commented Dec 1, 2023

krgy12138 commented Dec 17, 2023

krgy12138 commented Dec 17, 2023

udiboy1209 commented Feb 2, 2024

Fairseq-generate giving me the error: 'RuntimeError: Mask Type should be defined' on Colab #4899

Fairseq-generate giving me the error: 'RuntimeError: Mask Type should be defined' on Colab #4899

Comments

FleetAdmiral commented Dec 10, 2022

OmarAshrafFathy commented Dec 10, 2022

geehaad commented Dec 11, 2022

OmarAshrafFathy commented Dec 11, 2022

arnavmehta7 commented Dec 23, 2022

jcheigh commented Nov 1, 2023

boolmriver commented Dec 1, 2023

OmarAshrafFathy commented Dec 1, 2023

krgy12138 commented Dec 17, 2023

krgy12138 commented Dec 17, 2023

udiboy1209 commented Feb 2, 2024