Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computation of mask indices in Wav2vec2Model fails with low probabilities #14524

Closed
nikvaessen opened this issue Nov 25, 2021 · 0 comments
Closed

Comments

@nikvaessen
Copy link
Contributor

nikvaessen commented Nov 25, 2021

Environment info

  • transformers version: 4.12.2
  • Platform: Linux
  • Python version: 3.8
  • PyTorch version (GPU?): 1.10

Who can help

@patrickvonplaten

Information

I'm trying to reproduce fine-tuning with Wav2vec2 on Librispeech, however using feature mask probability 0.0012 as in the paper makes the code crash at some point (after ~3_000 steps).

To reproduce

from transformers.models.wav2vec2.modeling_wav2vec2 import _compute_mask_indices

mask = _compute_mask_indices(
    shape=(10, 500),
    mask_prob=0.0012, # or even lower
    mask_length=10,
)

print(mask)

raises

Traceback (most recent call last):
  File "/home/nik/workspace/phd/repo/w2v2-mt-learning/playground/buggy_mask.py", line 3, in <module>
    mask = _compute_mask_indices(
  File "/home/nik/workspace/phd/repo/w2v2-mt-learning/.venv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 201, in _compute_mask_indices
    dummy_mask_idx = spec_aug_mask_idx[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

Note that using min_mask=1 prevents this issue as well.

Expected behavior

If the probability is so low that no features are masked, the method shouldn't raise an IndexError.

nikvaessen added a commit to nikvaessen/transformers that referenced this issue Nov 25, 2021
@anton-l anton-l closed this as completed in 6645eb6 Dec 2, 2021
Albertobegue pushed a commit to Albertobegue/transformers that referenced this issue Jan 27, 2022
…face#14525)

* fix huggingface#14524 (IndexError when mask prob is too low)

* fix formatting

* correct documentation, add option for setting min_num_masks

* change the semantic meaning of `mask_prob` in _compute_mask_indices

With this commit the meaing of `mask_prob` actually adhered to the probability for each
vector to be the start of a masked span of length.

* fix check_copies test

* fix documentation to semantic meaning of `upper bound of overall masking percentage`, revert changes to _compute_mask_indices

* fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant