Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch BigBird random attention #23055

Closed
Bearnardd opened this issue Apr 28, 2023 · 2 comments
Closed

Pytorch BigBird random attention #23055

Bearnardd opened this issue Apr 28, 2023 · 2 comments

Comments

@Bearnardd
Copy link
Contributor

Bearnardd commented Apr 28, 2023

Reproduction

Pytorch->Flax and Flax->Pytorch equivalence tests were failing. At the moment they are skipped by #23040

Expected behavior

During working on #21023 I have found out that there is a bug in pytorch's implementation of BigBird. Namely random attention is used no matter whether we are in training/eval mode. Corect behaviour is that during inference (eval) we should not introduce any randomness, hence we random attention should not be used.

@Bearnardd
Copy link
Contributor Author

Hi @sanchit-gandhi @ydshieh! I have opened PR that fixes failing tests. I am wondering if the changes in the PR are okay (usage of random attention based on current mode) or do we want to have some more control over usage of random attention e.g. add deterministic argument for __call__ of BigBirdPreTrainedModel. Secondly I was wondering what is the advantage of marking _bigbird_block_rand_mask as a staticmethod and then calling it with self._bigbird_block_rand_mask and passing it arguments from self like self.max_seqlen instead of treating it as a regular method. It looks kinda weird to me. Am I missing something?

@huggingface huggingface deleted a comment from github-actions bot May 30, 2023
@sanchit-gandhi
Copy link
Contributor

Closed via #23056.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants