You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pytorch->Flax and Flax->Pytorch equivalence tests were failing. At the moment they are skipped by #23040
Expected behavior
During working on #21023 I have found out that there is a bug in pytorch's implementation of BigBird. Namely random attention is used no matter whether we are in training/eval mode. Corect behaviour is that during inference (eval) we should not introduce any randomness, hence we random attention should not be used.
The text was updated successfully, but these errors were encountered:
Hi @sanchit-gandhi@ydshieh! I have opened PR that fixes failing tests. I am wondering if the changes in the PR are okay (usage of random attention based on current mode) or do we want to have some more control over usage of random attention e.g. add deterministic argument for __call__ of BigBirdPreTrainedModel. Secondly I was wondering what is the advantage of marking _bigbird_block_rand_mask as a staticmethod and then calling it with self._bigbird_block_rand_mask and passing it arguments from self like self.max_seqlen instead of treating it as a regular method. It looks kinda weird to me. Am I missing something?
Reproduction
Pytorch->Flax
andFlax->Pytorch
equivalence tests were failing. At the moment they are skipped by #23040Expected behavior
During working on #21023 I have found out that there is a bug in pytorch's implementation of
BigBird
. Namely random attention is used no matter whether we are in training/eval mode. Corect behaviour is that during inference (eval) we should not introduce any randomness, hence we random attention should not be used.The text was updated successfully, but these errors were encountered: