Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support gradients to attention bias #636

Closed
EBGU opened this issue Jan 12, 2023 · 1 comment
Closed

Support gradients to attention bias #636

EBGU opened this issue Jan 12, 2023 · 1 comment

Comments

@EBGU
Copy link

EBGU commented Jan 12, 2023

馃殌 Feature

When I use memory_efficient_attention with a bias of Torch.Tensor, I got an error No operator found for this attention. Then I found a remark in xformers.ops.fmha , class _fMHA as "Only gradients to Q/K/V is implemented. For instance, it's not possible to backpropagate through the attention mask". I thought it means I could not get gradients for attention bias. I wonder if you could add a feature to support gradient for attention bias.

Motivation

In models like AlphaFold2, we used biased attention a lot, and the pairwise representation should require gradients.

Pitch

Backpropagate on attention bias for function memory_efficient_attention.

Additional context

I didn't see any args or kwargs for key_padding_mask. I wonder if it is proper to masked_fill the attention bias with float('-inf') to achieve key_padding_mask?

@danthe3rd
Copy link
Contributor

Hi @EBGU
Thanks for your report.
There is a PR right now adding support for exactly that: #587
We plan to merge it after we release 0.0.16 - hopefully next week

@EBGU EBGU closed this as completed Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants