[Longformer] Output both local attentions and global attentions when `output_attentions=True` -> Good Second Issue #7514

patrickvonplaten · 2020-10-01T16:33:52Z

🚀 Feature request

Good Second Issue - A more advanced issue for contributors who want to dive more into Longformer's attention mechanism.

Longformer currently only outputs global attentions, which is suboptimal because users might be interested in the local attentions as well. I propose to change the "output_attention" logic as follows in longformer:

attentions should correspond to the "local" attentions and then we'll add a new output type global_attention that contains the global_attentions. This is consistent with the naming of attention_mask and global_attention_mask IMO and the cleanest way to implement the feature.

Implementing this feature would mean to that Longformer will require its own ModelOutput class =>
BaseModelOutput, => LongformerBaseModelOutput or BaseModelOutputWithGlobalAttention (prefer the first name though)
BaseModelOutputWithPooling, => ...

Also some tests will have to be adapted.

This is a slightly more difficult issue, so I'm happy to help on it. One should understand the difference between local and global attention and how Longformer's attention is different to e.g. Bert's attention in general.

For more detail check out discussion here: #5646

The text was updated successfully, but these errors were encountered:

gui11aume · 2020-10-03T16:54:41Z

I am working on a pull request to address this. I don't see any major challenge so far, but this made me realize how much attentions in Bert-like models and in Longformers are different. Why not replace attentions in the Longformer by local_attentions?

This means that the interface of Longformers would become incompatible with every other Transformer, but maybe it should be? I don't think that there is a way to plug Longformer attentions into a code that expects Bert-like attentions and get meaningful results, so users always have to write a special case for Longformers if they use them. As is, the risk is that they get bogus output and won't realize it until they carefully read the doc (that is not yet written).

What are your thoughts on this @patrickvonplaten?

gui11aume · 2020-10-04T02:01:05Z

I have made the pull request.

I checked that the Longformer tests passed with my changes, and I added one more test to check the output of attention probabilities.

Quite stupidly I made the pull request to the master branch, I am sorry about this. I left it as is to avoid duplicating pull requests for now. You can reject it and I will make a cleaner pull request to a separate branch.

patrickvonplaten · 2020-10-29T23:57:39Z

sorry to have been so super inactive on this issue :-/ I will find time to solve it in ~1 week :-) . This issue is related as well: https://github.com/huggingface/transformers/pull/8007/files#r514633097.

gui11aume · 2020-10-30T00:06:45Z

No worries, there is no hurry on my side. Anyway, the issue is a little trickier than it looks because you guys have to decide how to encode attention probabilities when they are too large to be represented by a dense matrix. Let me know if there is anything I can do to help.

gui11aume · 2021-03-22T22:31:10Z

Hi @patrickvonplaten. I did not use the 🤗 Transformers since our discussion in November 2020. Today I came back to it (transformers version: 4.4.2) and I realized that this issue is still not completely solved. I could open a new issue, but I believe that the fix is really simple so I hope we can address it here: In some models, the global attentions are computed, stored in outputs, but at the very last stage they are not returned.

If I am not mistaken, the issue is in modeling_longformer.py. At lines 1784-1789 the code is

return LongformerMaskedLMOutput(
    loss=masked_lm_loss,
    logits=prediction_scores,
    hidden_states=outputs.hidden_states,
    attentions=outputs.attentions,
)

but I think it should be

return LongformerMaskedLMOutput(
    loss=masked_lm_loss,
    logits=prediction_scores,
    hidden_states=outputs.hidden_states,
    attentions=outputs.attentions,
    global_attentions=outputs.global_attentions,  # <=====
)

The same goes for lines 1876 and 2124 (but it is fine for lines 2029 and 2235).

patrickvonplaten · 2021-03-23T17:45:56Z

This sounds correct to me! Would you mind opening a new PR?

gui11aume · 2021-03-23T17:49:16Z

I will do it, no problem.

gui11aume · 2021-03-25T16:54:13Z

I made a minimal pull request #10906.

patrickvonplaten added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Oct 1, 2020

patrickvonplaten self-assigned this Oct 1, 2020

patrickvonplaten changed the title ~~[Longformer] Output both local attentions and global attentions when output_attentions=True~~ [Longformer] Output both local attentions and global attentions when output_attentions=True -> Second Good Issue Oct 1, 2020

patrickvonplaten changed the title ~~[Longformer] Output both local attentions and global attentions when output_attentions=True -> Second Good Issue~~ [Longformer] Output both local attentions and global attentions when output_attentions=True -> Good Second Issue Oct 1, 2020

gui11aume mentioned this issue Oct 2, 2020

Can't get (global) attention probs using Longformer #5646

Closed

4 tasks

gui11aume mentioned this issue Oct 4, 2020

Output global_attentions in Longformer models #7562

Merged

5 tasks

patrickvonplaten added this to On Hold - Wait for Main Contributor in Medium Contribution Proposals - Advanced Oct 27, 2020

patrickvonplaten mentioned this issue Oct 29, 2020

Ci test tf super slow #8007

Merged

46 tasks

patrickvonplaten closed this as completed in #7562 Nov 5, 2020

patrickvonplaten moved this from On Hold - Wait for Main Contributor to Done in Medium Contribution Proposals - Advanced Nov 11, 2020

gui11aume added a commit to gui11aume/transformers that referenced this issue Mar 25, 2021

Return global attentions (see huggingface#7514)

7d41d52

patrickvonplaten pushed a commit that referenced this issue Mar 29, 2021

Return global attentions (see #7514) (#10906)

b3544e4

Iwontbecreative pushed a commit to Iwontbecreative/transformers that referenced this issue Jul 15, 2021

Return global attentions (see huggingface#7514) (huggingface#10906)

887f9d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Longformer] Output both local attentions and global attentions when `output_attentions=True` -> Good Second Issue #7514

[Longformer] Output both local attentions and global attentions when `output_attentions=True` -> Good Second Issue #7514

patrickvonplaten commented Oct 1, 2020 •

edited

Loading

gui11aume commented Oct 3, 2020

gui11aume commented Oct 4, 2020

patrickvonplaten commented Oct 29, 2020

gui11aume commented Oct 30, 2020

gui11aume commented Mar 22, 2021

patrickvonplaten commented Mar 23, 2021

gui11aume commented Mar 23, 2021

gui11aume commented Mar 25, 2021

[Longformer] Output both local attentions and global attentions when output_attentions=True -> Good Second Issue #7514

[Longformer] Output both local attentions and global attentions when output_attentions=True -> Good Second Issue #7514

Comments

patrickvonplaten commented Oct 1, 2020 • edited Loading

🚀 Feature request

gui11aume commented Oct 3, 2020

gui11aume commented Oct 4, 2020

patrickvonplaten commented Oct 29, 2020

gui11aume commented Oct 30, 2020

gui11aume commented Mar 22, 2021

patrickvonplaten commented Mar 23, 2021

gui11aume commented Mar 23, 2021

gui11aume commented Mar 25, 2021

[Longformer] Output both local attentions and global attentions when `output_attentions=True` -> Good Second Issue #7514

[Longformer] Output both local attentions and global attentions when `output_attentions=True` -> Good Second Issue #7514

patrickvonplaten commented Oct 1, 2020 •

edited

Loading