Attention heatmap visualization #90

Stephen0808 · 2023-05-07T14:02:10Z

How to generate the heatmap of attention demonstrated in your paper? Could you supply APIs?

Sntz91 · 2023-05-08T12:31:39Z

Hi Stephen,

i got it to run today and was able to visualize some heatmaps. For this, I basically copied the code from dinov1 and pasted it into the dinov2 code. I wrote a quick how-to, you can find it here: gitlab

Basically, you need to pull the dinov2 repo, change some classes and functions (explained in the linked readme), then use the visualize_attention.py file from dinov1 (you can use the already changed file in my linked repo).

If something is unclear, just let me know.

If someone found an easier way, then please let us know :)

Stephen0808 · 2023-05-09T03:57:22Z

A thorough and complete answer for such an impressive work. Thanks！

ludles · 2023-06-02T16:19:13Z

Hi Stephen,

i got it to run today and was able to visualize some heatmaps. For this, I basically copied the code from dinov1 and pasted it into the dinov2 code. I wrote a quick how-to, you can find it here: gitlab

Basically, you need to pull the dinov2 repo, change some classes and functions (explained in the linked readme), then use the visualize_attention.py file from dinov1 (you can use the already changed file in my linked repo).

If something is unclear, just let me know.

If someone found an easier way, then please let us know :)

Hi @Sntz91, thanks a lot for your answer. To use XFORMERS, adding a few lines worked for me:
changing

dinov2/dinov2/layers/attention.py

Lines 76 to 77 in c3c2683

    
           x = memory_efficient_attention(q, k, v, attn_bias=attn_bias) 
        
           x = x.reshape([B, N, C])

to

        x = memory_efficient_attention(q, k, v, attn_bias=attn_bias)
        if return_attn:
            attn = x.permute(0, 2, 1, 3) @ v.permute(0, 2, 3, 1)
            return attn
        x = x.reshape([B, N, C])

But I noticed that the visualised attention seems much noisier compared to dinov1. And the larger the model, the noisier the self-attention, e.g., vit_small gives visually much better results than vit_giant2. I wonder if you experienced the same?

yxchng · 2023-07-22T12:10:38Z

@ludles i am also noticing the same phenomenon. any idea why?

anas-zafar · 2023-07-22T14:14:29Z

Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth

anas-zafar · 2023-07-22T14:24:40Z

@ludles I noticed the same thing, any luck to why this might be happening?

rainbowpuffpuff · 2023-08-15T07:54:50Z

Might have to do with "Additionally, note the weird line attentions[:, 10] = 0. I found out that the attention
scores are very high for one specific pixel, over all attention heads"

But I haven't found that line of code (attentions) in the gitlab repo nor in a few mins searching across the other references stuff

edit: found it was removed https://gitlab.com/ziegleto-machine-learning/dino/-/commit/cccff01fd1b9a68ec231f6ca6ed13ad3e67afdaa#a8fadc4edbeec1b1b6477a4cb735752ebc6afb49_212_86

i ll try later to run and add that back as well to see if i get similar results to what you had or whether adding back that line and exchaning it with the one related to max solves that or not

Sntz91 · 2023-09-02T08:19:34Z

The following snippet from the visualize_attention.py script should print you the id with the max attention. Then you can just put it in the next line and set it manually to zero. I hope that works! I am not sure though, why we need to do this, and if this is still necessary.

print(torch.max(attentions, dim=1))
attentions[:, 283] = 0

geomlyd · 2023-09-08T11:51:15Z

@Sntz91 Thanks for this! Is there a particular reason why you set your image size to 952x952? Also, why do you instantiate your ViT with 526 image size instead of 224? It does seem that the small and large ViT checkpoints correspond to such a size instead of 224x224, but any idea where this came from?

wrencanfly · 2023-10-14T04:46:37Z

The following snippet from the visualize_attention.py script should print you the id with the max attention. Then you can just put it in the next line and set it manually to zero. I hope that works! I am not sure though, why we need to do this, and if this is still necessary.

print(torch.max(attentions, dim=1)) attentions[:, 283] = 0

Hi Sntz91, I followed your steps and got this error, wondering why...

OK I SOLVED this issue by this #90 (comment)

Since I use dinov2_vits14_pretrain, I changed the model params:
model = vit_small(
patch_size=14,
img_size=518,
init_values=1.0,
block_chunks=0
)

then shapes are:
img shape: torch.Size([1, 3, 952, 952])
attentions shape: torch.Size([1, 4625, 384])

Traceback (most recent call last):
File "/home/yingwei/saff_v2/saff/dino_utils/visualize_attention_v2.py", line 88, in
attentions = attentions[0, :, 0, 1:].reshape(nh, -1)
IndexError: too many indices for tensor of dimension 3

any help would be appreciated!

betterze · 2023-10-18T02:19:04Z

Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth

In my experiment, I got similar results.

fedegonzal · 2023-10-30T08:40:35Z

Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth

Here is the explanation about your results https://arxiv.org/pdf/2309.16588.pdf

fedegonzal · 2023-10-30T08:41:02Z

Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth

In my experiment, I got similar results.

Here is the explanation about your results https://arxiv.org/pdf/2309.16588.pdf

LichunZhang · 2023-11-25T11:09:14Z

Hi Stephen,

i got it to run today and was able to visualize some heatmaps. For this, I basically copied the code from dinov1 and pasted it into the dinov2 code. I wrote a quick how-to, you can find it here: gitlab

Basically, you need to pull the dinov2 repo, change some classes and functions (explained in the linked readme), then use the visualize_attention.py file from dinov1 (you can use the already changed file in my linked repo).

If something is unclear, just let me know.

If someone found an easier way, then please let us know :)

Hi Sntz91,

I meet an error in your notebook:
attentions = attentions[0, :, 0, 1:].reshape(number_of_head, -1)
IndexError: too many indices for tensor of dimension 3

So I checked the attention shape. It turns out to be [1, xxxx, embedding_dimension] instead of [1, headers, xxxxx, xxxxx] in your notebook. Why my attention results have a different dimension from yours? Could you please help me fix it? Thank you.

maxmal1 · 2024-03-07T05:57:32Z

Any idea how you would visualize the output when block_chunks are not 0?

andrewkof · 2024-06-01T08:32:10Z

Greetings everyone !

I'm trying to visualise the attention heatmaps for one of the models with register tokens. Specifically, the "dinov2_vitl14_reg4_pretrain".

It seems there is a mismatch between the shape/reshapes on the visualize_attention.py file.

I'm using image_size = (416, 416) with patch_size = 14 and 4 register tokens, which means that the model reshapes the image in to (416//14)x(416//14) + 4 = 29x29 + 4 = 845. But the model expects a Tensor with shape 841(=29x29)

Do I have to drop somehow these 4 tokens after getting the last layer attention ?

Thanks in advance !

fedegonzal · 2024-06-02T07:15:43Z

Greetings everyone !

I'm trying to visualise the attention heatmaps for one of the models with register tokens. Specifically, the "dinov2_vitl14_reg4_pretrain".

It seems there is a mismatch between the shape/reshapes on the visualize_attention.py file.

I'm using image_size = (416, 416) with patch_size = 14 and 4 register tokens, which means that the model reshapes the image in to (416//14)x(416//14) + 4 = 29x29 + 4 = 845. But the model expects a Tensor with shape 841(=29x29)

Do I have to drop somehow these 4 tokens after getting the last layer attention ?

Thanks in advance !

If I'm not wrong, it's because registers should be dropped to visualize the heatmap. So 845 - 4 = 841

wylapp · 2024-06-13T11:23:54Z

Hello everyone!

Other than the attention map, has anyone tried to draw the norms plot in Figure 3 (from "Vision Transformers Need Registers")?
I got the opposite result, where there are no outlier patches.

We observe that an important difference between “artifact” patches and other patches is the norm of their token embedding at the output of the model.

This is the only sentence that authors explain what an outlier patch/token is.

FYI, this is my implementation.
https://colab.research.google.com/drive/1gHDOi8RL8hHmfAJvF7IqyBF7tmosG0ko

maywander · 2024-07-03T06:26:14Z

Hello everyone!

Other than the attention map, has anyone tried to draw the norms plot in Figure 3 (from "Vision Transformers Need Registers")? I got the opposite result, where there are no outlier patches.

We observe that an important difference between “artifact” patches and other patches is the norm of their token embedding at the output of the model.

This is the only sentence that authors explain what an outlier patch/token is.

FYI, this is my implementation. https://colab.research.google.com/drive/1gHDOi8RL8hHmfAJvF7IqyBF7tmosG0ko

Have you solved this problem?

wylapp · 2024-07-04T13:09:00Z

No I haven't. The problem remains.

Stephen0808 closed this as completed May 9, 2023

patricklabatut mentioned this issue May 11, 2023

[request] Feature extraction documentation #53

Open

ozkilim mentioned this issue Aug 29, 2023

Visualize Attention #177

Open

This was referenced Nov 7, 2023

Attention Map Visualization Script in "Vision Transformers Need Registers" Paper #285

Open

Attention maps Visualization #294

Open

legel mentioned this issue Nov 12, 2023

Notebook for attention map over input image #306

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention heatmap visualization #90

Attention heatmap visualization #90

Stephen0808 commented May 7, 2023

Sntz91 commented May 8, 2023

Stephen0808 commented May 9, 2023 •

edited

Loading

ludles commented Jun 2, 2023

yxchng commented Jul 22, 2023

anas-zafar commented Jul 22, 2023

anas-zafar commented Jul 22, 2023

rainbowpuffpuff commented Aug 15, 2023 •

edited

Loading

Sntz91 commented Sep 2, 2023 •

edited

Loading

geomlyd commented Sep 8, 2023

wrencanfly commented Oct 14, 2023 •

edited

Loading

betterze commented Oct 18, 2023

fedegonzal commented Oct 30, 2023

fedegonzal commented Oct 30, 2023

LichunZhang commented Nov 25, 2023

maxmal1 commented Mar 7, 2024

andrewkof commented Jun 1, 2024 •

edited

Loading

fedegonzal commented Jun 2, 2024

wylapp commented Jun 13, 2024 •

edited

Loading

maywander commented Jul 3, 2024

wylapp commented Jul 4, 2024

Attention heatmap visualization #90

Attention heatmap visualization #90

Comments

Stephen0808 commented May 7, 2023

Sntz91 commented May 8, 2023

Stephen0808 commented May 9, 2023 • edited Loading

ludles commented Jun 2, 2023

yxchng commented Jul 22, 2023

anas-zafar commented Jul 22, 2023

anas-zafar commented Jul 22, 2023

rainbowpuffpuff commented Aug 15, 2023 • edited Loading

Sntz91 commented Sep 2, 2023 • edited Loading

geomlyd commented Sep 8, 2023

wrencanfly commented Oct 14, 2023 • edited Loading

betterze commented Oct 18, 2023

fedegonzal commented Oct 30, 2023

fedegonzal commented Oct 30, 2023

LichunZhang commented Nov 25, 2023

maxmal1 commented Mar 7, 2024

andrewkof commented Jun 1, 2024 • edited Loading

fedegonzal commented Jun 2, 2024

wylapp commented Jun 13, 2024 • edited Loading

maywander commented Jul 3, 2024

wylapp commented Jul 4, 2024

Stephen0808 commented May 9, 2023 •

edited

Loading

rainbowpuffpuff commented Aug 15, 2023 •

edited

Loading

Sntz91 commented Sep 2, 2023 •

edited

Loading

wrencanfly commented Oct 14, 2023 •

edited

Loading

andrewkof commented Jun 1, 2024 •

edited

Loading

wylapp commented Jun 13, 2024 •

edited

Loading