Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention heatmap visualization #90

Closed
Stephen0808 opened this issue May 7, 2023 · 20 comments
Closed

Attention heatmap visualization #90

Stephen0808 opened this issue May 7, 2023 · 20 comments

Comments

@Stephen0808
Copy link

How to generate the heatmap of attention demonstrated in your paper? Could you supply APIs?

@Sntz91
Copy link

Sntz91 commented May 8, 2023

Hi Stephen,

i got it to run today and was able to visualize some heatmaps. For this, I basically copied the code from dinov1 and pasted it into the dinov2 code. I wrote a quick how-to, you can find it here: gitlab

Basically, you need to pull the dinov2 repo, change some classes and functions (explained in the linked readme), then use the visualize_attention.py file from dinov1 (you can use the already changed file in my linked repo).

If something is unclear, just let me know.

If someone found an easier way, then please let us know :)

@Stephen0808
Copy link
Author

Stephen0808 commented May 9, 2023

A thorough and complete answer for such an impressive work. Thanks!

@ludles
Copy link

ludles commented Jun 2, 2023

Hi Stephen,

i got it to run today and was able to visualize some heatmaps. For this, I basically copied the code from dinov1 and pasted it into the dinov2 code. I wrote a quick how-to, you can find it here: gitlab

Basically, you need to pull the dinov2 repo, change some classes and functions (explained in the linked readme), then use the visualize_attention.py file from dinov1 (you can use the already changed file in my linked repo).

If something is unclear, just let me know.

If someone found an easier way, then please let us know :)

Hi @Sntz91, thanks a lot for your answer. To use XFORMERS, adding a few lines worked for me:
changing

x = memory_efficient_attention(q, k, v, attn_bias=attn_bias)
x = x.reshape([B, N, C])

to

        x = memory_efficient_attention(q, k, v, attn_bias=attn_bias)
        if return_attn:
            attn = x.permute(0, 2, 1, 3) @ v.permute(0, 2, 3, 1)
            return attn
        x = x.reshape([B, N, C])

But I noticed that the visualised attention seems much noisier compared to dinov1. And the larger the model, the noisier the self-attention, e.g., vit_small gives visually much better results than vit_giant2. I wonder if you experienced the same?

@yxchng
Copy link

yxchng commented Jul 22, 2023

@ludles i am also noticing the same phenomenon. any idea why?

@anas-zafar
Copy link

Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth

image

@anas-zafar
Copy link

@ludles I noticed the same thing, any luck to why this might be happening?

@rainbowpuffpuff
Copy link

rainbowpuffpuff commented Aug 15, 2023

Might have to do with "Additionally, note the weird line attentions[:, 10] = 0. I found out that the attention
scores are very high for one specific pixel, over all attention heads"

But I haven't found that line of code (attentions) in the gitlab repo nor in a few mins searching across the other references stuff

edit: found it was removed https://gitlab.com/ziegleto-machine-learning/dino/-/commit/cccff01fd1b9a68ec231f6ca6ed13ad3e67afdaa#a8fadc4edbeec1b1b6477a4cb735752ebc6afb49_212_86

i ll try later to run and add that back as well to see if i get similar results to what you had or whether adding back that line and exchaning it with the one related to max solves that or not

@Sntz91
Copy link

Sntz91 commented Sep 2, 2023

The following snippet from the visualize_attention.py script should print you the id with the max attention. Then you can just put it in the next line and set it manually to zero. I hope that works! I am not sure though, why we need to do this, and if this is still necessary.

print(torch.max(attentions, dim=1))
attentions[:, 283] = 0

@geomlyd
Copy link

geomlyd commented Sep 8, 2023

@Sntz91 Thanks for this! Is there a particular reason why you set your image size to 952x952? Also, why do you instantiate your ViT with 526 image size instead of 224? It does seem that the small and large ViT checkpoints correspond to such a size instead of 224x224, but any idea where this came from?

@wrencanfly
Copy link

wrencanfly commented Oct 14, 2023

The following snippet from the visualize_attention.py script should print you the id with the max attention. Then you can just put it in the next line and set it manually to zero. I hope that works! I am not sure though, why we need to do this, and if this is still necessary.

print(torch.max(attentions, dim=1)) attentions[:, 283] = 0

Hi Sntz91, I followed your steps and got this error, wondering why...

OK I SOLVED this issue by this #90 (comment)

Since I use dinov2_vits14_pretrain, I changed the model params:
model = vit_small(
patch_size=14,
img_size=518,
init_values=1.0,
block_chunks=0
)

then shapes are:
img shape: torch.Size([1, 3, 952, 952])
attentions shape: torch.Size([1, 4625, 384])

Traceback (most recent call last):
File "/home/yingwei/saff_v2/saff/dino_utils/visualize_attention_v2.py", line 88, in
attentions = attentions[0, :, 0, 1:].reshape(nh, -1)
IndexError: too many indices for tensor of dimension 3

any help would be appreciated!

@betterze
Copy link

Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth

image

In my experiment, I got similar results.

@fedegonzal
Copy link

Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth

image

Here is the explanation about your results https://arxiv.org/pdf/2309.16588.pdf

@fedegonzal
Copy link

Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth
image

In my experiment, I got similar results.

Here is the explanation about your results https://arxiv.org/pdf/2309.16588.pdf

@LichunZhang
Copy link

Hi Stephen,

i got it to run today and was able to visualize some heatmaps. For this, I basically copied the code from dinov1 and pasted it into the dinov2 code. I wrote a quick how-to, you can find it here: gitlab

Basically, you need to pull the dinov2 repo, change some classes and functions (explained in the linked readme), then use the visualize_attention.py file from dinov1 (you can use the already changed file in my linked repo).

If something is unclear, just let me know.

If someone found an easier way, then please let us know :)

Hi Sntz91,

I meet an error in your notebook:
attentions = attentions[0, :, 0, 1:].reshape(number_of_head, -1)
IndexError: too many indices for tensor of dimension 3

So I checked the attention shape. It turns out to be [1, xxxx, embedding_dimension] instead of [1, headers, xxxxx, xxxxx] in your notebook. Why my attention results have a different dimension from yours? Could you please help me fix it? Thank you.

@maxmal1
Copy link

maxmal1 commented Mar 7, 2024

Any idea how you would visualize the output when block_chunks are not 0?

@andrewkof
Copy link

andrewkof commented Jun 1, 2024

Greetings everyone !

I'm trying to visualise the attention heatmaps for one of the models with register tokens. Specifically, the "dinov2_vitl14_reg4_pretrain".

It seems there is a mismatch between the shape/reshapes on the visualize_attention.py file.

I'm using image_size = (416, 416) with patch_size = 14 and 4 register tokens, which means that the model reshapes the image in to (416//14)x(416//14) + 4 = 29x29 + 4 = 845. But the model expects a Tensor with shape 841(=29x29)

Do I have to drop somehow these 4 tokens after getting the last layer attention ?

Thanks in advance !

@fedegonzal
Copy link

Greetings everyone !

I'm trying to visualise the attention heatmaps for one of the models with register tokens. Specifically, the "dinov2_vitl14_reg4_pretrain".

It seems there is a mismatch between the shape/reshapes on the visualize_attention.py file.

I'm using image_size = (416, 416) with patch_size = 14 and 4 register tokens, which means that the model reshapes the image in to (416//14)x(416//14) + 4 = 29x29 + 4 = 845. But the model expects a Tensor with shape 841(=29x29)

Do I have to drop somehow these 4 tokens after getting the last layer attention ?

Thanks in advance !

If I'm not wrong, it's because registers should be dropped to visualize the heatmap. So 845 - 4 = 841

@wylapp
Copy link

wylapp commented Jun 13, 2024

Hello everyone!

Other than the attention map, has anyone tried to draw the norms plot in Figure 3 (from "Vision Transformers Need Registers")?
I got the opposite result, where there are no outlier patches.

We observe that an important difference between “artifact” patches and other patches is the norm of their token embedding at the output of the model.

This is the only sentence that authors explain what an outlier patch/token is.

FYI, this is my implementation.
https://colab.research.google.com/drive/1gHDOi8RL8hHmfAJvF7IqyBF7tmosG0ko

image

@maywander
Copy link

Hello everyone!

Other than the attention map, has anyone tried to draw the norms plot in Figure 3 (from "Vision Transformers Need Registers")? I got the opposite result, where there are no outlier patches.

We observe that an important difference between “artifact” patches and other patches is the norm of their token embedding at the output of the model.

This is the only sentence that authors explain what an outlier patch/token is.

FYI, this is my implementation. https://colab.research.google.com/drive/1gHDOi8RL8hHmfAJvF7IqyBF7tmosG0ko

image

Have you solved this problem?

@wylapp
Copy link

wylapp commented Jul 4, 2024

No I haven't. The problem remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests