-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attention heatmap visualization #90
Comments
Hi Stephen, i got it to run today and was able to visualize some heatmaps. For this, I basically copied the code from dinov1 and pasted it into the dinov2 code. I wrote a quick how-to, you can find it here: gitlab Basically, you need to pull the dinov2 repo, change some classes and functions (explained in the linked readme), then use the visualize_attention.py file from dinov1 (you can use the already changed file in my linked repo). If something is unclear, just let me know. If someone found an easier way, then please let us know :) |
A thorough and complete answer for such an impressive work. Thanks! |
Hi @Sntz91, thanks a lot for your answer. To use XFORMERS, adding a few lines worked for me: dinov2/dinov2/layers/attention.py Lines 76 to 77 in c3c2683
to x = memory_efficient_attention(q, k, v, attn_bias=attn_bias)
if return_attn:
attn = x.permute(0, 2, 1, 3) @ v.permute(0, 2, 3, 1)
return attn
x = x.reshape([B, N, C]) But I noticed that the visualised attention seems much noisier compared to dinov1. And the larger the model, the noisier the self-attention, e.g., |
@ludles i am also noticing the same phenomenon. any idea why? |
Hi @Sntz91 , I followed your steps to visualize the attention heat maps, the results seem bad when compared to DinoV1, could you guide me why this might be happening. I am using dinov2_vitl14_pretrain.pth |
@ludles I noticed the same thing, any luck to why this might be happening? |
Might have to do with "Additionally, note the weird line attentions[:, 10] = 0. I found out that the attention But I haven't found that line of code (attentions) in the gitlab repo nor in a few mins searching across the other references stuff edit: found it was removed https://gitlab.com/ziegleto-machine-learning/dino/-/commit/cccff01fd1b9a68ec231f6ca6ed13ad3e67afdaa#a8fadc4edbeec1b1b6477a4cb735752ebc6afb49_212_86 i ll try later to run and add that back as well to see if i get similar results to what you had or whether adding back that line and exchaning it with the one related to max solves that or not |
The following snippet from the print(torch.max(attentions, dim=1)) |
@Sntz91 Thanks for this! Is there a particular reason why you set your image size to 952x952? Also, why do you instantiate your ViT with 526 image size instead of 224? It does seem that the small and large ViT checkpoints correspond to such a size instead of 224x224, but any idea where this came from? |
Hi Sntz91, I followed your steps and got this error, wondering why... OK I SOLVED this issue by this #90 (comment) Since I use dinov2_vits14_pretrain, I changed the model params: then shapes are: Traceback (most recent call last): any help would be appreciated! |
In my experiment, I got similar results. |
Here is the explanation about your results https://arxiv.org/pdf/2309.16588.pdf |
Here is the explanation about your results https://arxiv.org/pdf/2309.16588.pdf |
Hi Sntz91, I meet an error in your notebook: So I checked the attention shape. It turns out to be [1, xxxx, embedding_dimension] instead of [1, headers, xxxxx, xxxxx] in your notebook. Why my attention results have a different dimension from yours? Could you please help me fix it? Thank you. |
Any idea how you would visualize the output when block_chunks are not 0? |
Greetings everyone ! I'm trying to visualise the attention heatmaps for one of the models with register tokens. Specifically, the "dinov2_vitl14_reg4_pretrain". It seems there is a mismatch between the shape/reshapes on the visualize_attention.py file. I'm using image_size = (416, 416) with patch_size = 14 and 4 register tokens, which means that the model reshapes the image in to (416//14)x(416//14) + 4 = 29x29 + 4 = 845. But the model expects a Tensor with shape 841(=29x29) Do I have to drop somehow these 4 tokens after getting the last layer attention ? Thanks in advance ! |
If I'm not wrong, it's because registers should be dropped to visualize the heatmap. So 845 - 4 = 841 |
Hello everyone! Other than the attention map, has anyone tried to draw the norms plot in Figure 3 (from "Vision Transformers Need Registers")?
This is the only sentence that authors explain what an outlier patch/token is. FYI, this is my implementation. |
Have you solved this problem? |
No I haven't. The problem remains. |
How to generate the heatmap of attention demonstrated in your paper? Could you supply APIs?
The text was updated successfully, but these errors were encountered: