Attention Map Generation #18

Hugh0120 · 2021-01-18T13:06:13Z

Thanks for the release of pretrained model!
I was wondering if it is possible to show attention maps of input images using released ViT-32 model？

jongwook · 2021-01-21T00:20:53Z

It is not very convenient, but it should be possible this way:

Modify the call to nn.MultiheadAttention in model.py to pass need_weights=True.
Use the module's second return value, which will contain the attention weights. (The current code discards the second return value, by appending [0] to the call)
Load the CLIP model with jit=False applied, in order to load the modified model implementation.

It could have been more elegant with forward hooks, but PyTorch does not currently allow modifying the keyword arguments of forward_pre_hooks.

Another caveat is that the multihead attention implementation only returns the attention weights summed over all heads, so you might want to delete the .sum(dim=1) / num_heads part in case you want to show per-head visualization.

betterze · 2021-01-31T22:44:14Z

@Hugh0120 @jongwook Is there an implementation about this? I am new to transformer, a simple demo will be very helpful. Thx in advance.

haofanwang · 2021-02-21T17:14:39Z

Hi, @betterze and @Hugh0120, I implement the attention map based on suggestions. Here is my implementation which may be helpful to you guys.

betterze · 2021-02-21T19:25:14Z

@haofanwang Thank you for shaing with us this nice implementation. It is very nice.

g-luo · 2021-04-22T19:39:27Z

Does anyone have any insight on which layers are best to use as the saliency layer for Grad CAM (for the visual transformer ViT models)? For the resnet people use the relu from the last visual layer, and I'm wondering what's the best for the visual transformer.

EDIT: realizing that GradCAM is not meant for transformers and mainly CNN / ResNet based models. Feel free to ignore the above question.

kaizhaol · 2021-10-11T18:50:08Z

@g-luo I don't think gradCAM on the activation is a good idea. The patch setting makes it too coarse to be useful. But in general, when dealing with ViT, people usually use the output of the last norm layer of the last transformer block. So far I have mixed results though with CLIP.

Maddy12 · 2022-06-21T23:36:36Z

Does anyone have an implementation on the visual side using CLIP with a ViT?

ricardodeazambuja · 2023-01-11T20:02:13Z

Since this is the top result from google, it may be easier for me to find my fork here than by using the search box on github 😆
I followed the suggestions from the comments above and modified the code so I can extract the weights and visualize the attention for images:
https://github.com/ricardodeazambuja/CLIP/blob/attn_weights/notebooks/Interacting_with_CLIP.ipynb

zhanjiahui · 2023-02-24T13:04:54Z

Since this is the top result from google, it may be easier for me to find my fork here than by using the search box on github 😆 I followed the suggestions from the comments above and modified the code so I can extract the weights and visualize the attention for images: https://github.com/ricardodeazambuja/CLIP/blob/attn_weights/notebooks/Interacting_with_CLIP.ipynb

Hello @ricardodeazambuja , thank you for your recommendation. ,

However, I run this file and this error occurs. Do you know how to solve it?

ricardodeazambuja · 2023-02-24T13:27:11Z

@zhanjiahui, are you using the attn_weights branch from my repository?

zhanjiahui · 2023-02-24T13:41:38Z

@zhanjiahui, are you using the attn_weights branch from my repository?
@ricardodeazambuja
Thank you for your reply. Yes, I downloaded CLIP-attn-weights from your repository.😭

ricardodeazambuja · 2023-02-24T14:41:13Z

@zhanjiahui, Here everything works just fine (I had a problem related to a change in behaviour on how pillow and numpy exchange stuff). I would bet when you do import clip you are importing the normal clip instead of the one from the branch attn_weights because the notebooks are not at the base directory, so they won't see the directory clip by default.

zhanjiahui · 2023-02-24T16:03:36Z

@zhanjiahui, Here everything works just fine (I had a problem related to a change in behaviour on how pillow and numpy exchange stuff). I would bet when you do import clip you are importing the normal clip instead of the one from the branch attn_weights because the notebooks are not at the base directory, so they won't see the directory clip by default.

Hi @ricardodeazambuja , I uninstalled normal clip, and moved this ipynb file into the root dir of your project. But this error still happens. I double-checked that the encode_image function has exactly one return value in your repository. This is so confusing to me.

ricardodeazambuja · 2023-02-24T16:10:23Z

@zhanjiahui, Here everything works just fine (I had a problem related to a change in behaviour on how pillow and numpy exchange stuff). I would bet when you do import clip you are importing the normal clip instead of the one from the branch attn_weights because the notebooks are not at the base directory, so they won't see the directory clip by default.

Hi @ricardodeazambuja , I uninstalled normal clip, and moved this ipynb file into the root dir of your project. But this error still happens. I double-checked that the encode_image function has exactly one return value in your repository. This is so confusing to me.

@zhanjiahui, keep following the flow of the code and you will find it here:
https://github.com/ricardodeazambuja/CLIP/blob/9340bb39d05162605ed38b5999f68e9b7d390e72/clip/model.py#L230

Try to pull the latest version of the repo, checkout the attn_weights branch and install using pip install -e ..

jongwook closed this as completed Jan 27, 2021

jongwook mentioned this issue Feb 21, 2021

attention on the text #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention Map Generation #18

Attention Map Generation #18

Hugh0120 commented Jan 18, 2021

jongwook commented Jan 21, 2021

betterze commented Jan 31, 2021

haofanwang commented Feb 21, 2021

betterze commented Feb 21, 2021

g-luo commented Apr 22, 2021 •

edited

Loading

kaizhaol commented Oct 11, 2021

Maddy12 commented Jun 21, 2022

ricardodeazambuja commented Jan 11, 2023

zhanjiahui commented Feb 24, 2023 •

edited

Loading

ricardodeazambuja commented Feb 24, 2023

zhanjiahui commented Feb 24, 2023

ricardodeazambuja commented Feb 24, 2023

zhanjiahui commented Feb 24, 2023 •

edited

Loading

ricardodeazambuja commented Feb 24, 2023 •

edited

Loading

Attention Map Generation #18

Attention Map Generation #18

Comments

Hugh0120 commented Jan 18, 2021

jongwook commented Jan 21, 2021

betterze commented Jan 31, 2021

haofanwang commented Feb 21, 2021

betterze commented Feb 21, 2021

g-luo commented Apr 22, 2021 • edited Loading

kaizhaol commented Oct 11, 2021

Maddy12 commented Jun 21, 2022

ricardodeazambuja commented Jan 11, 2023

zhanjiahui commented Feb 24, 2023 • edited Loading

ricardodeazambuja commented Feb 24, 2023

zhanjiahui commented Feb 24, 2023

ricardodeazambuja commented Feb 24, 2023

zhanjiahui commented Feb 24, 2023 • edited Loading

ricardodeazambuja commented Feb 24, 2023 • edited Loading

g-luo commented Apr 22, 2021 •

edited

Loading

zhanjiahui commented Feb 24, 2023 •

edited

Loading

zhanjiahui commented Feb 24, 2023 •

edited

Loading

ricardodeazambuja commented Feb 24, 2023 •

edited

Loading