You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 5, 2022. It is now read-only.
I am a little bit confused about your implementations of nonlocality (in main.py (L346-351))
Here is the code:
batch = next(iter(data_loader_val))[0]
batch = batch.to(device)
batch = model_without_ddp.patch_embed(batch)
for l in range(len(model_without_ddp.blocks)):
attn = model_without_ddp.blocks[l].attn
nonlocality[l] = attn.get_attention_map(batch).detach().cpu().numpy().tolist()
It seems that you always feed the original patch embeddings to all 12 blocks.
Shouldn't the inputs of attn.get_attention_map be [original patch embeddings, outputs of the block 1, ..., outputs of the block 11]?
If I understand it wrong, please correct me.
Sincerely, looking forward to your reply.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Thanks for your great work and codes.
I am a little bit confused about your implementations of
nonlocality
(in main.py (L346-351))Here is the code:
It seems that you always feed the
original patch embeddings
to all 12 blocks.Shouldn't the inputs of
attn.get_attention_map
be [original patch embeddings
,outputs of the block 1
, ...,outputs of the block 11
]?If I understand it wrong, please correct me.
Sincerely, looking forward to your reply.
The text was updated successfully, but these errors were encountered: