Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Vit #6

Closed
scott870430 opened this issue Sep 9, 2021 · 6 comments
Closed

Question about Vit #6

scott870430 opened this issue Sep 9, 2021 · 6 comments

Comments

@scott870430
Copy link

Thanks for the great work.
I want to apply your Vit work (both CVPR2021 work and ICCV2021 work) from base 224 vision to vit_base_patch16_384, because I think it will have better result of relevancy map?
Can I directly modify the config in here to 384 x 384 config and download the pre-trained weight for 384 version?
Or do I need to make other changes?

Thank you in advance for your help.

@hila-chefer
Copy link
Owner

Hi @scott870430, thanks for your interest in our work!
Yes, I think you can make some configuration modifications to make it work, there's no reason why it shouldn't. You may also need to change additional code to accommodate the new shapes of the attention maps and the bilinear interpolation, but it should definitely work fine :)

@scott870430
Copy link
Author

scott870430 commented Sep 13, 2021

Thank you for your help.

In vit_base_patch16_224 need _conv_filter, where can I check which model need conv_filter? Now I refer to timm, but I don't know which model need the conv_filter...
Now my config:

def vit_base_patch16_384(pretrained=False, **kwargs):
    model = VisionTransformer(
        patch_size=16, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4, qkv_bias=True, **kwargs)
    model.default_cfg = default_cfgs['vit_base_patch16_384']
    if pretrained:
        load_pretrained(model, num_classes=model.num_classes, in_chans=kwargs.get('in_chans', 3))
    return model

And I want to make sure that, can a weight using the architecture of ViT_LRP fine tune also apply on ViT_new? I think it work? Due to the two methods also obtain attention map from model, but don't modify the forward?

Thanks!

@hila-chefer
Copy link
Owner

Hi @scott870430!
We used the implementation from timm, so assuming you follow the code there with the config there it should be equivalent to what we did. Does this help?

@scott870430
Copy link
Author

Hi @hila-chefer!
I have a question about dataset gtsegs_ijcv.mat. Follow your command, I can reproduce the result of LRP. In your code, it will select the highest probability of image, I want to know the category of each image.
However, I can't connect the offical website of dataset... Is there any way to know the image category and training, validation split of dataset?

Thank you in advance for your help.

@hila-chefer
Copy link
Owner

hila-chefer commented Nov 14, 2021

Hi @scott870430 :)
This is the link to the official download and yes, for some reason the explanations about this dataset have been removed from their site, unfortunately.
I haven't looked into these details since my code for the segmentation tests is adapted from other repositories, but I think it only contains the original image and ground truth segmentation.
The distinction between train/ val/ test is not too critical here since the model was trained for classification, and since all explainability methods will benefit from having the model predict the correct class.
I'm sorry I don't have a more informative answer, does this help?

@hila-chefer
Copy link
Owner

@scott870430 closing due to inactivity, but feel free to reopen if you have additional questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants