Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GradCAM for SwinTransformer #84

Closed
AmbiTyga opened this issue May 4, 2021 · 10 comments
Closed

GradCAM for SwinTransformer #84

AmbiTyga opened this issue May 4, 2021 · 10 comments

Comments

@AmbiTyga
Copy link
Contributor

AmbiTyga commented May 4, 2021

I am using image models from timm package.
Similar to ViT, I tried accessing normalization layer of last block of last layer in SwinTransformer. After using GradCAM++ the results from ViT and swin transformer have a huge difference, Swin transformer's accuracy is better than ViT but the gradient map is very different. I would like to know am I using right layer of Swin Transformer or should change some configurations in the GradCamPlusPlus module.

from timm.models import swin_base_patch4_window7_224_in22k
from pytorch_grad_cam import GradCAMPlusPlus
from pytorch_grad_cam.utils.image import show_cam_on_image, preprocess_image
def reshape_transform(tensor, height=6, width=8):
    result = tensor[:, 1 :  , :].reshape(tensor.size(0), 
        height, width, tensor.size(2))

    # Bring the channels to the first dimension,
    # like in CNNs.
    result = result.transpose(2, 3).transpose(1, 2)
    return result
model = swin_base_patch4_window7_224_in22k(pretrained=True,num_classes = 5)
model.cuda()
target_layer = model.layers[-1].blocks[-1].norm1
cam = GradCAMPlusPlus(model=model, 
                      target_layer=target_layer, 
                      reshape_transform=reshape_transform)

rgb_img = cv2.imread(img_path, 1)[:, :, ::-1]
rgb_img = cv2.resize(rgb_img, (224, 224))
rgb_img = np.float32(rgb_img) / 255
input_tensor = preprocess_image(rgb_img, mean=[0.5, 0.5, 0.5], 
                                          std=[0.5, 0.5, 0.5]).cuda()
grayscale_cam = cam(input_tensor=input_tensor,
                    target_category=0,
                    eigen_smooth=True,
                    aug_smooth=True)

grayscale_cam = grayscale_cam[0, :]
cam_image = show_cam_on_image(rgb_img, grayscale_cam)
cv2.imwrite('save_path.jpg', cam_image)

Original Image
Hookworm_rhabditiform_2_0_7

ViT GradCam++
Ancyclostoma=Hookworm_rhabditiform_2_0_7

Swin Transformer GradCAM++
Ancyclostoma=Hookworm_rhabditiform_2_0_7

@jacobgil
Copy link
Owner

jacobgil commented May 4, 2021

What was class_index used in the example above?

@AmbiTyga
Copy link
Contributor Author

AmbiTyga commented May 4, 2021

I fine-tuned the model with a classifier head on a specific a dataset. It's not a part of ImageNet.

@jacobgil
Copy link
Owner

jacobgil commented May 4, 2021

OK There are several different issues here.

The swin transformer, deviates from vit:

  • No class token, and the activations are 7x7.
def reshape_transform(tensor, height=7, width=7):
    result = tensor.reshape(tensor.size(0), 
        height, width, tensor.size(2))
    result = result.transpose(2, 3).transpose(1, 2)
    return result
  • It seems they use the image-net normalization and not the vit normalization (this may not affect you, however I was using the model weights from timm)

@AmbiTyga
Copy link
Contributor Author

AmbiTyga commented May 5, 2021

Yes, the ViT dosen't have the CLS token.
In SwinTransformer there is no such concept for CLS token, therefore the 0th token is part of the input, not a cls token.
I would like to create a PR for example of Swin Transformer in which I will create an example to generate gradient maps using swin transformer, as well as update the readme file for info. Please allow me to do this.

@jacobgil
Copy link
Owner

jacobgil commented May 5, 2021

Cool!

  • In the Readme it can be added to the "How it works with Vision Transformers" section.
  • Maybe add a new directory called usage_examples, and put the example for swin transformers there.

@jacobgil
Copy link
Owner

jacobgil commented May 5, 2021

Closing this issue, since it seems resolved.

@jacobgil jacobgil closed this as completed May 5, 2021
@scott870430
Copy link

Hi @AmbiTyga, @jacobgil !
I used gradcam to get CAM from Swin-Transformer. About Swin-Transformer, I used this implementation, and I think it is same as timm package.
Following the example of Swin-Transformer, I would get some weird result like following:
2008_002536_myGradCAM_ 16
2008_003379_myGradCAM_ 14
I finetune swin-transformer with pascal voc 2012 dataset.
The heatmap of class is correct, horse and person. However, the CAM would cover a square region out of object, and it look like patch of Swin-Transformer?
Is these correct results of gradcam on Swin-Transformer?
Have you ever encountered such problem?

Thank you in advance for your help.

@SKBL5694
Copy link

SKBL5694 commented Jul 5, 2023

@scott870430 Hi, I had a similar problem with you, have you figured out why?
捕获2

@scott870430
Copy link

@SKBL5694 Unfortunately, I couldn't figure out how to solve it, and ultimately, I gave up.

@SKBL5694
Copy link

SKBL5694 commented Jul 5, 2023

@scott870430 Ok, I'll explore and let you know if I have any results, thanks for the reply

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants