GradCAM for SwinTransformer #84

AmbiTyga · 2021-05-04T15:34:25Z

I am using image models from timm package.
Similar to ViT, I tried accessing normalization layer of last block of last layer in SwinTransformer. After using GradCAM++ the results from ViT and swin transformer have a huge difference, Swin transformer's accuracy is better than ViT but the gradient map is very different. I would like to know am I using right layer of Swin Transformer or should change some configurations in the GradCamPlusPlus module.

from timm.models import swin_base_patch4_window7_224_in22k
from pytorch_grad_cam import GradCAMPlusPlus
from pytorch_grad_cam.utils.image import show_cam_on_image, preprocess_image
def reshape_transform(tensor, height=6, width=8):
    result = tensor[:, 1 :  , :].reshape(tensor.size(0), 
        height, width, tensor.size(2))

    # Bring the channels to the first dimension,
    # like in CNNs.
    result = result.transpose(2, 3).transpose(1, 2)
    return result
model = swin_base_patch4_window7_224_in22k(pretrained=True,num_classes = 5)
model.cuda()
target_layer = model.layers[-1].blocks[-1].norm1
cam = GradCAMPlusPlus(model=model, 
                      target_layer=target_layer, 
                      reshape_transform=reshape_transform)

rgb_img = cv2.imread(img_path, 1)[:, :, ::-1]
rgb_img = cv2.resize(rgb_img, (224, 224))
rgb_img = np.float32(rgb_img) / 255
input_tensor = preprocess_image(rgb_img, mean=[0.5, 0.5, 0.5], 
                                          std=[0.5, 0.5, 0.5]).cuda()
grayscale_cam = cam(input_tensor=input_tensor,
                    target_category=0,
                    eigen_smooth=True,
                    aug_smooth=True)

grayscale_cam = grayscale_cam[0, :]
cam_image = show_cam_on_image(rgb_img, grayscale_cam)
cv2.imwrite('save_path.jpg', cam_image)

Original Image

ViT GradCam++

Swin Transformer GradCAM++

The text was updated successfully, but these errors were encountered:

jacobgil · 2021-05-04T18:20:16Z

What was class_index used in the example above?

AmbiTyga · 2021-05-04T19:01:50Z

I fine-tuned the model with a classifier head on a specific a dataset. It's not a part of ImageNet.

jacobgil · 2021-05-04T19:34:41Z

OK There are several different issues here.

The swin transformer, deviates from vit:

No class token, and the activations are 7x7.

def reshape_transform(tensor, height=7, width=7):
    result = tensor.reshape(tensor.size(0), 
        height, width, tensor.size(2))
    result = result.transpose(2, 3).transpose(1, 2)
    return result

It seems they use the image-net normalization and not the vit normalization (this may not affect you, however I was using the model weights from timm)

AmbiTyga · 2021-05-05T05:26:57Z

Yes, the ViT dosen't have the CLS token.
In SwinTransformer there is no such concept for CLS token, therefore the 0th token is part of the input, not a cls token.
I would like to create a PR for example of Swin Transformer in which I will create an example to generate gradient maps using swin transformer, as well as update the readme file for info. Please allow me to do this.

jacobgil · 2021-05-05T05:38:33Z

Cool!

In the Readme it can be added to the "How it works with Vision Transformers" section.
Maybe add a new directory called usage_examples, and put the example for swin transformers there.

jacobgil · 2021-05-05T05:38:57Z

Closing this issue, since it seems resolved.

scott870430 · 2021-08-24T10:31:30Z

Hi @AmbiTyga, @jacobgil !
I used gradcam to get CAM from Swin-Transformer. About Swin-Transformer, I used this implementation, and I think it is same as timm package.
Following the example of Swin-Transformer, I would get some weird result like following:

I finetune swin-transformer with pascal voc 2012 dataset.
The heatmap of class is correct, horse and person. However, the CAM would cover a square region out of object, and it look like patch of Swin-Transformer?
Is these correct results of gradcam on Swin-Transformer?
Have you ever encountered such problem?

Thank you in advance for your help.

SKBL5694 · 2023-07-05T08:41:35Z

@scott870430 Hi， I had a similar problem with you, have you figured out why?

scott870430 · 2023-07-05T09:01:21Z

@SKBL5694 Unfortunately, I couldn't figure out how to solve it, and ultimately, I gave up.

SKBL5694 · 2023-07-05T09:02:57Z

@scott870430 Ok, I'll explore and let you know if I have any results, thanks for the reply

jacobgil closed this as completed May 5, 2021

SKBL5694 mentioned this issue Jul 12, 2023

Some questions about grad-CAM showing in fig7 in paper Tag2Text. xinyu1205/recognize-anything#70

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GradCAM for SwinTransformer #84

GradCAM for SwinTransformer #84

AmbiTyga commented May 4, 2021 •

edited

Loading

jacobgil commented May 4, 2021

AmbiTyga commented May 4, 2021

jacobgil commented May 4, 2021 •

edited

Loading

AmbiTyga commented May 5, 2021 •

edited

Loading

jacobgil commented May 5, 2021

jacobgil commented May 5, 2021

scott870430 commented Aug 24, 2021

SKBL5694 commented Jul 5, 2023

scott870430 commented Jul 5, 2023

SKBL5694 commented Jul 5, 2023

GradCAM for SwinTransformer #84

GradCAM for SwinTransformer #84

Comments

AmbiTyga commented May 4, 2021 • edited Loading

jacobgil commented May 4, 2021

AmbiTyga commented May 4, 2021

jacobgil commented May 4, 2021 • edited Loading

AmbiTyga commented May 5, 2021 • edited Loading

jacobgil commented May 5, 2021

jacobgil commented May 5, 2021

scott870430 commented Aug 24, 2021

SKBL5694 commented Jul 5, 2023

scott870430 commented Jul 5, 2023

SKBL5694 commented Jul 5, 2023

AmbiTyga commented May 4, 2021 •

edited

Loading

jacobgil commented May 4, 2021 •

edited

Loading

AmbiTyga commented May 5, 2021 •

edited

Loading