Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about DenseCLIP for Any Visual Backbone #47

Open
needsee opened this issue Nov 30, 2023 · 6 comments
Open

Question about DenseCLIP for Any Visual Backbone #47

needsee opened this issue Nov 30, 2023 · 6 comments

Comments

@needsee
Copy link

needsee commented Nov 30, 2023

No description provided.

@needsee needsee changed the title uestion about DenseCLIP for Any Visual Backbone Question about DenseCLIP for Any Visual Backbone Nov 30, 2023
@needsee
Copy link
Author

needsee commented Nov 30, 2023

Congradulations on your great work! @raoyongming
I had got some questions about any backbone experiments. I want to know more details about the any backbone experiments.
Could you provide the codes for any backbone experiment? That will help understand a lot. Thanks!

@needsee
Copy link
Author

needsee commented Nov 30, 2023

If I use Swintransformer-T as the image encoder,the output image feature is [B, 768, 16, 12]. Is the attention pooling layer used to map image features to the embedded space([B,512,16,12]), then calculate similarity with text features? Can I replace it with a linear layer?

@raoyongming
Copy link
Owner

Yes, we use a randomly initialized attention pooling layer to map the image features into the embedding space. It might be okay to use a simpler linear layer but we haven't tried it in our experiments

@needsee
Copy link
Author

needsee commented Dec 1, 2023

Yes, we use a randomly initialized attention pooling layer to map the image features into the embedding space. It might be okay to use a simpler linear layer but we haven't tried it in our experiments
Thanks for your reply. Could you please provide the codes of any backbone experiment? This is my email liuliyuan2023@bupt.edu.cn. Thanks.

@needsee
Copy link
Author

needsee commented Dec 13, 2023

@raoyongming 您好,请问您在做 any visual backbone 实验时,有做ImageNet pre-trained vit 的实验吗?我尝试了一下使用ImageNet pre-trained vit进行实验,结果没有提升,请问您觉得是什么原因呢?

@raoyongming
Copy link
Owner

你好,我们只在论文里面report的ResNet和Swin上做过实验。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants