New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions of ViT-B-DenseCLIP #6
Comments
Hi, thanks for your interest in our work. Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the |
|
你们的ViT-B-DenseCLIP如何使用?有没有相关的简单demo?就像CLIP提供的示例程序那样 |
我们的代码是基于mmseg写的,mmseg本身提供了很多测试和可视化的工具。比如测试我们的模型可以用 |
1 I intend to know the performance of ViT-B-DenseCLIP (VS RN101-DenseCLIP), can you tell me the specific information of it? and how to train ViT-B-DenseCLIP on coco or ADE20K?
2 ViT-B-DenseCLIP is based on ViT-B-16.pt? not ViT-B-32.pt?
The text was updated successfully, but these errors were encountered: