Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions of ViT-B-DenseCLIP #6

Closed
lixiangMindSpore opened this issue Dec 21, 2021 · 6 comments
Closed

Some questions of ViT-B-DenseCLIP #6

lixiangMindSpore opened this issue Dec 21, 2021 · 6 comments

Comments

@lixiangMindSpore
Copy link

image
1 I intend to know the performance of ViT-B-DenseCLIP (VS RN101-DenseCLIP), can you tell me the specific information of it? and how to train ViT-B-DenseCLIP on coco or ADE20K?
2 ViT-B-DenseCLIP is based on ViT-B-16.pt? not ViT-B-32.pt?

@raoyongming
Copy link
Owner

Hi, thanks for your interest in our work.

Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the README.

@lixiangMindSpore
Copy link
Author

专家您好,请教几个问题:
1 README里没有ViT-B-DenseCLIP如何进行训练的信息,只有表格里有ViT-B-DenseCLIP字样,ViT-B-DenseCLIP如何进行训练?
2 ViT-B-DenseCLIP模型我在CLIP提供的示例程序中出现如图1所示错误,CLIP的示例代码如图2所示
image
图1
image
图2

@lixiangMindSpore
Copy link
Author

lixiangMindSpore commented Dec 22, 2021

Hi, thanks for your interest in our work.

Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the README.

专家您好,请教几个问题:
1 README里有RN50-CLIP和RN101-CLIP训练方法,但是没有ViT-B-DenseCLIP如何进行训练的信息,只有表格里有ViT-B-DenseCLIP字样,ViT-B-DenseCLIP如何进行训练?
2 ViT-B-DenseCLIP模型我在CLIP提供的示例程序中出现如图1所示错误,CLIP的示例代码如图2所示,我用ViT-B-32.pt替换代码中的第六行路径就没有问题
image
图1
image
图2

@raoyongming
Copy link
Owner

raoyongming commented Dec 22, 2021

  1. 我们提供了各种setting的config文件,训练模型只需要运行bash dist_train.sh configs/<config>.py 8, 其中<config>.py就是表格中的config的名字。
  2. 我们的模型是用于检测和分割任务的,包含backbone,text encoder 和 decoder等多个部分,不能直接用CLIP的例子使用。

@lixiangMindSpore
Copy link
Author

  1. 我们提供了各种setting的config文件,训练模型只需要运行bash dist_train.sh configs/<config>.py 8, 其中<config>.py就是表格中的config的名字。
  2. 我们的模型是用于检测和分割任务的,包含backbone,text encoder 是 decoder等多个部分,不能直接用CLIP的例子使用。

你们的ViT-B-DenseCLIP如何使用?有没有相关的简单demo?就像CLIP提供的示例程序那样

@raoyongming
Copy link
Owner

我们的代码是基于mmseg写的,mmseg本身提供了很多测试和可视化的工具。比如测试我们的模型可以用bash dist_test.sh configs/<config>.py /path/to/checkpoint 8 --eval mIoU --aug-test这个命令;修改参数也能做可视化,具体你可以参考mmseg的文档.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants