Some questions of ViT-B-DenseCLIP #6

lixiangMindSpore · 2021-12-21T08:03:02Z

1 I intend to know the performance of ViT-B-DenseCLIP (VS RN101-DenseCLIP), can you tell me the specific information of it? and how to train ViT-B-DenseCLIP on coco or ADE20K?
2 ViT-B-DenseCLIP is based on ViT-B-16.pt? not ViT-B-32.pt?

raoyongming · 2021-12-21T10:33:15Z

Hi, thanks for your interest in our work.

Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the README.

lixiangMindSpore · 2021-12-22T06:32:21Z

专家您好，请教几个问题：
1 README里没有ViT-B-DenseCLIP如何进行训练的信息，只有表格里有ViT-B-DenseCLIP字样，ViT-B-DenseCLIP如何进行训练？
2 ViT-B-DenseCLIP模型我在CLIP提供的示例程序中出现如图1所示错误，CLIP的示例代码如图2所示

图1

图2

lixiangMindSpore · 2021-12-22T06:33:46Z

Hi, thanks for your interest in our work.

Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the README.

专家您好，请教几个问题：
1 README里有RN50-CLIP和RN101-CLIP训练方法，但是没有ViT-B-DenseCLIP如何进行训练的信息，只有表格里有ViT-B-DenseCLIP字样，ViT-B-DenseCLIP如何进行训练？
2 ViT-B-DenseCLIP模型我在CLIP提供的示例程序中出现如图1所示错误，CLIP的示例代码如图2所示，我用ViT-B-32.pt替换代码中的第六行路径就没有问题

图1

图2

raoyongming · 2021-12-22T06:39:17Z

我们提供了各种setting的config文件，训练模型只需要运行bash dist_train.sh configs/<config>.py 8, 其中<config>.py就是表格中的config的名字。
我们的模型是用于检测和分割任务的，包含backbone，text encoder 和 decoder等多个部分，不能直接用CLIP的例子使用。

lixiangMindSpore · 2021-12-22T06:46:49Z

我们提供了各种setting的config文件，训练模型只需要运行bash dist_train.sh configs/<config>.py 8, 其中<config>.py就是表格中的config的名字。

我们的模型是用于检测和分割任务的，包含backbone，text encoder 是 decoder等多个部分，不能直接用CLIP的例子使用。

你们的ViT-B-DenseCLIP如何使用？有没有相关的简单demo？就像CLIP提供的示例程序那样

raoyongming · 2021-12-22T06:52:27Z

我们的代码是基于mmseg写的，mmseg本身提供了很多测试和可视化的工具。比如测试我们的模型可以用bash dist_test.sh configs/<config>.py /path/to/checkpoint 8 --eval mIoU --aug-test这个命令；修改参数也能做可视化，具体你可以参考mmseg的文档.

raoyongming closed this as completed Dec 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions of ViT-B-DenseCLIP #6

Some questions of ViT-B-DenseCLIP #6

lixiangMindSpore commented Dec 21, 2021

raoyongming commented Dec 21, 2021

lixiangMindSpore commented Dec 22, 2021

lixiangMindSpore commented Dec 22, 2021 •

edited

raoyongming commented Dec 22, 2021 •

edited

lixiangMindSpore commented Dec 22, 2021

raoyongming commented Dec 22, 2021

Some questions of ViT-B-DenseCLIP #6

Some questions of ViT-B-DenseCLIP #6

Comments

lixiangMindSpore commented Dec 21, 2021

raoyongming commented Dec 21, 2021

lixiangMindSpore commented Dec 22, 2021

lixiangMindSpore commented Dec 22, 2021 • edited

raoyongming commented Dec 22, 2021 • edited

lixiangMindSpore commented Dec 22, 2021

raoyongming commented Dec 22, 2021

lixiangMindSpore commented Dec 22, 2021 •

edited

raoyongming commented Dec 22, 2021 •

edited