I have collected the download addresses for all the training data and posted them here for others to download conveniently. #46

Anymake · 2023-09-07T08:39:52Z

I am reproducing the model on V100 GPU. If anyone is doing the same, I hope we can communicate and exchange ideas together. My wechat : Anymake_ren
1、Flickr 30k ：
http://shannon.cs.illinois.edu/DenotationGraph/data/index.html

2、The Visual Genome Dataset
VG数据集主要由4个部分组成：
Region Description：图片被划分成一个个region，每个region都有与其对应的一句自然语言描述。
Region Graph：每个region中的object、attribute、relationship被提取出来，构成局部的“Scene Graph”。
Scene Graph：把一张图片中的所有Region Graph合并成一个全局的Scene Graph。
QA：每张图片会有多对QA，分为两种类型：region-based和freeform。前者基于Region Description提出，与局部region的内容直接相关；后者则基于整张图片来提出。
https://homes.cs.washington.edu/~ranjay/visualgenome/api.html

3、LLaVA-CC3M-Pretrain-595K
https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K/tree/main

4、LLaVA-Instruct-150K
图片是COCO2014
https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/tree/main

5、CLEVR：
该数据集为合成数据集，是由一些简单的几何形状构成的视觉场景。数据集中的问题总是需要一长串的推理过程，为了对推理能力进行详细评估，所有问题分为了5类：属性查询（querying attribute），属性比较（comparing attributes），存在性（existence），计数（counting），整数比较（integer comparison）。所有的问题都是程序生成的。该数据集的人为标注数据子集为CLEVR-Humans
https://cs.stanford.edu/people/jcjohns/clevr/

6、GQA
图片20G，
https://cs.stanford.edu/people/dorarad/gqa/download.html

7、Visual7W: Grounded Question Answering in Images
Visual7W 是一个图像内容理解的数据集，通过对图像区域的文字描述和互相之间的关联，进行视觉问答 (Visual Question Answering) 任务，数据集中不仅包含图像本身，还包括图像区域内容相关的问答。
Visual7W 是 Visual Genome 数据集的一个子集，包含 47,300 张 COCO 数据集图像，327,929 个问答对，1,311,756 个人类生成的多选题，以及涵盖 36,579 个类别的 561,459 个 object groundings。
Visual7W 的问题主要由 What, Where, How, When, Who,Why, 以及 Which 构成。问题为多选，每个问题都有四个候选答案。
http://ai.stanford.edu/~yukez/visual7w/

8、VCR：Visual Commonsense Reasoning
VCR 全称 Visual Commonsense Reasoning，是一个用于视觉常识推理的大规模数据集。该数据集提出了关于图像的具有挑战性的问题，机器需要完成两个子任务：正确回答问题以及提供理由证明其答案的合理性。
VCR 数据集包含大量问题，其中 212K 个用于训练，26K 个用于验证，25K 个用于测试。答案和理由来自超过 110K 个不重复的电影场景。
https://visualcommonsense.com/download/

9、VQAv2 dataset
https://visualqa.org/download.html

10、VQA-E
全称 Visual Question Answering with Explanation，是带有解析的视觉问答数据集，其涉及的模型需要预测并生成答案解析。它是由 VQA v2 数据集自动衍生出来的，为每个 “图像-问题-答案三要素” 合成为一个文本解析，这使得问答过程更容易理解和可追溯。
COCO Images: Training images [83K/13GB], Validation Images [41K/6GB]
https://github.com/liqing-ustc/VQA-E

11、VQA-X （2018）
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
VQA-X是一个既有文字解释又有Visual grounding的数据集, 图片是coco2014

GaoXiaoshan · 2023-11-07T07:52:55Z

补充一个 coco2014 国内下载地址，https://developer.aliyun.com/article/797577?accounttraceid=0c07a70a5c3b40df97d3692b1fb519d7ckem

GaoXiaoshan · 2023-11-07T08:31:00Z

Visual7W dataset。https://pan.baidu.com/s/1kVNUTrL 网盘密码：6wge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have collected the download addresses for all the training data and posted them here for others to download conveniently. #46

I have collected the download addresses for all the training data and posted them here for others to download conveniently. #46

Anymake commented Sep 7, 2023 •

edited

Loading

GaoXiaoshan commented Nov 7, 2023

GaoXiaoshan commented Nov 7, 2023

I have collected the download addresses for all the training data and posted them here for others to download conveniently. #46

I have collected the download addresses for all the training data and posted them here for others to download conveniently. #46

Comments

Anymake commented Sep 7, 2023 • edited Loading

GaoXiaoshan commented Nov 7, 2023

GaoXiaoshan commented Nov 7, 2023

Anymake commented Sep 7, 2023 •

edited

Loading