OLIVE: Open-world Language Instruction for Visual-language Evaluation

This repository provides the download instructions for the OLIVE dataset, introduced in the NAACL'24 paper titled "What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases."

About OLIVE

The OLIVE dataset is a highly diverse, human-corrected multimodal collection designed to simulate the variety and idiosyncrasies of user queries vision-language models (VLMs) face in real-world scenarios. It supports the training and evaluation of VLMs in conditions that more closely resemble their ultimate use cases.

The dataset contains 9,450 images, 30,120 unique instructions, and 47,250 responses. The images are randomly sampled from LAION-Aesthetics. The instructions and responses are generated using ChatGPT and subsequently refined through human curation.

Each image corresponds with five instruction-response pairs. Each pair features a unique response, although instructions may be reused across different pairs. The instruction-response pairs can be broadly categorized into four groups: visual recognition, creative writing, knowledge-based, and elaborated description.

The dataset is split into 6,750 instruction-response pairs for training, 6,750 pairs for validation, and the remaining for test.

	Visual Recognition	Creative Writing	Knowledge-Based	Elaborated Description	Total
Train	1905	1415	1750	1680	6750
Validation	1900	1455	1755	1640	6750
Test_v1.0	7285	5910	6805	7285	27000
Test_v2.0	9145	7220	8595	8790	33750

Note: Test_v1.0 is a subset of Test_v2.0

Download OLIVE

The OLIVE dataset can be downloaded via the following links:

Images: train, validation, test

Annotations: train, validation, test_v1.0, test_v2.0

The annotations are in the following format:

[
  {
    "image": "filename for the image, e.g. -1612194712933037756",
    "category": "category of the instruction-response pair, e.g. visual_recognition",
    "instruction": "task instruction related to the image, e.g. What is the item in the image?",
    "output": "response to the instruction, e.g. The item in the image is a solar sail, which
is a device that is designed to harness the energy from sunlight to propel a spacecraft through
space without the use of fuel. It is a square shaped piece of cloth that acts like a sail and
captures the radiation pressure from the sun to propel the spacecraft forward.",
    "id": "composite index that uniquely identifies each instruction-response pair associated
with a specific image, e.g. res_3_1486, where 3 is the id of the instruction-response pair and
1486 is the unique id for the image",
  },
]

Performance

The table below reports the zero-shot performance of different models on the test_v1.0 split, utilizing CIDEr as the evaluation metric.

Model	Vision Encoder	Language Model	Size	CIDEr
BLIP-2	ViT-G	FlanT5-XL	4B	5.2
MiniGPT-4	ViT-G	Vicuna-7B	8B	1.6
mPLUG-Owl	ViT-L	LLaMA-7B	7B	4.4
LLaVA	ViT-L	LLaMA-7B	7B	29.6

Citation

If you find this work useful for your research, please consider citing it.

@inproceedings{tiong2024we,
  title={What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases},
  author={Tiong, Anthony Meng Huat and Zhao, Junqi and Li, Boyang and Li, Junnan and Hoi, Steven CH and Xiong, Caiming},
  booktitle={Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Repository files navigation

OLIVE: Open-world Language Instruction for Visual-language Evaluation

About OLIVE

Download OLIVE

Performance

Citation

About

Releases

Packages

License

jq-zh/olive-dataset

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

OLIVE: Open-world Language Instruction for Visual-language Evaluation

About OLIVE

Download OLIVE

Performance

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages