Dataset and starting code for visual entailment dataset
Clone or download
farleylai Merge pull request #1 from ningxynf01/master
SNLI-VE dataset generator and parser
Latest commit 7bb6d35 Dec 1, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Initial commit Nov 27, 2018
LICENSE Update LICENSE Nov 30, 2018
PATENTS Create PATENTS Nov 30, 2018 readme revised Nov 30, 2018
flickr30k_test.lst init upload Nov 30, 2018
flickr30k_train_val.lst init upload Nov 30, 2018 init upload Nov 30, 2018 init upload Nov 30, 2018

SNLI-VE: Visual Entailment Dataset

Ning Xie, Farley Lai, Derek Doran, Asim Kadav

This is the SNLI-VE dataset we propose for Visual Entailment (VE) task in Visual Entailment Task for Visually-Grounded Language Learning (accpeted by NeurIPS 2018 ViGIL workshop).


SNLI-VE is a dataset for VE task, which is generated based on SNLI and Flickr30k. The problem that VE trying to solve is to reason the relationship between an image premise Pimage and a text hypothesis Htext.

Specifically, given an image as premise, and a natural language sentence as hypothesis, three labels (entailment, neutral and contradiction) are assigned based on the relationship conveyed by the (Pimage, Htext)

  • entailment holds if there is enough evidence in Pimage to conclude that Htext is true.
  • contradiction holds if there is enough evidence in Pimage to conclude that Htext is false.
  • Otherwise, the relationship is neutral, implying the evidence in Pimage is insufficient to draw a conclusion about Htext.

This repository contains the followings,

Examples from SNLI-VE

SNLI-VE Generator

The script to generate SNLI-VE dataset is, which will automatically add Flickr30kID to each data item and split the dataset into train, dev and test split with no image overlappings.

In order to generate SNLI-VE dataset, the followings are required,

After downloaded SNLI dataset and Flickr30k split files (flickr30k_train_val.lst and flickr30k_test.lst), and revised the paths properly in to your settings, the generation can be conducted by running below command


SNLI-VE Statistics

Below is some highlighted dataset statistic, details can be found in our paper.

Data Split Distribution

The data details of train, dev and test split is shown below. The instances of three labels (entailment, neutral and contradiction) are evenly distributed for each split.

Train Dev Test
#Image 29783 1000 1000
#Entailment 176932 5959 5973
#Neutral 176045 5960 5964
#Contradiction 176550 5939 5964
Vocabulary Size 29550 6576 6592

Dataset Comparision

Below is a dataset comparison summary among SNLI-VE, VQA-v2.0 and CLEVR datasets.

Partition Size:
Training 529527 443757 699989
Validation 17858 214354 149991
Testing 17901 555187 149988
Question Length:
Mean 7.4 6.1 18.4
Median 7 6 17
Mode 6 5 14
Max 56 23 43
Vocabulary Size 32191 19174 87

Question Length Distribution

The question here for SNLI-VE dataset is the hypothesis. As shown in the figure, the question length of SNLI-VE dataset is quite heavy-tailed distributed.


We also provide a sample script to parse SNLI-VE dataset, see Please feel free to revise it to your own settings.

SNLI-VE Extensions

Flickr30k Entities dataset is an extension to Flickr30k, which contains detailed annotations.

It is easy to extend our SNLI-VE dataset with Flickr30k Entities if fine-grained annotations is required to your experiment settings.


  title={Visual Entailment Task for Visually-Grounded Language Learning},
  author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
  journal={arXiv preprint arXiv:1811.10582},

Thank you for your interest in our dataset! Please contact me at for any questions, comments, or suggestions! :-)