Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



36 Commits

Repository files navigation

SNLI-VE: Visual Entailment Dataset

Ning Xie, Farley Lai, Derek Doran, Asim Kadav

NEC Laboratories America

SNLI-VE is the dataset proposed for the Visual Entailment (VE) task investigated in Visual Entailment Task for Visually-Grounded Language Learning accpeted to NeurIPS 2018 ViGIL workshop). Refer to our full paper for detailed analysis and evaluations.




  • The Flickr images download is updated and now hosted by AlleNLP
  • The Flickr features download link is updated but the archive may require newer unzip to decompress on Linux


  • The data remains hosted by external parties and subject to change


Checkout the leaderboard from paperswith code


e-SNLI-VE-2.0 relabels the dev as well as test splits of the neutral class and evalutes the resulting performance in order of the original, val-correction and val/test correction configurations.


SNLI-VE is built on top of SNLI and Flickr30K. The problem that VE is trying to solve is to reason about the relationship between an image premise Pimage and a text hypothesis Htext.

Specifically, given an image as premise, and a natural language sentence as hypothesis, three labels (entailment, neutral and contradiction) are assigned based on the relationship conveyed by the (Pimage, Htext)

  • entailment holds if there is enough evidence in Pimage to conclude that Htext is true.
  • contradiction holds if there is enough evidence in Pimage to conclude that Htext is false.
  • Otherwise, the relationship is neutral, implying the evidence in Pimage is insufficient to draw a conclusion about Htext.

Examples from SNLI-VE


SNLI-VE Statistics

Below is some highlighted dataset statistic, details can be found in our paper.

Distribution by Split

The data details of train, dev and test split is shown below. The instances of three labels (entailment, neutral and contradiction) are evenly distributed for each split.

Train Dev Test
#Image 29783 1000 1000
#Entailment 176932 5959 5973
#Neutral 176045 5960 5964
#Contradiction 176550 5939 5964
Vocabulary Size 29550 6576 6592

Dataset Comparision

Below is a dataset comparison among SNLI-VE, VQA-v2.0 and CLEVR.

Partition Size:
Training 529527 443757 699989
Validation 17858 214354 149991
Test 17901 555187 149988
Question Length:
Mean 7.4 6.1 18.4
Median 7 6 17
Mode 6 5 14
Max 56 23 43
Vocabulary Size 32191 19174 87

Question Length Distribution

The question here for SNLI-VE dataset is the hypothesis. As shown in the figure, the question length of SNLI-VE dataset is distributed with a quite long tail.

Question length distribution


To check the quality of SNLI-VE dataset, we randomly sampled 217 pairs from all three splits (565286 pairs in total). Among all sampled pairs, 20 (about 9.2%) examples are incorrectly labeled, among which the majority is in the neutral class. This is consistent to the analysis reported by GTE in its Table 2.

It is worth noting that the original SNLI dataset is not perfectly labeled, with 8.8% of the sampled data not assigned a gold label, implying the disagreement within human labelers. SNLI-VE is no exception but we believe it is a common scenario in other large scale datasets. However, if the dataset quality is a major concern to you, we suggest dropping the neutral classs and only use entailment and contradiction examples.

SNLI-VE Creation generates the SNLI-VE dataset in train, dev and test splits with disjoint image sets. Each entry contains a Flickr30kID field to associate with the original Flickr30K image id. parses entires in SNLI-VE for applications and is free to revise.

Follow the instructions below to set up the environment and generate SNLI-VE:

  1. Set the conda environment and dependencies

    conda create -n vet37 python=3.7
    conda activate vet37
    conda install jsonlines
    # conda install -c NECLA-ML ml
  2. Clone the repo

    git clone
  3. Generate SNLI-VE in data/

    cd SNLI-VE
    python -m
  4. Download dependent datasets: Flickr30K, Entities, SNLI, and RoI features

    cd data
    ./download # y to all if necessary

SNLI-VE Extensions

Flickr30k Entities dataset is an extension to Flickr30k, which contains grounded RoI and entity annotations.

It is easy to extend our SNLI-VE dataset with Flickr30k Entities if fine-grained annotations are required in your experiments.


The first is our full paper while the second is the ViGiL workshop version.

  title={Visual Entailment: A Novel Task for Fine-grained Image Understanding},
  author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
  journal={arXiv preprint arXiv:1901.06706},

  title={Visual Entailment Task for Visually-Grounded Language Learning},
  author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
  journal={arXiv preprint arXiv:1811.10582},

Thank you for your interest in our dataset!
Please contact us for any questions, comments, or suggestions!


Dataset and starting code for visual entailment dataset







No releases published


No packages published

Contributors 4