Skip to content

Latest commit

 

History

History
 
 

datasets

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Datasets

This folder contains some example datasets that can be used to for training and evaluation of sentence embeddings methods.

To download these datasets, run:

python get_data.py

It will download the datasets and unzip them into this directory.

AllNLI Dataset

The AllNLI dataset is the concatenation of the SNLI dataset (https://nlp.stanford.edu/projects/snli/) and the MultiNLI dataset (https://www.nyu.edu/projects/bowman/multinli/).

STS Benchmark

The STS Benchmark (http://ixa2.si.ehu.eus/stswiki) contains sentence pairs with human gold score for their similarity.