This repository contains the Birds-to-Words dataset, a collection of paragraph-length descriptions of the differences between pairs of iNaturalist bird photographs.
The Birds-to-Words dataset was introduced in the paper:
Please see the Neural Naturalist project page for an overview of the research project and publication.
The data is provided in the file
birds-to-words-v1.0.tsv in this repository.
|Animal 1||Animal 2|
photo: John Ratzlaff (CC BY-NC-ND 4.0)
photo: Jessica (CC BY-NC 4.0)
Comparative Descriptions (four different writers):
Animal 1 is brown and white with a squatty body with a light brown head. Animal 2 is multi-colored with a light blue and black head.
Animal 1 has a brown head and wings, with a pale breast. The breast also has darker brown speckles on it. Animal 2 has a bright blue area around its eye, with a black patch right along the eye. Animal 2 also has a darker brown breast and greenish wings and back of its head.
Animal 1 has a brown and white face, animal 2 has a black and bright blue face. Animal 1 has a white breast with black spots, animal 2 has a brown breast. Animal 1 has brown wings, animal 2 has green wings.
Animal 1 is much smaller and shorter. Animal 2 has a larger head and longer tail feathers. Animal 1 has extensive spotting on the neck, chest, and belly. Animal 2 has turquoise head patches and brown coloring on the chest and belly.
tsv file is tab-separated and contains the following eleven columns:
||string||URL of the iNaturalist photo record (including metadata) corresponding to the left image in the pair|
||string||URL of the left image itself|
||string||Scientific species name for the animal in the left image|
||string||How the left image was selected in the "pivot-branch" stratified sampling procedure described in the paper. Value is one of:
||string||URL of the iNaturalist photo record (including metadata) corresponding to the right image in the pair|
||string||URL of the right image itself|
||string||Scientific species name for the animal in the right image|
||string||How the right image was selected in the "pivot-branch" stratified sampling procedure described in the paper. Value is one of:
||string||Split for training models and reporting results. One of:
||int||We collect up to five annotations of each image pair. This is the annotation number of this instance. Value is one of:
||string||A natural language paragraph describing the differences between the animals in the two photographs|
The Birds-to-Words dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International License. For the full license, see