Skip to content
The dataset and code released with the submission of NAACL 2018 paper "RankME: Reliable Human Ratings for Natural Language Generation"
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
crowdflower
data
README.md
lik_stp1.png
plainME_stp1.png
rankME_stp1.png
rankME_stp2_inf.png
rankME_stp2_natur.png
rankME_stp2_qual.png

README.md

RankME: Reliable Human Ratings for NLG

Authors: Jekaterina Novikova, Ondrej Dusek and Verena Rieser

This repository contains the dataset and code released with the submission of our NAACL 2018 paper "RankME: Reliable Human Ratings for Natural Language Generation".

Contents

crowdflower:

This folder contains instructions, CML, CSS and JS code used in CrowdFlower tasks.

data:

This folder contains data files with human evaluation ratings collected via CrowdFlower.

Description

Setup 1 corresponds to the experimental setup when the human evaluation ratings of informativeness, naturalness and quality are collected together. The folder crowdflower/setup_1, in correspondence with the paper, contains three code versions of Setup 1 - Likert, PlainME and RankME. Screenshots of the corresponding CrowdFlower tasks are shown in Fig.1:


Fig.1. Screenshots of three methods used with Setup 1 to collect human evaluation data. Left to right - Likert, PlainME and RankME methods

Setup 2 corresponds to the experimental setup when the human evaluation ratings of informativeness, naturalness and quality are collected separately. The folder crowdflower/setup_2 provides CrowdFlower code for three collection methods (Likert, PlainME and RankME) for each human rating (informativeness*, naturalness and quality). Screenshots of the RankME method for Setup 2 for informativeness, naturalness and quality are shown in Fig.2:


Fig.2. Screenshots of the RankME methods/setup 2 used to collect human evaluation data. Left to right - informativeness, naturalness, quality.

Citing

If you use this code or data in your work, please cite the following paper:

@inproceedings{novikova2018rankME,
  title={Rank{ME}: Reliable Human Ratings for Natural Language Generation},
  author={Novikova, Jekaterina and Du{\v{s}}ek, Ondrej and Rieser, Verena},
  booktitle={Proceedings of the  16th Annual Conference of the North American Chapter 
             of the Association for Computational Linguistics},
  address={New Orleans, Louisiana},
  pages={72--78},
  year={2018},
  url={http://aclweb.org/anthology/N18-2012},
}

License

Distributed under the Creative Commons 4.0 Attribution-ShareAlike license (CC4.0-BY-SA).

Acknowledgements

This research received funding from the EPSRC projects DILiGENt (EP/M005429/1) and MaDrIgAL (EP/N017536/1).

You can’t perform that action at this time.