The dataset and statistical analysis code released with the submission of EMNLP 2017 paper "Why We Need New Evaluation Metrics for NLG"
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
analysis_emnlp.R
emnlp_data_individual_hum_scores.csv
emnlp_data_medians.csv

README.md

EMNLP 2017 submission

This repository contains the dataset and statistical analysis code released with the submission of EMNLP 2017 paper "Why We Need New Evaluation Metrics for NLG".

File descriptions:

  • emnlp_data_individual_hum_scores.csv - the dataset with system outputs and evaluation ratings of 3 crowd-workers for each output
  • emnlp_data_individual_hum_scores.csv - the dataset with system outputs, original human references, scores of automatic metrics and medians of human ratings
  • analysis_emnlp.R - R code with statistical analysis discussed in the paper

Citing the paper:

Jekaterina Novikova, Ondrej Dusek, Amanda Cercas-Curry and Verena Rieser (2017): Why We Need New Evaluation Metrics for NLG. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP 2017, Copenhaged, Denmark