Data and analyses for our ACL 2021 work "The statistical advantage of automatic NLG metrics at the system level".
- WMT16-19 metrics shared task, pickled dataframe
- SummEval, raw + scored (Thank you Alex Fabbri!)
- SummEval, pickled dataframe
In the main text:
- The notebooks to reproduce Table 1 are
bvnd_wmt.ipynb
(WMT, left) andbvnd_summeval.ipynb
(SummEval, right). - The notebook to reproduce Figure 2 is
human_comparison_wmt.ipynb
(the same analysis for SummEval is in the appendix). - The notebook to reproduce Table 2 is
power_analysis_wmt.ipynb
(the same analysis for SummEval is in the appendix).
In the appendix:
- The notebook to reproduce Table 3 is in
variance_analysis_wmt.ipynb
. - The notebook to reproduce Table 4 is in
variance_analysis_summeval.ipynb
. - The notebook to reproduce Figure 5 is in
human_comparison_summeval.ipynb
. - The notebook to reproduce Table 5 is in
power_analysis_summeval.ipynb
.
If you have any questions, please email us at jtwei@usc.edu and robinjia@usc.edu!