This is the analysis code, data extraction code, and raw result data for the Edinburgh Napier University lab in the 2024 ReproNLP Challenge.
This study reproduces Data-to-text Generation with Entity Modeling (Puduppully et al., ACL 2019).
We filled the Human Evaluation Datasheet (HEDs) which can be found in napiernlp-reprohum2024-datasheet.json
or see the ReproHum2024 Central HEDs Repo.
The analysis folder consists of:
Chart.py and Chart-original.py
- Used to generate the graphs used in the paper publishing our results.analysis-withdecrem.py
- This is the analysis code used to calculate the relative preference percentage for each system.full_data.csv
- This is a copy of the raw data for use with the nalysis-withdecrem.pytype1_analysis_cv.py
- This is the code used to calculate the coefficient of variation (CV) ajusted for small sampled, See paper here and code here.type2_analysis.py
- This is the code used to calculate Pearson's r and Spearman's rho.
The data extraction folder consists of:
database.db
- This is the raw database file extracted from the webapp we used to host the reproduction.author_data.csv
- This is the data provided to us by the original authors, consisting of the task and raw model outputs.fromdatabase.py
- This script takes the database.db file and creates combined_results.csv which contains the results and task data from the database.full_generate_2.py
- This script takes the combined results and merges it (into full_data.csv) with the author provided data as this allows us to query which system was selected by the participant in the results.attentioncheck.py
- This script checks if there are any tasks in the full_data.csv file that fail the attention check.remove_attention_fail.py
- This script uses the output of attentioncheck.py to provide the researcher with which the IDs to be removed from the database.db and prevented from answering further reruns on prolific.
We also provide study metadata.txt
which contains additional metadata about the study.
Additionally, we provide the original and reproduction result charts as pngs.
Finally, results data can be found in Data Extraction/full_data.csv
Identifiable information has been replaced with 'XXX'