MONSERRATE Corpus

MONSERRATE is a dataset specifically created to automatically evaluate Question Generation systems. It has, on average, 26 questions associated to each source sentence, attempting to be an "exhaustive" reference.

But why?

Despite the growing interest in Question Generation, evaluating these systems remains notably difficult. Many authors rely on metrics like BLEU or ROUGE instead of relying on manual evaluations, as their computation is mostly free. However, corpora generally used as reference is very incomplete, containing just a couple of hypotheses per source sentence. For example, the most used and created large datasets, SQuAD and MS Marco, only have, at most, a single reference question per source sentence.

Dataset

In corpus you can find the full dataset, brokedown in the following files:

Source sentences (73): sourceSentences.txt;
Full reference (over 1900 questions): fullReference.txt;
Reference sentences and questions aligned in separate files: referenceSentences.txt, referenceQuestions.txt.

Examples:

Sentence	Questions
When you buy the ticket, you will receive a map which allows you to go around easily by yourself.	How can I get a map? How can I get a map of the palace? What does one receive upon buying the ticket? What will you receive when you buy a ticket? Why is a map useful?
The estate of Monserrate was rented by Gerard de Visme (1789), a wealthy English merchant, who built a house there in the neo-Gothic style.	Who was Gerard de Visme? What did Gerard de Visme build? What was Gerard de Visme's profession? What was Gerard de Visme's nationality? Was the estate of Monserrate ever rented? What style was Gerard de Visme's house? When did Gerard de Visme rent the estate?

Benchmark

We benchmarked three available state of the art systems, each with a different approach to the problem of QG:

H&S: Heilman, M. and Smith, N. (2010);
D&A: Du, X. et al (2017);
GEN: Rodrigues, H. et al (2018).

System	ROUGE	METEOR	BLEU1	BLEU4	EACS	GMS	STCS	VECS
H&S	69.00	46.38	83.71	45.56	92.51	86.11	73.26	77.92
D&A	63.71	37.58	77.40	26.63	92.52	85.51	74.47	77.54
GEN (args)	65.81	46.44	81.80	40.61	92.25	85.86	71.17	80.89

Contact us with your results to appear on the table!

Usage

We used Maluba Project in our experiments. You can find a script (requires Maluba instalation) to automatically evaluate your system output on MONSERRATE. But the dataset is also publicly available to be used as you see fit.

Citation

Hugo Rodrigues, Eric Nyberg, Luísa Coheur, Towards the benchmarking of question generation: introducing the Monserrate corpus, Language Resources and Evaluation, Springer, pages 1-19, doi: https://doi.org/10.1007/s10579-021-09545-5, June 2021

Acknowledgements

Hugo Rodrigues was supported by the Carnegie Mellon-Portugal program (SFRH/ BD/51916/2012). This work was also supported by national funds through Fundação para a Ciência e Tecnologia (FCT) with reference UIDB/50021/2020.

Contact Information

Hugo Rodrigues: hugo.p.rodrigues@tecnico.ulisboa.pt

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
corpus		corpus
script		script
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MONSERRATE Corpus

But why?

Dataset

Benchmark

Usage

Citation

Acknowledgements

Contact Information

About

Releases

Packages

Languages

hprodrig/MONSERRATE_Corpus

Folders and files

Latest commit

History

Repository files navigation

MONSERRATE Corpus

But why?

Dataset

Benchmark

Usage

Citation

Acknowledgements

Contact Information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages