Language Understanding Test Sets
LUTEST objective is the creation of test sets and an evaluation methodology that provide significant evidence about the linguistic generalization capabilities of deep learning methods applied to natural language processing. For the last years, there have been different works on building test sets and evaluation methods for the purpose of assessing the language understanding capabilities of deep neural models and what information they select and encode. However, there is much work to be done yet, in particular from a linguistically-motivated perspective and for languages other than English.
LUTEST has delivered the EsCOLA dataset: Spanish Corpus of linguistic acceptability. You can find it (only 1 partition and no test data) at the EsCOLA directory. The EsCOLA dataset is documented at the following publication. A draft of the paper is also included at the EsCOLA directory:
Núria Bel, Marta Punsola, Valle Ruíz-Fernández, 2024, EsCoLA: Spanish Corpus of Linguistic Acceptability. Joint International Conference on Computational Linguistics, Language Resources and Evaluation LREC-COLING 2024. Torino. Italy.
Project PID2019-104512GB-I00. Funded by Ministerio de Ciencia e Innovación (Spain)