EVALITA 2016 Datasets
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
EVALITA 2016 overview.pdf



EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language.

The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent manner.

The diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for NLP and speech sciences. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that it is worth pursuing such goals for the Italian language.

As a side effect of the evaluation campaign, both training and test data are available to the scientific community as benchmarks for future improvements.

EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC) and it is endorsed by the Italian Association for Artificial Intelligence (AI*IA) and the Italian Association for Speech Sciences (AISV).



The 5th evaluation campaign EVALITA 2016 was organized along the following selected tasks:

  • ArtiPhon - Articulatory Phone Recognition
  • FactA - Event Factuality Annotation
  • NEEL-IT - Named Entity rEcognition and Linking in Italian Tweets
  • PoSTWITA - POS tagging for Italian Social Media Texts
  • QA­4FAQ - Question Answering for Frequently Asked Questions
  • SENTIPOLC - SENTIment POLarity Classification

EVALITA 2016 is an initiative of AILC (Associazione Italiana di Linguistica Computazionale).

Proceedings are available on the CEUR open access platform and on the aAccademia University Press website.

Read the Storify story on the final workshop.

Follow EVALITA 2016 on Twitter and Facebook and use the hashtag #EVALITA2016 to disseminate the initiative!!


Data repository structure:

|-artiphone: ArtiPhon data
|-facta: FactA data
|-neelit: NEEL-IT data
|----neel-it16_dev-set_v4: training set folder
|----neel-it_evalita2016-nil.gold.idfix: goldstandard annotations
|----neel-it_evalita2016_v3.data.gold.test: Test set
|-postwita: PoSTWITA data
|-----goldDEVset-2016_09_05.txt: training set data
|-----goldTESTset-2016_09_12.txt: test set data
|-qa4faq: QA4FAQ data
|-----qa4faq_dev_v3: training data folder
|-----qa4faq_qrel: relevance judgments
|-----qa4faq_qrel.trec: relevance judgments (TREC format)
|-----qa4faq_question: questions for testing
|-sentipolc: SENTIPOLC data
|-----sentipolc16_gold2000.csv: test set
|-----sentipolc16_officialdistrib_train.csv: training set
|-shared Files in this directory contain tweet ids shared between tasks
|-EVALITA 2016 overview: Overview of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian

Each task folder contains a PDF document that describes the task and data format. You can find further information on the website of each task.

If you use these data in writing scientific papers, or you use this data in any other medium serving scientists or students (e.g. web-sites, CD-ROMs) please include the following citation:

author={Basile, P. and Cutugno, F. and Nissim, M. and Patti, V. and Sprugnoli, R.},
title={EVALITA 2016: Overview of the 5th evaluation campaign of natural language processing and speech tools for Italian},
journal={CEUR Workshop Proceedings},


Attribution-NonCommercial-ShareAlike 3.0 Italy (CC BY-NC-SA 3.0 IT)

You are free to:

  • Share — copy and redistribute the material in any medium or format
  • Adapt — remix, transform, and build upon the material

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • NonCommercial — You may not use the material for commercial purposes.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
  • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.


  • You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
  • No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.


For what concerns Twitter datesets: any Content provided to third parties remains subject to the Twitter's Developer Agreement & Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy before receiving such downloads.