GermEval 2019 Task 3: Shared task on the lemmatization of German web and social media texts (EmpiriST-lemmatization 2019)
Note that the shared task has been cancelled due to an insufficient number of participants.
The goal of the shared task is to encourage the developers of NLP applications to adapt their tools and resources to the lemmatization of German Web pages and written German discourse in genres of computer-mediated communication (CMC). Examples for CMC genres are chats, forums, wiki talk pages, tweets, blog comments, social networks, SMS and WhatsApp dialogues.
The shared task is a follow-up to the EmpiriST 2015 shared task, which focused on tokenization and POS-tagging. The current task focuses on the next fundamental step in the NLP pipeline. Lemmatization is crucial for general corpus indexing purposes as well as for many applications in lexicography, text classification, discourse analysis, etc.
Participants will receive pre-tokenized and pre-tagged text files and will have to provide surface-oriented lemmata and/or normalized lemmata. Surface-oriented lemmata are mainly based on the inflectional suffixes of the token and retain, as far as possible, any non-standard orthographical features of the token. For normalized lemmata, on the other hand, obvious spelling errors are corrected and non-standard forms are treated as standard forms.
Subtask 1: Surface-oriented lemmatization
XD EMOASC XD du PPER du killst VVFIN killen mich PPER mich ! $. ! Soooo PTKIFG soooo herrlich ADJD herrlich xDD EMOASC xDD
Subtask 2: Normalized lemmatization
XD EMOASC XD du PPER du killst VVFIN killen mich PPER mich ! $. ! Soooo PTKIFG so herrlich ADJD herrlich xDD EMOASC xDD
- 2019-04-26: Release of training data
- 2019-07-15 to 2019-07-25: Evaluation period
2019-07-31: Publication of results 2019-08-15: Submission of system description papers 2019-08-31: Notification of acceptance 2019-10-01: Camera-ready papers due 2019-10-08: Workshop in Erlangen, Germany The shared task will be a pre-conference workshop of the Conference on
Natural Language Processing (“Konferenz zur Verarbeitung natürlicher
Sprache”, KONVENS) hosted on October 8, 2019 at FAU
Erlangen-Nuremberg, see http://2019.konvens.org/.
Participants to the shared task need to register by sending an e-mail with the following information to firstname.lastname@example.org:
- Team name (will be used to identify submissions)
- Name(s) of team member(s)
- Contact person and e-mail address
All participants and further interested parties are invited to register to our mailing list.
The training data were individually lemmatized by four student annotators according to our lemmatization guidelines. Unclear cases were decided in group meetings with the task organizers.
- training data (2019-04-26)
The shared task is organized by: