title |
---|
CzechIT! - A linguistic corpus of Czech learners acquiring Italian |
Browse the texts here.
Second Language Acquisition (SLA) is a fertile field of research in linguistic studies, either by applied and empirical standpoints than from theoretical and general perspectives. This corpus stands for comparative and contrastive analyses exhibited among linguistic structures patterns among languages during the acquisitional path by the learner.
The project is based on quantitative analyses of the corpus, which is constituted by an amount of different kinds of data, in order to retain a wide range of linguistic behaviors and styles:
- Email communications
- Text messages (SMS, Chat)
- Oral production
- Auto-judgements of grammaticality
Data is marked and annotated with NLP tools running in the Python environment.
The project starts from July, 2017 and does not have an upper limit of time, so please check the news to stay tuned.
The corpus itself will be released as soon as possible in open file format with a CC0 license.