Skip to content

Latest commit

 

History

History
25 lines (20 loc) · 1.09 KB

index.md

File metadata and controls

25 lines (20 loc) · 1.09 KB
title
CzechIT! - A linguistic corpus of Czech learners acquiring Italian

Browse

Browse the texts here.

Aims

Second Language Acquisition (SLA) is a fertile field of research in linguistic studies, either by applied and empirical standpoints than from theoretical and general perspectives. This corpus stands for comparative and contrastive analyses exhibited among linguistic structures patterns among languages during the acquisitional path by the learner.

Data

The project is based on quantitative analyses of the corpus, which is constituted by an amount of different kinds of data, in order to retain a wide range of linguistic behaviors and styles:

  • Email communications
  • Text messages (SMS, Chat)
  • Oral production
  • Auto-judgements of grammaticality

Methods

Data is marked and annotated with NLP tools running in the Python environment.

Timeline

The project starts from July, 2017 and does not have an upper limit of time, so please check the news to stay tuned.

Usage

The corpus itself will be released as soon as possible in open file format with a CC0 license.