The project is for developing a pretrained domain-specific language model for Turkish Twitter content. It has been seen that domain-adapted language models work better when comparing the general purpose base language models.
This will be the first large-scale pretrained model for Turkish Twitter that will be trained by extending a monolingually-trained Turkish BERT model.
The model adaptation will be evaluated in terms of extrinsic evaluation over different Twitter-based datasets in Turkish.