This repository contains code and text for my diploma thesis and also best models for all task variants. Models are published under Attribution-NonCommercial-ShareAlike 4.0 International licence.
Best models are available in the form of checkpoints temporarily on AIC cluster:
- tagging and lemmatization - tl_18 index data
- csfd index data mappings
- mall index data
- facebook index data
- joint index data
or on Lindat:
Demo notebook with an example of usage of pretrained models is available for tagging and lemmatization here. Demo for sentiment is available here.
If you wish to replicate training experiments, the list of scripts with hyperparameters is in run_scripts
Input data should be in the following format: every line contains one input word, gold lemma and gold tag (all separated by tab) as in the following example.
Faxu fax NNIS3-----A----
škodí škodit_:T VB-P---3P-AA---
především především Db-------------
přetížené přetížený_^(*3it) AAFP1----1A----
telefonní telefonní AAFP1----1A----
linky linka NNFP1-----A----
The model also needs the same embeddings as in the demo notebooks.