Skip to content

Diploma thesis, Faculty of Math and Physics of the Charles University

Notifications You must be signed in to change notification settings

flower-go/DiplomaThesis

Repository files navigation

DiplomaThesis

Czech NLP with Contwextalized Embeddings

This repository contains code and text for my diploma thesis and also best models for all task variants. Models are published under Attribution-NonCommercial-ShareAlike 4.0 International licence.

Best models are available in the form of checkpoints temporarily on AIC cluster:

or on Lindat:

Demo notebook with an example of usage of pretrained models is available for tagging and lemmatization here. Demo for sentiment is available here.

If you wish to replicate training experiments, the list of scripts with hyperparameters is in run_scripts Input data should be in the following format: every line contains one input word, gold lemma and gold tag (all separated by tab) as in the following example.
Faxu fax NNIS3-----A----
škodí škodit_:T VB-P---3P-AA---
především především Db-------------
přetížené přetížený_^(*3it) AAFP1----1A----
telefonní telefonní AAFP1----1A----
linky linka NNFP1-----A----

The model also needs the same embeddings as in the demo notebooks.

About

Diploma thesis, Faculty of Math and Physics of the Charles University

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages