This project is aimed to evaluate the effects of changes to a corpus annotation on POS tagging, with cross-validation.
MIT licensed, except the 3rd party files which have their own licences.
Currently, the code includes parts of my (unreleased) chart parser “ijáček”. It should be released as well and the common code should be shared across the projects.
To be written, but you need at least Python 3.5, RFTagger, and GNU Make. Plus the DESAM corpus or any corpus using the Czech attributive tagset. The tagset is employed by a free morphological analyzer Majka.
There may be some useful description in readme.html.
(Optional) Python 3 packages, available in Arch Linux AUR:
python-beautifulsoup4 4.5.1-1(required by
convert_to_latex.py, just a helper script)
pygments_lexer.py, also an unnecessary part)
Firefox >= 51 is advised for colourful emojis to help navigate generated HTML tables with better visual cue than just shapes/glyphs.
Czech comments in the code do not contain important stuff.