In this project we will implement NLTK Taggers for Slovene language.
For this tagger to work, you need Python 2.7 and NLTK.
Unitl this taggers are build into NLTK, you can download the taggers from folder slovene_taggers/ and use them in NLTK.
The example, which shows how to use Slovene taggers, is in file example.py
Slovenian explanation of tags is in jos1M/josMSD-canon-sl.tbl
Folders and files description
jos100k/ : Slovene corpus taken from JOS project with 100.000 tagged words.
jos1M/ : Slovene corpus taken from JOS project with million tagged words.
pos/jos1M.pos : this file is used as an input for trainer program from trainer/
slovene_taggers/ : the result of this project. Here are strored Slovene Taggers, which can be used in NLTK.
trainer/ : the code forked from https://github.com/japerk/nltk-trainer. This trainer is used to train the taggers.
example.py : this example shows, how to use Slovene taggers in NLTK.
generateTaggers.sh : commands for generating the taggers. The generation uses data pos/jos1M.pos and program trainer/train_tagger.py.
evaluateTaggers.sh : commands for accuracy evaluation of the taggers.
transformJOS.py : the code for transforming all .xml corpuses from jos1M/ into pos/jos1M.pos.