Slovene NLTK tagger
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 37 commits behind nikicc:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


In this project we will implement NLTK Taggers for Slovene language.


For this tagger to work, you need Python 2.7 and NLTK.


Unitl this taggers are build into NLTK, you can download the taggers from folder slovene_taggers/ and use them in NLTK.

The example, which shows how to use Slovene taggers, is in file

Slovenian explanation of tags is in jos1M/josMSD-canon-sl.tbl

##Folders and files description

  • jos100k/ : Slovene corpus taken from JOS project with 100.000 tagged words.

  • jos1M/ : Slovene corpus taken from JOS project with million tagged words.

  • pos/jos1M.pos : this file is used as an input for trainer program from trainer/

  • slovene_taggers/ : the result of this project. Here are strored Slovene Taggers, which can be used in NLTK.

  • trainer/ : the code forked from This trainer is used to train the taggers.

  • : this example shows, how to use Slovene taggers in NLTK.

  • : commands for generating the taggers. The generation uses data pos/jos1M.pos and program trainer/

  • : commands for accuracy evaluation of the taggers.

  • : the code for transforming all .xml corpuses from jos1M/ into pos/jos1M.pos.