Slovene NLTK tagger
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 37 commits behind nikicc:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
jos100k
jos1M
paper
pos
slovene_taggers
trainer
.gitignore
README.md
evaluateTaggers.sh
example.py
generateTaggers.sh
transformJOS.py

README.md

About

In this project we will implement NLTK Taggers for Slovene language.

##Reqirements

For this tagger to work, you need Python 2.7 and NLTK.

##Usage

Unitl this taggers are build into NLTK, you can download the taggers from folder slovene_taggers/ and use them in NLTK.

The example, which shows how to use Slovene taggers, is in file example.py

Slovenian explanation of tags is in jos1M/josMSD-canon-sl.tbl

##Folders and files description

  • jos100k/ : Slovene corpus taken from JOS project with 100.000 tagged words.

  • jos1M/ : Slovene corpus taken from JOS project with million tagged words.

  • pos/jos1M.pos : this file is used as an input for trainer program from trainer/

  • slovene_taggers/ : the result of this project. Here are strored Slovene Taggers, which can be used in NLTK.

  • trainer/ : the code forked from https://github.com/japerk/nltk-trainer. This trainer is used to train the taggers.

  • example.py : this example shows, how to use Slovene taggers in NLTK.

  • generateTaggers.sh : commands for generating the taggers. The generation uses data pos/jos1M.pos and program trainer/train_tagger.py.

  • evaluateTaggers.sh : commands for accuracy evaluation of the taggers.

  • transformJOS.py : the code for transforming all .xml corpuses from jos1M/ into pos/jos1M.pos.