Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Slovene NLTK tagger
branch: master

This branch is 37 commits behind nikicc:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
jos100k
jos1M
paper
pos
slovene_taggers
trainer
.gitignore
README.md
evaluateTaggers.sh
example.py
generateTaggers.sh
transformJOS.py

README.md

About

In this project we will implement NLTK Taggers for Slovene language.

Reqirements

For this tagger to work, you need Python 2.7 and NLTK.

Usage

Unitl this taggers are build into NLTK, you can download the taggers from folder slovene_taggers/ and use them in NLTK.

The example, which shows how to use Slovene taggers, is in file example.py

Slovenian explanation of tags is in jos1M/josMSD-canon-sl.tbl

Folders and files description

  • jos100k/ : Slovene corpus taken from JOS project with 100.000 tagged words.

  • jos1M/ : Slovene corpus taken from JOS project with million tagged words.

  • pos/jos1M.pos : this file is used as an input for trainer program from trainer/

  • slovene_taggers/ : the result of this project. Here are strored Slovene Taggers, which can be used in NLTK.

  • trainer/ : the code forked from https://github.com/japerk/nltk-trainer. This trainer is used to train the taggers.

  • example.py : this example shows, how to use Slovene taggers in NLTK.

  • generateTaggers.sh : commands for generating the taggers. The generation uses data pos/jos1M.pos and program trainer/train_tagger.py.

  • evaluateTaggers.sh : commands for accuracy evaluation of the taggers.

  • transformJOS.py : the code for transforming all .xml corpuses from jos1M/ into pos/jos1M.pos.

Something went wrong with that request. Please try again.