Skip to content

Quick Start to Tools

eaxelson edited this page Aug 15, 2017 · 1 revision

Tool User Quick Start

Download and compile a lexicon

If you have installed hfst, download a Finnish lexicon text file from:

http://hfst.github.io/downloads/finntreebank.lexc

and use the commands mentioned in the beginning of the file:

hfst-lexc -v -f foma finntreebank.lexc -o finntreebank.inverted.hfst
hfst-invert -v finntreebank.inverted.hfst -o finntreebank.debug.hfst
hfst-fst2fst -v finntreebank.debug.hfst -f olw -o finntreebank.hfst

You may also download some precompiled lexicons for various languages from

https://sourceforge.net/projects/hfst/files/resources/morphological-transducers/

Use the lexicon

You can try out the Finnish lexicon with some word, e.g. "testi":

echo "testi" | hfst-lookup finntreebank.hfst

and you should get the line:

testi testi<N><sg><nom> 0.000000

Try a non-word

echo "xtesti" | hfst-lookup finntreebank.hfst

and you should get:

xtesti xtesti+? inf

Other lookup tools

There is a tool that does some useful things with capital letters, but may be slightly slower. You can feed it text and not only single words:

cat your-text | hfst-proc [--xerox] finntreebank.hfst

On the other hand, if you need speed, e.g. when you have millions of words to analyze, you may wish to feed your list of words to the lookup command:

cat your-list | hfst-lookup finntreebank.hfst

All commands have various parameters that will give you different formatting of the output. You get advice on those with the --help option, e.g. hfst-lookup --help.

Clone this wiki locally
You can’t perform that action at this time.