Omorfi–Open morphology of Finnish
This is a free/libre open source morphology of Finnish: a database, tools and APIs. Everything you need to build NLP applications processing Finnish language words and texts.
For more detailed information, see github pages for omorfi.
Citing and academic works
Downloading and further information
Omorfi packages can be downloaded from github:
or the most current version using git. For more information, see Release policy
- hfst-3.15 or greater,
- python-3.5 or greater,
- C++ compiler and libtool
- GNU autoconf-2.64, automake-1.12; compatible pkg-config implementation
- VISL CG 3
- hfst-ospell-0.2.0 or greater needed for spell-checking
- Java 7, or greater, for Java bindings
For further information, see Installation instructions
It is possible to download the language models from previous release from the internet (Minimal installation) or compile them from the database (Normal installation), the former is recommended for new users and latter for advanced users.
Normal installation (recommended)
./configure make make install
For further instructions, see Intallation instructions.
autoreconf -i ./configure src/bash/omorfi-download.bash
This will download some of the pre-compiled dictionaries into your current working directory.
It is possible to install only python bindings via pip or anaconda. The dependencies that are not available in pip or anaconda will not be used, e.g. syntactic analysis and disambiguation using VISL CG 3.
Omorfi can be used from command line using following commands:
omorfi-disambiguate-text.sh: analyse and disambiguate
omorfi-spell.sh: spell-check and correct
omorfi-segment.sh: morphologically segment
omorfi-conllu.bash: analyse in CONLL-U format
omorfi-freq-evals.bash: analyse coverage and statistics
omorfi-ftb3.bash: analyse in FTB-3 format (CONLL-X)
omorfi-factorise.bash: analyse in Moses-SMT factorised format
omorfi-vislcg.bash: analyse in VISL CG 3 format
omorfi-analyse-tokenised.sh: analyse word per line (faster)
omorfi-generate.sh: generate word-forms from omor descriptions
omorfi-download.bash: download language models from latest release
For further details please refer to:
Omorfi can be used via very simple programming APIs, the design is detailed in omorfi API design
Using binary models
There are various binaries for language models that can be used with specialised tools like HFST. For further details, see our usage examples.
For full descriptions and archived problems, see: Troubleshooting in github pages
hfst-lexc: Unknown option
ImportError (or other Python problems)
In order for python scripts to work you need to install them to same prefix as
python, or define PYTHONPATH, e.g.
Processing text gets stuck / takes long
This can easily happen for legit reasons. It can be reduced by filtering overlong tokens out. Or processing texts in smaller pieces.
Make gets killed
Get more RAM or swap space.
Omorfi code and data are free and libre open source, and community-driven, to participate, read further information in CONTRIBUTING
- Issues and problems may be filed in our github issue tracker, including support questions
- IRC channel #omorfi on Freenode is particularly good for live chat for support questions, suggestions and discussions
- omorfi-devel mailing list is good for longer more involved discussions
You can always discuss in English or Finnish on any of the channels.
Code of conduct
See our code of conduct.