Skip to content
develop
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
bin
 
 
doc
 
 
 
 
m4
 
 
man
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Omorfi–Open morphology of Finnish

This is a free/libre open source morphology of Finnish: a database, tools and APIs. Everything you need to build NLP applications processing Finnish language words and texts.

Build Status (stable master branch: Build Status ) CI

Documentation

For more detailed information, see github pages for omorfi.

Citing and academic works

Citation information can be found in file CITATION. For further details, see omorfi articles.

Downloading and further information

Omorfi packages can be downloaded from github:

or the most current version using git. For more information, see Release policy

Dependencies

  • hfst-3.15 or greater,
  • python-3.5 or greater,
  • libhfst-python,
  • C++ compiler and libtool
  • GNU autoconf-2.64, automake-1.12; compatible pkg-config implementation

Optionally:

  • VISL CG 3
  • hfst-ospell-0.2.0 or greater needed for spell-checking
  • Java 7, or greater, for Java bindings

For further information, see Installation instructions

Installation

It is possible to download the language models from previous release from the internet (Minimal installation) or compile them from the database (Normal installation), the former is recommended for new users and latter for advanced users.

Normal installation (recommended)

./configure
make
make install

For further instructions, see Intallation instructions.

Minimal installation

autoreconf -i
./configure
src/bash/omorfi-download.bash

This will download some of the pre-compiled dictionaries into your current working directory.

Python-bindings only

It is possible to install only python bindings via pip or anaconda. The dependencies that are not available in pip or anaconda will not be used, e.g. syntactic analysis and disambiguation using VISL CG 3.

Anaconda

Usage

Omorfi can be used from command line using following commands:

  1. omorfi-disambiguate-text.sh: analyse and disambiguate
  2. omorfi-analyse-text.sh: analyse
  3. omorfi-spell.sh: spell-check and correct
  4. omorfi-segment.sh: morphologically segment
  5. omorfi-conllu.bash: analyse in CONLL-U format
  6. omorfi-freq-evals.bash: analyse coverage and statistics
  7. omorfi-ftb3.bash: analyse in FTB-3 format (CONLL-X)
  8. omorfi-factorise.bash: analyse in Moses-SMT factorised format
  9. omorfi-vislcg.bash: analyse in VISL CG 3 format
  10. omorfi-analyse-tokenised.sh: analyse word per line (faster)
  11. omorfi-generate.sh: generate word-forms from omor descriptions
  12. omorfi-download.bash: download language models from latest release

For further details please refer to:

Programming APIs

Omorfi can be used via very simple programming APIs, the design is detailed in omorfi API design

Using binary models

There are various binaries for language models that can be used with specialised tools like HFST. For further details, see our usage examples.

Troubleshooting

For full descriptions and archived problems, see: Troubleshooting in github pages

hfst-lexc: Unknown option

Update HFST.

ImportError (or other Python problems)

In order for python scripts to work you need to install them to same prefix as python, or define PYTHONPATH, e.g. export PYTHONPATH=/usr/local/lib/python3.4/site-packages/

Processing text gets stuck / takes long

This can easily happen for legit reasons. It can be reduced by filtering overlong tokens out. Or processing texts in smaller pieces.

Make gets killed

Get more RAM or swap space.

Contributing

Omorfi code and data are free and libre open source, and community-driven, to participate, read further information in CONTRIBUTING

Contact

  • Issues and problems may be filed in our github issue tracker, including support questions
  • IRC channel #omorfi on Freenode is particularly good for live chat for support questions, suggestions and discussions
  • omorfi-devel mailing list is good for longer more involved discussions

You can always discuss in English or Finnish on any of the channels.

Code of conduct

See our code of conduct.

Donations

A lot of omorfi development has been done on spare time and by volunteers, if you want to support Flammie you can use the github's ❤️Sponsor button, or any of the services below:

Donate using Liberapay

Become a Patron!