🇫🇮Omorfi–Open morphology of Finnish

This is a free/libre open source morphology of Finnish: a database, tools and APIs. Everything you need to build NLP applications processing Finnish language words and texts.

🇫🇮 high-quality Finnish text analysis and generation
🩸 bleeding edge
⚡ blazing fast

Documentation

I try to keep this README very condensed for github. For more detailed information, see github pages for omorfi.

Citing and academic works

Citation information is available in github's cite this repository function, backed by the CITATION.cff. For further details, see omorfi articles.

Downloading and further information

Omorfi source packages can be downloaded from github:

omorfi releases

or the most current version using git. For more information, see Release policy

Dependencies

hfst-3.15 or greater,
python-3.5 or greater,
libhfst-python,
C++ compiler and libtool
GNU autoconf-2.64, automake-1.12; compatible pkg-config implementation

Optionally:

VISL CG 3
hfst-ospell-0.2.0 or greater needed for spell-checking
Java 7, or greater, for Java bindings

Installation

For detailed instructions and explanations of different options, see Installation instructions on the github pages site. This readme is a quick reference.

Full installation

Requires all dependencies to be installed.

autoreconf -i
./configure
make
make install

Will install binaries and scripts for all users on typical environments

Minimal "installation"

To skip language model building and use some of the scripts locally.

autoreconf -i
./configure
src/bash/omorfi-download.bash

This will download some of the pre-compiled dictionaries into your current working directory.

Python installation

It is possible to install within python via pip or anaconda. The dependencies that are not available in pip or anaconda will not be usable, e.g. syntactic analysis and disambiguation using VISL CG 3.

pip install omorfi

conda install -c flammie omorfi

Docker

It is possible to use omorfi with a ready-made docker container, there is a Dockerfile in src/docker/Dockerfile for that.

docker build -t "omorfi:Dockerfile" .
docker run -it "omorfi:Dockerfile" bash

Usage

Omorfi can be used from command line using following commands:

omorfi-disambiguate-text.sh: analyse and disambiguate
omorfi-analyse-text.sh: analyse
omorfi-spell.sh: spell-check and correct
omorfi-segment.sh: morphologically segment
omorfi-conllu.bash: analyse in CONLL-U format
omorfi-freq-evals.bash: analyse coverage and statistics
omorfi-ftb3.bash: analyse in FTB-3 format (CONLL-X)
omorfi-factorise.bash: analyse in Moses-SMT factorised format
omorfi-vislcg.bash: analyse in VISL CG 3 format
omorfi-analyse-tokenised.sh: analyse word per line (faster)
omorfi-generate.sh: generate word-forms from omor descriptions
omorfi-download.bash: download language models from latest release

For further details please refer to:

Programming APIs

Omorfi can be used via very simple programming APIs, the design is detailed in omorfi API design

Using binary models

There are various binaries for language models that can be used with specialised tools like HFST. For further details, see our usage examples.

Troubleshooting

For full descriptions and archived problems, see: Troubleshooting in github pages

hfst-lexc: Unknown option

Update HFST.

ImportError (or other Python problems)

In order for python scripts to work you need to install them to same prefix as python, or define PYTHONPATH, e.g. export PYTHONPATH=/usr/local/lib/python3.11/site-packages/

Processing text gets stuck / takes long

This can easily happen for legit reasons. It can be reduced by filtering overlong tokens out. Or processing texts in smaller pieces.

Make gets killed

Get more RAM or swap space.

Contributing

Omorfi code and data are free and libre open source, and community-driven, to participate, read further information in CONTRIBUTING

Contact

Issues and problems may be filed in our github issue tracker, including support questions
IRC channel #omorfi on OFTC is particularly good for live chat for support questions, suggestions and discussions
omorfi-devel mailing list is good for longer more involved discussions

You can always discuss in English or Finnish on any of the channels.

Code of conduct

See our code of conduct.

Donations

A lot of omorfi development has been done on spare time and by volunteers, if you want to support Flammie you can use the github's ❤️Sponsor button, or any of the services below:

Become a Patron!

Name		Name	Last commit message	Last commit date
Latest commit History 2,917 Commits
.github		.github
bin		bin
doc		doc
docs		docs
m4		m4
man		man
src		src
test		test
.clang-format		.clang-format
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.hound.yml		.hound.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
AUTHORS		AUTHORS
CITATION		CITATION
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
COPYING-hfst-java		COPYING-hfst-java
ChangeLog.old		ChangeLog.old
INSTALL		INSTALL
LICENSE		LICENSE
Makefile.am		Makefile.am
NEWS		NEWS
README		README
README.md		README.md
THANKS		THANKS
TODO		TODO
_config.yml		_config.yml
autogen.sh		autogen.sh
build.gradle		build.gradle
configure.ac		configure.ac
omorfi.pc.in		omorfi.pc.in
pylintrc		pylintrc
requirements.txt		requirements.txt
setup.cfg		setup.cfg

License

Licenses found

flammie/omorfi

Folders and files

Latest commit

History

Repository files navigation

🇫🇮Omorfi–Open morphology of Finnish

Documentation

Citing and academic works

Downloading and further information

Dependencies

Installation

Full installation

Minimal "installation"

Python installation

Docker

Usage

Programming APIs

Using binary models

Troubleshooting

hfst-lexc: Unknown option

ImportError (or other Python problems)

Processing text gets stuck / takes long

Make gets killed

Contributing

Contact

Code of conduct

Donations

About

Topics

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Sponsor this project

Languages