Find multiwords in text using an Aho Corasick automaton. Works for Mandarin and Finnish.
Find multiwords in text using the rarest lemma as a key. Can find contiguous multiwords in tokenized text or discontinuous ones from a dependency tree.
Processing pipeline for FinnMWE.
There are only tests and a few docstrings for now.