A pure-Java reimplementation of Morfeusz, a morphological analyzer for Polish.
The vast majority of code in this repository was written by an LLM (Claude Opus 4.6). The code was then lightly scrutinized and vetted by a human (Daniel Janus) to verify that it does what it says on the label; however, hallucinations are still possible.
Use JMorfeusz at your own risk. If you do so, you are strongly encouraged to do your own code review before use.
At the moment, JMorfeusz only supports inflectional analysis; the synthesizer part of the original Morfeusz is not implemented.
JMorfeusz’s output has been cross-validated with original Morfeusz on the full text of “Quo Vadis”, yielding a 100% match (see tests).
The interface is very similar (albeit not identical) to the one offered by official (JNI-based) wrappers for the original Morfeusz. See also the original documentation (in Polish).
// Create analyzer instance (searches default paths for dictionary)
Morfeusz morfeusz = Morfeusz.createInstance();
// Or specify dictionary path explicitly
Morfeusz morfeusz = Morfeusz.createInstance(Paths.get("/path/to/sgjp-a.dict"));
// Analyze text - returns a DAG of morphological interpretations
List<MorphInterpretation> results = morfeusz.analyze("Miałem miał.");
// Process results
for (MorphInterpretation interp : results) {
String tag = morfeusz.getTag(interp.getTagId());
System.out.printf("%d->%d: %s / %s [%s]\n",
interp.getStartNode(), interp.getEndNode(),
interp.getOrth(), interp.getLemma(), tag);
}Example output:
0->1: Miał / mieć [praet:sg:m1.m2.m3:imperf]
0->2: Miałem / miał [subst:sg:inst:m3]
1->2: em / być [aglt:sg:pri:imperf:wok]
2->3: miał / miał [subst:sg:nom.acc:m3]
2->3: miał / mieć [praet:sg:m1.m2.m3:imperf]
3->4: . / . [interp]
JMorfeusz includes a simple command-line interface, akin to the morfeusz_analyzer program that's part of the original Morfeusz (and producing output in the same format). Try:
$ echo 'Miałem miał.' | java -cp jmorfeusz-0.1.0-SNAPSHOT-sgjp.jar pl.sgjp.jmorfeusz.AnalyzerBinary packages of JMorfeusz are distributed via GitHub Packages. There are two flavours: one with a bundled dictionary (SGJP), and one without.
Follow this guide to configure an access token in your settings.xml, then add to your pom.xml:
<dependency>
<groupId>pl.sgjp</groupId>
<artifactId>jmorfeusz</artifactId>
<version>0.1.0-SNAPSHOT</version>
<classifier>sgjp</classifier>
</dependency>Remove the classifier if you want a thin dependency without a bundled dictionary.
To be written...
The person responsible for JMorfeusz (the equivalent of “author” for human-written projects, overseeing LLMs and verifying their output) is Daniel Janus.
JMorfeusz is based on Morfeusz. The copyright holder of the original Morfeusz is Institute of Computer Science, Polish Academy of Sciences.
Some binary artifacts of JMorfeusz bundle the SGJP dictionary, which is copyright by Zygmunt Saloni, Włodzimierz Gruszczyński, Marcin Woliński, Robert Wołosz, and Danuta Skowrońska, and distributed under the same 2-clause BSD license as below.
2-clause BSD, the same as Morfeusz. See LICENSE.txt and LICENSE.pl.txt.