Skip to content

nathell/jmorfeusz

Repository files navigation

jmorfeusz

A pure-Java reimplementation of Morfeusz, a morphological analyzer for Polish.

LLM disclosure

The vast majority of code in this repository was written by an LLM (Claude Opus 4.6). The code was then lightly scrutinized and vetted by a human (Daniel Janus) to verify that it does what it says on the label; however, hallucinations are still possible.

Use JMorfeusz at your own risk. If you do so, you are strongly encouraged to do your own code review before use.

Status

At the moment, JMorfeusz only supports inflectional analysis; the synthesizer part of the original Morfeusz is not implemented.

JMorfeusz’s output has been cross-validated with original Morfeusz on the full text of “Quo Vadis”, yielding a 100% match (see tests).

Usage

As a library

The interface is very similar (albeit not identical) to the one offered by official (JNI-based) wrappers for the original Morfeusz. See also the original documentation (in Polish).

// Create analyzer instance (searches default paths for dictionary)
Morfeusz morfeusz = Morfeusz.createInstance();

// Or specify dictionary path explicitly
Morfeusz morfeusz = Morfeusz.createInstance(Paths.get("/path/to/sgjp-a.dict"));

// Analyze text - returns a DAG of morphological interpretations
List<MorphInterpretation> results = morfeusz.analyze("Miałem miał.");

// Process results
for (MorphInterpretation interp : results) {
    String tag = morfeusz.getTag(interp.getTagId());
    System.out.printf("%d->%d: %s / %s [%s]\n",
        interp.getStartNode(), interp.getEndNode(),
        interp.getOrth(), interp.getLemma(), tag);
}

Example output:

0->1: Miał / mieć [praet:sg:m1.m2.m3:imperf]
0->2: Miałem / miał [subst:sg:inst:m3]
1->2: em / być [aglt:sg:pri:imperf:wok]
2->3: miał / miał [subst:sg:nom.acc:m3]
2->3: miał / mieć [praet:sg:m1.m2.m3:imperf]
3->4: . / . [interp]

Command line interface

JMorfeusz includes a simple command-line interface, akin to the morfeusz_analyzer program that's part of the original Morfeusz (and producing output in the same format). Try:

$ echo 'Miałem miał.' | java -cp jmorfeusz-0.1.0-SNAPSHOT-sgjp.jar pl.sgjp.jmorfeusz.Analyzer

Distribution

Binary packages of JMorfeusz are distributed via GitHub Packages. There are two flavours: one with a bundled dictionary (SGJP), and one without.

Follow this guide to configure an access token in your settings.xml, then add to your pom.xml:

<dependency>
  <groupId>pl.sgjp</groupId>
  <artifactId>jmorfeusz</artifactId>
  <version>0.1.0-SNAPSHOT</version>
  <classifier>sgjp</classifier>
</dependency>

Remove the classifier if you want a thin dependency without a bundled dictionary.

Why use JMorfeusz instead of the original?

To be written...

Authors

The person responsible for JMorfeusz (the equivalent of “author” for human-written projects, overseeing LLMs and verifying their output) is Daniel Janus.

JMorfeusz is based on Morfeusz. The copyright holder of the original Morfeusz is Institute of Computer Science, Polish Academy of Sciences.

Some binary artifacts of JMorfeusz bundle the SGJP dictionary, which is copyright by Zygmunt Saloni, Włodzimierz Gruszczyński, Marcin Woliński, Robert Wołosz, and Danuta Skowrońska, and distributed under the same 2-clause BSD license as below.

License

2-clause BSD, the same as Morfeusz. See LICENSE.txt and LICENSE.pl.txt.

About

A pure-Java implementation of Morfeusz, a morphological analyzer for Polish

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors