A pure Clojure reimplementation of Concraft-pl, a morphosyntactic tagger for Polish based on constrained conditional random fields.
The vast majority of code in this repository was written by an LLM (Claude Opus 4.6). The code was then lightly scrutinized and vetted by a human (Daniel Janus) to verify that it does what it says on the label; however, hallucinations are still possible.
Note that this is not a clean-room implementation: the LLM had access to the original source code. Thus, clj-concraft should be viewed as a LLM-facilitated translation, rather than an original implementation.
Use clj-concraft at your own risk. If you do so, you are strongly encouraged to do your own code review before use.
At the moment, clj-concraft only supports the tagging pipeline; it is unable to train models. It is, however, able to reuse models trained by the original Concraft-pl (it supports the same binary format).
The output of clj-concraft has been cross-validated with original Concraft-pl on the example input, yielding a 100% match.
There are none, but you can use clj-concraft as a Git dependency from within a deps.edn-based project. Add this to your deps.edn:
clj-concraft/clj-concraft {:git/url "https://github.com/nathell/clj-concraft.git"
:git/sha "dae363bdc73473106f1158ef4bb673926c2c21e6"}Replace the :git/sha with the most recent commit ID from this repo.
Like the original, clj-concraft is coupled with Morfeusz, the morphological analyzer for Polish. For ease of interoperation and to keep everything on the JVM, you may want to use it with JMorfeusz instead.
See the szlauch project for an example.
The person responsible for clj-concraft (the equivalent of “author” for human-written projects, overseeing LLMs and verifying their output) is Daniel Janus.
clj-concraft is based on Concraft-pl and Concraft, which are written and copyright by Jakub Waszczuk.
2-clause BSD, the same as Concraft-pl (see LICENSE.txt).