Skip to content
Java implmentation of LemmaGen project
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore formatting Feb 21, 2018 configure release Feb 21, 2018
nb-configuration.xml separate modules Feb 21, 2018


JLemmaGen is java implmentation of LemmaGen project. It's open source lemmatizer with 15 prebuilted european lexicons. Of course you can build your own lexicon.

LemmaGen project aims at providing standardized open source multilingual platform for lemmatisation.

Project contains 2 libraries:

  • lemmagen.jar - implementation of lemmatizer and API for building own lemmatizers
  • lemmagen-lang.jar - prebuilted lemmatizers from Multext Eastern dictionaries * IMPORTANT! - see License chapter.

Sample Usage

Lemmatizer lm = LemmatizerFactory.getPrebuilt("mlteast-en");



    <name>JLemmaGen snaphsot repository</name>



Additionally you can add language dictionaries:


Lucene (Solr)

You need these jars to integrate with lucene/solr:

  • lemmagen-lucene.jar
  • lemmagen.jar
  • lemmagen-lang.jar
  • SLF4J API and implememtation (e.g. slf4j-jdk14.jar)

Example of solr filter definition in schema (e.g. Slovak):

<filter class="org.apache.lucene.analysis.lemmagen.LemmagenFilterFactory" lexicon="mlteast-sk"/>

Making release

mvn clean release:prepare release:perform -Darguments='-Dmaven.javadoc.failOnError=false'
git push --follow-tags


All source code is licensed under Apache License 2.0. Important note is that binary rule tree files (*.lem) are NOT licensed under Apache License 2.0 and can be used only for non-commercial projects.

You can’t perform that action at this time.