Skip to content

sbsdev/wordhierarchy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wordhierarchy

This project provides a word hierarchy builder. It builds a tree out of a set of words which can then be navigated by a WordProcessor to generate e.g. Regexps that match any of the words in the given set.

Example:

java -jar dist/wordhierarchy.jar Euch Euer Eure Eurer
 Eu -
  er 
  ch 
  re 
   r 

(?:Eu(?:er|ch|re(?:r)?))

The output of the command line program is the input partitioned into common parts of words. If a part of a word does not complete a word, a - is appended (above: Eu -). If a part of a word does indeed complete a word, no - is appended (above: er, ch, re, r).

The last line of the output is a Java regexp that matches the set of words. It is built by the included RegexWordProcessor, which can easily be adapted to other regexp dialects (e.g. perl, etc.).

This example shows the command line use which is merely intended for demonstration purposes. It's mainly to be used is as a library.

Todo

  • works best for words with common substrings from the left. Could be improved to work with substrings anywhere.

  • look at Trie

  • simplify using ideas from this post

  • improve when there's a row of single characters: Instead of e.g. (?:3|8|1|6|4) it could generate [38164].

  • improve command line: offer options to generate different regexp dialects.

Build

mvn -B package

Release

mvn release:prepare
mvn release:perform

Authors

Bernhard Wagner

License

Copyright 2011 Bernhard Wagner.

Licensed under GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%