Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
C++ Java
branch: master
Failed to load latest commit information.
build-dawg-cpp Clean up
src First commit!
README.md Update README.md

README.md

dawg-java

DAWGs (directed acyclic word graphs) are minimal, highly compressed dictionaries implemented using deterministic acyclic finite state automata, providing lightning fast lookup speeds.

Some properties:

  • Prefix data structure (just like a trie)
  • Optimal lookup complexity O(length)
  • Small size, fast loading (stored as an array of ints)

This Java class provides read-only access to files created by the dawgdic C++ library and DAWG Python module.

Taken from: http://goo.gl/OyUPJV

Building in C++

A single C++ source file "dawgdic-build.cpp" which uses the dawgdic library is included in this package to build a dawg from a lexicon file, which has sorted lexicons separated by line feeds (not CRLF), and write it to disk. This code is not rigorously tested, so may give nasty suprises!

g++ -O3 dawgdic-build.cpp -o dawgdic-build
dawgdic-build lexicons.txt lexicon-dawg.dawg

Building in Python

Using the DAWG module, building the dawg is as simple as

import dawg
d = dawg.DAWG(data)
d.save('lexicon-dawg.dawg')
Something went wrong with that request. Please try again.