DAWGs (directed acyclic word graphs) are minimal, highly compressed dictionaries implemented using deterministic acyclic finite state automata, providing lightning fast lookup speeds.
- Prefix data structure (just like a trie)
- Optimal lookup complexity O(length)
- Small size, fast loading (stored as an array of ints)
Taken from: http://goo.gl/OyUPJV
Building in C++
A single C++ source file "dawgdic-build.cpp" which uses the dawgdic library is included in this package to build a dawg from a lexicon file, which has sorted lexicons separated by line feeds (not CRLF), and write it to disk. This code is not rigorously tested, so may give nasty suprises!
g++ -O3 dawgdic-build.cpp -o dawgdic-build dawgdic-build lexicons.txt lexicon-dawg.dawg
Building in Python
Using the DAWG module, building the dawg is as simple as
import dawg d = dawg.DAWG(data) d.save('lexicon-dawg.dawg')