A barebones implementation of a Directed Acyclic Word Graph (DAWG) in C.
Run make and then ./yodawg <prefix> to test.
To quote Wikipedia, a DAWG, or Directed Acyclic Word Graph, "is a data structure that represents a set of strings, and allows for a query operation that tests whether a given string belongs to the set in time proportional to its length. In these respects, a DAWG is very similar to a trie, but it is much more space efficient."
You can add a set of strings to a DAWG, then, for example, test to see if a string is in the set in O(n) time, where n is the length of the tested string.
You can also, in much slower time, but fairly compact space, get all the members of the set that begin with a given prefix. This is useful, for example, for autocomplete. And that is why I started researching DAWGs and eventually wrote yodawg.
A few reasons:
- To practice some slightly non-trivial C.
- To better understand DAWGs.
- In hopes that I might build a space-efficient autocomplete tool.
My goal with #3 is to build a Python module around yodawg that can store a DAWG very space-efficiently, and then write a simple web service that, given a prefix, returns the possible completions.
By writing yodawg in C, I hope it might be able to form the core of modules for other languages, like Ruby or Node.js, too.
- I am, shall we say, a beginner at C.
- I wouldn't use this in critical or production code.
- I haven't done much (any) performance (memory or time) testing yet.
- Patches very welcome, but please be nice (remember #1).
- Some form of unicode support.