A kind of Radix Tree or Trie data structure baptized 'Sofia Tree' in honour to my beloved daughter. A multinode tree with a level for each letter with fast lookup operations. See Anottated Source
- Non-recursive tree traverse implementation to keep memory usage low and speed up on lookup operations.
- Efficient in-memory data structure, no redis, no dependencies
- Optional built-in cache for even faster lookups.
- Optionally case sensitive
- Optionally limit number of results to get matches even faster
- Optionally exclude the given prefix if it's a valid word
- Batteries included example to benchmark its performance
npm install sofia-tree
var SofiaTree=require('sofia-tree');
var dictionary=["f","foo","foobar","b","bar","barbar"];
var sofiaTree= new SofiaTree({
useCache: true,
caseSensitive: false
});
dictionary.forEach(function(word){
sofiaTree.insert(word);
});
//Retrieve all the completions including the prefix
console.log(sofiaTree.getCompletions("foo"));
["foo","foobar"]
//Retrieve all the completions without the prefix itself
console.log(sofiaTree.getCompletions("foo",true));
["foobar"]
//Limit the number of results without excluding the prefix (defaults to false)
console.log(sofiaTree.getCompletions("f",2)))
["f","foobar"]
//Prints the whole dictionary
console.log(sofiaTree.getCompletions(""));
["f","foo","foobar","b","bar","barbar"]
See tests for more complete use cases
Clone the whole repo and run npm run example
to load a micro search engine, in another console run cd example && ./test.sh
to load a 350,000 words dictionary and launch 1,500 searchs with 100 threads in parallel. See Anottated Source and shell scripts
Actually when limiting the maximum number or matches to be retorned it won't necessarely return the first ones in alphabetical order. When returning all the results they won't be alphabetically sorted either. This is due the nature of javascript objects properties which are unordered by definition. Perhaps I'll switch properties for an array in the future.
Define nice jasmine tests and setup TravisComment the example API usage- Reduce prefixes stack memory comsuption
Partial cache invalidation mechanism when inserting new words (it actually resets the whole cache upon insertion of new words.)- Implement remove method
- Return results as stream
Case-sensitive lookups? Not sure if it's a valuable adition- Add weight on nodes to enable sort based on some business logic
- Any ideas? :)
Made with ❤ in Spain