Skip to content

rasmuserik/trie-experiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trie experiments

This is an experimental implementation of a trie, which is a key/value-mapping and a persistent data structure. It is more cache aware / sensitive for access patterns than HAMT, so it is difficult to benchmark versus other persistent data structures, but initial experiments indicates it might be twice as fast as immutable-js, which seems faster than mori.

The vision is a compact key/value-datastructure/database, that can be synchronised to disk and across network, and is also useful for functional-reactive-programming. The intention is to merge ideas from the Trie, the B-Tree and compressed/succint datastructures.

Currently this is just initial steps.

Bon - Binary Object Notation

The trie works only with binary data. To store strings and other objects, they need to be encoded into byte arrays. The following properties are desired:

  • Fast, - for every key lookup/store, the key has to be converted to a byte array, so the encoding needs to be fast, - preliminary benchmark shows that Bon is about twice as fast as JSON.stringify.
  • Prefix-free, - No encoded object should be a prefix of another, - this makes the trie-implementation simpler, as data is only stored in the leafs. Example: the string "the" is a prefix of "there", but zero-terminated strings "the\0" is not a prefix of "there\0".
  • Compact - the shorter the better, though for tradeofs it is generally a higher priority to be fast.
  • Different data types, - not only strings, but also integers, numbers, objects, arrays, ... should be encodeable
  • Order preserving. The lexicograhical sorted binary strings should preserver the order of the sorted strings, ie "hello" < "hi", and -10 < -5 < 20 < 300 < 1000. For performance reason order of floating point numbers are not preserved.

Trie

Nodes:

  • prefix-free strings are easily enforced, and thus data is only stored in leafs / empty nodes
  • special case for unary nodes and binary nodes for performance yields better performance
  • sorted-node ie. linear list of symbols, binary search during lookup.
  • 4+4-node, ie two levels of 16-entry tables (second level only allocated when needed).

Performance:

  • performance of 4+4 and linear nodes are quite similar, though for < 50 symbols linear are typically peforms better, and vice versa.
  • large performance degradation when linear nodes crosses the 127-symbol size on spidermonkey/firefox.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published