k-dimensional B tree backed to a chunk store
This code is based on the original kdb tree paper and the algorithm description in "Data Structures and Algorithms in C++, 4th edition".
For an in-memory version of this algorithm, look at the kdb-tree package.
var kdbtree = require('kdb-tree-store')
var fdstore = require('fd-chunk-store')
var tmpdir = require('os').tmpdir()
var path = require('path')
var file = path.join(tmpdir, 'kdb-tree-' + Math.random())
var n = 5000
var kdb = kdbtree({
types: [ 'float32', 'float32', 'float32', 'uint32' ],
store: fdstore(1024, file)
})
var pending = n
for (var i = 0; i < n; i++) (function () {
var x = Math.random() * 200 - 100
var y = Math.random() * 200 - 100
var z = Math.random() * 200 - 100
var loc = Math.floor(Math.random() * 1000)
kdb.insert([x,y,z], loc, function (err) {
if (--pending === 0) check()
})
})()
function check () {
kdb.query([[-100,0],[0,5],[-50,-40]], function (err, pts) {
console.log(pts)
})
}
var kdbtree = require('kdb-tree-store')
Create a new kdb tree instance kdb
given opts
:
opts.types
- array of data types for each dimension plus the payload type at the endopts.store
- chunk store instanceopts.available
- next free chunk index to use, set if loading a previously saved file with data from'available'
events
Query for results with q
, an array of [min,max]
arrays for each dimension.
The results are given as an array of points in cb(err, results)
. Each element
in results
has a point
and value
property.
opts.depth
- add depth information to each matching point when true in adepth
property (default:false
)opts.index
- add[chunkIndex,pointIndex]
pairs to each matching point when true in anindex
property (default:false
)
Return a readable stream
of query results from the query q
.
Insert value
(a 32-bit integer) at a point pt
.
Remove all the points in a query q
, modified by these options:
opts.value
- only remove points that value this valueopts.filter(pt)
- only remove points where this function returns true. Points havepoint
andvalue
properties. Precedence overopts.value
.opts.index
- remove exactly one item by its[chunkIndex,pointIndex]
. Highest precedence.
Index n
of the next available chunk to use.
Save n
and pass as opts.available
to future kdb instances that load from the
same file.
These data types are provided under string aliases:
float
(float32
)double
(float64
)uint8
uint16
uint32
int8
int16
int32
buffer[BYTES]
- ex:buffer[10]
for 10 bytes
Otherwise, a data type must be an object with these properties:
t.read(buf, offset)
t.write(buf, value, offset)
t.size
(in bytes)t.min
t.max
t.cmp.eq(a, b)
t.cmp.lt(a, b)
t.cmp.lte(a, b)
t.cmp.gt(a, b)
t.cmp.gte(a, b)
The combined size of all the types in a chunk must be below the chunkLength of
the opts.store
given in the kdbtree()
constructor.
The kdb tree paper describes the resulting tree as balanced, but this module does not yet generate very balanaced trees in practice. Some help on this part would be great!
The splitting plane is not yet chosen very well, looking only at the median of the presently overfull point page along the depth modulo dimension axis.
Here is a histogram of depths (right column) for 15000 points under the current implementation:
$ node example/depth.js 15000 | uniq -c
2876 2
2487 4
2825 5
274 6
1204 7
1990 8
1223 9
1092 10
338 11
242 13
124 14
208 15
117 17
npm install kdb-tree-store
BSD