Support lazy evaluation and iterable input sources for performance #76

kofrasa · 2017-12-15T19:40:08Z

Problem

Mingo has a great feature set for transforming collections using MongoDB queries, however it is still quite slow for large datasets (#71) when compared to other Javascript utility libraries.

It is also currently unsuitable for use cases that require streaming query results as they are computed see #38. The current iterator interface on Cursor with next() and hasNext()is only a facade and will process the entire input source on first use.

The initial goal for Mingo to be most feature complete with regards to queries has mostly been achieved. It currently has most useful queries required for obj transformations with the exception of geospatial queries. In view of that, it useful direct focus to improving the performance of the library to make it competitive with existing alternatives.

Following the major refactoring (34c5ac7) and porting to ES6 the library is particularly well structured now to optimize with little disruption.

Solution

Support streaming for pipeline operators by transforming input sequence lazily. Operations that need to transform the whole sequence (eg. $group and $sort) should also produce iterators on their output.
Support any iterable object (i.e. Object{next:Function}) as input to produce values lazily for processing.
Minimal implementation with small foot print to keep to spirit of taking no dependencies.

Initial Work

A first stab of replacing the current pipeline operators with lazy versions have yielded significant performance gain (~60% speedup) benchmarking with the unit tests.

// command: for i in `seq 3`; do time tape test/**/*.js >/dev/null; done

// Current: master
// tape test/**/*.js > /dev/null  3.61s user 0.13s system 100% cpu 3.733 total
// tape test/**/*.js > /dev/null  3.45s user 0.11s system 102% cpu 3.487 total
// tape test/**/*.js > /dev/null  3.44s user 0.11s system 102% cpu 3.461 total

// Lazy: (ae497acb0fae6ec92546d9339f10d8e7a6dacd1a)
// tape test/**/*.js > /dev/null  2.58s user 0.11s system 103% cpu 2.605 total
// tape test/**/*.js > /dev/null  2.54s user 0.10s system 103% cpu 2.557 total
// tape test/**/*.js > /dev/null  2.55s user 0.10s system 103% cpu 2.562 total

// Lazy: (latest)
// tape test/**/*.js > /dev/null  1.54s user 0.08s system 113% cpu 1.423 total
// tape test/**/*.js > /dev/null  1.52s user 0.08s system 113% cpu 1.411 total
// tape test/**/*.js > /dev/null  1.47s user 0.07s system 111% cpu 1.382 total

The text was updated successfully, but these errors were encountered:

kofrasa · 2018-01-11T07:43:57Z

v2.1.1 adds support for Lazy evaluation under the hood.

More performance improvements will follow in v2.2.0

kofrasa added the enhancement label Dec 15, 2017

kofrasa closed this as completed Jan 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support lazy evaluation and iterable input sources for performance #76

Support lazy evaluation and iterable input sources for performance #76

kofrasa commented Dec 15, 2017

kofrasa commented Jan 11, 2018

Support lazy evaluation and iterable input sources for performance #76

Support lazy evaluation and iterable input sources for performance #76

Comments

kofrasa commented Dec 15, 2017

Problem

Solution

Initial Work

kofrasa commented Jan 11, 2018