You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mingo has a great feature set for transforming collections using MongoDB queries, however it is still quite slow for large datasets (#71) when compared to other Javascript utility libraries.
It is also currently unsuitable for use cases that require streaming query results as they are computed see #38. The current iterator interface on Cursor with next() and hasNext()is only a facade and will process the entire input source on first use.
The initial goal for Mingo to be most feature complete with regards to queries has mostly been achieved. It currently has most useful queries required for obj transformations with the exception of geospatial queries. In view of that, it useful direct focus to improving the performance of the library to make it competitive with existing alternatives.
Following the major refactoring (34c5ac7) and porting to ES6 the library is particularly well structured now to optimize with little disruption.
Solution
Support streaming for pipeline operators by transforming input sequence lazily. Operations that need to transform the whole sequence (eg. $group and $sort) should also produce iterators on their output.
Support any iterable object (i.e. Object{next:Function}) as input to produce values lazily for processing.
Minimal implementation with small foot print to keep to spirit of taking no dependencies.
Initial Work
A first stab of replacing the current pipeline operators with lazy versions have yielded significant performance gain (~60% speedup) benchmarking with the unit tests.
// command: for i in `seq 3`; do time tape test/**/*.js >/dev/null; done
// Current: master
// tape test/**/*.js > /dev/null 3.61s user 0.13s system 100% cpu 3.733 total
// tape test/**/*.js > /dev/null 3.45s user 0.11s system 102% cpu 3.487 total
// tape test/**/*.js > /dev/null 3.44s user 0.11s system 102% cpu 3.461 total
// Lazy: (ae497acb0fae6ec92546d9339f10d8e7a6dacd1a)
// tape test/**/*.js > /dev/null 2.58s user 0.11s system 103% cpu 2.605 total
// tape test/**/*.js > /dev/null 2.54s user 0.10s system 103% cpu 2.557 total
// tape test/**/*.js > /dev/null 2.55s user 0.10s system 103% cpu 2.562 total
// Lazy: (latest)
// tape test/**/*.js > /dev/null 1.54s user 0.08s system 113% cpu 1.423 total
// tape test/**/*.js > /dev/null 1.52s user 0.08s system 113% cpu 1.411 total
// tape test/**/*.js > /dev/null 1.47s user 0.07s system 111% cpu 1.382 total
The text was updated successfully, but these errors were encountered:
Problem
Mingo
has a great feature set for transforming collections using MongoDB queries, however it is still quite slow for large datasets (#71) when compared to other Javascript utility libraries.It is also currently unsuitable for use cases that require streaming query results as they are computed see #38. The current iterator interface on
Cursor
withnext()
andhasNext()
is only a facade and will process the entire input source on first use.The initial goal for
Mingo
to be most feature complete with regards to queries has mostly been achieved. It currently has most useful queries required for obj transformations with the exception of geospatial queries. In view of that, it useful direct focus to improving the performance of the library to make it competitive with existing alternatives.Following the major refactoring (34c5ac7) and porting to ES6 the library is particularly well structured now to optimize with little disruption.
Solution
$group
and$sort
) should also produce iterators on their output.Object{next:Function}
) as input to produce values lazily for processing.Initial Work
A first stab of replacing the current pipeline operators with lazy versions have yielded significant performance gain (~60% speedup) benchmarking with the unit tests.
The text was updated successfully, but these errors were encountered: