Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lazy evaluation and iterable input sources for performance #76

Closed
kofrasa opened this issue Dec 15, 2017 · 1 comment
Closed

Comments

@kofrasa
Copy link
Owner

kofrasa commented Dec 15, 2017

Problem

Mingo has a great feature set for transforming collections using MongoDB queries, however it is still quite slow for large datasets (#71) when compared to other Javascript utility libraries.

It is also currently unsuitable for use cases that require streaming query results as they are computed see #38. The current iterator interface on Cursor with next() and hasNext()is only a facade and will process the entire input source on first use.

The initial goal for Mingo to be most feature complete with regards to queries has mostly been achieved. It currently has most useful queries required for obj transformations with the exception of geospatial queries. In view of that, it useful direct focus to improving the performance of the library to make it competitive with existing alternatives.

Following the major refactoring (34c5ac7) and porting to ES6 the library is particularly well structured now to optimize with little disruption.

Solution

  • Support streaming for pipeline operators by transforming input sequence lazily. Operations that need to transform the whole sequence (eg. $group and $sort) should also produce iterators on their output.
  • Support any iterable object (i.e. Object{next:Function}) as input to produce values lazily for processing.
  • Minimal implementation with small foot print to keep to spirit of taking no dependencies.

Initial Work

A first stab of replacing the current pipeline operators with lazy versions have yielded significant performance gain (~60% speedup) benchmarking with the unit tests.

// command: for i in `seq 3`; do time tape test/**/*.js >/dev/null; done

// Current: master
// tape test/**/*.js > /dev/null  3.61s user 0.13s system 100% cpu 3.733 total
// tape test/**/*.js > /dev/null  3.45s user 0.11s system 102% cpu 3.487 total
// tape test/**/*.js > /dev/null  3.44s user 0.11s system 102% cpu 3.461 total

// Lazy: (ae497acb0fae6ec92546d9339f10d8e7a6dacd1a)
// tape test/**/*.js > /dev/null  2.58s user 0.11s system 103% cpu 2.605 total
// tape test/**/*.js > /dev/null  2.54s user 0.10s system 103% cpu 2.557 total
// tape test/**/*.js > /dev/null  2.55s user 0.10s system 103% cpu 2.562 total

// Lazy: (latest)
// tape test/**/*.js > /dev/null  1.54s user 0.08s system 113% cpu 1.423 total
// tape test/**/*.js > /dev/null  1.52s user 0.08s system 113% cpu 1.411 total
// tape test/**/*.js > /dev/null  1.47s user 0.07s system 111% cpu 1.382 total
@kofrasa
Copy link
Owner Author

kofrasa commented Jan 11, 2018

v2.1.1 adds support for Lazy evaluation under the hood.

More performance improvements will follow in v2.2.0

@kofrasa kofrasa closed this as completed Jan 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant