Added .group() function #153

adityamukho · 2014-03-07T13:16:12Z

Ref: http://docs.mongodb.org/manual/reference/method/db.collection.group/

Added a function to the Cursor protoype that allows users to reduce the result set (after find()), allowing operations like aggregation (sum, avg, count, etc) or any other reductor. This function can work in conjunction with limit(), skip() and sort(). The order of the execution pipeline is: find -> group -> sort -> slice (skip+limit).

The function is invoked on the Cursor object returned by find(), with an object that looks like:

key: {
    'field1': 1,
    'nested.field.2': 1
},
reduce: function (curr, result) {
    result.count++;
    result.size += curr.size;
},
initial: {
    count: 0,
    size: 0
},
finalize: function (result) {
    result.avg = Math.round(result.size / result.count);
}

Unlike the group() function in MongoDB, this does not need a cond part, since the find() operation takes care of that beforehand.

The function returns the Cursor on which it was invoked, allowing for chaining sort, skip/limit ops later.

The reduction outputs a resultset that looks like:

[
  {
    field1: 'A',
    nested_field_2: 'b',
    count: 2,
    size: 4,
    avg: 2
  },
  {
    field1: 'A',
    nested_field_2: 'b1',
    count: 3,
    size: 9,
    avg: 3
  },
  {
    field1: 'B',
    nested_field_2: 'b',
    count: 1,
    size: 4,
    avg: 4
  },
  {
    field1: 'B',
    nested_field_2: 'b2',
    count: 5,
    size: 1,
    avg: 0
  },
...
...
]

The field names in the groupBy keys have their . replaced by _ so that the subsequent sort function, if present, works properly.

I have tried my best to ensure the function is optimized, and robust. The test suite passes all tests. Have tested this function on a DB with ~380,000 records and the same reductor as shown above, and it works fine. Haven't added test cases directly to the project yet, though. Will do it if this looks likely to be merged :).

PS. The browser versions were auto-generated using the build script. I haven't modified them manually. The test page shows no errors there either.

Ivshti · 2014-03-09T20:27:17Z

I haven't tested this much, but first I tried to use it like:
col.group({ key: { firstName: 1 } }).exec(function(err, res) { console.log(res) }) and no result was ever returned

My collection has two objects, both have firstName field. What am I doing wrong? Can this be used without the reduce, finalize, etc?

adityamukho · 2014-03-09T21:12:43Z

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

You need to invoke col.find(...).group(...) rather than col.group(). It is a slight departure from the mongo invocation, but I felt this made more sense in the nedb context, keeping things in line with the other resultset post-processors like sort, skip, etc.

Also, this way one doesnt need to support the 'cond' attribute, since find() takes care of filtering already. In this implementation, grouping is only possible on the result of the find() call, and so must be invoked on the cursor (hence you should omit passing a callback to find).

finalize() is optional, but a reduce() is necessary - in a way it is the whole point of the group function :-).

On March 10, 2014 1:57:20 AM GMT+05:30, Ivo Georgiev notifications@github.com wrote:

I haven't tested this much, but first I tried to use it like:
col.group({ key: { firstName: 1 } }).exec(function(err, res) {
console.log(res) }) and no result was ever returned

My collection has two objects, both have firstName field. What am I
doing wrong? Can this be used without the reduce, finalize, etc?

Reply to this email directly or view it on GitHub:
#153 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.
-----BEGIN PGP SIGNATURE-----
Version: APG v1.0.9

iQFgBAEBCABKBQJTHNk/QxxBZGl0eWEgTXVraG9wYWRoeWF5ICgyMDQ4IGJpdCBS
U0EmUlNBKSA8d2VibWFzdGVyQGFkaXR5YW11a2hvLmNvbT4ACgkQXqCS3Ttcc2Ls
Cgf+IqrBmwfHsLk7TeUrw8EwmXY7vjfdyVuO8S20xKP7BMJGdlZCJIwTg610p4SP
EuJXAVDUIfuMLeQ723HUGtrBXg2BF4mY9gmmpQvbGA4sf99Wu22rGQ/xUv0youzE
YYgjBiBW0ce2LO3kEYls0evdCx+6JDqbmJP/GZMxPMskQCXCk0SPizs+BpXsgMi/
nOSNFa8XfyMR85boSAxAZm8+dt8y21h34OaWnRt2ZwCZ7Ml4HAbYXF+9LjDXGgg4
hsP2jl/PU2UwGD1iAhgBwXI0a4haGwo+/WgLUuYJFpH5PyDSXLYGoG8e9INg7BzE
DZLqnI7cxc2hwaqlJ705TanXCw==
=e8Gi
-----END PGP SIGNATURE-----

Powered by BigRock.com

Ivshti · 2014-03-09T22:54:31Z

Sorry, I did invoke .find() - just forgot to write it. Otherwise it would have thrown an error (.group would be undefined for the collection since it's defined in the cursor).

However, in my case there was no error - the callback was simply not invoked. This behavior is not ideal - in case reduce is necessary, there should be an error thrown if it's not passed.

Ivshti · 2014-03-09T23:02:35Z

It seems to work when I added both initial: and reduce:

I'm sorry, the error is being called on the callback. My mistake, ignore the initial comment.

adityamukho · 2014-03-10T02:06:42Z

Hmmm, it would've been really strange if it hadn't thrown an error for the missing input - the validation code to check for valid keys, reduce and initial is present.

louischatriot · 2014-03-12T08:48:10Z

Thanks for the PR, this seems very interesting. Will try to review and merge in the coming days.

adityamukho · 2014-03-13T06:41:52Z

Added a commit that allows the caller to omit specifying the key. This will result in the reducer being applied to the entire find() result, followed by finalize(), if defined.

Azema · 2014-04-02T13:15:58Z

Hi,

It's a very good functionality, thank you @adityamukho. I've merge this on my fork (with the new sort function #159) and it's work correctly for the moment.

The thing that stuck me was that the array of values are not accepted, even by manipulating the data in the reduce function. Is this normal?

adityamukho · 2014-04-04T07:37:45Z

Thanks for the feedback @Azema
I'm guessing you're talking about the case where the path indicated by the key points to an array rather than a scalar. Yes, in that case it wouldn't work as expected, since the grouping algorithm doesn't perform a 'deep compare'.

In any case, can you share the dataset and operators you used in your query? It may be useful to take a closer look.

Azema · 2014-04-04T08:49:29Z

Hi @adityamukho,

Here is the information you asked.

I tried to delete the condition that checks that the value indicated by the key points is an array and in my case it worked, because the array was concatenated. But I think it would not work in all cases.

A example of data used:

{ "title": "A", "genre": ["a","b"] }
{ "title": "B", "genre": ["c"] }
{ "title": "C", "genre": ["d","e"] }
{ "title": "D", "genre": [] }

My query:

  var sort = {genre: 1};
  db.find({})
    .group({
      key: {'genre': 1},
      reduce: function(curr, result) {
        if (curr.genre instanceof Array && curr.genre.length > 0) {
          result.genre = curr.genre[0];
        }
        return result;
      },
      initial: {}
    }).sort({genre: 1}).exec(function(err, genres) {
      res.json({results: genres});
    });

The results of the query:

[
  { "genre": "a" }
]

Here are the expected results:

[
  { "genre": "a" },
  { "genre": "c" },
  { "genre": "d" }
]

adityamukho · 2014-04-06T03:14:06Z

I'm looking into enabling array values in key fields, but in the meantime I think the expected result should be..

[
  { "genre": "a" },
  { "genre": "c" },
  { "genre": "d" },
  { "genre": [] }
]

..since the actual keys: genre: ["a","b"], genre: ["c"], genre: ["d","e"], genre: [] are all being overwritten by the reducer, except the last one. Not sure where the sort ought to put the last element, but it should be there somewhere.

adityamukho · 2014-04-11T04:43:21Z

Hi @Azema
I've added support for non-primitive keys in a separate branch - 7030704

I'm converting non-primitive keys to a fixed length string hash during the intermediate collection stage. This is less than ideal since the number of possible non-primitive keys is greater than the number of possible fixed length hashes. However, in situations where such a hash overlap does not occur, it does cover the use case you have given.

I would refrain from merging this into the master branch until I or someone else can come up with a better way to uniquely and efficiently identify (possibly large) non-primitive objects (using a variable length hash perhaps).

Azema · 2014-04-11T06:47:21Z

hi @adityamukho,

Thank you for your efforts and I will test your changes. I will come back to you if I find a problem.

I think your idea is good hash, but it may increase the processing time. However, I do not have better things to offer for now.

adityamukho · 2014-04-11T09:24:55Z

Have updated the code to use a variable length hash. It uses the djb2 algorithm, which may not be quite as fast as the previous xxhash, but ensures uniqueness, and is still much faster than any crypto hash or 2-way hash. The included implementation, provided by the es-hash module has the added advantage of being browser-compatible.

The default behaviour would be to hash all keys during the collection stage, ensuring correct output in all cases, at the cost of a slight performance penalty. The hashing can be disabled by passing noHashKeys: true in the group object, thereby getting a performance boost. This should be done ONLY if one is sure that all groupBy keys in the result set are either primitives or small arrays (of primitives). (For large arrays, it may be faster to use hashed representations than their stringified versions.)

https://github.com/adityamukho/nedb/tree/feature-array-keys

adityamukho · 2014-04-14T16:12:41Z

Another small validation - I just finished writing an NeDB ORM adapter for the Sails framework (https://github.com/adityamukho/sails-nedb), based on this fork. The Waterline (Sails' persistence layer) tests are fairly rigorous and extensive, and so far all tests have passed, including the aggregation tests.

Pheo-Player · 2014-08-15T11:33:02Z

This would be a great feature to have. Is it still planned to merge this?

adityamukho · 2014-08-16T10:36:06Z

Haven't looked into this PR in a while. Looks like some files have changed in the master repo while I was gone, which would now require a manual conflict resolution. Also, I have to push myself to write some test cases. But other than these two points, I think this PR is still ripe for a merge. I have used this function extensively in several applications and have not seen any errors or incorrect behavior so far.

iclems · 2014-10-21T08:28:58Z

@adityamukho congrats, it's really useful -- I hope it'll be merged some day!

FaynePerera · 2014-12-04T07:46:46Z

Hi, @adityamukho!
Just a simple request, it would be nice to have this function fully documented. I know this has a lot to do with the Mongo group function, but there are some differences worth mentioning, I'm sure. Turns out to be very useful, at least the count option, wich is the only one documented enough in this PR. But as for the sum option, it's no so clear. Could you please give some example with the sum?
Neverthless, congrats, I'm sure this could be a great feature.

Glavin001 · 2015-05-28T16:59:02Z

Any progress on this? It is rather worrying that a pull request like this does not get merged and merge conflicts start to pile up. Makes it tough for contributors to consider contributing. Looks like other users are forking nedb because this original repo does not contain the functionality they want.

luislobo · 2015-11-11T21:12:49Z

+1 This should be merged...

luislobo · 2016-01-06T18:36:31Z

Any news on merging this?

JamesMGreene · 2016-01-08T17:17:36Z

👍

tcurdt · 2016-02-21T12:48:46Z

Urgh - open for 9 months now :-/
Any chance for a merge?

louischatriot · 2016-02-21T16:01:50Z

Unfortunately not, no time to do it in the coming months I expect ...

coderofsalvation · 2016-04-17T20:55:04Z

@louischatriot will you consider adding a contributor to this project? (PR's which take > 9 months seems quite extreme).

ghost · 2016-10-24T14:01:27Z

Should this project be considered dead? (NedB) ?

louischatriot · 2016-10-25T18:22:32Z

No, it still works well. Features are not being added regularly, but that
is pretty different from being a dead project.

2016-10-24 16:01 GMT+02:00 Freddy notifications@github.com:

Should this project be considered dead? (NedB) ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#153 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4ik9lCSjCm9RAe-jGUnTklaRhvGeM7ks5q3Lo6gaJpZM4Bns7W
.

This commit is a cleaned version of original PR.

pi0 · 2016-11-30T09:30:40Z

Just made a clean commit (631bad4) on nedb3
@adityamukho @coderofsalvation @fjeddy @Glavin001 @luislobo

adityamukho added 8 commits March 6, 2014 12:34

Added grouping code.

0328a04

Minor optimizations.

66413b3

Removed traverse dependency, implemented efficient graph walker.

297023e

Cleanup cursor code.

ded8124

Tested and functional.

fa10868

Restore original formatting (mostly), fix skip/lim cond.

567a664

Restore original timeout.

1322c0c

Restore a few blank lines.

283b322

Validate finalize().

b90df3e

Allow for no specified key (aggregate over entire resultset).

596e9cf

Allow non-primitive keys.

7030704

adityamukho added 2 commits April 11, 2014 11:18

Switched to pure JS xxhash impl for browser compatibility.

d3f6867

Revert to previous 'key not found' behaviour.

7433b0c

Using variable length hash for keys.

b74f652

adityamukho added 2 commits April 11, 2014 14:56

Removed debug statement.

ef8d28e

Don't hash empty values.

75dcbbc

adityamukho added 6 commits April 11, 2014 19:28

noHashKeys no longer breaks object keys.

c7bf5a9

Inlined _getHash().

988fd2c

Removed es-hash dep.

695da26

Removed es-hash dep.

42ad559

Manually merged updates from source.

ebf7b37

Fixed missing brace.

4e26c11

adityamukho added 4 commits April 22, 2014 11:51

Added travis build config.

3d294da

Recursively sort object keys before hashing.

ab645b2

Use isPrimitive() for object type test.

cbb8cc0

Don't hash primitive types.

767766f

adityamukho added 2 commits November 1, 2014 11:41

Resolved conflicts with external changes.

d854908

Recompiled + tested browser versions.

994485c

itayw mentioned this pull request Feb 18, 2015

Add .sort() and .group() func joola/nedb#2

Merged

pi0 added a commit to nedbhq/nedb-core that referenced this pull request Nov 30, 2016

Added .group() function louischatriot#153 by @adityamukho

631bad4

This commit is a cleaned version of original PR.

JamesMGreene mentioned this pull request Aug 15, 2017

Consider adding .group(...) function a la MongoDB JamesMGreene/nestdb#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added .group() function #153

Added .group() function #153

adityamukho commented Mar 7, 2014

Ivshti commented Mar 9, 2014

adityamukho commented Mar 9, 2014

Ivshti commented Mar 9, 2014

Ivshti commented Mar 9, 2014

adityamukho commented Mar 10, 2014

louischatriot commented Mar 12, 2014

adityamukho commented Mar 13, 2014

Azema commented Apr 2, 2014

adityamukho commented Apr 4, 2014

Azema commented Apr 4, 2014

adityamukho commented Apr 6, 2014

adityamukho commented Apr 11, 2014

Azema commented Apr 11, 2014

adityamukho commented Apr 11, 2014

adityamukho commented Apr 14, 2014

Pheo-Player commented Aug 15, 2014

adityamukho commented Aug 16, 2014

iclems commented Oct 21, 2014

FaynePerera commented Dec 4, 2014

Glavin001 commented May 28, 2015

luislobo commented Nov 11, 2015

luislobo commented Jan 6, 2016

JamesMGreene commented Jan 8, 2016

tcurdt commented Feb 21, 2016

louischatriot commented Feb 21, 2016

coderofsalvation commented Apr 17, 2016

ghost commented Oct 24, 2016

louischatriot commented Oct 25, 2016

pi0 commented Nov 30, 2016 •

edited

Loading

Added .group() function #153

Are you sure you want to change the base?

Added .group() function #153

Conversation

adityamukho commented Mar 7, 2014

Ivshti commented Mar 9, 2014

adityamukho commented Mar 9, 2014

Ivshti commented Mar 9, 2014

Ivshti commented Mar 9, 2014

adityamukho commented Mar 10, 2014

louischatriot commented Mar 12, 2014

adityamukho commented Mar 13, 2014

Azema commented Apr 2, 2014

adityamukho commented Apr 4, 2014

Azema commented Apr 4, 2014

adityamukho commented Apr 6, 2014

adityamukho commented Apr 11, 2014

Azema commented Apr 11, 2014

adityamukho commented Apr 11, 2014

adityamukho commented Apr 14, 2014

Pheo-Player commented Aug 15, 2014

adityamukho commented Aug 16, 2014

iclems commented Oct 21, 2014

FaynePerera commented Dec 4, 2014

Glavin001 commented May 28, 2015

luislobo commented Nov 11, 2015

luislobo commented Jan 6, 2016

JamesMGreene commented Jan 8, 2016

tcurdt commented Feb 21, 2016

louischatriot commented Feb 21, 2016

coderofsalvation commented Apr 17, 2016

ghost commented Oct 24, 2016

louischatriot commented Oct 25, 2016

pi0 commented Nov 30, 2016 • edited Loading

pi0 commented Nov 30, 2016 •

edited

Loading