Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added .group() function #153

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

adityamukho
Copy link

Ref: http://docs.mongodb.org/manual/reference/method/db.collection.group/

Added a function to the Cursor protoype that allows users to reduce the result set (after find()), allowing operations like aggregation (sum, avg, count, etc) or any other reductor. This function can work in conjunction with limit(), skip() and sort(). The order of the execution pipeline is: find -> group -> sort -> slice (skip+limit).

The function is invoked on the Cursor object returned by find(), with an object that looks like:

key: {
    'field1': 1,
    'nested.field.2': 1
},
reduce: function (curr, result) {
    result.count++;
    result.size += curr.size;
},
initial: {
    count: 0,
    size: 0
},
finalize: function (result) {
    result.avg = Math.round(result.size / result.count);
}

Unlike the group() function in MongoDB, this does not need a cond part, since the find() operation takes care of that beforehand.

The function returns the Cursor on which it was invoked, allowing for chaining sort, skip/limit ops later.

The reduction outputs a resultset that looks like:

[
  {
    field1: 'A',
    nested_field_2: 'b',
    count: 2,
    size: 4,
    avg: 2
  },
  {
    field1: 'A',
    nested_field_2: 'b1',
    count: 3,
    size: 9,
    avg: 3
  },
  {
    field1: 'B',
    nested_field_2: 'b',
    count: 1,
    size: 4,
    avg: 4
  },
  {
    field1: 'B',
    nested_field_2: 'b2',
    count: 5,
    size: 1,
    avg: 0
  },
...
...
]

The field names in the groupBy keys have their . replaced by _ so that the subsequent sort function, if present, works properly.

I have tried my best to ensure the function is optimized, and robust. The test suite passes all tests. Have tested this function on a DB with ~380,000 records and the same reductor as shown above, and it works fine. Haven't added test cases directly to the project yet, though. Will do it if this looks likely to be merged :).

PS. The browser versions were auto-generated using the build script. I haven't modified them manually. The test page shows no errors there either.

@Ivshti
Copy link

Ivshti commented Mar 9, 2014

I haven't tested this much, but first I tried to use it like:
col.group({ key: { firstName: 1 } }).exec(function(err, res) { console.log(res) }) and no result was ever returned

My collection has two objects, both have firstName field. What am I doing wrong? Can this be used without the reduce, finalize, etc?

@adityamukho
Copy link
Author

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

You need to invoke col.find(...).group(...) rather than col.group(). It is a slight departure from the mongo invocation, but I felt this made more sense in the nedb context, keeping things in line with the other resultset post-processors like sort, skip, etc.

Also, this way one doesnt need to support the 'cond' attribute, since find() takes care of filtering already. In this implementation, grouping is only possible on the result of the find() call, and so must be invoked on the cursor (hence you should omit passing a callback to find).

finalize() is optional, but a reduce() is necessary - in a way it is the whole point of the group function :-).

On March 10, 2014 1:57:20 AM GMT+05:30, Ivo Georgiev notifications@github.com wrote:

I haven't tested this much, but first I tried to use it like:
col.group({ key: { firstName: 1 } }).exec(function(err, res) {
console.log(res) }) and no result was ever returned

My collection has two objects, both have firstName field. What am I
doing wrong? Can this be used without the reduce, finalize, etc?


Reply to this email directly or view it on GitHub:
#153 (comment)


Sent from my Android device with K-9 Mail. Please excuse my brevity.
-----BEGIN PGP SIGNATURE-----
Version: APG v1.0.9

iQFgBAEBCABKBQJTHNk/QxxBZGl0eWEgTXVraG9wYWRoeWF5ICgyMDQ4IGJpdCBS
U0EmUlNBKSA8d2VibWFzdGVyQGFkaXR5YW11a2hvLmNvbT4ACgkQXqCS3Ttcc2Ls
Cgf+IqrBmwfHsLk7TeUrw8EwmXY7vjfdyVuO8S20xKP7BMJGdlZCJIwTg610p4SP
EuJXAVDUIfuMLeQ723HUGtrBXg2BF4mY9gmmpQvbGA4sf99Wu22rGQ/xUv0youzE
YYgjBiBW0ce2LO3kEYls0evdCx+6JDqbmJP/GZMxPMskQCXCk0SPizs+BpXsgMi/
nOSNFa8XfyMR85boSAxAZm8+dt8y21h34OaWnRt2ZwCZ7Ml4HAbYXF+9LjDXGgg4
hsP2jl/PU2UwGD1iAhgBwXI0a4haGwo+/WgLUuYJFpH5PyDSXLYGoG8e9INg7BzE
DZLqnI7cxc2hwaqlJ705TanXCw==
=e8Gi
-----END PGP SIGNATURE-----


Powered by BigRock.com

@Ivshti
Copy link

Ivshti commented Mar 9, 2014

Sorry, I did invoke .find() - just forgot to write it. Otherwise it would have thrown an error (.group would be undefined for the collection since it's defined in the cursor).

However, in my case there was no error - the callback was simply not invoked. This behavior is not ideal - in case reduce is necessary, there should be an error thrown if it's not passed.

@Ivshti
Copy link

Ivshti commented Mar 9, 2014

It seems to work when I added both initial: and reduce:

I'm sorry, the error is being called on the callback. My mistake, ignore the initial comment.

@adityamukho
Copy link
Author

Hmmm, it would've been really strange if it hadn't thrown an error for the missing input - the validation code to check for valid keys, reduce and initial is present.

@louischatriot
Copy link
Owner

Thanks for the PR, this seems very interesting. Will try to review and merge in the coming days.

@adityamukho
Copy link
Author

Added a commit that allows the caller to omit specifying the key. This will result in the reducer being applied to the entire find() result, followed by finalize(), if defined.

@Azema
Copy link

Azema commented Apr 2, 2014

Hi,

It's a very good functionality, thank you @adityamukho. I've merge this on my fork (with the new sort function #159) and it's work correctly for the moment.

The thing that stuck me was that the array of values ​​are not accepted, even by manipulating the data in the reduce function. Is this normal?

@adityamukho
Copy link
Author

Thanks for the feedback @Azema
I'm guessing you're talking about the case where the path indicated by the key points to an array rather than a scalar. Yes, in that case it wouldn't work as expected, since the grouping algorithm doesn't perform a 'deep compare'.

In any case, can you share the dataset and operators you used in your query? It may be useful to take a closer look.

@Azema
Copy link

Azema commented Apr 4, 2014

Hi @adityamukho,

Here is the information you asked.

I tried to delete the condition that checks that the value indicated by the key points is an array and in my case it worked, because the array was concatenated. But I think it would not work in all cases.

A example of data used:

{ "title": "A", "genre": ["a","b"] }
{ "title": "B", "genre": ["c"] }
{ "title": "C", "genre": ["d","e"] }
{ "title": "D", "genre": [] }

My query:

  var sort = {genre: 1};
  db.find({})
    .group({
      key: {'genre': 1},
      reduce: function(curr, result) {
        if (curr.genre instanceof Array && curr.genre.length > 0) {
          result.genre = curr.genre[0];
        }
        return result;
      },
      initial: {}
    }).sort({genre: 1}).exec(function(err, genres) {
      res.json({results: genres});
    });

The results of the query:

[
  { "genre": "a" }
]

Here are the expected results:

[
  { "genre": "a" },
  { "genre": "c" },
  { "genre": "d" }
]

@adityamukho
Copy link
Author

I'm looking into enabling array values in key fields, but in the meantime I think the expected result should be..

[
  { "genre": "a" },
  { "genre": "c" },
  { "genre": "d" },
  { "genre": [] }
]

..since the actual keys: genre: ["a","b"], genre: ["c"], genre: ["d","e"], genre: [] are all being overwritten by the reducer, except the last one. Not sure where the sort ought to put the last element, but it should be there somewhere.

@adityamukho
Copy link
Author

Hi @Azema
I've added support for non-primitive keys in a separate branch - 7030704

I'm converting non-primitive keys to a fixed length string hash during the intermediate collection stage. This is less than ideal since the number of possible non-primitive keys is greater than the number of possible fixed length hashes. However, in situations where such a hash overlap does not occur, it does cover the use case you have given.

I would refrain from merging this into the master branch until I or someone else can come up with a better way to uniquely and efficiently identify (possibly large) non-primitive objects (using a variable length hash perhaps).

@Azema
Copy link

Azema commented Apr 11, 2014

hi @adityamukho,

Thank you for your efforts and I will test your changes. I will come back to you if I find a problem.

I think your idea is good hash, but it may increase the processing time. However, I do not have better things to offer for now.

@adityamukho
Copy link
Author

Have updated the code to use a variable length hash. It uses the djb2 algorithm, which may not be quite as fast as the previous xxhash, but ensures uniqueness, and is still much faster than any crypto hash or 2-way hash. The included implementation, provided by the es-hash module has the added advantage of being browser-compatible.

The default behaviour would be to hash all keys during the collection stage, ensuring correct output in all cases, at the cost of a slight performance penalty. The hashing can be disabled by passing noHashKeys: true in the group object, thereby getting a performance boost. This should be done ONLY if one is sure that all groupBy keys in the result set are either primitives or small arrays (of primitives). (For large arrays, it may be faster to use hashed representations than their stringified versions.)

https://github.com/adityamukho/nedb/tree/feature-array-keys

@adityamukho
Copy link
Author

Another small validation - I just finished writing an NeDB ORM adapter for the Sails framework (https://github.com/adityamukho/sails-nedb), based on this fork. The Waterline (Sails' persistence layer) tests are fairly rigorous and extensive, and so far all tests have passed, including the aggregation tests.

@Pheo-Player
Copy link

This would be a great feature to have. Is it still planned to merge this?

@adityamukho
Copy link
Author

Haven't looked into this PR in a while. Looks like some files have changed in the master repo while I was gone, which would now require a manual conflict resolution. Also, I have to push myself to write some test cases. But other than these two points, I think this PR is still ripe for a merge. I have used this function extensively in several applications and have not seen any errors or incorrect behavior so far.

@iclems
Copy link

iclems commented Oct 21, 2014

@adityamukho congrats, it's really useful -- I hope it'll be merged some day!

@FaynePerera
Copy link

Hi, @adityamukho!
Just a simple request, it would be nice to have this function fully documented. I know this has a lot to do with the Mongo group function, but there are some differences worth mentioning, I'm sure. Turns out to be very useful, at least the count option, wich is the only one documented enough in this PR. But as for the sum option, it's no so clear. Could you please give some example with the sum?
Neverthless, congrats, I'm sure this could be a great feature.

@Glavin001
Copy link

Any progress on this? It is rather worrying that a pull request like this does not get merged and merge conflicts start to pile up. Makes it tough for contributors to consider contributing. Looks like other users are forking nedb because this original repo does not contain the functionality they want.

@luislobo
Copy link

+1 This should be merged...

@luislobo
Copy link

luislobo commented Jan 6, 2016

Any news on merging this?

@JamesMGreene
Copy link
Contributor

👍

@tcurdt
Copy link

tcurdt commented Feb 21, 2016

Urgh - open for 9 months now :-/
Any chance for a merge?

@louischatriot
Copy link
Owner

Unfortunately not, no time to do it in the coming months I expect ...

@coderofsalvation
Copy link

@louischatriot will you consider adding a contributor to this project? (PR's which take > 9 months seems quite extreme).

@ghost
Copy link

ghost commented Oct 24, 2016

Should this project be considered dead? (NedB) ?

@louischatriot
Copy link
Owner

No, it still works well. Features are not being added regularly, but that
is pretty different from being a dead project.

2016-10-24 16:01 GMT+02:00 Freddy notifications@github.com:

Should this project be considered dead? (NedB) ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#153 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4ik9lCSjCm9RAe-jGUnTklaRhvGeM7ks5q3Lo6gaJpZM4Bns7W
.

pi0 added a commit to nedbhq/nedb-core that referenced this pull request Nov 30, 2016
This commit is a cleaned version of original PR.
@pi0
Copy link

pi0 commented Nov 30, 2016

Just made a clean commit (631bad4) on nedb3
@adityamukho @coderofsalvation @fjeddy @Glavin001 @luislobo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet