Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does findRandom return an array of sequential documents? #8

Closed
joeytwiddle opened this issue Sep 1, 2016 · 4 comments
Closed

Does findRandom return an array of sequential documents? #8

joeytwiddle opened this issue Sep 1, 2016 · 4 comments
Assignees

Comments

@joeytwiddle
Copy link

joeytwiddle commented Sep 1, 2016

Would I be right in thinking that the "find many" function selects the first document randomly but the rest are simply those documents that directly follow it in mongo's sequence?

If so, it might be good to document that clearly. For some applications it would be an undesirable limitation. (For example pulling 5 random playing cards from an unshuffled deck would nearly always return a straight run!)

@larryprice
Copy link
Owner

I think that's how it currently works, and I agree that it's probably non-optimal as far as functionality goes. I'll happily look at any submitted PR.

@joeytwiddle
Copy link
Author

joeytwiddle commented Nov 19, 2016

No solution here I'm afraid. The approach I landed on was to choose one random document n times, using your skip approach, and then handle the risk of the same document being chosen twice. That's n queries!

From the research I did: an option some people used was to include a random value in a field of every document in the DB, and then use that field for ordering. But it seems to me that those values should be updated each time (at least for the retrieved documents), otherwise later queries may see similar repeated blocks of documents.

I wonder if a mongo aggregate query could somehow hash (the objid of each document combined with a fresh random value supplied each time). In theory that could provide a random an unpredictable value we could sort on. It might be pretty heavy on large collections though.

@lchenay
Copy link

lchenay commented Jan 13, 2017

I don't use this plugin due to sequential issue.

I have quick coded this and it works like wanted.
It have 0 chance to take same documents twice (cf _.sample(_.range(count), nb);)

As I do use external dependancy as async and underscore, i can't make PR with this code.

var async = require("async");
var _ = require("underscore");

module.exports = exports = function(schema) {
    return schema.statics.pickRandom = function(query, nb, next) {
        return this.count(query, (err, count) => {
            if (err) {
                return next(err);
            }
            var randomSkip = _.sample(_.range(count), nb);
            return async.map(randomSkip, (skip, next) => {
                return this.findOne(query).skip(skip).exec(next);
            }, function(err, memories) {
                return next(err, memories, count);
            });
        });
    };
};

@larryprice
Copy link
Owner

OK I've started looking into this, and although we get a significant slowdown, I believe that having more random docs will be better. Will update in a few days once I've got the algo down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants