Score Aware Pagination #223

dfjones · 2014-02-16T22:50:45Z

The current method of pagination does not take score sorting into account. News items are primarily sorted by their created date when a page is fetched from the DB:

function getNewsItems(query, user, callback, page) {
  page = typeof page !== 'undefined' ? page: 1;
  NewsItem
  .find(query)
  .sort('-created')
  .skip((page-1)*newsItemsPerPage)
  .limit(newsItemsPerPage)
  .populate('poster')
  .exec(function (err, newsItems) {

    if(err) return callback(err);

    addVotesAndCommentCountToNewsItems(newsItems, user, callback);
  });
}

The page is then sorted by score.

Ideally we'd order by score before we paginate.

marcofiset · 2014-02-17T18:08:46Z

Do you have any idea how we could achieve this without loading the whole database every time?

I can't think of something other than limiting results to posts of the last X days or less. But then a post submitted X + 1 days ago that still has a lot of interest won't show up anymore.

Perhaps store the score in the database so we can sort on it? And on every vote, update the score, or something like that. But score is also based on time so that will require more thinking.

dfjones · 2014-02-17T18:53:57Z

Sure, here are some thoughts.

I think we will need to persist the score information somewhere. Right now the database is the obvious choice, but a cache or something like Redis with sorted sets could be options as well.

Assuming we go with the database, I think we will want to store the score either on the news item or perhaps in its own collection. When a vote is written, we should update the score as well. Since the score is time based, I think we will need a background tasks that updates the scores in the database at some frequency (every minute, for example). To keep this from writing to all news items, I think we will want to define a policy to limit the number of documents that are updated. For example, news items with a score < 1 will no longer be updated. This way, scores will naturally become frozen as an item ages. To choose the right value, I think we will want to do some analysis of the score function and what values it emits as an item ages. We don't want to prevent fresh items from being updated.

treygriffith · 2014-02-18T23:36:51Z

As far as I'm aware, there is no good way of doing this in MongoDB, although there is a planned (but not scheduled) feature supporting it: https://jira.mongodb.org/browse/SERVER-153

Constantly updating the database with new scores as time passes seems like a pretty limited solution.

My vote is to pull all the records from the database and perform the sort and slice in the application until it becomes a problem (which I don't expect until we have a lot more users than we currently do), at which point we can either do some caching (in Redis or similar) or determine the age of the set we should be fetching from the database to perform the sort and slice. (i.e. determine X in @marco-fiset's answer with a high level of confidence).

dfjones · 2014-02-18T23:58:18Z

Even if the Mongo feature @treygriffith mentioned was implemented, running any sort of custom JavaScript in Mongo isn't a long-term scalable solution either.

I think every solution to this problem will have some sort of tradeoffs associated with them. What @treygriffith suggested will work, but won't scale long term. However, I think if we limited the query to something like 1000 news items, the query & sorting will probably be fast enough for a long time.

treygriffith · 2014-02-19T01:07:41Z

Totally true that it's not scalable. It's basically a punt on this issue until we have more data about which trade offs are most appropriate given the application's real world usage.

That said, doing a 1000 item limit would do effectively for the first page. I'm not sure how that would work for the nth page though, since you can't do a .skip
—
Sent from Mailbox for iPhone

On Tue, Feb 18, 2014 at 6:58 PM, Doug Jones notifications@github.com
wrote:

Even if the Mongo feature @treygriffith mentioned was implemented, running any sort of custom JavaScript in Mongo isn't a long-term scalable solution either.

I think every solution to this problem will have some sort of tradeoffs associated with them. What @treygriffith suggested will work, but won't scale long term. However, I think if we limited the query to something like 1000 news items, the query & sorting will probably be fast enough for a long time.

Reply to this email directly or view it on GitHub:
#223 (comment)

dfjones · 2014-02-19T02:46:18Z

I think if we are going to load 1000 items at a time, you'd use a query like the one we already have and I've copied in the original description. Then, sort all 1000 items by score, then select page N of the items from the sorted array. So, the most pages we could support would be 1000/M where M is the number of items on a page.

The big downside is that every page view has to query for and sort 1000 items.

chall8908 · 2014-02-19T05:11:32Z

Continuing from @dfjones, posts beyond this 1000 limit could be "archived" and viewable from a separate, differently sorted (e.g. only post time) list view.

treygriffith · 2014-02-19T06:48:50Z

@dfjones I see what you're saying now.

I don't think that querying for 1000 records and sorting is actually a big problem - I imagine that most of the time spent in the query is overhead in the actual request itself, and doesn't vary much with the size of the result set (at least for 1000 results). The sort itself is very fast (I'd bet less than 2 tenths of a millisecond for 1000 items).

dfjones · 2014-02-20T05:31:35Z

Something to keep in mind is that for each news item we have to perform a query to join in the item's vote count, comment count, and author info. So we are seemingly in the neighborhood of 3000 queries to build 1 page. (Although, Mongoose might be batching or doing something to reduce query overhead. I don't know it well enough to say one way or the other.)

This probably won't be an issue for the immediate future. But when the day comes to optimize, this probably should be the first place we look at :-)

dfjones · 2014-02-20T05:39:37Z

Also, we should probably close this issue. I don't think I have the rights to do it. @megamattron ?

treygriffith · 2014-02-20T16:26:56Z

@dfjones The queries for vote count and comment count are batched together, so that's only one extra query each. I'm also pretty sure that Mongoose batches populate queries, so that's one more query for author info. For those queries, we should be doing pretty well (4 queries total, regardless of the number of news items).

I did just notice that a recent PR to get the latest comment time performs a separate query for each news item, so that's 899 unnecessary queries that can be eliminated.

But I agree that this issue is resolved.

treygriffith · 2014-02-20T16:48:12Z

I take back that bit about the comment counts - those are separate queries. I'm going to see what I can do about the comment counts as well as last comment time queries - Performing ~2000 queries to build the front page is really not a great idea.

megamattron · 2014-02-20T22:27:24Z

Looks like we've got a reasonable compromise in there now, so I'll close this.

treygriffith mentioned this issue Feb 19, 2014

Score-based Pagination #236

Merged

megamattron closed this as completed Feb 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Score Aware Pagination #223

Score Aware Pagination #223

dfjones commented Feb 16, 2014

marcofiset commented Feb 17, 2014

dfjones commented Feb 17, 2014

treygriffith commented Feb 18, 2014

dfjones commented Feb 18, 2014

treygriffith commented Feb 19, 2014

dfjones commented Feb 19, 2014

chall8908 commented Feb 19, 2014

treygriffith commented Feb 19, 2014

dfjones commented Feb 20, 2014

dfjones commented Feb 20, 2014

treygriffith commented Feb 20, 2014

treygriffith commented Feb 20, 2014

megamattron commented Feb 20, 2014

Score Aware Pagination #223

Score Aware Pagination #223

Comments

dfjones commented Feb 16, 2014

marcofiset commented Feb 17, 2014

dfjones commented Feb 17, 2014

treygriffith commented Feb 18, 2014

dfjones commented Feb 18, 2014

treygriffith commented Feb 19, 2014

dfjones commented Feb 19, 2014

chall8908 commented Feb 19, 2014

treygriffith commented Feb 19, 2014

dfjones commented Feb 20, 2014

dfjones commented Feb 20, 2014

treygriffith commented Feb 20, 2014

treygriffith commented Feb 20, 2014

megamattron commented Feb 20, 2014