-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Score Aware Pagination #223
Comments
Do you have any idea how we could achieve this without loading the whole database every time? I can't think of something other than limiting results to posts of the last X days or less. But then a post submitted X + 1 days ago that still has a lot of interest won't show up anymore. Perhaps store the score in the database so we can sort on it? And on every vote, update the score, or something like that. But score is also based on time so that will require more thinking. |
Sure, here are some thoughts. I think we will need to persist the score information somewhere. Right now the database is the obvious choice, but a cache or something like Redis with sorted sets could be options as well. Assuming we go with the database, I think we will want to store the score either on the news item or perhaps in its own collection. When a vote is written, we should update the score as well. Since the score is time based, I think we will need a background tasks that updates the scores in the database at some frequency (every minute, for example). To keep this from writing to all news items, I think we will want to define a policy to limit the number of documents that are updated. For example, news items with a score < 1 will no longer be updated. This way, scores will naturally become frozen as an item ages. To choose the right value, I think we will want to do some analysis of the score function and what values it emits as an item ages. We don't want to prevent fresh items from being updated. |
As far as I'm aware, there is no good way of doing this in MongoDB, although there is a planned (but not scheduled) feature supporting it: https://jira.mongodb.org/browse/SERVER-153 Constantly updating the database with new scores as time passes seems like a pretty limited solution. My vote is to pull all the records from the database and perform the sort and slice in the application until it becomes a problem (which I don't expect until we have a lot more users than we currently do), at which point we can either do some caching (in Redis or similar) or determine the age of the set we should be fetching from the database to perform the sort and slice. (i.e. determine X in @marco-fiset's answer with a high level of confidence). |
Even if the Mongo feature @treygriffith mentioned was implemented, running any sort of custom JavaScript in Mongo isn't a long-term scalable solution either. I think every solution to this problem will have some sort of tradeoffs associated with them. What @treygriffith suggested will work, but won't scale long term. However, I think if we limited the query to something like 1000 news items, the query & sorting will probably be fast enough for a long time. |
Totally true that it's not scalable. It's basically a punt on this issue until we have more data about which trade offs are most appropriate given the application's real world usage. That said, doing a 1000 item limit would do effectively for the first page. I'm not sure how that would work for the nth page though, since you can't do a .skip On Tue, Feb 18, 2014 at 6:58 PM, Doug Jones notifications@github.com
|
I think if we are going to load 1000 items at a time, you'd use a query like the one we already have and I've copied in the original description. Then, sort all 1000 items by score, then select page N of the items from the sorted array. So, the most pages we could support would be 1000/M where M is the number of items on a page. The big downside is that every page view has to query for and sort 1000 items. |
Continuing from @dfjones, posts beyond this 1000 limit could be "archived" and viewable from a separate, differently sorted (e.g. only post time) list view. |
@dfjones I see what you're saying now. I don't think that querying for 1000 records and sorting is actually a big problem - I imagine that most of the time spent in the query is overhead in the actual request itself, and doesn't vary much with the size of the result set (at least for 1000 results). The sort itself is very fast (I'd bet less than 2 tenths of a millisecond for 1000 items). |
Something to keep in mind is that for each news item we have to perform a query to join in the item's vote count, comment count, and author info. So we are seemingly in the neighborhood of 3000 queries to build 1 page. (Although, Mongoose might be batching or doing something to reduce query overhead. I don't know it well enough to say one way or the other.) This probably won't be an issue for the immediate future. But when the day comes to optimize, this probably should be the first place we look at :-) |
Also, we should probably close this issue. I don't think I have the rights to do it. @megamattron ? |
@dfjones The queries for vote count and comment count are batched together, so that's only one extra query each. I'm also pretty sure that Mongoose batches I did just notice that a recent PR to get the latest comment time performs a separate query for each news item, so that's 899 unnecessary queries that can be eliminated. But I agree that this issue is resolved. |
I take back that bit about the comment counts - those are separate queries. I'm going to see what I can do about the comment counts as well as last comment time queries - Performing ~2000 queries to build the front page is really not a great idea. |
Looks like we've got a reasonable compromise in there now, so I'll close this. |
The current method of pagination does not take score sorting into account. News items are primarily sorted by their created date when a page is fetched from the DB:
The page is then sorted by score.
Ideally we'd order by score before we paginate.
The text was updated successfully, but these errors were encountered: