Pagination with materialized paths #58

Open
jakepaul opened this Issue Jun 21, 2011 · 2 comments

2 participants

@jakepaul

I've been exploring pagination with materialized paths. I'm adding pagination to a threaded comments system that sometimes receives thousands of comments per post. What I'm looking to do is return the page of comments in the order that would appear in the tree, top to bottom, to create a page.

Ancestry's ordered_by_ancestry scope produces an ORDER BY clause like this:

SELECT id, body, ancestry FROM comments ORDER BY (case when ancestry is null then 0 else 1 end), ancestry LIMIT 10;

With this query, you get all root comments first (in no particular order without another condition). I'm assuming this was the intended outcome for the purposes of the arrange method?

To get it sorted in tree order this is the first thing I tried:

SELECT id, body, ancestry FROM comments ORDER BY (case when ancestry is null then id else ancestry end), ancestry LIMIT 10;

This is better but you still end up with comments sorted by tier, so all second-level comments will come before all third-level comments, regardless of where they would appear in the actual tree. Still closer:

SELECT id, body, ancestry FROM comments ORDER BY (case when ancestry is null then id else CONCAT(ancestry,"/",id) end) LIMIT 10;

Adding a comment's own ID to the end of it's ancestry produces the expected result, but it's still not perfect, because sorting strings of integers behaves such that if you have 1, 2, 3, and 21, it would sort 1, 2, 21, 3. One solution to that is to pad the number with zeroes, but another is to translate the number to base 36 like Drupal and at least one Django add-on does.

I'm planning on implementing a system similar to Drupal/Django. Obviously making the ancestry string use base 36 instead of the actual IDs means some of the methods in ancestry would break and need to be modified to work, though that is fine for this use case.

I'm primarily wondering whether this is something that has been raised before. It seems like ancestry is primarily focused on traversing and assembling complete trees. Is that a correct assessment? I suspect that paginating huge trees is a fairly uncommon problem, but still I'm thinking about doing this work with an eye towards creating a gem, so of course I'd like to know if it's something that you think would actually be appropriate for inclusion in ancestry.

@nazar

One solution to that is to pad the number with zeroes, but another is to translate the number to base 36 like Drupal and at least one Django add-on does.

I think there is one more problem the base 36 encoding of ancestry does not solve and that is the issue of limit boundaries hitting an ancestry. Hopefully an example will illustrate this.

id: 105, Ancestry: Null
id: 117, Ancestry: 105
id: 118, Ancestry: 105/117
id: 119, Ancestry: 105/117/118

A LIMIT 0,3 (for the sake of the example above) would return the first three records, which will render correctly. The subsequent LIMIT 3,3 will return id: 119 which will not render correctly as its parents are not present.

I'm starting to think that pagination can only happen via two queries: the first paginates the roots; the second returns any children based on parents in the first query. The second query won't be pretty in that there will be a lot of (ancestry like '105%') or (ancestry like '109%') or (ancestry like '115%') and so on.

More thoughts are very welcome....

@nazar

I managed to come up with the following solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment