-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cursor pagination over ordered queries #412
Conversation
I'm not really happy with this large query in the pagination logic. This abomination is only for the pagination logic (when sorting the result): AND EXISTS (
SELECT
operation_fields_v1.value
FROM
operation_fields_v1
WHERE
operation_fields_v1.name = 'message'
AND operation_fields_v1.operation_id = document_view_fields.operation_id
AND (
LOWER(operation_fields_v1.value) > (
SELECT
LOWER(operation_fields_v1.value)
FROM
operation_fields_v1
LEFT JOIN document_view_fields ON document_view_fields.operation_id = operation_fields_v1.operation_id
WHERE
document_view_fields.document_view_id = (
SELECT
document_view_fields.document_view_id
FROM
operation_fields_v1
LEFT JOIN document_view_fields ON document_view_fields.operation_id = operation_fields_v1.operation_id
WHERE
operation_fields_v1.cursor = 'fef4b411cd17ce1018d9720dc8d6135c5fd2f65e9e2f74fed5469b28051eab45'
LIMIT
1
) AND operation_fields_v1.name = 'timestamp'
LIMIT
1
) OR (
LOWER(operation_fields_v1.value) = (
SELECT
LOWER(operation_fields_v1.value)
FROM
operation_fields_v1
LEFT JOIN document_view_fields ON document_view_fields.operation_id = operation_fields_v1.operation_id
WHERE
document_view_fields.document_view_id = (
SELECT
document_view_fields.document_view_id
FROM
operation_fields_v1
LEFT JOIN document_view_fields ON document_view_fields.operation_id = operation_fields_v1.operation_id
WHERE
operation_fields_v1.cursor = 'fef4b411cd17ce1018d9720dc8d6135c5fd2f65e9e2f74fed5469b28051eab45'
LIMIT
1
) AND operation_fields_v1.name = 'timestamp'
LIMIT
1
) AND operation_fields_v1.cursor > 'fef4b411cd17ce1018d9720dc8d6135c5fd2f65e9e2f74fed5469b28051eab45'
)
)
) 🤯 These sub-queries return always the same result. This is thanks to the We can simplify this immensively by doing one "pre" SQL query before the "main" SQL query to retrieve the value of the cursors. This is not only better for our brain cells but also for performance, otherwise we're running these expensive sub-queries repetitively for nothing. The bug is fixed, I'll add this "pre" query and then it should be ready for merging. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #412 +/- ##
==========================================
+ Coverage 92.64% 92.70% +0.06%
==========================================
Files 70 70
Lines 6660 6786 +126
==========================================
+ Hits 6170 6291 +121
- Misses 490 495 +5
☔ View full report in Codecov by Sentry. |
I've added more docstrings to explain this whole AND EXISTS (
SELECT
operation_fields_v1.value
FROM
operation_fields_v1
WHERE
operation_fields_v1.name = 'message'
AND operation_fields_v1.operation_id = document_view_fields.operation_id
AND (
LOWER(operation_fields_v1.value) > LOWER($1)
OR (
LOWER(operation_fields_v1.value) = LOWER($1)
AND operation_fields_v1_list.cursor > 'c7679c82e8db45bf294c5a239484b108d98a2d3943f172582a7389e09b2327c4'
)
) |
The PR is ready for review and merging now! I think I'll make a separate PR for moving the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Real tricky area, good work unwrapping it and thanks for presenting the solution in as easy to understand as possible a way 👍
/// Generate SQL for cursor-based pagination. | ||
/// | ||
/// Read more about cursor-based pagination here: | ||
/// https://brunoscheufler.com/blog/2022-01-01-paginating-large-ordered-datasets-with-cursor-based-pagination | ||
/// | ||
/// ## Cursors | ||
/// | ||
/// Our implementation follows the principle mentioned in that article, with a couple of | ||
/// specialities due to our SQL table layout: | ||
/// | ||
/// * We don't have auto incrementing `id` but `cursor` fields | ||
/// * We can have duplicate, multiple document id and view id values since one row represents only | ||
/// one document field. A document can consist of many fields. So pagination over document id's or | ||
/// view id's is non-trivial and needs extra aid from our `cursor` | ||
/// * Cursors _always_ need to point at the _last_ field of each document. This is assured by the | ||
/// `convert_rows` method which returns that cursor to the client via the GraphQL response | ||
/// | ||
/// ## Ordering | ||
/// | ||
/// Pagination is strictly connected to the chosen ordering by the client of the results. We need | ||
/// to take the ordering into account to understand which "next page" to show. | ||
/// | ||
/// ## Pre-Queries | ||
/// | ||
/// This method is async as it does some smaller "pre" SQL queries before the "main" query. This is | ||
/// an optimization over the fact that cursors sometimes point at values which stay the same for | ||
/// each SQL sub-SELECT, so we just do this query once and pass the values over into the "main" | ||
/// query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for sure not an easy topic to wrap ones head around, but this description really helps, at least to me this makes sense and I can understand the issue we faced and the solution.
"Hello, Panda!".into(), | ||
"Oh, howdy, Pengi!".into(), | ||
"How are you?".into(), | ||
"I miss Pengolina. How about you?".into(), | ||
"I am cute and very hungry".into(), | ||
"(°◇°) !!".into(), | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
We paginate with the help of a unique
cursor
, there is one per document field.When the client choses to order the resulting documents by some field (for example "timestamp") it is important that the comparison value for pagination comes from this field.
At the same time we need to always return the last
cursor
for each document to the client, since this is the point where we can safely paginate to the next document.The bug resulted from us using the cursor to determine the field, which was only accidentally sometimes the same as chosen by the ordering.
Also, this fix uncovered a second bug which was that not always we returned the last cursor per document since the SQL sometimes orders them slightly different, indeterministicly.
This PR fixes the problem by:
convert_rows
(before it's been a bit random, as the SQL query returns fields in different order sometimes)Closes #381
📋 Checklist
CHANGELOG.md