-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cursor pagination improvements #34
Cursor pagination improvements #34
Conversation
This looks good - I'll play around with it. I did some research into cursor pagination. I think the challenge with using it in Piccolo Admin is it's hard to jump to any page. For example, if you're on page 3, and want to jump to page 5, cursor pagination doesn't work, as it just works for sequential pages. I was reading how Shopify do pagination, and it's quite interesting - they use cursors. When you make a request, the cursor values are returns as headers. https://www.shopify.co.uk/partners/blog/relative-pagination Another interesting thing about cursors is they only make sense if all of the filters and ordering remains consistent between requests. I thought about encoding all of the filters and ordering into the cursor. So a request like this:
Would encode the cursor as:
Which is returned as a header. The subsequent request would be:
If you pass in extra parameters, it invalidates the cursor. The backend then decodes the cursor, and gets the parameters from it. What do you think? This is just a brain dump - I'll review your solution in detail later tonight. |
This is where I got to when I was experimenting with cursor based pagination last time: It's a hard problem for sure. I'm not saying mine is correct - I just pushed it so you can take a look at where I got to last time. |
@dantownsend I totally agree with you. LimitOffset pagination is much easier and you can go to the page you want. Cursor pagination is hard to understand. You have so many edge cases and works only on unique colums like id. I need some time to understand that whole concept and your code. |
response.cursor = cursor | ||
continue | ||
|
||
if key == "__previous": | ||
try: | ||
previous = str(value) | ||
except ValueError: | ||
logger.info(f"Unrecognised __previous argument - {value}") | ||
else: | ||
response.previous = previous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get some type errors for response.cursor = cursor
and response.previous = previous
- we just need to add cursor
and previous
as attributes to Params
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dantownsend Like this:
@dataclass
class Params:
operators: t.Dict[str, t.Type[Operator]] = field(
default_factory=lambda: defaultdict(lambda: Equal)
)
match_types: t.Dict[str, str] = field(
default_factory=lambda: defaultdict(lambda: MATCH_TYPES[0])
)
fields: t.Dict[str, t.Any] = field(default_factory=dict)
order_by: t.Optional[OrderBy] = None
include_readable: bool = False
page: int = 1
page_size: t.Optional[int] = None
cursor: str = ""
previous: str = ""
I can add this in next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, exactly
piccolo_api/crud/endpoints.py
Outdated
try: | ||
# query where limit is equal to page_size plus one | ||
query = query.limit(page_size + 1) | ||
rows = await query.run() | ||
# initial cursor | ||
next_cursor = self.encode_cursor(str(rows[-1]["id"])) | ||
except IndexError: | ||
# for preventing IndexError in piccolo_admin filtering | ||
next_cursor = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having some way for the user to request cursor pagination?
If we automatically do it, it adds some overhead to each request, and might not be required.
What do you think about only doing it if a cursor is passed in? To initiate it the first time, an empty cursor could be passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dantownsend Like this:
if cursor:
try:
# query where limit is equal to page_size plus one
query = query.limit(page_size + 1)
rows = await query.run()
# initial cursor
next_cursor = self.encode_cursor(str(rows[-1]["id"]))
except IndexError:
# for preventing IndexError in piccolo_admin filtering
next_cursor = ""
Unfortunately if I do that I'm getting wrong results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to try and debug this - it might be easier when we have unit tests.
We might be able to copy over this one, and modify it:
It's not the end of the world, but it would be nice to isolate the cursor logic, so it's only run when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started writing some tests (based on your link) so we'll see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome - if you need any help let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some tests (latest commit) and they all pass locally. I still don't know how to separate the cursor logic. Maybe you have some ideas about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll try it out later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've played around with it a bit, to try and reduce the amount of cursor logic being run if the cursor isn't required. I've pushed my changes to this pull request.
There are still some more improvements to be made, but it feels a bit cleaner. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes looks pretty good and a lot cleaner :)
piccolo_api/crud/endpoints.py
Outdated
cursor_model = Cursor(next_cursor=next_cursor).json() | ||
json = self.pydantic_model_plural(include_readable=include_readable)( | ||
rows=rows | ||
).json() | ||
return CustomJSONResponse(json) | ||
return CustomJSONResponse(f"{json[:-1]}, {cursor_model[1:]}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about returning the cursor as a header, rather than in the body? It might simplify this bit of code?
I'm not saying it's right - just interested in your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dantownsend Whatever you want. I think it’s simpler in the body but you know better :)
Example of Vue frontend
getPosts() {
const path = "/posts/?__page_size=5&__order=id&__cursor=" + this.cursor + "&__readable=true";
axios
.get(path)
.then(res => {
this.posts = res.data.rows;
this.cursor = res.data.next_cursor; // use cursor from request body like any other variable
})
.catch(error => {
console.error(error);
});
},
If you want to return cursor from header, can you implement that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just be res.headers.Next-Cursor
, assuming the header is called Next-Cursor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean something like this:
# set cursor in header
headers: t.Dict[str, str] = {}
headers["cursor"] = next_cursor
# We need to serialise it ourselves, in case there are datetime
# fields.
json = self.pydantic_model_plural(include_readable=include_readable)(
rows=rows
).json()
return CustomJSONResponse(json, headers=headers)
After that we can access cursor with res.headers['cursor']
(not res.headers.cursor
) but we first need to add expose_headers=["cursor"]
in CORSMiddleware
. If we don't do this the axios doesn't return the cursor from headers. If we use a cursor from the request body we don’t have to do any of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good point - I didn't consider CORS.
OK, so let's keep it in the response body.
We could potentially tidy it up a bit by modifying pydantic_model_plural
, so it contains the cursor attribute:
def pydantic_model_plural(self, include_readable=False):
"""
This is for when we want to serialise many copies of the model.
"""
base_model = create_pydantic_model(
self.table,
include_default_columns=True,
include_readable=include_readable,
model_name=f"{self.table.__name__}Item",
)
return pydantic.create_model(
str(self.table.__name__) + "Plural",
__config__=Config,
rows=(t.List[base_model], None),
cursor=(t.Optional[str], None)
)
That way we wouldn't need the separate Cursor
model, and to combine the JSON strings. Instead it would just be:
return CustomJSONResponse(
self.pydantic_model_plural(include_readable=include_readable)(
rows=rows,
cursor=cursor
).json()
)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I just change one line and everything works great. Thanks.
return CustomJSONResponse(
self.pydantic_model_plural(include_readable=include_readable)(
rows=rows,
cursor=next_cursor // I changed this line
).json()
)
Now we don't need Cursor
model and this weird line return CustomJSONResponse(f"{json[:-1]}, {cursor_model[1:]}")
I read that article - I agree that it's a sensible approach. I left a few comments. We could do with some unit tests too. |
…nto cursor_pagination_improvements
This pull request introduces 1 alert when merging 02f87c5 into ead9b84 - view on LGTM.com new alerts:
|
@dantownsend What do you think about merge this to master? I think this works good and code is pretty clean. |
@sinisaos Yeah, I'll have one more look over it tomorrow, and I'll merge it in. It'll need docs adding at some point. |
…inisaos/piccolo_api into cursor_pagination_improvements
@dantownsend I updated the pagination docs and also added a contribution guide to make it easier for people to get involved in the project. Please can you check these docs as I am not a native English speaker. Thanks. |
Closed due #94 |
Added the ability to move in different directions (forward and backward) from next_cursor.
This is quite tricky with a lot of edge cases. I don't know exactly how to use cursor pagination in piccolo_admin. LimitOffset pagination works good (up to 200,000 rows), and is quite convinient.
We may use cursor pagination above that number (maybe millions of rows, but I’m have never encountered such large data sets), but then we lose the ability to sort by all columns except by id, because the cursor must be unique.
You probably know much better than I how to implement that. Cheers.