Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor pagination improvements #34

Closed

Conversation

sinisaos
Copy link
Member

Added the ability to move in different directions (forward and backward) from next_cursor.

Example:
http://localhost:8000/posts/?__page_size=3&__order=id&__cursor=NA== (ASC forward)
http://localhost:8000/posts/?__page_size=3&__order=id&__cursor=NA==&__previous=yes (ASC backward)
http://localhost:8000/posts/?__page_size=3&__order=-id&__cursor=OQ== (DESC forward)
http://localhost:8000/posts/?__page_size=3&__order=-id&__cursor=OQ==&__previous=yes (DESC backward)

This is quite tricky with a lot of edge cases. I don't know exactly how to use cursor pagination in piccolo_admin. LimitOffset pagination works good (up to 200,000 rows), and is quite convinient.
We may use cursor pagination above that number (maybe millions of rows, but I’m have never encountered such large data sets), but then we lose the ability to sort by all columns except by id, because the cursor must be unique.
You probably know much better than I how to implement that. Cheers.

@dantownsend
Copy link
Member

This looks good - I'll play around with it.

I did some research into cursor pagination. I think the challenge with using it in Piccolo Admin is it's hard to jump to any page. For example, if you're on page 3, and want to jump to page 5, cursor pagination doesn't work, as it just works for sequential pages.

I was reading how Shopify do pagination, and it's quite interesting - they use cursors. When you make a request, the cursor values are returns as headers.

https://www.shopify.co.uk/partners/blog/relative-pagination

Another interesting thing about cursors is they only make sense if all of the filters and ordering remains consistent between requests.

I thought about encoding all of the filters and ordering into the cursor.

So a request like this:

/api/tables/directors?ordering=name

Would encode the cursor as:

base64.encode(json.dumps({'ordering': 'name', 'id': 10})
>>> abc123

Which is returned as a header.

The subsequent request would be:

/api/tables/directors?cursor=abc123

If you pass in extra parameters, it invalidates the cursor. The backend then decodes the cursor, and gets the parameters from it.

What do you think?

This is just a brain dump - I'll review your solution in detail later tonight.

@dantownsend
Copy link
Member

This is where I got to when I was experimenting with cursor based pagination last time:

#35

It's a hard problem for sure. I'm not saying mine is correct - I just pushed it so you can take a look at where I got to last time.

@sinisaos
Copy link
Member Author

@dantownsend I totally agree with you. LimitOffset pagination is much easier and you can go to the page you want. Cursor pagination is hard to understand. You have so many edge cases and works only on unique colums like id. I need some time to understand that whole concept and your code.

Comment on lines +392 to +401
response.cursor = cursor
continue

if key == "__previous":
try:
previous = str(value)
except ValueError:
logger.info(f"Unrecognised __previous argument - {value}")
else:
response.previous = previous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get some type errors for response.cursor = cursor and response.previous = previous - we just need to add cursor and previous as attributes to Params.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dantownsend Like this:

@dataclass
class Params:
    operators: t.Dict[str, t.Type[Operator]] = field(
        default_factory=lambda: defaultdict(lambda: Equal)
    )
    match_types: t.Dict[str, str] = field(
        default_factory=lambda: defaultdict(lambda: MATCH_TYPES[0])
    )
    fields: t.Dict[str, t.Any] = field(default_factory=dict)
    order_by: t.Optional[OrderBy] = None
    include_readable: bool = False
    page: int = 1
    page_size: t.Optional[int] = None
    cursor: str = ""
    previous: str = ""

I can add this in next commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, exactly

Comment on lines 521 to 529
try:
# query where limit is equal to page_size plus one
query = query.limit(page_size + 1)
rows = await query.run()
# initial cursor
next_cursor = self.encode_cursor(str(rows[-1]["id"]))
except IndexError:
# for preventing IndexError in piccolo_admin filtering
next_cursor = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having some way for the user to request cursor pagination?

If we automatically do it, it adds some overhead to each request, and might not be required.

What do you think about only doing it if a cursor is passed in? To initiate it the first time, an empty cursor could be passed in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dantownsend Like this:

if cursor:
    try:
        # query where limit is equal to page_size plus one
        query = query.limit(page_size + 1)
        rows = await query.run()
        # initial cursor
        next_cursor = self.encode_cursor(str(rows[-1]["id"]))
    except IndexError:
        # for preventing IndexError in piccolo_admin filtering
        next_cursor = ""

Unfortunately if I do that I'm getting wrong results

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to try and debug this - it might be easier when we have unit tests.

We might be able to copy over this one, and modify it:

https://github.com/piccolo-orm/piccolo_api/pull/35/files#diff-9a76db3e8763a9778cd3d34f47be9819e44a3f5e78fa6e80c9d92c692a767b47

It's not the end of the world, but it would be nice to isolate the cursor logic, so it's only run when needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started writing some tests (based on your link) so we'll see what happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - if you need any help let me know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests (latest commit) and they all pass locally. I still don't know how to separate the cursor logic. Maybe you have some ideas about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll try it out later today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've played around with it a bit, to try and reduce the amount of cursor logic being run if the cursor isn't required. I've pushed my changes to this pull request.

There are still some more improvements to be made, but it feels a bit cleaner. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes looks pretty good and a lot cleaner :)

Comment on lines 587 to 591
cursor_model = Cursor(next_cursor=next_cursor).json()
json = self.pydantic_model_plural(include_readable=include_readable)(
rows=rows
).json()
return CustomJSONResponse(json)
return CustomJSONResponse(f"{json[:-1]}, {cursor_model[1:]}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about returning the cursor as a header, rather than in the body? It might simplify this bit of code?

I'm not saying it's right - just interested in your thoughts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dantownsend Whatever you want. I think it’s simpler in the body but you know better :)

Example of Vue frontend

getPosts() {
      const path = "/posts/?__page_size=5&__order=id&__cursor=" + this.cursor + "&__readable=true";
      axios
        .get(path)
        .then(res => {
          this.posts = res.data.rows;
          this.cursor = res.data.next_cursor;  // use cursor from request body like any other variable
        })
        .catch(error => {
          console.error(error);
        });
    },

If you want to return cursor from header, can you implement that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just be res.headers.Next-Cursor, assuming the header is called Next-Cursor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean something like this:

# set cursor in header
headers: t.Dict[str, str] = {}
headers["cursor"] = next_cursor

# We need to serialise it ourselves, in case there are datetime
# fields.
json = self.pydantic_model_plural(include_readable=include_readable)(
    rows=rows
).json()
return CustomJSONResponse(json, headers=headers)

After that we can access cursor with res.headers['cursor'] (not res.headers.cursor) but we first need to add expose_headers=["cursor"] in CORSMiddleware. If we don't do this the axios doesn't return the cursor from headers. If we use a cursor from the request body we don’t have to do any of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point - I didn't consider CORS.

OK, so let's keep it in the response body.

We could potentially tidy it up a bit by modifying pydantic_model_plural, so it contains the cursor attribute:

    def pydantic_model_plural(self, include_readable=False):
        """
        This is for when we want to serialise many copies of the model.
        """
        base_model = create_pydantic_model(
            self.table,
            include_default_columns=True,
            include_readable=include_readable,
            model_name=f"{self.table.__name__}Item",
        )
        return pydantic.create_model(
            str(self.table.__name__) + "Plural",
            __config__=Config,
            rows=(t.List[base_model], None),
            cursor=(t.Optional[str], None)
        )

That way we wouldn't need the separate Cursor model, and to combine the JSON strings. Instead it would just be:

return CustomJSONResponse(
    self.pydantic_model_plural(include_readable=include_readable)(
        rows=rows,
        cursor=cursor
    ).json()
)

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I just change one line and everything works great. Thanks.

return CustomJSONResponse(
    self.pydantic_model_plural(include_readable=include_readable)(
        rows=rows,
        cursor=next_cursor // I changed this line 
    ).json()
)

Now we don't need Cursor model and this weird line return CustomJSONResponse(f"{json[:-1]}, {cursor_model[1:]}")

@dantownsend
Copy link
Member

I read that article - I agree that it's a sensible approach.

I left a few comments. We could do with some unit tests too.

@lgtm-com
Copy link

lgtm-com bot commented Mar 27, 2021

This pull request introduces 1 alert when merging 02f87c5 into ead9b84 - view on LGTM.com

new alerts:

  • 1 for Unused import

@sinisaos
Copy link
Member Author

@dantownsend What do you think about merge this to master? I think this works good and code is pretty clean.

@dantownsend
Copy link
Member

@sinisaos Yeah, I'll have one more look over it tomorrow, and I'll merge it in. It'll need docs adding at some point.

@sinisaos sinisaos mentioned this pull request Apr 1, 2021
@sinisaos
Copy link
Member Author

sinisaos commented Apr 1, 2021

@dantownsend I updated the pagination docs and also added a contribution guide to make it easier for people to get involved in the project. Please can you check these docs as I am not a native English speaker. Thanks.

@sinisaos sinisaos closed this Oct 15, 2021
@sinisaos sinisaos deleted the cursor_pagination_improvements branch October 15, 2021 04:52
@sinisaos
Copy link
Member Author

Closed due #94

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants