Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pagination for the _catalog registry endpoint #220

Merged
merged 1 commit into from
Feb 22, 2021

Conversation

mdellweg
Copy link
Member

@mdellweg mdellweg commented Feb 2, 2021

@pulpbot
Copy link
Member

pulpbot commented Feb 2, 2021

Attached issue: https://pulp.plan.io/issues/7974

@mdellweg mdellweg force-pushed the paginate_registry_api branch 4 times, most recently from 3ac79b7 to 247e1b8 Compare February 4, 2021 09:52
@mdellweg mdellweg marked this pull request as ready for review February 4, 2021 12:51
Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I have tested the changes and they seem to work most of the time:

(pulp) [vagrant@pulp3-source-fedora32 backup]$ http :24817/v2/namespace/busybox/tags/list?n=10
HTTP/1.1 200 OK
Access-Control-Expose-Headers: Correlation-ID
Allow: GET, HEAD, OPTIONS
Connection: close
Content-Length: 47
Content-Type: application/json
Correlation-ID: a62ebd01565847d5bb666b0ccf7ac2af
Date: Fri, 05 Feb 2021 12:54:39 GMT
Docker-Distribution-Api-Version: registry/2.0
Link: <http://localhost:24817/v2/namespace/busybox/tags/list?n=10&last=1-glibc>; rel="next"
Server: gunicorn/20.0.4
Vary: Accept
X-Frame-Options: SAMEORIGIN

{
    "name": "namespace/busybox",
    "tags": [
        "1-glibc"
    ]
}


(pulp) [vagrant@pulp3-source-fedora32 backup]$ http :24817/v2/namespace/busybox/tags/list?n=-1
HTTP/1.1 500 Internal Server Error
Access-Control-Expose-Headers: Correlation-ID
Connection: close
Content-Length: 27
Content-Type: text/html
Correlation-ID: 03767513c8ba4814a2fa7d2c13094656
Date: Fri, 05 Feb 2021 12:54:46 GMT
Server: gunicorn/20.0.4
X-Frame-Options: SAMEORIGIN

<h1>Server Error (500)</h1>

Also, I am not sure whether we want to mimic the behavior of dockerhub or not, but when I query the tags/list endpoint, it does not return the Link header if the number of entries is not (presumably) large enough:

(pulp) [vagrant@pulp3-source-fedora32 backup]$ http GET https://registry-1.docker.io/v2/lubosmj/mirror/tags/list?n=1 --auth-type=jwt --auth=blabla
HTTP/1.1 200 OK
Content-Length: 56
Content-Type: application/json
Date: Fri, 05 Feb 2021 12:58:53 GMT
Docker-Distribution-Api-Version: registry/2.0
Strict-Transport-Security: max-age=31536000

{
    "name": "lubosmj/mirror",
    "tags": [
        "123",
        "3.8",
        "latest"
    ]
}

(pulp) [vagrant@pulp3-source-fedora32 backup]$ http GET https://registry-1.docker.io/v2/lubosmj/mirror/tags/list --auth-type=jwt --auth=blabla
HTTP/1.1 200 OK
Content-Length: 56
Content-Type: application/json
Date: Fri, 05 Feb 2021 12:44:48 GMT
Docker-Distribution-Api-Version: registry/2.0
Strict-Transport-Security: max-age=31536000

{
    "name": "lubosmj/mirror",
    "tags": [
        "123",
        "3.8",
        "latest"
    ]
}

Comment on lines 367 to 417
def paginate_queryset(self, queryset, request, view=None):
"""
Analyse the pagination parameters and prepare the queryset.
"""
try:
self.n = int(request.query_params.get("n"))
except Exception:
self.n = self.page_size
last = request.query_params.get("last")
self.url = request.build_absolute_uri()

if last:
queryset = queryset.filter(base_path__gt=last)
return queryset.order_by("base_path")[: self.n]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to create a new base class and extract methods that will be common for both ContainerCatalogPagination and ContainerTagListSerializer? I think we can better generalize (at least to some extent) paginate_queryset() as well as get_paginated_response(). How does that sound?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that and then there were subtle differences. But now that we know how it looks, i might be able to extract a base class.

CHANGES/7974.feature Outdated Show resolved Hide resolved
}
return Response(headers=headers, data={"repositories": list(repositories_names)})

page_size = 16 # TODO what is a reasonable default here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have seen 50 on quay

@ipanova
Copy link
Member

ipanova commented Feb 11, 2021

@mdellweg did you happen to stumble across this? https://github.com/tbeadle/django-rest-framework-link-header-pagination

No, but it looks like it's using pages, and that does not match our needs.

pulp_container/app/registry_api.py Show resolved Hide resolved
pulp_container/app/registry_api.py Show resolved Hide resolved
pulp_container/app/registry_api.py Outdated Show resolved Hide resolved
headers = {
"Link": f'<{url}>; rel="next"',
}
return Response(headers=headers, data={"repositories": list(repositories_names)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repo_name is already a list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see a problem with this approach - the link header should be present in the response only when it indicates that not all the results have been displayed.
This way we will always have link in the header which is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(pulp) [vagrant@pulp3-source-fedora32 ~]$ http GET :24817/v2/test/this/tags/list
HTTP/1.1 200 OK
Access-Control-Expose-Headers: Correlation-ID
Allow: GET, HEAD, OPTIONS
Connection: close
Content-Length: 40
Content-Type: application/json
Correlation-ID: 39902f76fc6a4b2696d31040ceeba9ec
Date: Wed, 17 Feb 2021 17:33:41 GMT
Docker-Distribution-Api-Version: registry/2.0
Link: <http://localhost:24817/v2/test/this/tags/list?n=16&last=mytag1.8>; rel="next"
Server: gunicorn/20.0.4
Vary: Accept
X-Frame-Options: SAMEORIGIN

{
    "name": "test/this",
    "tags": [
        "mytag1.8"
    ]
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however i agree on the approach to paginate the response even if the client did not sent request with pagination.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can get away with this. If you follow the next link here (provided i fixed the empty list problem) you'll get an empty list, and you stop iterating.
But i will try to find a solution without this extra call anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, with the empty list, there will be no next link anymore. But as long as we return something, i do not see, where i can get the information that there would be more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing i would suggest is to return in the link header a relative URI instead of absolute. According to RFC both are support though

pulp_container/app/registry_api.py Show resolved Hide resolved
@ipanova
Copy link
Member

ipanova commented Feb 17, 2021

imo, i would make this as simple as possible and:

  1. return a paginated response no matter whether client has asked for it
  2. support only n so a client can ask for a custom pagination other then the one set in our settings( it is 100)
  3. the link header should be present only when not the full response has been produced

@mdellweg mdellweg force-pushed the paginate_registry_api branch 3 times, most recently from 949e43e to e02cd38 Compare February 18, 2021 11:26
Comment on lines 409 to 411
else:
if self.n > 10 * self.page_size:
self.n = 10 * self.page_size
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the greedy.
In the end the pagination is not really a service to the user, but a precautionary measure against DOS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would not know what's the reasonable max page_size, probably 1000 is ok if having 100 as defaults unchanged

@ipanova
Copy link
Member

ipanova commented Feb 18, 2021

@lubosmj how funny or sad it might sounds but i am pretty sure dockehub does not support pagination :) I have tried to list tags from mongo and it returned me 800+ tags. specidying n did not make any difference :) 🤦

CHANGES/7974.feature Outdated Show resolved Hide resolved
headers["Link"] = f'<{url}>; rel="next"'
return Response(headers=headers, data={"name": self.path, "tags": tag_names})

page_size = api_settings.PAGE_SIZE
Copy link
Member

@ipanova ipanova Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this is defined twice? we can use it as a constant from settings

@ipanova
Copy link
Member

ipanova commented Feb 18, 2021

@mdellweg the pr looks good :octocat: please take a look at this comment #220 (comment)
I would really love to see this pagination code for tags and catalog consolidated into one code path, but let's merge this as is for now.
I would also clean up registry_api.py and leave only viewes/viewsets related stuff and for example extract serializers into a separate file. But we can do this as well in separate PR.

@mdellweg
Copy link
Member Author

I think consolidating the code paths would be possible, once we add renderers to the mix.
ATM the pagination class takes care of the rendering, which is not ideal.

The pagination follows the container registry specification by providing
a link:next header and controls the viewable subset by count (n) and the
last name of the previous page (last).

fixes #7974
https://pulp.plan.io/issues/7974
Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! 🥂

@mdellweg mdellweg merged commit 4a2a996 into pulp:master Feb 22, 2021
@mdellweg mdellweg deleted the paginate_registry_api branch February 22, 2021 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants