-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pagination for the _catalog registry endpoint #220
Conversation
Attached issue: https://pulp.plan.io/issues/7974 |
3ac79b7
to
247e1b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I have tested the changes and they seem to work most of the time:
(pulp) [vagrant@pulp3-source-fedora32 backup]$ http :24817/v2/namespace/busybox/tags/list?n=10
HTTP/1.1 200 OK
Access-Control-Expose-Headers: Correlation-ID
Allow: GET, HEAD, OPTIONS
Connection: close
Content-Length: 47
Content-Type: application/json
Correlation-ID: a62ebd01565847d5bb666b0ccf7ac2af
Date: Fri, 05 Feb 2021 12:54:39 GMT
Docker-Distribution-Api-Version: registry/2.0
Link: <http://localhost:24817/v2/namespace/busybox/tags/list?n=10&last=1-glibc>; rel="next"
Server: gunicorn/20.0.4
Vary: Accept
X-Frame-Options: SAMEORIGIN
{
"name": "namespace/busybox",
"tags": [
"1-glibc"
]
}
(pulp) [vagrant@pulp3-source-fedora32 backup]$ http :24817/v2/namespace/busybox/tags/list?n=-1
HTTP/1.1 500 Internal Server Error
Access-Control-Expose-Headers: Correlation-ID
Connection: close
Content-Length: 27
Content-Type: text/html
Correlation-ID: 03767513c8ba4814a2fa7d2c13094656
Date: Fri, 05 Feb 2021 12:54:46 GMT
Server: gunicorn/20.0.4
X-Frame-Options: SAMEORIGIN
<h1>Server Error (500)</h1>
Also, I am not sure whether we want to mimic the behavior of dockerhub or not, but when I query the tags/list endpoint, it does not return the Link
header if the number of entries is not (presumably) large enough:
(pulp) [vagrant@pulp3-source-fedora32 backup]$ http GET https://registry-1.docker.io/v2/lubosmj/mirror/tags/list?n=1 --auth-type=jwt --auth=blabla
HTTP/1.1 200 OK
Content-Length: 56
Content-Type: application/json
Date: Fri, 05 Feb 2021 12:58:53 GMT
Docker-Distribution-Api-Version: registry/2.0
Strict-Transport-Security: max-age=31536000
{
"name": "lubosmj/mirror",
"tags": [
"123",
"3.8",
"latest"
]
}
(pulp) [vagrant@pulp3-source-fedora32 backup]$ http GET https://registry-1.docker.io/v2/lubosmj/mirror/tags/list --auth-type=jwt --auth=blabla
HTTP/1.1 200 OK
Content-Length: 56
Content-Type: application/json
Date: Fri, 05 Feb 2021 12:44:48 GMT
Docker-Distribution-Api-Version: registry/2.0
Strict-Transport-Security: max-age=31536000
{
"name": "lubosmj/mirror",
"tags": [
"123",
"3.8",
"latest"
]
}
pulp_container/app/registry_api.py
Outdated
def paginate_queryset(self, queryset, request, view=None): | ||
""" | ||
Analyse the pagination parameters and prepare the queryset. | ||
""" | ||
try: | ||
self.n = int(request.query_params.get("n")) | ||
except Exception: | ||
self.n = self.page_size | ||
last = request.query_params.get("last") | ||
self.url = request.build_absolute_uri() | ||
|
||
if last: | ||
queryset = queryset.filter(base_path__gt=last) | ||
return queryset.order_by("base_path")[: self.n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to create a new base class and extract methods that will be common for both ContainerCatalogPagination
and ContainerTagListSerializer
? I think we can better generalize (at least to some extent) paginate_queryset()
as well as get_paginated_response()
. How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that and then there were subtle differences. But now that we know how it looks, i might be able to extract a base class.
7ade67b
to
5b6b118
Compare
pulp_container/app/registry_api.py
Outdated
} | ||
return Response(headers=headers, data={"repositories": list(repositories_names)}) | ||
|
||
page_size = 16 # TODO what is a reasonable default here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have seen 50 on quay
@mdellweg did you happen to stumble across this? https://github.com/tbeadle/django-rest-framework-link-header-pagination No, but it looks like it's using pages, and that does not match our needs. |
pulp_container/app/registry_api.py
Outdated
headers = { | ||
"Link": f'<{url}>; rel="next"', | ||
} | ||
return Response(headers=headers, data={"repositories": list(repositories_names)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repo_name is already a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see a problem with this approach - the link header should be present in the response only when it indicates that not all the results have been displayed.
This way we will always have link in the header which is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(pulp) [vagrant@pulp3-source-fedora32 ~]$ http GET :24817/v2/test/this/tags/list
HTTP/1.1 200 OK
Access-Control-Expose-Headers: Correlation-ID
Allow: GET, HEAD, OPTIONS
Connection: close
Content-Length: 40
Content-Type: application/json
Correlation-ID: 39902f76fc6a4b2696d31040ceeba9ec
Date: Wed, 17 Feb 2021 17:33:41 GMT
Docker-Distribution-Api-Version: registry/2.0
Link: <http://localhost:24817/v2/test/this/tags/list?n=16&last=mytag1.8>; rel="next"
Server: gunicorn/20.0.4
Vary: Accept
X-Frame-Options: SAMEORIGIN
{
"name": "test/this",
"tags": [
"mytag1.8"
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however i agree on the approach to paginate the response even if the client did not sent request with pagination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we can get away with this. If you follow the next link here (provided i fixed the empty list problem) you'll get an empty list, and you stop iterating.
But i will try to find a solution without this extra call anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, with the empty list, there will be no next link anymore. But as long as we return something, i do not see, where i can get the information that there would be more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more thing i would suggest is to return in the link header a relative URI instead of absolute. According to RFC both are support though
imo, i would make this as simple as possible and:
|
949e43e
to
e02cd38
Compare
pulp_container/app/registry_api.py
Outdated
else: | ||
if self.n > 10 * self.page_size: | ||
self.n = 10 * self.page_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the greedy.
In the end the pagination is not really a service to the user, but a precautionary measure against DOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would not know what's the reasonable max page_size, probably 1000 is ok if having 100 as defaults unchanged
@lubosmj how funny or sad it might sounds but i am pretty sure dockehub does not support pagination :) I have tried to list tags from mongo and it returned me 800+ tags. specidying |
pulp_container/app/registry_api.py
Outdated
headers["Link"] = f'<{url}>; rel="next"' | ||
return Response(headers=headers, data={"name": self.path, "tags": tag_names}) | ||
|
||
page_size = api_settings.PAGE_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason this is defined twice? we can use it as a constant from settings
@mdellweg the pr looks good please take a look at this comment #220 (comment) |
I think consolidating the code paths would be possible, once we add renderers to the mix. |
The pagination follows the container registry specification by providing a link:next header and controls the viewable subset by count (n) and the last name of the previous page (last). fixes #7974 https://pulp.plan.io/issues/7974
fbb5428
to
c268a9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! 🥂
fixes #7974
https://pulp.plan.io/issues/7974