Add pagination for the _catalog registry endpoint #220

mdellweg · 2021-02-02T09:15:41Z

fixes #7974
https://pulp.plan.io/issues/7974

pulpbot · 2021-02-02T09:33:46Z

Attached issue: https://pulp.plan.io/issues/7974

lubosmj

Awesome! I have tested the changes and they seem to work most of the time:

(pulp) [vagrant@pulp3-source-fedora32 backup]$ http :24817/v2/namespace/busybox/tags/list?n=10
HTTP/1.1 200 OK
Access-Control-Expose-Headers: Correlation-ID
Allow: GET, HEAD, OPTIONS
Connection: close
Content-Length: 47
Content-Type: application/json
Correlation-ID: a62ebd01565847d5bb666b0ccf7ac2af
Date: Fri, 05 Feb 2021 12:54:39 GMT
Docker-Distribution-Api-Version: registry/2.0
Link: <http://localhost:24817/v2/namespace/busybox/tags/list?n=10&last=1-glibc>; rel="next"
Server: gunicorn/20.0.4
Vary: Accept
X-Frame-Options: SAMEORIGIN

{
    "name": "namespace/busybox",
    "tags": [
        "1-glibc"
    ]
}


(pulp) [vagrant@pulp3-source-fedora32 backup]$ http :24817/v2/namespace/busybox/tags/list?n=-1
HTTP/1.1 500 Internal Server Error
Access-Control-Expose-Headers: Correlation-ID
Connection: close
Content-Length: 27
Content-Type: text/html
Correlation-ID: 03767513c8ba4814a2fa7d2c13094656
Date: Fri, 05 Feb 2021 12:54:46 GMT
Server: gunicorn/20.0.4
X-Frame-Options: SAMEORIGIN

<h1>Server Error (500)</h1>

Also, I am not sure whether we want to mimic the behavior of dockerhub or not, but when I query the tags/list endpoint, it does not return the Link header if the number of entries is not (presumably) large enough:

(pulp) [vagrant@pulp3-source-fedora32 backup]$ http GET https://registry-1.docker.io/v2/lubosmj/mirror/tags/list?n=1 --auth-type=jwt --auth=blabla
HTTP/1.1 200 OK
Content-Length: 56
Content-Type: application/json
Date: Fri, 05 Feb 2021 12:58:53 GMT
Docker-Distribution-Api-Version: registry/2.0
Strict-Transport-Security: max-age=31536000

{
    "name": "lubosmj/mirror",
    "tags": [
        "123",
        "3.8",
        "latest"
    ]
}

(pulp) [vagrant@pulp3-source-fedora32 backup]$ http GET https://registry-1.docker.io/v2/lubosmj/mirror/tags/list --auth-type=jwt --auth=blabla
HTTP/1.1 200 OK
Content-Length: 56
Content-Type: application/json
Date: Fri, 05 Feb 2021 12:44:48 GMT
Docker-Distribution-Api-Version: registry/2.0
Strict-Transport-Security: max-age=31536000

{
    "name": "lubosmj/mirror",
    "tags": [
        "123",
        "3.8",
        "latest"
    ]
}

lubosmj · 2021-02-05T12:53:05Z

pulp_container/app/registry_api.py

+    def paginate_queryset(self, queryset, request, view=None):
+        """
+        Analyse the pagination parameters and prepare the queryset.
+        """
+        try:
+            self.n = int(request.query_params.get("n"))
+        except Exception:
+            self.n = self.page_size
+        last = request.query_params.get("last")
+        self.url = request.build_absolute_uri()
+
+        if last:
+            queryset = queryset.filter(base_path__gt=last)
+        return queryset.order_by("base_path")[: self.n]


Would it be possible to create a new base class and extract methods that will be common for both ContainerCatalogPagination and ContainerTagListSerializer? I think we can better generalize (at least to some extent) paginate_queryset() as well as get_paginated_response(). How does that sound?

I thought about that and then there were subtle differences. But now that we know how it looks, i might be able to extract a base class.

CHANGES/7974.feature

ipanova · 2021-02-10T15:57:29Z

pulp_container/app/registry_api.py

+        }
+        return Response(headers=headers, data={"repositories": list(repositories_names)})
+
+    page_size = 16  # TODO what is a reasonable default here?


i have seen 50 on quay

ipanova · 2021-02-11T17:54:31Z

@mdellweg did you happen to stumble across this? https://github.com/tbeadle/django-rest-framework-link-header-pagination

No, but it looks like it's using pages, and that does not match our needs.

pulp_container/app/registry_api.py

ipanova · 2021-02-17T17:30:06Z

pulp_container/app/registry_api.py

+        headers = {
+            "Link": f'<{url}>; rel="next"',
+        }
+        return Response(headers=headers, data={"repositories": list(repositories_names)})


repo_name is already a list

i see a problem with this approach - the link header should be present in the response only when it indicates that not all the results have been displayed.
This way we will always have link in the header which is wrong.

(pulp) [vagrant@pulp3-source-fedora32 ~]$ http GET :24817/v2/test/this/tags/list HTTP/1.1 200 OK Access-Control-Expose-Headers: Correlation-ID Allow: GET, HEAD, OPTIONS Connection: close Content-Length: 40 Content-Type: application/json Correlation-ID: 39902f76fc6a4b2696d31040ceeba9ec Date: Wed, 17 Feb 2021 17:33:41 GMT Docker-Distribution-Api-Version: registry/2.0 Link: <http://localhost:24817/v2/test/this/tags/list?n=16&last=mytag1.8>; rel="next" Server: gunicorn/20.0.4 Vary: Accept X-Frame-Options: SAMEORIGIN { "name": "test/this", "tags": [ "mytag1.8" ] }

however i agree on the approach to paginate the response even if the client did not sent request with pagination.

Not sure if we can get away with this. If you follow the next link here (provided i fixed the empty list problem) you'll get an empty list, and you stop iterating.
But i will try to find a solution without this extra call anyway.

Now, with the empty list, there will be no next link anymore. But as long as we return something, i do not see, where i can get the information that there would be more.

one more thing i would suggest is to return in the link header a relative URI instead of absolute. According to RFC both are support though

pulp_container/app/registry_api.py

ipanova · 2021-02-17T17:56:22Z

imo, i would make this as simple as possible and:

return a paginated response no matter whether client has asked for it
support only n so a client can ask for a custom pagination other then the one set in our settings( it is 100)
the link header should be present only when not the full response has been produced

mdellweg · 2021-02-18T11:27:52Z

pulp_container/app/registry_api.py

+        else:
+            if self.n > 10 * self.page_size:
+                self.n = 10 * self.page_size


For the greedy.
In the end the pagination is not really a service to the user, but a precautionary measure against DOS.

i would not know what's the reasonable max page_size, probably 1000 is ok if having 100 as defaults unchanged

ipanova · 2021-02-18T17:19:36Z

@lubosmj how funny or sad it might sounds but i am pretty sure dockehub does not support pagination :) I have tried to list tags from mongo and it returned me 800+ tags. specidying n did not make any difference :) 🤦

CHANGES/7974.feature

ipanova · 2021-02-18T17:36:23Z

pulp_container/app/registry_api.py

+            headers["Link"] = f'<{url}>; rel="next"'
+        return Response(headers=headers, data={"name": self.path, "tags": tag_names})
+
+    page_size = api_settings.PAGE_SIZE


any reason this is defined twice? we can use it as a constant from settings

pulp_container/app/registry_api.py

ipanova · 2021-02-18T18:53:37Z

@mdellweg the pr looks good please take a look at this comment #220 (comment)
I would really love to see this pagination code for tags and catalog consolidated into one code path, but let's merge this as is for now.
I would also clean up registry_api.py and leave only viewes/viewsets related stuff and for example extract serializers into a separate file. But we can do this as well in separate PR.

mdellweg · 2021-02-19T12:29:04Z

I think consolidating the code paths would be possible, once we add renderers to the mix.
ATM the pagination class takes care of the rendering, which is not ideal.

The pagination follows the container registry specification by providing a link:next header and controls the viewable subset by count (n) and the last name of the previous page (last). fixes #7974 https://pulp.plan.io/issues/7974

ipanova

thank you! 🥂

mdellweg force-pushed the paginate_registry_api branch 4 times, most recently from 3ac79b7 to 247e1b8 Compare February 4, 2021 09:52

mdellweg marked this pull request as ready for review February 4, 2021 12:51

lubosmj requested changes Feb 5, 2021

View reviewed changes

mdellweg force-pushed the paginate_registry_api branch from 7ade67b to 5b6b118 Compare February 8, 2021 14:40

ipanova reviewed Feb 10, 2021

View reviewed changes

ipanova reviewed Feb 17, 2021

View reviewed changes

mdellweg force-pushed the paginate_registry_api branch 3 times, most recently from 949e43e to e02cd38 Compare February 18, 2021 11:26

mdellweg commented Feb 18, 2021

View reviewed changes

ipanova reviewed Feb 18, 2021

View reviewed changes

CHANGES/7974.feature Outdated Show resolved Hide resolved

ipanova reviewed Feb 18, 2021

View reviewed changes

pulp_container/app/registry_api.py Show resolved Hide resolved

Add pagination for _catalog and tags

c268a9c

The pagination follows the container registry specification by providing a link:next header and controls the viewable subset by count (n) and the last name of the previous page (last). fixes #7974 https://pulp.plan.io/issues/7974

mdellweg force-pushed the paginate_registry_api branch from fbb5428 to c268a9c Compare February 19, 2021 12:29

ipanova approved these changes Feb 19, 2021

View reviewed changes

mdellweg merged commit 4a2a996 into pulp:master Feb 22, 2021

mdellweg deleted the paginate_registry_api branch February 22, 2021 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pagination for the _catalog registry endpoint #220

Add pagination for the _catalog registry endpoint #220

mdellweg commented Feb 2, 2021

pulpbot commented Feb 2, 2021

lubosmj left a comment

lubosmj Feb 5, 2021

mdellweg Feb 5, 2021

ipanova Feb 10, 2021

ipanova commented Feb 11, 2021 •

edited by mdellweg

Loading

ipanova Feb 17, 2021

ipanova Feb 17, 2021

ipanova Feb 17, 2021

ipanova Feb 17, 2021

mdellweg Feb 18, 2021

mdellweg Feb 18, 2021

ipanova Feb 18, 2021

ipanova commented Feb 17, 2021

mdellweg Feb 18, 2021

ipanova Feb 18, 2021

ipanova commented Feb 18, 2021 •

edited

Loading

ipanova Feb 18, 2021 •

edited

Loading

ipanova commented Feb 18, 2021

mdellweg commented Feb 19, 2021

ipanova left a comment

Add pagination for the _catalog registry endpoint #220

Add pagination for the _catalog registry endpoint #220

Conversation

mdellweg commented Feb 2, 2021

pulpbot commented Feb 2, 2021

lubosmj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova commented Feb 11, 2021 • edited by mdellweg Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova commented Feb 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova commented Feb 18, 2021 • edited Loading

ipanova Feb 18, 2021 • edited Loading

Choose a reason for hiding this comment

ipanova commented Feb 18, 2021

mdellweg commented Feb 19, 2021

ipanova left a comment

Choose a reason for hiding this comment

ipanova commented Feb 11, 2021 •

edited by mdellweg

Loading

ipanova commented Feb 18, 2021 •

edited

Loading

ipanova Feb 18, 2021 •

edited

Loading