Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/scale-out: support for proxying GQL queries #2588

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vrajashkr
Copy link
Contributor

What type of PR is this?
feature

Which issue does this PR fix:
Towards #2434

What does this PR do / Why do we need it:
Previously, only dist-spec APIs were supported for scale-out as in a shared storage environment, the metadata was shared and any instance could correctly respond to the GQL queries as all the data is available.

In a local scale-out cluster deployment, the metadata store, in addition to the file storage is isolated to each member in the cluster. Due to this, there is a need to proxy the GQL queries as well for UI and client requests to work as expected.

This change introduces a new GQL proxy + a handler for the GlobalSearch request that proxies the request to all the members and collects them for response to the client.
Support for other GQL queries is pending.

Testing done on this change:
Unit Tests added.

Will this break upgrades or downgrades?
No, there shouldn't be any impact to upgrades and downgrades.

Does this PR introduce any user-facing change?:

A zot scale-out cluster deployed without using shared storage is now supported. Client queries from the UI and other clients will be proxied amongst the cluster members to fetch data for the request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@vrajashkr
Copy link
Contributor Author

vrajashkr commented Aug 4, 2024

There's still quite a bit of work to be done on this change. Sharing an early draft for review on the approach and handling.

I'll address the non-TODO style comments in the next commit.

return nil, err
}

resp, err := httpClient.Do(fwdRequest)

Check failure

Code scanning / CodeQL

Uncontrolled data used in network request Critical

The
URL
of this request depends on a
user-provided value
.
@@ -13,6 +11,7 @@ import (
"zotregistry.dev/zot/pkg/api/constants"
"zotregistry.dev/zot/pkg/cluster"
"zotregistry.dev/zot/pkg/common"
"zotregistry.dev/zot/pkg/proxy"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this split between pkg/api/proxy.go and pkg/proxy/proxy.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During development, there was a circular import for api and the new gql proxy in the extensions. This led me to break up the generic proxy logic into its own package and call it from the api package as well as from the gql proxy package.

That said, I do agree that the file naming could potentially be better.

StarCount int `json:"starCount"`
DownloadCount int `json:"downloadCount"`
NewestImage ImageSummary `json:"newestImage"`
Name string `json:"Name"` //nolint:tagliatelle // graphQL schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the GQL server was handling serialization of the data for responding to the client. Now, since the proxy handler is responding on behalf of the GQL server (after proxying), the standard Golang serialization takes place.

GQL starts keys with an uppercase, but the standard struct annotations don't encourage the same - hence the nolint.

Example:

{
    "errors": [
        {
            "message": "unable to run vulnerability scan on tag v0.0.19.231225-squashfs in repo machine/bootkit/rootfs: error: image 'machine/bootkit/rootfs@sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1' scanning is not supported for given image media type",
            "path": [
                "GlobalSearch"
            ]
        },
        {
            "message": "unable to run vulnerability scan in repo machine/bootkit/rootfs: manifest digest: sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1, error: image 'machine/bootkit/rootfs@sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1' scanning is not supported for given image media type",
            "path": [
                "GlobalSearch"
            ]
        }
    ],
    "data": {
        "GlobalSearch": {
            "Page": {
                "TotalCount": 37,
                "ItemCount": 3
            },
            "Repos": [
                {
                    "Name": "machine/bootkit/rootfs",
                    "LastUpdated": "2023-12-25T15:31:59.110429376Z",
                    "Size": "319472431",
                    "Platforms": [
                        {
                            "Os": "linux",
                            "Arch": "amd64"
                        }
                    ],
                    "IsStarred": false,
                    "IsBookmarked": false,
                    "NewestImage": {
                        "Tag": "v0.0.19.231225-squashfs",
                        "Vulnerabilities": {
                            "MaxSeverity": "",
                            "Count": 0
                        },
                        "Description": "A minimal bootable root filesystem",
                        "IsSigned": false,
                        "SignatureInfo": [],
                        "Licenses": "GPLv2 and others",
                        "Vendor": "project-machine",
                        "Labels": ""
                    },
                    "StarCount": 0,
                    "DownloadCount": 5285
                },
                {
                    "Name": "c3/ubuntu/base-amd64",
                    "LastUpdated": "2024-03-01T00:46:16.186838886Z",
                    "Size": "273201849",
                    "Platforms": [
                        {
                            "Os": "linux",
                            "Arch": "amd64"
                        }
                    ],
                    "IsStarred": false,
                    "IsBookmarked": false,
                    "NewestImage": {
                        "Tag": "jammy",
                        "Vulnerabilities": {
                            "MaxSeverity": "",
                            "Count": 0
                        },
                        "Description": "base is a minimal glibc-based Linux system",
                        "IsSigned": false,
                        "SignatureInfo": [],
                        "Licenses": "GPL-2.0-or-later",
                        "Vendor": "Cisco Systems, Inc.",
                        "Labels": ""
                    },
                    "StarCount": 0,
                    "DownloadCount": 260
                },
                {
                    "Name": "tools/busybox",
                    "LastUpdated": "2022-10-04T18:22:45.289257759Z",
                    "Size": "773920",
                    "Platforms": [
                        {
                            "Os": "linux",
                            "Arch": "amd64"
                        }
                    ],
                    "IsStarred": false,
                    "IsBookmarked": false,
                    "NewestImage": {
                        "Tag": "1.34.1",
                        "Vulnerabilities": {
                            "MaxSeverity": "",
                            "Count": 0
                        },
                        "Description": "",
                        "IsSigned": false,
                        "SignatureInfo": [],
                        "Licenses": "",
                        "Vendor": "",
                        "Labels": ""
                    },
                    "StarCount": 0,
                    "DownloadCount": 112
                }
            ]
        }
    }
}

}

proxyBody, err := io.ReadAll(proxyResponse.Body)
proxyResponse.Body.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close it right here? or defer this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is completely read after ReadAll, we may not need to keep it open till the end of the function execution. We can close it right away.

}

// aggregate errors
collatedResult.Errors = append(collatedResult.Errors, proxyRespData.Errors...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we now have a situation where we may a good result mixed with errors. This could be problematic/confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, though - it may not be all response errors. For example, here's a snippet from zothub:

{
    "errors": [
        {
            "message": "unable to run vulnerability scan on tag v0.0.19.231225-squashfs in repo machine/bootkit/rootfs: error: image 'machine/bootkit/rootfs@sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1' scanning is not supported for given image media type",
            "path": [
                "GlobalSearch"
            ]
        },
        {
            "message": "unable to run vulnerability scan in repo machine/bootkit/rootfs: manifest digest: sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1, error: image 'machine/bootkit/rootfs@sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1' scanning is not supported for given image media type",
            "path": [
                "GlobalSearch"
            ]
        }
    ],
    "data": {
        "GlobalSearch": {
            "Page": {
                "TotalCount": 37,
                "ItemCount": 3
            },
            "Repos": [
                {
                    "Name": "machine/bootkit/rootfs",
                    "LastUpdated": "2023-12-25T15:31:59.110429376Z",
                    "Size": "319472431",
                    "Platforms": [
                        {
                            "Os": "linux",
                            "Arch": "amd64"
                        }
                    ],
                    "IsStarred": false,
                    "IsBookmarked": false,
                    "NewestImage": {
                        "Tag": "v0.0.19.231225-squashfs",
                        "Vulnerabilities": {
                            "MaxSeverity": "",
                            "Count": 0
                        },
                        "Description": "A minimal bootable root filesystem",
                        "IsSigned": false,
                        "SignatureInfo": [],
                        "Licenses": "GPLv2 and others",
                        "Vendor": "project-machine",
                        "Labels": ""
                    },
                    "StarCount": 0,
                    "DownloadCount": 5285
                },

The data is valid, but there are some errors.

@rchincha
Copy link
Contributor

rchincha commented Aug 5, 2024

Overall the patch is not super-complicated.

@@ -214,14 +214,14 @@ type GlobalSearchResultResp struct {
}

type GlobalSearchResult struct {
Copy link
Contributor

@andaaron andaaron Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't these property renames mean we break backwards compatibility with older zots? Do we care about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I gathered, the GlobalSearch and other such payloads that are part of the GQL schema are handled entirely by the GQL server.

Ideally, this implementation should send the same payload as the GQL server except aggregated.

Not sure if I fully got your question though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean this code did not produce the same payload as the GQL server before the change?
In this case we should definitely fix this issue.

@vrajashkr
Copy link
Contributor Author

Update on this:
As an alternative solution, we were looking into having the fanout and proxy logic inside the resolver code and use closures to hold the request/response data.

However, the resolver has a signature like the following:
pkg/extensions/search/resolver.go

func globalSearch(ctx context.Context, query string, metaDB mTypes.MetaDB, filter *gql_generated.Filter,
	requestedPage *gql_generated.PageInput, cveInfo cveinfo.CveInfo, log log.Logger, //nolint:unparam
) (*gql_generated.PaginatedReposResult, []*gql_generated.ImageSummary, []*gql_generated.LayerSummary, error,
) {

This appears to be called by the internal resolver from the GQL server where we don't have any control of the behaviour.

Next items:

  1. Explore the current GQL server to see if there is any way to expose the request/response data.
  2. Explore whether the current GQL server can support proxying on its own.
  3. See if there are any other projects out there trying to proxy GQL requests.

@vrajashkr
Copy link
Contributor Author

Started a discussion on the gqlgen repo: 99designs/gqlgen#3295

While looking through their Discord, I came across the following gqlparser library that gqlgen uses internally. I'll take a look and evaluate this for parsing the query and formatting the output.

https://github.com/vektah/gqlparser

Previously, only dist-spec APIs were supported for scale-out as in a
shared storage environment, the metadata was shared and any instance
could correctly respond to the GQL queries as all the data is available.

In a local scale-out cluster deployment, the metadata store, in addition
to the file storage is isolated to each member in the cluster.
Due to this, there is a need to proxy the GQL queries as well for UI
and client requests to work as expected.

This change introduces a new GQL proxy + a handler for the GlobalSearch
request that proxies the request to all the members and collects them
for response to the client.
Support for other GQL queries is pending.

Signed-off-by: Vishwas Rajashekar <vrajashe@cisco.com>
Signed-off-by: Vishwas Rajashekar <vrajashe@cisco.com>
@vrajashkr
Copy link
Contributor Author

gqlparser appears to have worked very well for parsing the request query data and making the information accessible.
In the last commit, a change was made to take in the query name using this approach.

What was left partially handled was that the spec appears to allow for multiple GQL operations in a single request payload, but zot appears to only use one request at a time. TODO: need to verify this. If this is only one request at a time, then it will allow the current GQL proxy logic to function, otherwise, we may need to re-think the deserialization and aggregation part.

I'm currently looking into making the output available exactly as the GQL spec would.
The last commit already included a change to set the Content-Type header to application/json and indent the output.
The next commit will explore trimming the JSON to remove fields that were not in the GQL request. I explored how the GQL server does this and it is mostly handled by generated code that checks each field which won't work in our post-aggregation serialization implementation, hence the exploration into a generic field remover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants