Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify documentation of return format for /v1/querybatch API #466

Open
p3pijn opened this issue Jun 22, 2022 · 5 comments
Open

Clarify documentation of return format for /v1/querybatch API #466

p3pijn opened this issue Jun 22, 2022 · 5 comments
Assignees
Labels
api API-related infrastructure documentation Improvements or additions to documentation infra infrastructure bugs/FRs

Comments

@p3pijn
Copy link

p3pijn commented Jun 22, 2022

According to the Swagger documentation at https://osv.dev/docs/#operation/OSV_QueryAffectedBatch the endpoint https://api.osv.dev/v1/querybatch should return a list of osvVulnerability.

Instead it returns a list of vulnerability IDs without any vulnerability details or the actual link to the package the vulnerability belongs to.

This leads to the following issues:

  • The batch endpoint does not return the OSV format, leading to clients having a need to query those vulnerability IDs. This completely goes against the idea of having a batch endpoint where you get all info for a set of libraries in a single HTTP request so that you do not overload the service.
  • It is unclear and undocumented how the current response from the batch endpoint should be used. How to link these vulnerability IDs to the actual packages that were provided in the request?
  • The documentation and implementation are out of sync

Example:

cat <<EOF | curl -X POST -d @- "https://api.osv.dev/v1/querybatch"
{
  "queries": [
    {
      "package": {
        "ecosystem": "Maven",
        "name": "org.apache.logging.log4j:log4j-core"
      },
      "version": "2.13.0"
    },
    {
      "package": {
        "ecosystem": "Packagist",
        "name": "noumo/easyii"
      },
      "version": "0.8"
    }
  ]
}
EOF

Returns:

{
    "results": [
        {
            "vulns": [
                {
                    "id": "GHSA-7rjr-3q55-vv33"
                },
                {
                    "id": "GHSA-jfh8-c2jp-5v3q"
                },
                {
                    "id": "GHSA-p6xc-xr62-6r2g"
                },
                {
                    "id": "GHSA-vwqq-5vrc-xw9h"
                }
            ]
        },
        {}
    ]
}

With this approach, the current service is almost impossible to use without hitting rate limits.

@oliverchang
Copy link
Collaborator

Hi! Thanks for trying our API and filing this issue.

According to the Swagger documentation at https://osv.dev/docs/#operation/OSV_QueryAffectedBatch the endpoint https://api.osv.dev/v1/querybatch should return a list of osvVulnerability.

Instead it returns a list of vulnerability IDs without any vulnerability details or the actual link to the package the vulnerability belongs to.

This leads to the following issues:

  • The batch endpoint does not return the OSV format, leading to clients having a need to query those vulnerability IDs. This completely goes against the idea of having a batch endpoint where you get all info for a set of libraries in a single HTTP request so that you do not overload the service.

Unfortunately since we allow up to 1000 packages per batch query, it's not feasible for us to return the full vulnerability details for every single package.

There is no rate limiting. GETing vulnerability details is very cheap and you can do them in parallel. We may add a batch GET endpoint in the near future if we see enough demand for it.

  • It is unclear and undocumented how the current response from the batch endpoint should be used. How to link these vulnerability IDs to the actual packages that were provided in the request?

We return an array where each element corresponds to the package you provided. I.e. in your example, you provided two packages. The returned response has two entries in "results". The first shows 4 vulnerabilities for the first package you provided. There were no vulnerabilities that were matched for the second package you provided (given by the "{}"). Note that Packagist querying is not fully supported yet (tracked in #230).

We'll clarify this in our API.

  • The documentation and implementation are out of sync

Could you please clarify this? Is this just the same issue with us not returning fully hydrated results?

With this approach, the current service is almost impossible to use without hitting rate limits.

We have no rate limits :)

@p3pijn
Copy link
Author

p3pijn commented Jun 23, 2022

@oliverchang Many thanks! I’ll have a go at the /v1/query endpoint instead of the batch endpoint. Great to hear there is no rate limitting.

The documentation and implementation are out of sync
Could you please clarify this? Is this just the same issue with us not returning fully hydrated results?

Indeed, the Swagger documentation currently suggests that the batch endpoint returns the fully hydrated vulnerabilities. Hence the confusion.

@fviernau
Copy link

fviernau commented Jul 5, 2022

I've just ran into this same issue as well in context of integrating OSV into [1].
Comparing the options

  1. Query the vulnerabilities for each project separately
  2. Batch request the vulnerabilities of all projects M plus requesting the N distinct vulnerabilities details by id.

Which one is actually faster for my use case in [1] is not obvious to me instantly, e.g:
Assuming both requests execute fast on the server, which option is better depends on the size of projects M and distinct vulnerabilities N and how much overlap of vulnerabilities there is across the projects. So, option (2) avoids redundantly transferring the details while it can (in some cases) also lead to a higher amount of total requests.

Anyhow, what I'm after is whether option 2. would allow for caching vulnerability details, which would be beneficial.
That'd work e.g. if the ID was usable as cache key for the vulnerability details. My suspicion however is that the ID is not usable as cache key, because the details have a "modified" flag and also the events seem like they would change over time while the ID remains stable. So, can you tell whether

  1. details may change over time for any given ID
  2. Are there plans on supporting the use case of client side caching of vulnerability details? e.g. by

[1] https://github.com/oss-review-toolkit/ort
[2] #448

@oliverchang
Copy link
Collaborator

Hi @fviernau,

Thanks for trying the OSV API!

Option 2. is definitely the preferred option from our perspective also. GETing vulnerabilities by ID is extremely fast/cheap and you can do this in parallel as well to improve performance. We've also been considering adding a bulk GET (by multiple IDs) to make this easier.

Re your questions specifically,

details may change over time for any given ID
That's correct.

Are there plans on supporting the use case of client side caching of vulnerability details? e.g. by

This makes a lot of sense -- we can certainly include the "modified" timestamp with the ID as well in the batch response. The combination of ID+modified would serve as a reliable cache key. I just filed #492 to track this.

Extending batchQuery by filtering by a last modified date is likely not what we're going to go with given the complexity it would add to the API.

@oliverchang oliverchang added the infra infrastructure bugs/FRs label Jul 6, 2022
@oliverchang oliverchang changed the title /v1/querybatch API does not return vulnerabilities in OSV format Clarify documentation of return format for /v1/querybatch API Jul 6, 2022
@oliverchang oliverchang added the documentation Improvements or additions to documentation label Jul 6, 2022
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 6, 2022
Add a retrofit based REST API client and a wrapper for simplicity and
information hiding. The implementation is based on [1], even though it
has some inconsistencies with the real behavior, see [2].

[1] https://osv.dev/docs/
[2] google/osv.dev#466

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 6, 2022
Add a retrofit based REST API client and a wrapper for simplicity and
information hiding. The implementation is based on [1], even though it
has some inconsistencies with the real behavior, see [2].

[1] https://osv.dev/docs/
[2] google/osv.dev#466

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2022
Add a retrofit based REST API client and a wrapper for simplicity and
information hiding. The implementation is based on [1], even though it
has some inconsistencies with the real behavior, see [2].

[1] https://osv.dev/docs/
[2] google/osv.dev#466

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2022
Add a retrofit based REST API client and a wrapper for simplicity and
information hiding. The implementation is based on [1], even though it
has some inconsistencies with the real behavior, see [2].

[1] https://osv.dev/docs/
[2] google/osv.dev#466

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2022
Add a Retrofit-based REST API client as well as a wrapper for
simplicity and information hiding. The implementation is based on [1],
even though it has some inconsistencies with the real behavior, see
e.g. [2].

[1] https://osv.dev/docs/
[2] google/osv.dev#466

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 8, 2022
Add a Retrofit-based REST API client as well as a wrapper for
simplicity and information hiding. The implementation is based on [1],
even though it has some inconsistencies with the real behavior, see
e.g. [2].

[1] https://osv.dev/docs/
[2] google/osv.dev#466

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 8, 2022
Add a Retrofit-based REST API client as well as a wrapper for
simplicity and information hiding. The implementation is based on [1],
even though it has some inconsistencies with the real behavior, see
e.g. [2].

[1] https://osv.dev/docs/
[2] google/osv.dev#466

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 8, 2022
Add a Retrofit-based REST API client as well as a wrapper for
simplicity and information hiding. The implementation is based on [1],
even though it has some inconsistencies with the real behavior, see
e.g. [2].

[1] https://osv.dev/docs/
[2] google/osv.dev#466

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
@andrewpollock andrewpollock added the api API-related infrastructure label Oct 18, 2022
Copy link

This issue has not had any activity for 60 days and will be automatically closed in two weeks

@github-actions github-actions bot added the stale The issue or PR is stale and pending automated closure label Jul 29, 2024
@andrewpollock andrewpollock self-assigned this Aug 5, 2024
@andrewpollock andrewpollock removed the stale The issue or PR is stale and pending automated closure label Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API-related infrastructure documentation Improvements or additions to documentation infra infrastructure bugs/FRs
Projects
None yet
Development

No branches or pull requests

4 participants