Clarify documentation of return format for /v1/querybatch API #466

p3pijn · 2022-06-22T11:43:14Z

According to the Swagger documentation at https://osv.dev/docs/#operation/OSV_QueryAffectedBatch the endpoint https://api.osv.dev/v1/querybatch should return a list of osvVulnerability.

Instead it returns a list of vulnerability IDs without any vulnerability details or the actual link to the package the vulnerability belongs to.

This leads to the following issues:

The batch endpoint does not return the OSV format, leading to clients having a need to query those vulnerability IDs. This completely goes against the idea of having a batch endpoint where you get all info for a set of libraries in a single HTTP request so that you do not overload the service.
It is unclear and undocumented how the current response from the batch endpoint should be used. How to link these vulnerability IDs to the actual packages that were provided in the request?
The documentation and implementation are out of sync

Example:

cat <<EOF | curl -X POST -d @- "https://api.osv.dev/v1/querybatch"
{
  "queries": [
    {
      "package": {
        "ecosystem": "Maven",
        "name": "org.apache.logging.log4j:log4j-core"
      },
      "version": "2.13.0"
    },
    {
      "package": {
        "ecosystem": "Packagist",
        "name": "noumo/easyii"
      },
      "version": "0.8"
    }
  ]
}
EOF

Returns:

{
    "results": [
        {
            "vulns": [
                {
                    "id": "GHSA-7rjr-3q55-vv33"
                },
                {
                    "id": "GHSA-jfh8-c2jp-5v3q"
                },
                {
                    "id": "GHSA-p6xc-xr62-6r2g"
                },
                {
                    "id": "GHSA-vwqq-5vrc-xw9h"
                }
            ]
        },
        {}
    ]
}

With this approach, the current service is almost impossible to use without hitting rate limits.

The text was updated successfully, but these errors were encountered:

oliverchang · 2022-06-22T16:24:38Z

Hi! Thanks for trying our API and filing this issue.

According to the Swagger documentation at https://osv.dev/docs/#operation/OSV_QueryAffectedBatch the endpoint https://api.osv.dev/v1/querybatch should return a list of osvVulnerability.

Instead it returns a list of vulnerability IDs without any vulnerability details or the actual link to the package the vulnerability belongs to.

This leads to the following issues:

The batch endpoint does not return the OSV format, leading to clients having a need to query those vulnerability IDs. This completely goes against the idea of having a batch endpoint where you get all info for a set of libraries in a single HTTP request so that you do not overload the service.

Unfortunately since we allow up to 1000 packages per batch query, it's not feasible for us to return the full vulnerability details for every single package.

There is no rate limiting. GETing vulnerability details is very cheap and you can do them in parallel. We may add a batch GET endpoint in the near future if we see enough demand for it.

It is unclear and undocumented how the current response from the batch endpoint should be used. How to link these vulnerability IDs to the actual packages that were provided in the request?

We return an array where each element corresponds to the package you provided. I.e. in your example, you provided two packages. The returned response has two entries in "results". The first shows 4 vulnerabilities for the first package you provided. There were no vulnerabilities that were matched for the second package you provided (given by the "{}"). Note that Packagist querying is not fully supported yet (tracked in #230).

We'll clarify this in our API.

The documentation and implementation are out of sync

Could you please clarify this? Is this just the same issue with us not returning fully hydrated results?

With this approach, the current service is almost impossible to use without hitting rate limits.

We have no rate limits :)

p3pijn · 2022-06-23T05:24:00Z

@oliverchang Many thanks! I’ll have a go at the /v1/query endpoint instead of the batch endpoint. Great to hear there is no rate limitting.

The documentation and implementation are out of sync
Could you please clarify this? Is this just the same issue with us not returning fully hydrated results?

Indeed, the Swagger documentation currently suggests that the batch endpoint returns the fully hydrated vulnerabilities. Hence the confusion.

fviernau · 2022-07-05T10:05:54Z

I've just ran into this same issue as well in context of integrating OSV into [1].
Comparing the options

Query the vulnerabilities for each project separately
Batch request the vulnerabilities of all projects M plus requesting the N distinct vulnerabilities details by id.

Which one is actually faster for my use case in [1] is not obvious to me instantly, e.g:
Assuming both requests execute fast on the server, which option is better depends on the size of projects M and distinct vulnerabilities N and how much overlap of vulnerabilities there is across the projects. So, option (2) avoids redundantly transferring the details while it can (in some cases) also lead to a higher amount of total requests.

Anyhow, what I'm after is whether option 2. would allow for caching vulnerability details, which would be beneficial.
That'd work e.g. if the ID was usable as cache key for the vulnerability details. My suspicion however is that the ID is not usable as cache key, because the details have a "modified" flag and also the events seem like they would change over time while the ID remains stable. So, can you tell whether

details may change over time for any given ID
Are there plans on supporting the use case of client side caching of vulnerability details? e.g. by
- by returning the whole cache key
- extending the batchQuery to allow for filtering by a last modified date, like also discussed in #Support mirroring through API #448.

[1] https://github.com/oss-review-toolkit/ort
[2] #448

oliverchang · 2022-07-06T05:38:26Z

Hi @fviernau,

Thanks for trying the OSV API!

Option 2. is definitely the preferred option from our perspective also. GETing vulnerabilities by ID is extremely fast/cheap and you can do this in parallel as well to improve performance. We've also been considering adding a bulk GET (by multiple IDs) to make this easier.

Re your questions specifically,

details may change over time for any given ID
That's correct.

Are there plans on supporting the use case of client side caching of vulnerability details? e.g. by

This makes a lot of sense -- we can certainly include the "modified" timestamp with the ID as well in the batch response. The combination of ID+modified would serve as a reliable cache key. I just filed #492 to track this.

Extending batchQuery by filtering by a last modified date is likely not what we're going to go with given the complexity it would add to the API.

Add a retrofit based REST API client and a wrapper for simplicity and information hiding. The implementation is based on [1], even though it has some inconsistencies with the real behavior, see [2]. [1] https://osv.dev/docs/ [2] google/osv.dev#466 Signed-off-by: Frank Viernau <frank_viernau@epam.com>

Add a Retrofit-based REST API client as well as a wrapper for simplicity and information hiding. The implementation is based on [1], even though it has some inconsistencies with the real behavior, see e.g. [2]. [1] https://osv.dev/docs/ [2] google/osv.dev#466 Signed-off-by: Frank Viernau <frank_viernau@epam.com>

github-actions · 2024-07-29T18:06:34Z

This issue has not had any activity for 60 days and will be automatically closed in two weeks

oliverchang added the infra infrastructure bugs/FRs label Jul 6, 2022

oliverchang changed the title ~~/v1/querybatch API does not return vulnerabilities in OSV format~~ Clarify documentation of return format for /v1/querybatch API Jul 6, 2022

oliverchang added the documentation Improvements or additions to documentation label Jul 6, 2022

fviernau mentioned this issue Jul 6, 2022

clients/osv: Implement querying the OSV API oss-review-toolkit/ort#5526

Merged

oliverchang mentioned this issue Aug 2, 2022

Fix API documentation #573

Closed

andrewpollock added the api API-related infrastructure label Oct 18, 2022

hayleycd mentioned this issue Mar 31, 2023

Migrating Documentation to GitHub page #1185

Closed

github-actions bot added the stale The issue or PR is stale and pending automated closure label Jul 29, 2024

andrewpollock self-assigned this Aug 5, 2024

andrewpollock removed the stale The issue or PR is stale and pending automated closure label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify documentation of return format for /v1/querybatch API #466

Clarify documentation of return format for /v1/querybatch API #466

p3pijn commented Jun 22, 2022

oliverchang commented Jun 22, 2022

p3pijn commented Jun 23, 2022

fviernau commented Jul 5, 2022

oliverchang commented Jul 6, 2022

github-actions bot commented Jul 29, 2024

Clarify documentation of return format for /v1/querybatch API #466

Clarify documentation of return format for /v1/querybatch API #466

Comments

p3pijn commented Jun 22, 2022

oliverchang commented Jun 22, 2022

p3pijn commented Jun 23, 2022

fviernau commented Jul 5, 2022

oliverchang commented Jul 6, 2022

github-actions bot commented Jul 29, 2024