Client-Server Mode for osv-scanner #2773
skleeolive
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Sharing an idea I've been thinking about client-server mode. curious if others have hit the same pain point and whether this direction makes sense.
The Problem
When running osv-scanner at scale (e.g. in a Kubernetes cluster), two things get expensive fast:
Online mode makes multiple round-trips to
api.osv.devper scan:POST /v1/querybatch→ returns only vulnerability IDsGET /v1/vulns/{id}× N → one request per match to fetch the full schemaA single scan can already generate hundreds of API calls. Multiply that across dozens of concurrent pods and it becomes a real bottleneck.
Offline mode (
--offline) avoids the API but shifts the cost to GCS: each pod independently downloads per-ecosystem ZIP archives (osv-vulnerabilities.storage.googleapis.com/{ecosystem}/all.zip). In a cluster, every pod ends up fetching the same files on its own schedule.Neither mode was designed for "many clients scanning at the same time."
Prior Art: trivy-operator
The trivy-operator ran into the same issue. They solved it by running a central
trivy serverpod that holds the vulnerability DB in memory — scanner pods just delegate matching to it. One DB download, many clients.The Idea
What if osv-scanner had the same mode?
A single server process would pull the DB from GCS once, keep it in memory, and handle matching for any number of clients over HTTP:
The server would auto-refresh from GCS on a configurable interval (incremental, CRC32C-based) and expose
/healthzand/readyzendpoints for Kubernetes probes.Expected Limitations
A few constraints worth calling out before going further:
Commit-based matching: Vendored C/C++ libraries are identified via OSV.dev's DetermineVersion API and matched by commit hash. This requires Git history that a local ZIP DB doesn't contain, so those packages would still need to hit OSV.dev directly from the client.
Transitive dependency resolution: Resolving transitive dependencies (e.g. Maven
pom.xml, Pythonrequirements.txt) is done by the client against deps.dev or native registries before matching even begins. The server only receives the already-resolved package list — it doesn't change how transitive resolution works or reduce those upstream calls.License scanning: License information is sourced from deps.dev independently of vulnerability matching, so it would remain a client-side concern regardless of server mode.
Single point of failure: A central server introduces availability concerns that don't exist in the current per-client model. High-availability deployments would need additional consideration.
Looking for Feedback
Does this match a problem you've run into? I want to hear whether this direction seems worth pursuing — and if there are scenarios or constraints I haven't considered.
Beta Was this translation helpful? Give feedback.
All reactions