Client-Server Mode for osv-scanner #2773

skleeolive · 2026-05-06T02:18:17Z

skleeolive
May 6, 2026

Overview

Sharing an idea I've been thinking about client-server mode. curious if others have hit the same pain point and whether this direction makes sense.

The Problem

When running osv-scanner at scale (e.g. in a Kubernetes cluster), two things get expensive fast:

Online mode makes multiple round-trips to api.osv.dev per scan:

POST /v1/querybatch → returns only vulnerability IDs
GET /v1/vulns/{id} × N → one request per match to fetch the full schema

A single scan can already generate hundreds of API calls. Multiply that across dozens of concurrent pods and it becomes a real bottleneck.

Offline mode (--offline) avoids the API but shifts the cost to GCS: each pod independently downloads per-ecosystem ZIP archives (osv-vulnerabilities.storage.googleapis.com/{ecosystem}/all.zip). In a cluster, every pod ends up fetching the same files on its own schedule.

Neither mode was designed for "many clients scanning at the same time."

Prior Art: trivy-operator

The trivy-operator ran into the same issue. They solved it by running a central trivy server pod that holds the vulnerability DB in memory — scanner pods just delegate matching to it. One DB download, many clients.

The Idea

What if osv-scanner had the same mode?

A single server process would pull the DB from GCS once, keep it in memory, and handle matching for any number of clients over HTTP:

Client pods                          Server pod
──────────────────────────           ──────────────────────────
scan source --server=...   ──────▶  POST /v1/scan
                                     └─ in-memory DB match
                           ◀──────   full vulnerability data

Mode	DB download	Vuln lookup
Online	—	OSV.dev API (querybatch + N hydrations)
Offline	Per-client GCS download	Local ZIP match
Client-Server	Server once	Single POST /v1/scan

The server would auto-refresh from GCS on a configurable interval (incremental, CRC32C-based) and expose /healthz and /readyz endpoints for Kubernetes probes.

Expected Limitations

A few constraints worth calling out before going further:

Commit-based matching: Vendored C/C++ libraries are identified via OSV.dev's DetermineVersion API and matched by commit hash. This requires Git history that a local ZIP DB doesn't contain, so those packages would still need to hit OSV.dev directly from the client.
Transitive dependency resolution: Resolving transitive dependencies (e.g. Maven pom.xml, Python requirements.txt) is done by the client against deps.dev or native registries before matching even begins. The server only receives the already-resolved package list — it doesn't change how transitive resolution works or reduce those upstream calls.
License scanning: License information is sourced from deps.dev independently of vulnerability matching, so it would remain a client-side concern regardless of server mode.
Single point of failure: A central server introduces availability concerns that don't exist in the current per-client model. High-availability deployments would need additional consideration.

Looking for Feedback

Does this match a problem you've run into? I want to hear whether this direction seems worth pursuing — and if there are scenarios or constraints I haven't considered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client-Server Mode for osv-scanner #2773

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Client-Server Mode for osv-scanner #2773

Uh oh!

skleeolive May 6, 2026

Overview

The Problem

Prior Art: trivy-operator

The Idea

Expected Limitations

Looking for Feedback

Replies: 0 comments

skleeolive
May 6, 2026