Skip to content

feat(nvd): use go to upload NVD conversion to gcs upon conversion#5099

Open
jess-lowe wants to merge 24 commits intogoogle:masterfrom
jess-lowe:refactor/nvd-use-gcs
Open

feat(nvd): use go to upload NVD conversion to gcs upon conversion#5099
jess-lowe wants to merge 24 commits intogoogle:masterfrom
jess-lowe:refactor/nvd-use-gcs

Conversation

@jess-lowe
Copy link
Copy Markdown
Contributor

@jess-lowe jess-lowe commented Mar 20, 2026

This PR introduces support for immediately uploading NVD conversion records to GCS instead of saving them locally and then syncing, leveraging helper functions discussed in #4984.
Additionally, it refactors the NVD converter to separate record generation from output handling and reorganizes the project's upload and GCS utilities.

Key Changes

NVD Converter

  • Refactored output logic: Updated nvd.CVEToOSV to return the Vulnerability and Metrics objects instead of writing them to disk directly. This separates the conversion logic from the I/O handling.
  • Added GCS Upload Support: Added -upload-to-gcs, -output-bucket, and -gcs-prefix flags to the NVD converter tool to support direct streaming to GCS.

Package Reorganization & Utilities

  • Moved upload package: Relocated vulnfeeds/upload to vulnfeeds/conversion/writer to better fit the new output handling structure.
  • New gcs-tools package: Added a general GCS utility package in vulnfeeds/gcs-tools providing functions like UploadToGCS, UploadFile, and DownloadBucket.

Other Converters

  • Updated combine-to-osv and other converters (Alpine, Debian, etc.) to use the new writer package instead of the old upload package.

Why this is needed

  • Performance/Storage: Avoids local disk space bottlenecks during large NVD conversions by streaming directly to Cloud Storage.
  • Maintainability: Improves code modularity by separating conversion logic from output methods.

@jess-lowe jess-lowe requested review from another-rex and michaelkedar and removed request for another-rex March 20, 2026 03:09
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the gcs-tools repo generic to only uploading to GCS, but we shouldn't put CVE specific logic into here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If uploading to GCS is going to take a while, I would even put the multithreading / concurrency logic in here.
E.g. provide a function that will spin up X number of works, and a "gcs client" that just contains a channel.

Other code can pass the client to their code to upload.

Probably for a separate PR though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the gcs-tools repo generic to only uploading to GCS, but we shouldn't put CVE specific logic into here.

Moved these into their own thing in conversion/writer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If uploading to GCS is going to take a while, I would even put the multithreading / concurrency logic in here. E.g. provide a function that will spin up X number of works, and a "gcs client" that just contains a channel.

Other code can pass the client to their code to upload.

Probably for a separate PR though.

For uploading vulnerability records, this is too nuanced, hence it has its own thing in writer.VulnWorker, but with the NVD data this will be happening in the same thread that converts the record

Comment thread vulnfeeds/gcs-tools/gcs.go Outdated
@jess-lowe jess-lowe requested a review from another-rex March 20, 2026 05:44
jess-lowe added a commit that referenced this pull request Apr 30, 2026
nvd-cve-osv Cron job doesn't seem to be successfully finishing - it is
currently taking forever to upload and go threshold checks. This should
speed things up hopefully, while waiting for #5099
Copy link
Copy Markdown
Contributor

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, mostly looks good. Have you tested it locally and see how much faster it is compared to the script? (Probably not a big impact here, since locally we have a lot of threads compared to the cronjob)

Comment thread vulnfeeds/cmd/converters/cve/nvd-cve-osv/main.go Outdated
Comment thread vulnfeeds/conversion/writer/writer.go Outdated
Comment thread vulnfeeds/conversion/writer/writer.go Outdated
Comment thread vulnfeeds/conversion/writer/writer.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants