Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add experimental offline mode #183

Merged
merged 28 commits into from
Aug 9, 2023
Merged

feat: add experimental offline mode #183

merged 28 commits into from
Aug 9, 2023

Conversation

G-Rath
Copy link
Collaborator

@G-Rath G-Rath commented Feb 3, 2023

Resolves #81

This is based off a lot of the core of the detector - it's not working yet because I need to figure how to handle passing in the queries to the local db given that the detector takes PackageDetails, but really the key thing there is how to handle PURL which comes from SBOMs that I don't really know how to use 😅 (idk if I'm just dumb or what, but for some reason I've still not been able to figure how to accurately generate one from a Gemfile.lock, package-lock.json, etc)

If someone could provide some sample SBOMs that would be very useful (I'll also do a PR adding tests using them as fixtures), and also happy to receive feedback on the general approach - there are some smaller bits to discuss, like if fields should be omitted from the JSON output vs an empty array, and the Describe related stuff too.

This is now working, though personally it feels pretty awkward codewise - I know I'm bias but I feel like it would be better to trying to bring across the whole database package from the detector, as the API db is pretty much the same and then you'd have support for zips, directories, and the API with extra configs like working directories + an extensive test suite for all three (I don't think it would be as painful as one might first think, even with osvscanner having just been made public because that's relatively small).

Still, this does work as advertised - there's definitely a few things that could do with some cleaning up (including if fields should be omitted from the JSON output vs an empty array, and the Describe related stuff too) but am leaving them for now until I hear what folks think of the general implementation + my above comment.

I've also gone with two boolean flags rather than the url-based flag @oliverchang suggested because I didn't feel comfortable trying to shoehorn that into this PR as well, and now that we're using --experimental it should be fine to completely change these flags in future.

Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks! some initial comments.

pkg/models/vulnerabilities.go Outdated Show resolved Hide resolved
internal/offline/zip.go Outdated Show resolved Hide resolved
pkg/models/vulnerabilities_test.go Outdated Show resolved Hide resolved
pkg/models/vulnerability.go Outdated Show resolved Hide resolved
pkg/models/vulnerability.go Outdated Show resolved Hide resolved
cmd/osv-scanner/main.go Outdated Show resolved Hide resolved
@G-Rath

This comment was marked as resolved.

@oliverchang
Copy link
Collaborator

@oliverchang thanks for the initial review - a few of the things you've flagged I already know about but have deliberately left to show a possible future and their use-case is actually already being introduced in parallel PRs.

For now I'd like to focus on the overall direction, after which I'll adjust the whole PR to match and then you can review the finer points, but until then you don't need to spend time reviewing those :)

Sure! What kind of feedback re direction are you looking for in particular? The general structure? (It is a little hard to tease out all of these given the unrelated changes this seems to be bringing in :) ). That said I'll review this in a bit more detail later this week.

You've already started to touch on that by asking for a Db interface, but I don't think that's worth doing until there are multiple DBs since it's all internal - having said that, the detector already has three DBs including one for the API, so I'd be happy to port that across in which case you'd have your interface.

That seems like an easy/cheap thing to do today in a way that allows easy future extension, even if these interfaces are all private?

@G-Rath

This comment was marked as resolved.

another-rex pushed a commit that referenced this pull request Jun 27, 2023
This reduces the diff with #183 a bit, and is good practice
@G-Rath G-Rath marked this pull request as ready for review July 2, 2023 21:15
@G-Rath
Copy link
Collaborator Author

G-Rath commented Jul 2, 2023

@oliverchang @another-rex I think you can probably start doing some reviewing if you have time during this week - I've still got to add some CLI tests, and do a maybe final review of the whole thing against our design doc but I don't think there's anything major missing.

If nothing else, I would like to know your thoughts on ScannerActionsExperimental - ideally I'd like to actually land that in its own PR pulling in the existing experimental flags, so if you're happy with that I'll get that done later this week too

pkg/models/vulnerability.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added some initial comments.

Also:

  • What is the performance of the localdb when doing the local vuln matching like, is there any need for further optimisations, or is it already pretty fast as it is?
  • We could add some logging or a loading indicator when downloading the local zip files.

pkg/models/vulnerability.go Outdated Show resolved Hide resolved
pkg/models/vulnerability.go Outdated Show resolved Hide resolved
pkg/models/vulnerability.go Show resolved Hide resolved
pkg/models/vulnerability.go Show resolved Hide resolved
internal/offline/zip.go Outdated Show resolved Hide resolved
internal/offline/zip.go Outdated Show resolved Hide resolved
@G-Rath
Copy link
Collaborator Author

G-Rath commented Jul 3, 2023

What is the performance of the localdb when doing the local vuln matching like, is there any need for further optimisations, or is it already pretty fast as it is?

This hasn't passed the @spencerschrock check yet, but generally downloading the databases (which takes at least 5-10 seconds) is slower than the rest of the process so yes it's pretty fast.

I am interested in hearing how it performs against all the scorecard repos though because while I tried to benchmark I could have missed something 🤷

We could add some logging or a loading indicator when downloading the local zip files.

Whoops yeah adding some logs are another thing I still need to do 😅

@spencerschrock
Copy link
Contributor

What is the performance of the localdb when doing the local vuln matching like, is there any need for further optimisations, or is it already pretty fast as it is?

This hasn't passed the @spencerschrock check yet, but generally downloading the databases (which takes at least 5-10 seconds) is slower than the rest of the process so yes it's pretty fast.

I am interested in hearing how it performs against all the scorecard repos though because while I tried to benchmark I could have missed something 🤷

From the Scorecard weekly analysis side of things, I imagine a local/cached copy could certainly speed things up and would eliminate a lot of API traffic to osv (assuming we can configure it via our current entry point of DoScan).

I'd be interested in seeing how much of a difference the lack of commit based matching hurts.

@oliverchang
Copy link
Collaborator

From the Scorecard weekly analysis side of things, I imagine a local/cached copy could certainly speed things up and would eliminate a lot of API traffic to osv (assuming we can configure it via our current entry point of DoScan).

I'd be interested in seeing how much of a difference the lack of commit based matching hurts.

Today, perhaps not so much. But in a few months we'll have a large chunk of the NVD imported that will be very useful for C/C++ in general where source/commit-based matching is the best we have.

pkg/models/vulnerability.go Outdated Show resolved Hide resolved
pkg/models/vulnerability.go Outdated Show resolved Hide resolved
pkg/models/vulnerability.go Show resolved Hide resolved
Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good! Left some more comments.

cmd/osv-scanner/main.go Outdated Show resolved Hide resolved
internal/offline/check.go Outdated Show resolved Hide resolved
cmd/osv-scanner/main_test.go Outdated Show resolved Hide resolved
internal/offline/check.go Outdated Show resolved Hide resolved
cmd/osv-scanner/main_test.go Show resolved Hide resolved
@G-Rath
Copy link
Collaborator Author

G-Rath commented Jul 29, 2023

Ok this should be good to go now I think, with the main discussion point I'm expecting to be about the introduction of setting the local db path which we'd previously decided would only be configurable by an env variable. I've since realised it makes it harder to test by not having it not just for the scanner as a CLI but anyone using the library that wants to write their own tests.

Also I've deliberately not :

  • added many CLI tests because the core behaviour is already covered by more package-specific tests and it's a lot more work to craft CLI tests to cover the same sorts of things
  • done anything fancy with messaging output or loading handling, like tracking if a db for an ecosystem failed to load as I think there's some pro/cons on the different ways to handle it and that the is good enough for a first implementation, so I'm leaving them as follow-up changes if/when needed (which can be based on user feedback)

Finally, I think this has discovered a bug with the sbom parsing as it looks reports the ecosystem for Alpine packages as APK instead of Alpine.

@another-rex
Copy link
Collaborator

Fixed the alpine issue in #457

Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!! Just mostly have some nits, otherwise this looks pretty good to me.

internal/offline/zip.go Outdated Show resolved Hide resolved
internal/offline/zip.go Outdated Show resolved Hide resolved
pkg/models/vulnerability.go Show resolved Hide resolved
pkg/models/vulnerability.go Show resolved Hide resolved
pkg/models/vulnerability.go Show resolved Hide resolved
pkg/osvscanner/osvscanner.go Outdated Show resolved Hide resolved
internal/offline/zip.go Outdated Show resolved Hide resolved
internal/offline/check.go Outdated Show resolved Hide resolved
internal/offline/check.go Outdated Show resolved Hide resolved
internal/offline/zip.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! LGTM with some final nits.

internal/offline/zip.go Outdated Show resolved Hide resolved
pkg/models/vulnerability.go Show resolved Hide resolved
internal/offline/check.go Outdated Show resolved Hide resolved
internal/offline/check.go Outdated Show resolved Hide resolved
// the url that the zip archive was downloaded from
ArchiveURL string
// whether this database should make any network requests
Offline bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should name this ShouldUpdate or something similar ? That makes the intent a lot clearer, and currently as it's named it's a bit confusing in the context of this being inside an "offline" package.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about renaming the package itself to something like local? I'm not too fussed about renaming the property either, I just felt that Offline is pretty clear on what should (or should not) happen and that it is conceivable that other features get added that involve talking to a network.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local SGTM.

another-rex pushed a commit that referenced this pull request Aug 2, 2023
This pulls across the experimental struct from #183 which I'm assuming
folks are happy with given that PR is approved - I think it is good to
land this in its own PR as its technically an allowed breaking change
(or "experimental change" if you will), and that way it can be
explicitly linked to in changelogs etc without having to understand #183
as much.
@oliverchang oliverchang merged commit 53107dd into google:main Aug 9, 2023
7 checks passed
@G-Rath G-Rath deleted the support-offline branch August 15, 2023 07:34
@G-Rath G-Rath restored the support-offline branch August 15, 2023 07:34
@G-Rath G-Rath deleted the support-offline branch August 15, 2023 07:34
@G-Rath G-Rath restored the support-offline branch August 15, 2023 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support local DBs
4 participants