-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add experimental offline mode #183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks! some initial comments.
This comment was marked as resolved.
This comment was marked as resolved.
Sure! What kind of feedback re direction are you looking for in particular? The general structure? (It is a little hard to tease out all of these given the unrelated changes this seems to be bringing in :) ). That said I'll review this in a bit more detail later this week.
That seems like an easy/cheap thing to do today in a way that allows easy future extension, even if these interfaces are all private? |
This comment was marked as resolved.
This comment was marked as resolved.
This reduces the diff with #183 a bit, and is good practice
@oliverchang @another-rex I think you can probably start doing some reviewing if you have time during this week - I've still got to add some CLI tests, and do a maybe final review of the whole thing against our design doc but I don't think there's anything major missing. If nothing else, I would like to know your thoughts on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added some initial comments.
Also:
- What is the performance of the localdb when doing the local vuln matching like, is there any need for further optimisations, or is it already pretty fast as it is?
- We could add some logging or a loading indicator when downloading the local zip files.
This hasn't passed the @spencerschrock check yet, but generally downloading the databases (which takes at least 5-10 seconds) is slower than the rest of the process so yes it's pretty fast. I am interested in hearing how it performs against all the scorecard repos though because while I tried to benchmark I could have missed something 🤷
Whoops yeah adding some logs are another thing I still need to do 😅 |
From the Scorecard weekly analysis side of things, I imagine a local/cached copy could certainly speed things up and would eliminate a lot of API traffic to osv (assuming we can configure it via our current entry point of I'd be interested in seeing how much of a difference the lack of commit based matching hurts. |
Today, perhaps not so much. But in a few months we'll have a large chunk of the NVD imported that will be very useful for C/C++ in general where source/commit-based matching is the best we have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good! Left some more comments.
Ok this should be good to go now I think, with the main discussion point I'm expecting to be about the introduction of setting the local db path which we'd previously decided would only be configurable by an env variable. I've since realised it makes it harder to test by not having it not just for the scanner as a CLI but anyone using the library that wants to write their own tests. Also I've deliberately not :
Finally, I think this has discovered a bug with the sbom parsing as it looks reports the ecosystem for Alpine packages as APK instead of Alpine. |
Fixed the alpine issue in #457 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice!! Just mostly have some nits, otherwise this looks pretty good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! LGTM with some final nits.
internal/offline/zip.go
Outdated
// the url that the zip archive was downloaded from | ||
ArchiveURL string | ||
// whether this database should make any network requests | ||
Offline bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should name this ShouldUpdate
or something similar ? That makes the intent a lot clearer, and currently as it's named it's a bit confusing in the context of this being inside an "offline" package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about renaming the package itself to something like local
? I'm not too fussed about renaming the property either, I just felt that Offline
is pretty clear on what should (or should not) happen and that it is conceivable that other features get added that involve talking to a network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local
SGTM.
This pulls across the experimental struct from #183 which I'm assuming folks are happy with given that PR is approved - I think it is good to land this in its own PR as its technically an allowed breaking change (or "experimental change" if you will), and that way it can be explicitly linked to in changelogs etc without having to understand #183 as much.
Resolves #81
This is based off a lot of the core of the detector - it's not working yet because I need to figure how to handle passing in the queries to the local db given that the detector takesPackageDetails
, but really the key thing there is how to handle PURL which comes from SBOMs that I don't really know how to use 😅 (idk if I'm just dumb or what, but for some reason I've still not been able to figure how to accurately generate one from aGemfile.lock
,package-lock.json
, etc)If someone could provide some sample SBOMs that would be very useful (I'll also do a PR adding tests using them as fixtures), and also happy to receive feedback on the general approach - there are some smaller bits to discuss, like if fields should be omitted from the JSON output vs an empty array, and theDescribe
related stuff too.This is now working, though personally it feels pretty awkward codewise - I know I'm bias but I feel like it would be better to trying to bring across the whole
database
package from the detector, as the API db is pretty much the same and then you'd have support for zips, directories, and the API with extra configs like working directories + an extensive test suite for all three (I don't think it would be as painful as one might first think, even withosvscanner
having just been made public because that's relatively small).Still, this does work as advertised - there's definitely a few things that could do with some cleaning up (including if fields should be omitted from the JSON output vs an empty array, and the
Describe
related stuff too) but am leaving them for now until I hear what folks think of the general implementation + my above comment.I've also gone with two boolean flags rather than the url-based flag @oliverchang suggested because I didn't feel comfortable trying to shoehorn that into this PR as well, and now that we're using
--experimental
it should be fine to completely change these flags in future.