-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diff ingester: Allow configurable wildcards to match ecosystem file names, plus some cleanups #975
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dig the cleanup and tests, but let's move the github dependencies to one place as we prepare for the pluggable providers to land
|
||
"github.com/google/go-github/v53/github" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not use the github API directly? Instead, let's include it in the github provider implementation. This way, we'll be able to better abstract this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the code calling the GH API directly, but just this line:
listOpts := &github.ListOptions{PerPage: 30}
I can modify the ListFiles
method to accept a generic parameter struct instead, that would get rid of the import, but note that the REST API currently returns GH specific types and there are other methods that accept GH specific types so maybe a bigger cleanup is in order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are indeed gonna need a bigger cleanup
if err != nil { | ||
return nil, fmt.Errorf("error getting pull request files: %w", err) | ||
} | ||
logger := zerolog.Ctx(ctx).With(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a bug for migrating everything to zero log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I think @evankanderson pointed me to code that showed that calling log
triggers calling zerolog
already? So far I've been just adding zerolog calls to code that I'm writing from scratch or cleaning up, but I don't think we track any systematical cleanup.
continue | ||
listOpts := &github.ListOptions{PerPage: 30} | ||
for { | ||
prFiles, resp, err := di.cli.ListFiles(ctx, pr.RepoOwner, pr.RepoName, int(pr.Number), listOpts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Long term, it would be nice to be able to use a channel to feed the results from reading the paginated list, but this seems fine for now. (Like using a generator in some other languages.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to use channels in the vulnerability evaluator to have a pipeline between the database and the package repository, there is actually seems like it will cut down on the wait time quite a bit and even simplify the code. If you don't mind, in this module I would wait and implement that later (maybe we implement the pagination in the REST API module by returning a channel?)
For the moment a simple loop seems fine because 99% of the list results would be just discarded right away. (most PRs don't add deps..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my thought was that later we could implement lazy pagination (rather than the eager pagination here) by pulling from a channel and having the channel's capacity block further pagination calls.
Don't do this before November, though. We have bigger fish to fry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and more fish :-)
I filed #996 in the meantime linking back here so we are reminded of this discussion later.
eco := di.getEcosystemForFile(*file.Filename) | ||
if eco == DepEcosystemNone { | ||
logger.Debug(). | ||
Str("filename", *file.Filename). | ||
Msg("No ecosystem found, skipping") | ||
continue | ||
} | ||
|
||
logger.Debug(). | ||
Str("filename", *file.Filename). | ||
Str("package-ecosystem", string(eco)). | ||
Msg("No ecosystem found, skipping") | ||
|
||
parser := newEcosystemParser(eco) | ||
if parser == nil { | ||
return nil, fmt.Errorf("no parser found for ecosystem %s", eco) | ||
} | ||
|
||
depBatch, err := parser(*file.Patch) | ||
if err != nil { | ||
return nil, fmt.Errorf("error parsing file %s: %w", *file.Filename, err) | ||
} | ||
|
||
for i := range depBatch { | ||
dep := depBatch[i] | ||
allDiffs = append(allDiffs, &pb.PrDependencies_ContextualDependency{ | ||
Dep: dep, | ||
File: &pb.PrDependencies_ContextualDependency_FilePatch{ | ||
Name: file.GetFilename(), | ||
PatchUrl: file.GetRawURL(), | ||
}, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to extract this into a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done locally, will push after a round of testing
logger.Debug(). | ||
Str("filename", *file.Filename). | ||
Str("package-ecosystem", string(eco)). | ||
Msg("No ecosystem found, skipping") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to have this message here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I fixed this already locally, but /in a different branch/ (I have already been building more tests atop this other branch)
I did mean to leave a debug message there, but it shouldn't say that no ecosystem matched..
if parser == nil { | ||
return nil, fmt.Errorf("no parser found for ecosystem %s", eco) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both this and the eco == DepEcosystemNone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not at the moment and not strictly needed, but at the same time I think it's better to check each call's return value (or alternatively we might want to encapsulate the two calls into one function)
allDiffs = append(allDiffs, &pb.PrDependencies_ContextualDependency{ | ||
Dep: dep, | ||
File: &pb.PrDependencies_ContextualDependency_FilePatch{ | ||
Name: file.GetFilename(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use file.GetFilename()
here, but *file.Filename
on lines 88, 91, and 97?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason except inconsistency. Thanks, fixed.
if strings.HasPrefix(ecoMapping.Depfile, wildcard) { | ||
strippedDepfile := strings.TrimPrefix(ecoMapping.Depfile, wildcard) | ||
if lastComponent == strippedDepfile { | ||
return DependencyEcosystem(ecoMapping.Name) | ||
} | ||
} else if filename == ecoMapping.Depfile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what this is supposed to do -- it feels like this should be a glob (and maybe matching with regexp
), but the code doesn't look like the code I'd expect for a glob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the code was difficult to parse, then please block the PR merge that's what reviews are for :-)
Does the current version with filepath.Match
read better?
(I wasn't sure initially about fifepatch.Match matching in the middle of the path, seemed like a footgun, but in the end the code is readable and more generic with Match...)
repeated FilePatch patches = 6; // The list of files changed in the PR. Does not include file contents | ||
|
||
int64 author_id = 7; // The author of the PR, will be used to check if we can request changes | ||
int64 author_id = 6; // The author of the PR, will be used to check if we can request changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentional to renumber these proto fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this protobuf struct is only ever used between mediator server and the policy engine and not exposed to the clients, so I think it's better to renumber while removing the FilePatch
attribute. When/if we send this struct to the clients, renumbering fields should probably be a hard nack.
cd2dc71
to
932feee
Compare
continue | ||
listOpts := &github.ListOptions{PerPage: 30} | ||
for { | ||
prFiles, resp, err := di.cli.ListFiles(ctx, pr.RepoOwner, pr.RepoName, int(pr.Number), listOpts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my thought was that later we could implement lazy pagination (rather than the eager pagination here) by pulling from a channel and having the channel's capacity block further pagination calls.
Don't do this before November, though. We have bigger fish to fry.
Str("package-ecosystem", string(eco)). | ||
Msg("matched ecosystem") | ||
|
||
parser := newEcosystemParser(eco) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Following up from the earlier comment)
My suggestion about the check for eco == DepEcosystemNone
was, in fact, to merge the two function calls into one:
if parser == nil {
logger.Debug().Str("filename", filename).Msg("No ecosystem found, skipping")
return nil, nil
}
logger.Debug().Str("filename", filename).Str("package-ecosystem", parser.Ecosystem()).
Msg("matched ecosystem") // Can also use .Send() IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
932feee
to
3d17288
Compare
3d17288
to
51e6da6
Compare
51e6da6
to
01e26b2
Compare
@evankanderson would you mind approving this version again? There are no functional changes, just a rebase to resolve conflicts in the autogenerated protobuf files. |
I started adding wildcard support for the package ecosystem matching in the
diff ingester and ended up doing a little cleanup after the code had been
rushed a bit to me demoable. I ended up adding pagination support, removed
some code that was just needed after a refactoring and added the wildcard
support with a basic test. More tests are incoming, there was only so much
time on the plane :-)
Fixes: #954