Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Python requirements.txt scanning for pull requests #1225

Merged
merged 5 commits into from
Oct 19, 2023

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Oct 17, 2023

Adds support for ingesting patches that touch requirements.txt

This adds both support for replying to PRs that add python dependencies as
well as a building block for the Pi integration that will use the same code,
just pointed to Pi.

Fixes: #913

@jhrozek jhrozek changed the title Support Python requirements.txt scanning for pull requests #913 Support Python requirements.txt scanning for pull requests Oct 17, 2023
JAORMX
JAORMX previously requested changes Oct 17, 2023
internal/engine/eval/vulncheck/review.go Outdated Show resolved Hide resolved
@jhrozek jhrozek force-pushed the py_ingest branch 2 times, most recently from f54d224 to bfeafd9 Compare October 17, 2023 18:33
Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions rather than concerns. The tests made be feel much more confident than my own reasoning through the code.

"strings"

pb "github.com/stacklok/mediator/pkg/api/protobuf/go/mediator/v1"
)

var (
pyRequirementsNameRegex = regexp.MustCompile(`\s*(>=|<=|==|>|<|!=)`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to make this greedy so that it will match >= rather than >?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A test would verify this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way the operators are ordered already ensures that >= would match before >. You had a point though that the way I was trying to find the lowest version was incomplete, I added a fix to the parser and some more tests. Thanks!

Copy link
Member

@evankanderson evankanderson Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that RE2 and regexp are both greedy. (I thought they were lazy!) I'm going to include the rest just because I find it interesting and already typed it out.

Ordering of alternations is irrelevant for regular expression engines (particularly RE2 and Go's regexp module). regexp builds a state machine and marches it through the expression, picking out the longest string (greedy) that matches.

Since the match is greedy, this will match the longest possible string, which works for what you want.

internal/engine/eval/vulncheck/pkgdb.go Show resolved Hide resolved
internal/engine/eval/vulncheck/pkgdb.go Show resolved Hide resolved
internal/engine/ingester/diff/parse.go Outdated Show resolved Hide resolved
internal/engine/ingester/diff/parse.go Outdated Show resolved Hide resolved
internal/engine/ingester/diff/parse.go Outdated Show resolved Hide resolved
internal/engine/ingester/diff/parse.go Outdated Show resolved Hide resolved
internal/engine/ingester/diff/parse.go Outdated Show resolved Hide resolved
internal/engine/ingester/diff/parse.go Outdated Show resolved Hide resolved
@jhrozek
Copy link
Contributor Author

jhrozek commented Oct 18, 2023

Mostly questions rather than concerns. The tests made be feel much more confident than my own reasoning through the code.

I have big respect that you were even able to reason about a regex - for me, regexes are a write-only code :-) I was writing the regex along with the tests

@jhrozek
Copy link
Contributor Author

jhrozek commented Oct 18, 2023

Thanks for the review @evankanderson. I will address your comments through the day, but I want to get the Pi support up for review first, to get at least some high-level feedback there.

Adds support for parsing requirements.txt in PRs.
GH API is again a bit odd here - if the suggestion is only one line,
then it expects only the Line parameter to be set. If you set both to
the same value you'd get:
```
422 Unprocessable Entity [{Resource: Field: Code: Message:Pull request
    review thread start line must precede the end line.}]
```
…rements.txt

Queries pypi for latest package versions and provides suggestions to the
PR.

Fixes: #913
python's requirements.txt allows specifying packages without versions.
Because there' no vulnerability check we can do in absence of a version,
let's just skip those. pip should install the latest package in this
case anyway.
internal/engine/eval/vulncheck/pkgdb.go Show resolved Hide resolved
Comment on lines +144 to +152
func (p *PyPiReply) LineHasDependency(line string) bool {
nameMatch := util.PyRequestsNameRegexp.FindStringIndex(line)
if nameMatch == nil {
return false
}

name := strings.TrimSpace(line[:nameMatch[0]])
return name == p.Info.Name
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like you might want to implement util.ParsePyRequirement(line string) (string, string) that returns the package name and the version constraints (or "", "" for not matching a requirement line). This method could then just return the first argument.

Comment on lines +133 to +141
// IndentedString returns the patch suggestion for a requirement.txt file
// This method satisfies the patchLocatorFormatter interface where different
// package managers have different patch formats and different ways of presenting
// them. Since PyPi doesn't indent, but can specify zero or multiple versions, we
// don't care about the indent parameter. This is ripe for refactoring, though,
// see the comment in the patchLocatorFormatter interface.
func (p *PyPiReply) IndentedString(_ int, oldDepLine string, oldDep *pb.Dependency) string {
return strings.Replace(oldDepLine, oldDep.Version, p.Info.Version, 1)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the use cases here -- are we replacing one version with a new recommended version? If so, it seems like we might not need oldDepLine and oldDepVersion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea was to do a minimal replace on the version, keeping the rest of the line intact, mainly things like extras (package[extra_feature,extra_bloat]. I would like to refactor this interface and the associated structure anyway.

Comment on lines +105 to +117
func pyReqNormalizeLine(line string) string {
if !strings.HasPrefix(line, "+") {
return ""
}
line = strings.TrimPrefix(line, "+")

// Remove inline comments
if idx := strings.Index(line, "#"); idx != -1 {
line = line[:idx]
}

return strings.TrimSpace(line)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function only looking at added lines? Normalize doesn't quite seem like it covers that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyExtractAddedDeps seems like it might be a better name.

version := ""
var lowestVersion string
for _, match := range matches {
if len(match) < 3 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not:

Suggested change
if len(match) < 3 {
if len(match) != util.PyRequestsVersionRegexp.NumSubexp() {

(Which also seems unlikely to happen...)

}

// Extract the name by grabbing everything up to the first operator
nameMatch := util.PyRequestsNameRegexp.FindStringIndex(line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to do this is to split the string into (dep-name)(version-strings) first, and then doing the parsing loop above on the version-string part of the line. Since parsing is linear in the size of the input, there's probably not much performance gain either way.

Comment on lines +274 to +277
Flask
+requests>=2.0,<3
+pandas<0.25.0,>=0.24.0
+numpy==1.16.0`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test!

description: "Single addition, greater or equal version",
content: `
Flask
+requests>=2.19.0`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to test lines that contain - as well?

Suggested change
+requests>=2.19.0`,
-requests>=2.14.0
+requests>=2.19.0`,

@jhrozek
Copy link
Contributor Author

jhrozek commented Oct 19, 2023

Thanks for the review Evan, I filed #1236 to get back to your comments. I think they are all valid, but right now I'd prefer to move faster (and hopefully not break too many things). I'll merge.

@jhrozek jhrozek merged commit c8beb55 into stacklok:main Oct 19, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Python requirements.txt scanning for pull requests
3 participants