Support Python requirements.txt scanning for pull requests #1225

jhrozek · 2023-10-17T14:24:49Z

Adds support for ingesting patches that touch requirements.txt

This adds both support for replying to PRs that add python dependencies as
well as a building block for the Pi integration that will use the same code,
just pointed to Pi.

Fixes: #913

internal/engine/eval/vulncheck/review.go

evankanderson

Mostly questions rather than concerns. The tests made be feel much more confident than my own reasoning through the code.

evankanderson · 2023-10-17T20:10:13Z

internal/engine/eval/vulncheck/pkgdb.go

 	"strings"

 	pb "github.com/stacklok/mediator/pkg/api/protobuf/go/mediator/v1"
 )

+var (
+	pyRequirementsNameRegex = regexp.MustCompile(`\s*(>=|<=|==|>|<|!=)`)


Do you need to make this greedy so that it will match >= rather than >?

(A test would verify this)

the way the operators are ordered already ensures that >= would match before >. You had a point though that the way I was trying to find the lowest version was incomplete, I added a fix to the parser and some more tests. Thanks!

It turns out that RE2 and regexp are both greedy. (I thought they were lazy!) I'm going to include the rest just because I find it interesting and already typed it out.

Ordering of alternations is irrelevant for regular expression engines (particularly RE2 and Go's regexp module). regexp builds a state machine and marches it through the expression, picking out the longest string (greedy) that matches.

Since the match is greedy, this will match the longest possible string, which works for what you want.

internal/engine/eval/vulncheck/pkgdb.go

internal/engine/ingester/diff/parse.go

jhrozek · 2023-10-18T06:25:32Z

Mostly questions rather than concerns. The tests made be feel much more confident than my own reasoning through the code.

I have big respect that you were even able to reason about a regex - for me, regexes are a write-only code :-) I was writing the regex along with the tests

jhrozek · 2023-10-18T06:29:35Z

Thanks for the review @evankanderson. I will address your comments through the day, but I want to get the Pi support up for review first, to get at least some high-level feedback there.

Adds support for parsing requirements.txt in PRs.

GH API is again a bit odd here - if the suggestion is only one line, then it expects only the Line parameter to be set. If you set both to the same value you'd get: ``` 422 Unprocessable Entity [{Resource: Field: Code: Message:Pull request review thread start line must precede the end line.}] ```

…rements.txt Queries pypi for latest package versions and provides suggestions to the PR. Fixes: #913

python's requirements.txt allows specifying packages without versions. Because there' no vulnerability check we can do in absence of a version, let's just skip those. pip should install the latest package in this case anyway.

internal/engine/eval/vulncheck/pkgdb.go

evankanderson · 2023-10-18T22:44:02Z

internal/engine/eval/vulncheck/pkgdb.go

+func (p *PyPiReply) LineHasDependency(line string) bool {
+	nameMatch := util.PyRequestsNameRegexp.FindStringIndex(line)
+	if nameMatch == nil {
+		return false
+	}
+
+	name := strings.TrimSpace(line[:nameMatch[0]])
+	return name == p.Info.Name
+}


It feels like you might want to implement util.ParsePyRequirement(line string) (string, string) that returns the package name and the version constraints (or "", "" for not matching a requirement line). This method could then just return the first argument.

evankanderson · 2023-10-18T22:45:48Z

internal/engine/eval/vulncheck/pkgdb.go

+// IndentedString returns the patch suggestion for a requirement.txt file
+// This method satisfies the patchLocatorFormatter interface where different
+// package managers have different patch formats and different ways of presenting
+// them. Since PyPi doesn't indent, but can specify zero or multiple versions, we
+// don't care about the indent parameter. This is ripe for refactoring, though,
+// see the comment in the patchLocatorFormatter interface.
+func (p *PyPiReply) IndentedString(_ int, oldDepLine string, oldDep *pb.Dependency) string {
+	return strings.Replace(oldDepLine, oldDep.Version, p.Info.Version, 1)
+}


I'm not sure I understand the use cases here -- are we replacing one version with a new recommended version? If so, it seems like we might not need oldDepLine and oldDepVersion.

the idea was to do a minimal replace on the version, keeping the rest of the line intact, mainly things like extras (package[extra_feature,extra_bloat]. I would like to refactor this interface and the associated structure anyway.

evankanderson · 2023-10-18T22:52:55Z

internal/engine/ingester/diff/parse.go

+func pyReqNormalizeLine(line string) string {
+	if !strings.HasPrefix(line, "+") {
+		return ""
+	}
+	line = strings.TrimPrefix(line, "+")
+
+	// Remove inline comments
+	if idx := strings.Index(line, "#"); idx != -1 {
+		line = line[:idx]
+	}
+
+	return strings.TrimSpace(line)
+}


Is this function only looking at added lines? Normalize doesn't quite seem like it covers that.

pyExtractAddedDeps seems like it might be a better name.

evankanderson · 2023-10-18T23:01:55Z

internal/engine/ingester/diff/parse.go

+		version := ""
+		var lowestVersion string
+		for _, match := range matches {
+			if len(match) < 3 {


Why not:

Suggested change

if len(match) < 3 {

if len(match) != util.PyRequestsVersionRegexp.NumSubexp() {

(Which also seems unlikely to happen...)

evankanderson · 2023-10-18T23:07:20Z

internal/engine/ingester/diff/parse.go

+		}
+
+		// Extract the name by grabbing everything up to the first operator
+		nameMatch := util.PyRequestsNameRegexp.FindStringIndex(line)


Another way to do this is to split the string into (dep-name)(version-strings) first, and then doing the parsing loop above on the version-string part of the line. Since parsing is linear in the size of the input, there's probably not much performance gain either way.

evankanderson · 2023-10-18T23:09:30Z

internal/engine/ingester/diff/parse_test.go

+ Flask
+requests>=2.0,<3
+pandas<0.25.0,>=0.24.0
+numpy==1.16.0`,


Great test!

evankanderson · 2023-10-18T23:10:02Z

internal/engine/ingester/diff/parse_test.go

+			description: "Single addition, greater or equal version",
+			content: `
+ Flask
+requests>=2.19.0`,


Do you want to test lines that contain - as well?

Suggested change

+requests>=2.19.0`,

-requests>=2.14.0

+requests>=2.19.0`,

Re-added context parameter.

jhrozek · 2023-10-19T07:26:54Z

Thanks for the review Evan, I filed #1236 to get back to your comments. I think they are all valid, but right now I'd prefer to move faster (and hopefully not break too many things). I'll merge.

jhrozek changed the title ~~Support Python requirements.txt scanning for pull requests #913~~ Support Python requirements.txt scanning for pull requests Oct 17, 2023

JAORMX previously requested changes Oct 17, 2023

View reviewed changes

internal/engine/eval/vulncheck/review.go Outdated Show resolved Hide resolved

jhrozek force-pushed the py_ingest branch 2 times, most recently from f54d224 to bfeafd9 Compare October 17, 2023 18:33

evankanderson reviewed Oct 17, 2023

View reviewed changes

jhrozek mentioned this pull request Oct 18, 2023

Pi Evaluator that provides a summary of dependencies and their alternatives #1232

Merged

jhrozek force-pushed the py_ingest branch from bfeafd9 to f06f1eb Compare October 18, 2023 21:22

jhrozek added 5 commits October 18, 2023 23:38

diff ingestor: Add support for ingesting requirements.txt

4e8f226

Adds support for parsing requirements.txt in PRs.

Support reporting python vulnerabilities back to PRs that touch requi…

c644406

…rements.txt Queries pypi for latest package versions and provides suggestions to the PR. Fixes: #913

vulncheck: Skip packages without a version

337e7ff

python's requirements.txt allows specifying packages without versions. Because there' no vulnerability check we can do in absence of a version, let's just skip those. pip should install the latest package in this case anyway.

fix: Use gh client not a custom http client when locating a PR dep

ad19eb7

jhrozek force-pushed the py_ingest branch from f06f1eb to ad19eb7 Compare October 18, 2023 21:38

evankanderson approved these changes Oct 18, 2023

View reviewed changes

jhrozek mentioned this pull request Oct 19, 2023

Address remaining code review suggestions for the py requirements.txt parser #1236

Closed

jhrozek merged commit c8beb55 into stacklok:main Oct 19, 2023
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Python requirements.txt scanning for pull requests #1225

Support Python requirements.txt scanning for pull requests #1225

jhrozek commented Oct 17, 2023 •

edited

Loading

evankanderson left a comment

evankanderson Oct 17, 2023

evankanderson Oct 17, 2023

jhrozek Oct 18, 2023

evankanderson Oct 18, 2023 •

edited

Loading

jhrozek commented Oct 18, 2023

jhrozek commented Oct 18, 2023

evankanderson Oct 18, 2023

evankanderson Oct 18, 2023

jhrozek Oct 19, 2023

evankanderson Oct 18, 2023

evankanderson Oct 18, 2023

evankanderson Oct 18, 2023

evankanderson Oct 18, 2023

evankanderson Oct 18, 2023

evankanderson Oct 18, 2023

jhrozek commented Oct 19, 2023

	if len(match) < 3 {
	if len(match) != util.PyRequestsVersionRegexp.NumSubexp() {

Support Python requirements.txt scanning for pull requests #1225

Support Python requirements.txt scanning for pull requests #1225

Conversation

jhrozek commented Oct 17, 2023 • edited Loading

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

jhrozek commented Oct 18, 2023

jhrozek commented Oct 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhrozek commented Oct 19, 2023

jhrozek commented Oct 17, 2023 •

edited

Loading

evankanderson Oct 18, 2023 •

edited

Loading