Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

o/i/apparmorprompting/common: add prompting path patterns #13730

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

olivercalder
Copy link
Member

This PR splits out the path pattern matching, expansion, and precedence from the previous PR to add the common package in its entirety: #13668

The path pattern-related work there itself is largely a port of a previous PR against the prompting branch: olivercalder#30

These path patterns will be used by prompt rules to determine whether a
path matches a particular rule, and if so, which rule has precedence.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
This is necessary so that patterns of the form `/foo/*` match paths of
the form `/foo`.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Path patterns can be thought of as a base path concatenated with either
a suffix or a group over multiple suffixes.

The list of suffixes (by precedence) is:

1. ``           (no suffix has highest priority)
2. `/*.ext`     (particular file extension in base path directory)
3. `/**/*.ext`  (particular file extension in any subdirectory of base path)
4. `/.*`        (dotfiles in base path directory)
5. `/**/*.`     (dotfiles in any subdirectory of base path)
6. `/*`         (any file in base path directory)
7. `/**`        (any file in any subdirectory of base path)

When a path is received, it is matched against existing rules, each of
which has a path pattern. If there are any groups in a rule's path
pattern, that pattern has been expanded (at time of rule being added)
into a pattern without groups for each option in the group of the
original path pattern.

When the path incoming path is matched against existing rules, it may
match more than one path pattern. If there is only one pattern which is
of the form with the highest precedence (of the set of matching
patterns), then that rule has precedence. Otherwise, the longest pattern
has precedence, as all suffixes of a given precedence matching a given
rule must be identical, so the base path of the longest pattern must be
longest, and thus the most specific.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
In order to support multi-part file extensions, such as `.tar.gz`, in
path patterns, we need to change the way precedence is computed for
patterns in the same suffix-precedence category.

For example, the following have the same suffix type (`/**/*.ext`), but
have equal length, so the old way of computing precedence by pattern
length fails:

* `/foo/bar/baz/**/*.gz`
* `/foo/bar/**/*.tar.gz`

In the above, we see that the first rule has a more specific base path,
while the second rule has a more specific file extension. It is a bit
arbitrary whether base path or file extension is more important, but for
consistency's sake, we introduce a new principle to path matching:

**Within a given suffix-precedence category, base path specificity is
more important than suffix specificity.**

Now, (1) has higher precedence than (2), despite that the latter has a
longer total pattern:

1. `/foo/bar/x/**/*.gz`
2. `/foo/bar/**/*.tar.gz

The precedence relationship still holds between suffix categories:

1. `/foo/bar/*.gz`
2. `/foo/bar/**/*.tar.gz`
3. `/foo/bar/*`
4. `/foo/bar/**`

As before, (1) has precedence over (2), because it matches files with a
given extension in the base path directory, rather than files with a
(more specific) file extension in any subdirectory, since any
subdirectory is a lower-precedence suffix category than in-directory.
The rules 1-4 are ordered in terms of precedence.

Note: the following should never be compared, since they cannot match
the same path, but if they *were* to be compared, the precedence order
would be as follows:

1. `/foo`
2. `/foo/bar/*.gz`
3. `/foo/*.tar.gz`
4. `/foo/bar/baz/**/*.gz`
5. `/foo/bar/**/*.tar.gz`
6. `/foo/bar/baz/fizz/*`
7. `/foo/bar/baz/*`
8. `/foo/bar/baz/fizz/buzz/**`
9. `/foo/bar/baz/fizz/**`

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Precedence has been adjusted so that *where* has priority over *what*.
That is, base path length takes priority over the specificity of file
extensions or other suffixes. If two patterns have the same base path,
but one follows with `/*` while the another follows with `/**`, the
former has precedence, but if a base+`/*` rule has a shorter base path
than a base+`/**` rule, the latter has precedence.

For example, the pattern `/foo/bar/**` has precedence over /foo/bar*`,
since the former requires that the path ends in `/` (TODO: this isn't
actually the case yet, since path matching hasn't yet been updated to be
more strict now that paths and path patterns may end in `/` and groups
are allowed in path patterns). Both of these patterns match `/foo/bar/`
(and also, currently, incorrectly, `/foo/bar` -- this will be fixed).

Wildcards are no longer restricted to being at the end of the path
pattern, grouped together and followed by a file extension or nothing at
all. Now, characters and/or directories can appear arbitrarily around
the single `*`, and after the `/**/`. However, there must still be at
most one `/**/` and one `*` in the pattern, with the `*` occurring after
the `/**/` if it exists.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
There are no longer restrictions about where wildcards or doublestars
may occur in path patterns. Prior modifications to precedence
computation which placed priority on *where* instead of *what* mean that
precedence between any two patterns (without groups or classes) which
match the same path can be decided.

More broadly, the concept of suffixes is no longer used in precedence
computation (due to the previous change) or checking the validity of
path patterns (new in this commit).

As such, the allowable path patterns glob can be modified to be far more
flexible, with wildcards occurring anywhere and the group beginning
anywhere. Additionally, we now allow "mis-use" of `**`, such as
concatenated to a non-`/` character, which is equivalent to `*`. Since
the group may now begin anywhere, we also allow repeated `//`, which is
equivalent to `/`, since it is difficult to check for these prior to
expansion. In both cases, `ExpandPathPattern()` replaces these repeated
characters with a single `*` or `/`, respectively, so that conflicts
with other equivalent patterns still occur as usual.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Re-architected the pattern precedence system, increasing performance,
clarity, and correctness. Notably, adjusted the precedence of patterns
with a glob as the next character compared to other patterns with a path
separator as the next character.

The new precedence order is as follows (highest to lowest precedence):

literals:
- /foo/bar/baz
- /foo/bar/
terminated:
- /foo/bar
singlestars:
- /foo/bar/*/baz
- /foo/bar/*/
- /foo/bar/*/*baz
- /foo/bar/*/*
- /foo/bar/*
- /foo/bar/*/**
glob:
- /foo/bar*baz
- /foo/bar*/baz
- /foo/bar*/baz/**
- /foo/bar*/
- /foo/bar*/*baz
- /foo/bar*/*/baz
- /foo/bar*/*/
- /foo/bar*/*
- /foo/bar*
doublestars:
- /foo/bar/**/baz
- /foo/bar/**/*baz/
- /foo/bar/**/*baz
terminal doublestar:
- /foo/bar/**/        # These are tough... usually, /foo/bar/**/ would have precedence over
- /foo/bar/**/*       # precedence over /foo/bar/**/*baz, but in this case,
- /foo/bar/**         # the trailing *baz adds more specificity.
glob with immediate doublestar:
- /foo/bar*/**/baz
- /foo/bar*/**/
- /foo/bar*/**

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
The `strings.CutSuffix` function requires go 1.20, which is not yet
required for snapd.

Furthermore, using the `CutSuffix` function in this case means calling
`HasSuffix` with `"/**/*"`, then constructing a string with the
`"/**/*"` removed, then constructing a new string with the `"/**"`
re-appended. Not very efficient.

Instead, simply check if the pattern has the `"/**/*"` suffix, and if
so, strip only the final `"/*"`.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
The `ExpandPathPattern` function has been rewritten to allow arbitrary
usage of groups in path patterns, without restrictions on the number of
groups which may occur nor the depth to which groups may be nested.

Now, the only restrictions placed on path patterns are that they must
begin with a `'/'` character and must not use unescaped `'['`, `']'`, or
`'?'` characters, as the current way precedence is computed relies on
being able to fully expand all path patterns into literals and
wildcards, and expanding `'[a-z]`' classes is not yet supported. If
classes were not expanded, they could be treated similarly to a single
`'?'` character with more restrictions, but computing precedence between
two classes is difficult, if not impossible.

As before, a `'\'` in front of any special character causes it to be
treated as a literal.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Any single character (denoted `'?'`) should have lower precedence than a
literal character, but should have higher precedence than zero or more
characters (denoted `'*'`).

There are some tricky interactions between `'?'` and `'*'`, so added
some unit tests to make sure these are cleaned properly. Additionally,
improved the robustness of path pattern cleaning in general.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Doublestar v4.6.0 had a bug where paths ending in `/**/` would not match
paths correctly. Doublestar v4.6.1 fixes this bug, but now allows paths
like `/path/to/a/**/` to match `/path/to/a/foo`. We want to restrict
matches further so that patterns ending in `/` only match paths which
also end in `/`. However, we do want patterns which don't end in `/` to
match paths which end in `/`, since received paths which are directories
end in `/` and we don't want to require that the user ends their pattern
in `/` or `{,/}` to match directories or {non,}directories,
respectively.

This, computing whether a pattern matches a path involves three steps:
1. If the pattern ends in `/` but the path doesn't, return false. This
   is really just to catch the case of patterns ending in `/**/`.
2. If the pattern matches the path, return true.
3. Return whether `pattern+'/'` matches the path. This ensures that
   patterns which do not end in `/` match paths which do end in `/`.
   For example, this makes sure that the pattern `/path/to/a` matches
   the path `/path/to/a/`.

Adds some more tests to clarify that this is the case.

Additionally, since `StripTrailingSlashes` is no longer used, remove it,
along with its tests.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.93%. Comparing base (2b48274) to head (da0e748).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13730      +/-   ##
==========================================
+ Coverage   78.91%   78.93%   +0.02%     
==========================================
  Files        1043     1044       +1     
  Lines      134345   134537     +192     
==========================================
+ Hits       106014   106203     +189     
- Misses      21719    21720       +1     
- Partials     6612     6614       +2     
Flag Coverage Δ
unittests 78.93% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Runes in UTF-8 should be treated as a single "character", even if they
are more than one byte.

Also, no longer check precedence of patterns ending in "/**/*", since
these are identical to "/**" and should have been converted as such by
`ExpandPathPattern`.

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
@olivercalder olivercalder added the Needs Samuele review Needs a review from Samuele before it can land label Mar 19, 2024
@olivercalder olivercalder requested a review from zyga March 19, 2024 23:03
@olivercalder olivercalder marked this pull request as ready for review March 19, 2024 23:03
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
@zyga
Copy link
Collaborator

zyga commented Mar 21, 2024

I've started to review this now

Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note on dependencies

go.mod Outdated Show resolved Hide resolved
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Copy link
Collaborator

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of initial comments. I need to take another in-depth look, but it will clearly require more ☕

@@ -6,6 +6,7 @@ go 1.18
replace maze.io/x/crypto => github.com/snapcore/maze.io-x-crypto v0.0.0-20190131090603-9b94c9afe066

require (
github.com/bmatcuk/doublestar/v4 v4.6.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be added to the reference packaging for Fedora and Debian

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some dependencies in packaging/fedora/snapd.spec, for example, but this isn't something I've worked on before. Is this a build dependency (I would think, since Go statically compiles?) or runtime dependency (but then, this might be why we need the dependency as a package?)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build dependency, take a look at snapd.spec

%if ! 0%{?with_bundled}
Requires: golang(go.etcd.io/bbolt)
Requires: golang(github.com/coreos/go-systemd/activation)
Requires: golang(github.com/godbus/dbus)
Requires: golang(github.com/godbus/dbus/introspect)
Requires: golang(github.com/gorilla/mux)
Requires: golang(github.com/jessevdk/go-flags)
Requires: golang(github.com/juju/ratelimit)
Requires: golang(github.com/kr/pretty)
Requires: golang(github.com/kr/text)
Requires: golang(github.com/mvo5/goconfigparser)
Requires: golang(github.com/seccomp/libseccomp-golang)
Requires: golang(github.com/snapcore/go-gettext)
Requires: golang(golang.org/x/crypto/openpgp/armor)
Requires: golang(golang.org/x/crypto/openpgp/packet)
Requires: golang(golang.org/x/crypto/sha3)
Requires: golang(golang.org/x/crypto/ssh/terminal)
Requires: golang(golang.org/x/xerrors)
Requires: golang(golang.org/x/xerrors/internal)
Requires: golang(gopkg.in/check.v1)
Requires: golang(gopkg.in/macaroon.v1)
Requires: golang(gopkg.in/mgo.v2/bson)
Requires: golang(gopkg.in/retry.v1)
Requires: golang(gopkg.in/tomb.v2)
Requires: golang(gopkg.in/yaml.v2)
Requires: golang(gopkg.in/yaml.v3)
%else
# These Provides are unversioned because the sources in
# the bundled tarball are unversioned (they go by git commit)
# *sigh*... I hate golang...
Provides: bundled(golang(go.etcd.io/bbolt))
Provides: bundled(golang(github.com/coreos/go-systemd/activation))
Provides: bundled(golang(github.com/godbus/dbus))
Provides: bundled(golang(github.com/godbus/dbus/introspect))
Provides: bundled(golang(github.com/gorilla/mux))
Provides: bundled(golang(github.com/jessevdk/go-flags))
Provides: bundled(golang(github.com/juju/ratelimit))
Provides: bundled(golang(github.com/kr/pretty))
Provides: bundled(golang(github.com/kr/text))
Provides: bundled(golang(github.com/mvo5/goconfigparser))
Provides: bundled(golang(github.com/seccomp/libseccomp-golang))
Provides: bundled(golang(github.com/snapcore/go-gettext))
Provides: bundled(golang(golang.org/x/crypto/openpgp/armor))
Provides: bundled(golang(golang.org/x/crypto/openpgp/packet))
Provides: bundled(golang(golang.org/x/crypto/sha3))
Provides: bundled(golang(golang.org/x/crypto/ssh/terminal))
Provides: bundled(golang(golang.org/x/xerrors))
Provides: bundled(golang(golang.org/x/xerrors/internal))
Provides: bundled(golang(gopkg.in/check.v1))
Provides: bundled(golang(gopkg.in/macaroon.v1))
Provides: bundled(golang(gopkg.in/mgo.v2/bson))
Provides: bundled(golang(gopkg.in/retry.v1))
Provides: bundled(golang(gopkg.in/tomb.v2))
Provides: bundled(golang(gopkg.in/yaml.v2))
Provides: bundled(golang(gopkg.in/yaml.v3))
%endif

}
// Saw start of new group, so get the string from currLiteralStart to
// the opening '{' of the new group. Do this before expanding.
infix := prevStringFromIndex(reader, currLiteralStart)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering, why not the standard FSM based recursive parser with a queue of patterns or expansion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered using a queue of subpatterns, but it seemed more difficult to reason about which subpatterns to concatenate later. I believe how this works with a reader reading through the pattern string once is essentially a FSM, but I didn't see any other way to look back and extract a substring up to a given point. So either look forward to next '{', ',', '}' and add those runes to a string builder, or look back once one is seen and use a slice over the existing string. I think these are basically equivalent, and I wanted to minimize memory copying as much as possible, so I decided to look back and extract the slice.

I also considered an approach to construct a list of lists, where each list contains all the subpatterns in a group, and the set of expanded patterns would be the Cartesian product of the list of lists. But then you have to recursively resolve groups within groups in order to flatten the lists of subpatterns. Basically, I opted for the current approach because I believed the cognitive load would be minimal, but I agree there are some messy parts as a result of trying to maintain good performance.

I think storing a recursive list of the start and end indices of subgroups, and then building each final string using a string Builder might decrease the asymptotic runtime and memory usage by a factor of len(pattern) if creating substring slices involved copying the bytes, but afaik it only creates a header with a start pointer and length, so it's equivalent to storing start and end indices but much easier (IMO) to reason about.

The inefficiency with my current approach is that the previous substrings are concatenated into new strings, and that involves copying. So a more efficient approach would not concatenate any strings until the very end, then doing so with a builder for every permutation of subpatterns. Would this be what you mean by using a queue of patterns or expansion?

I'm very open to any preferences or suggestions you might have. Thanks!

// - /foo/bar*/**/baz
// - /foo/bar*/**/
// - /foo/bar*/**
func GetHighestPrecedencePattern(patterns []string) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to keep the PR small, I'd move this to a separate one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already split the pattern-related code from the broader common package PR #13668, and I think the functions in this PR are related enough that it makes sense for them to be in the same PR. Also, the non-test code is only 478 lines. I suppose it could be justified to make a separate PR for precedence, but it should definitely end up in the same file as the rest of the pattern functions, in my opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the current size splitting is a good idea, this PR as is definitely still too big. There's no rule that the entire content of a file needs to be added in one PR. Also one doesn't need to understand expansion of {} to review a functions that says it must have been done before.

return np.nextPatternsList
}

// GetHighestPrecedencePattern determines which of the given path patterns is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, do you recall what apparmor_parser does if there are potentially overlapping file rules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppArmor denies if there is any deny rule which matches the given path. But we don't want that behavior here -- we want the most specific rule to have precedence here. For example, a user could grant read access to ~/Documents but deny read access to ~/Documents/private.

Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending more comments that were stuck "pending"

//
// Groups are enclosed by '{' '}'. Returns a list of all the expanded path
// patterns, or an error if the given pattern is invalid.
func ExpandPathPattern(pattern string) ([]string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need some sort of protection to avoid exponential explosion of patterns and memory exhaustion? Will this have to process anything that is not coming from trusted source?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. Path patterns are arbitrary and come from prompt clients or even directly from users. So there is definitely a risk of exponential pattern expansion burning through memory.

I considered adding a dedicated function to ValidatePathPattern which would count the precise number of patterns expansion will produce, but to do so precisely seemed to require computation on the same order as just going ahead with expanding the pattern, so my thought was just to check the number of patterns which have been have been expanded whenever the subpatterns are concatenated and return an error if that number exceeds some arbitrary limit (16k?). Does this approach sound good to you? Alternatively, we could use a simpler heuristic, or only allow some fixed max number of groups, regardless of whether they interact exponentially {a,b}{c,d} or additively {a,{b,c}}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone could also pass something like {...{foo}...} with two million bracket pairs (which I think the API will still accept) and it will kill the goroutine. So there should also be a recursion limit since in that case the number of expanded subpatterns would never go past 1 AFAICT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a maximum group count to the pattern validation which checks that there are no more than 10 groups total, whether they are nested or in series. That should prevent both exponential expansion (beyond N^10 for N options per group), and prevent stack overflow. Similar check also in ExpandPathPattern, so it can't crash snapd in case it is called with a non-validated pattern.

go.mod Outdated Show resolved Hide resolved
@zyga zyga self-requested a review March 22, 2024 12:20
continue
}
if r == ',' || r == '}' {
suffix := prevStringFromIndex(reader, currSubpatternStart)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpandPathPattern is deduplicating expanded patterns at the end but by then we've already paid the cost of generating their combinations. We could do it here to avoid to generating many combinations that will be dropped anyway

patternForReader := make(map[*strings.Reader]string, len(patterns))
remainingPatterns := make([]*strings.Reader, 0, len(patterns))
for _, pattern := range patterns {
if alreadySeen[pattern] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment in the tests saying that duplicate patterns should never be passed into this and these would've come out of ExpandPathPattern. So do we need to deduplicate here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. There shouldn't be duplicate patterns passed into GetHighestPrecedencePattern, but if there are, it should handle them correctly. The note is just to say that technically the tests are not something which should be seen in real use. I clarified a bit, let me know if you think it could use more adjustment.

//
// Groups are enclosed by '{' '}'. Returns a list of all the expanded path
// patterns, or an error if the given pattern is invalid.
func ExpandPathPattern(pattern string) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone could also pass something like {...{foo}...} with two million bracket pairs (which I think the API will still accept) and it will kill the goroutine. So there should also be a recursion limit since in that case the number of expanded subpatterns would never go past 1 AFAICT

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
…icate patterns

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
…anding

Signed-off-by: Oliver Calder <oliver.calder@canonical.com>
Copy link
Contributor

@MiguelPires MiguelPires left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@olivercalder
Copy link
Member Author

This should be moved to the interfaces/prompting package, related to #13849 and #13850.

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a quick pass on this, it needs splitting at least once more

// Any '\'-escaped '{', ',', and '}' characters are treated as literals.
//
// If the pattern terminates before a non-escaped '}' is seen, returns an error.
func expandPathPatternRecursively(reader *strings.Reader, seenGroups *int) ([]string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a lot of similar code between this and ExpandPathPattern so it's a bit unclear why ExpandPathPath cannot be smaller/shallower and invoke this one level up?

// Return the substring from the given start index until the index of the
// previous rune read by the reader.
func prevStringFromIndex(reader *strings.Reader, startIndex int) string {
if err := reader.UnreadRune(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: there might be a way using Len and Size to avoid the UnreadRune and Seek?

return fmt.Errorf("invalid path pattern: must start with '/': %q", pattern)
}
depth := 0
totalGroups := 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number of elements in the groups matter too to know how much expansions we will have no?

}

// ValidatePathPattern returns nil if the pattern is valid, otherwise an error.
func ValidatePathPattern(pattern string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit unclear to me why this is separate and not done early or as well early in the expansion function?

priorityDoublestar
priorityGlob
prioritySinglestar
prioritySingleChar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Singlechar is a bit of strange name for this? Anysinglechar? Anychar?

// - /foo/bar*/**/baz
// - /foo/bar*/**/
// - /foo/bar*/**
func GetHighestPrecedencePattern(patterns []string) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the current size splitting is a good idea, this PR as is definitely still too big. There's no rule that the entire content of a file needs to be added in one PR. Also one doesn't need to understand expansion of {} to review a functions that says it must have been done before.

// patterns have been previously expanded using ExpandPathPattern(), so there
// are no groups in any of the patterns.
//
// Below are some sample patterns, in order of precedence, though precedence is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment doesn't say which are higher or lower? I suppose it is higher to lower?

reader := strings.NewReader(pattern)
patternForReader[reader] = pattern
remainingPatterns = append(remainingPatterns, reader)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this is meant to be used on patterns that all match the same file, wouldn't it make sense to as first step remove any common prefix from all the patterns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⛔ Blocked Needs Samuele review Needs a review from Samuele before it can land
Projects
None yet
6 participants