Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check ignore matches before Bucket item downloads #337

Merged
merged 3 commits into from
Apr 14, 2021

Conversation

hiddeco
Copy link
Member

@hiddeco hiddeco commented Apr 12, 2021

Fixes #333

This PR makes the BucketReconciler more efficient by looking for
exclusions while downloading files, instead of during the archiving of
the downloaded contents.

It also makes the filtering applied during the archiving
configurable by introducing an optional ArchiveFileFilter
callback argument and a SourceIgnoreFilter implementation.

SourceIgnoreFilter filters out files matching
sourceignore.VCSPatterns and any of the provided patterns.
If an empty gitignore.Pattern slice is given, the matcher is set to
sourceignore.NewDefaultMatcher.

The GitRepositoryReconciler now loads the ignore patterns
before archiving the repository contents by calling
sourceignore.LoadIgnorePatterns and other helpers. The loading
behavior is breaking as .sourceignore files in the (subdirectories of the)
repository are now still taken into account if spec.ignore for a resource
is defined, overwriting is still possible by creating an overwriting rule
in the spec.ignore of the resource.

Signed-off-by: Hidde Beydals <hello@hidde.co>
This commit makes the filtering applied during the archiving
configurable by introducing an optional `ArchiveFileFilter`
callback argument and a `SourceIgnoreFilter` implementation.

`SourceIgnoreFilter` filters out files matching
sourceignore.VCSPatterns and any of the provided patterns.
If an empty gitignore.Pattern slice is given, the matcher is set to
sourceignore.NewDefaultMatcher.

The `GitRepository` now loads the ignore patterns before archiving
the repository contents by calling `sourceignore.LoadIgnorePatterns`
and other helpers. The loading behavior is **breaking** as
`.sourceignore` files in the (subdirectories of the) repository are
now still taken into account if `spec.ignore` for a resource is
defined, overwriting is still possible by creating an overwriting
rule in the `spec.ignore` of the resource.

This change also makes it possible for the `BucketReconciler` to not
configure a callback at all and prevent looking for ignore
matches twice. To finalize the bucket refactor, a change to the
reconciler has been made to look for a `.sourceignore` file in
the root of the bucket to provide an additional way of configuring
(global) exclusions.

Signed-off-by: Hidde Beydals <hello@hidde.co>
@hiddeco hiddeco added area/bucket Bucket related issues and pull requests area/git Git related issues and pull requests enhancement New feature or request area/storage Storage related issues and pull requests labels Apr 13, 2021
@hiddeco hiddeco marked this pull request as ready for review April 13, 2021 14:14
Copy link
Member

@squaremo squaremo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shaping up well I reckon!

return func(p string, fi os.FileInfo) bool {
// The directory is always false as the archiver does already skip
// directories.
return matcher.Match(strings.Split(p, string(filepath.Separator)), false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow went looking for this method but could not find it :-S. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or no, it was not. I came across this method as well but it splits them by filepath.ListSeparator, resulting in e.g. [/a/b/c, /d/f/g] instead of the [a, b, c] we are after for the domain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curses! I have made exactly this mistake three or four times now aaaaa

This commit updates Go to 1.16, a required change because of the use of
`os.WriteFile` in one of the tests introduced by commit
b5004a9.

Normally _just_ this would not justify the change, but given the
introduction of breaking changes (and thereby forcing a MINOR update
anyway), and the various file{system, path} improvements introduced in
Go 1.16 like
[`filepath#WalkDir`](https://golang.org/pkg/path/filepath/#WalkDir),
going ahead with this should be fine.

Signed-off-by: Hidde Beydals <hello@hidde.co>
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @hiddeco

@hiddeco hiddeco merged commit 1494626 into main Apr 14, 2021
@hiddeco hiddeco deleted the efficient-bucket-download branch April 14, 2021 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bucket Bucket related issues and pull requests area/git Git related issues and pull requests area/storage Storage related issues and pull requests enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Bucket Controller to not download entire bucket contents
3 participants