-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Shared extractor: support file path globs #13969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general, let's ensure the wildcards don't match path separators. For example *JsonFile
should not match test/SampleJsonFile
.
Make sure to test path separator handling on Windows too.
// Construct a single globset containing all language globs, | ||
// and a mapping from glob index to language index. | ||
let (globset, glob_language_mapping) = { | ||
let mut builder = GlobSetBuilder::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set the literal_separator feature, so path separators do not match the *
and ?
wildcards.
let mut builder = GlobSetBuilder::new(); | |
let mut builder = GlobSetBuilder::new().literal_separator(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now why I didn't do this:
One surprising aspect of this change is that the globs match against the
whole path, rather than just the file name.
If we prevent *
from matching file separators, then *.txt
will match foo.txt
but not bar/foo.txt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but if you don't want that then you should match the globs against the whole paths. Instead of globset.matches(&path);
we could do something like globset.matches(&path.filename());
Alternatively, users could add **/
prefixes to their globs.
2409b71
to
a9115c5
Compare
Replace the `file_extensions` field with `file_globs`, which supports UNIX style glob patterns powered by the `globset` crate. This allows files with no extension (e.g. Dockerfiles) to be extracted, by specifying a glob such as `*Dockerfile`. One surprising aspect of this change is that the globs match against the whole path, rather than just the file name. This is a breaking change.
a9115c5
to
60f2506
Compare
5645322
to
3680613
Compare
Update the tree-sitter extractor to support file globs. This replaces the existing
file_extensions
field with afile_globs
field, which supports UNIX style glob patterns powered by theglobset
crate.This allows files with no extension (e.g. Dockerfiles) to be extracted
by specifying a glob such as
*Dockerfile
.One surprising aspect of this change is that the globs match against the
whole path, rather than just the file name. I'm not sure if this is an issue we should work around, or if it's OK.
@aibaars I'd be interested in your thoughts on this.
Fixes #13964