Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nested .gitignore files in --pants-ignore-use-gitignore #5682

Closed
stuhood opened this issue Apr 9, 2018 · 18 comments
Closed

Support nested .gitignore files in --pants-ignore-use-gitignore #5682

stuhood opened this issue Apr 9, 2018 · 18 comments
Assignees
Labels
onboarding Issues that affect a new user's onboarding experience

Comments

@stuhood
Copy link
Sponsor Member

stuhood commented Apr 9, 2018

The --pants-ignore option causes pants to ignore files and directories: it uses the same syntax as .gitignore because it is implemented using the ignore crate. The ignore crate also supports consuming .gitignore files from disk, but we're currently not using that support.

This ticket would deal with adding a pants level option next to --pants-ignore that would enable using the .gitignore file in addition to the explicit patterns in --pants-ignore. If the .gitignore file patterns were applied first, this would allow for maximum generality: with that option enabled, --pants-ignore would be a superset of .gitignore, but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

--pants-ignore is declared here:

register('--pants-ignore', advanced=True, type=list, fromfile=True,
default=['.*/', default_rel_distdir],
help='Paths to ignore for all filesystem operations performed by pants '
'(e.g. BUILD file scanning, glob matching, etc). '
'Patterns use the gitignore syntax (https://git-scm.com/docs/gitignore).')
and ends up being used here:
let ignore = create_ignore(&ignore_patterns).map_err(|e| {
format!(
"Could not parse build ignore inputs {:?}: {:?}",
ignore_patterns, e
)
})?;

@Eric-Arellano
Copy link
Contributor

but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

How would this work? Mind writing an example please?

@stuhood
Copy link
Sponsor Member Author

stuhood commented Jan 30, 2020

but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

How would this work? Mind writing an example please?

I think that I was thinking...?:

$ cat .gitignore
ignore/these/**/*.py
$ grep -A2 'pants_ignore' pants.ini
pants_ignore: [
    '!ignore/these/but/not/these/*.py',
  ]

Not a big deal though... not sure the usecase is that important.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Feb 25, 2020

Bumping the priority on this. Filesystem specs make it more important than in the past. E.g., ./pants cloc '**' will now see a lot of files it wouldn't previously have, including in our own repo's case large rust build byproducts etc.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Feb 25, 2020

But note that repos can have non top-level .gitignore (this repo does!) and those are evaluated on the relpath from the .gitignore file, so it's more complicated than just tacking .gitignore lines onto the pants ignore list.

For example, we ignore rust build cruft in src/rust/engine/.gitignore, and it's actually quite important to pants-ignore that cruft.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Feb 25, 2020

Or maybe the ignore crate handles multiple nested .gitignore files naturally already?

@gshuflin gshuflin self-assigned this Feb 26, 2020
@gshuflin
Copy link
Contributor

If I'm reading this documentation correctly, it looks like the ignore crate supports parsing hierarchically-defined .gitignore files: https://docs.rs/ignore/0.4.11/ignore/struct.WalkBuilder.html#method.parents

@stuhood
Copy link
Sponsor Member Author

stuhood commented Mar 16, 2020

Just had a user run into a git-ignored directory that contained BUILD files causing them trouble. This would still be a really good thing to do.

gshuflin added a commit that referenced this issue Apr 10, 2020
### Problem

Cf. issue #5682, we would like pants to pay attention to a top-level .gitignore filee in a pants-controlled repository if the user passes in a `--pants-gitignore` flag.

### Solution

Pants already uses the Rust `ignore` crate to handle explicit ignore strings passed in the command line, so we can modify this code to also read `.gitignore` files when we try to do a file access, read the lines contained therein, and use that to decide whether a path should be ignored or not.
@gshuflin
Copy link
Contributor

#9310 gets us part of the way there. We still need to make pants pay attention to all nested .gitignore files, rather than just a top-level one.

@gshuflin gshuflin changed the title Add an option to prepend .gitignore to --pants-ignore Support nested .gitignore files in --pants-ignore-use-gitignore Apr 24, 2020
@benjyw benjyw removed the engine label Sep 9, 2021
@stuhood stuhood added onboarding Issues that affect a new user's onboarding experience and removed good first issue labels Mar 31, 2022
@thejcannon
Copy link
Member

One of my users hit this when I added a recursive glob on a data directory.

Mapping targets took several minutes and we couldn't figure out why. After it completed I asked what the line output of list was: 383k files 🙈

@benjyw
Copy link
Sponsor Contributor

benjyw commented Apr 16, 2022

I forget why this wasn't easily solved by the ignore crate, but I vaguely remember some difficulty. @gshuflin do you happen to recall?

@gshuflin
Copy link
Contributor

It's definitely been a while since I've thought about this - IIRC, the problem was something like, it was difficult to make the code from the rust ignore crate that handled recursive .gitignore files in a filesystem (code that might've lived somewhere around here: https://github.com/pantsbuild/pants/pull/9310/files#diff-a1af714f77a06a94025988e047198428068ac449953a5c0297977b99567e2d63 ) interact with the virtual filesystem abstraction. It expected being able to traverse a real filesystem, and so we would've had to reimplement the logic that ignore had internally in the context of the pants fs module. And we judged this wasn't worth doing at the time since we were able to modify our top-level .gitignore to ignore the nested files we cared about ignoring.

@thejcannon
Copy link
Member

It's not prohibitively hard to glob for the gitignore files from the virtual FS and then construct the ignores by hand, the only blocker is globbing is async and the scheduler is sync. I don't know Rust well enough to bridge that gap

@thejcannon
Copy link
Member

The alternative is looking the gitignores up dynamically as we try and glob paths, and caching the results.

@stuhood
Copy link
Sponsor Member Author

stuhood commented Apr 16, 2022

There are some comments on the implementing ticket: #9310 (review)

The idea of holding onto ignore state as we descend the tree might be related to some of the optimizations mentioned in #14890, but I haven't thought about this particular problem recently.

@jriddy
Copy link
Contributor

jriddy commented Aug 3, 2022

I found this while looking for a feature request for supporting recursive gitignore files. I'm running into this problem too now. So... +1 for doing this 😅

@fathom-parth
Copy link

another +1...

@Eric-Arellano
Copy link
Contributor

See #18654 :) Thanks @cognifloyd for talking through the strategy and haha thanks ChatGPT4 for helping to fill out the implementation

@stuhood
Copy link
Sponsor Member Author

stuhood commented Sep 18, 2023

This issue has a lot of history on it, which masks the remaining work to be done. I'm going to close it in favor of #19860, which has a more focused description, and some suggested implementation.

@stuhood stuhood closed this as completed Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
onboarding Issues that affect a new user's onboarding experience
Projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants