Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting non-root VCS ignore #304

Open
sorcio opened this issue Jul 2, 2022 · 27 comments
Open

Supporting non-root VCS ignore #304

sorcio opened this issue Jul 2, 2022 · 27 comments

Comments

@sorcio
Copy link

sorcio commented Jul 2, 2022

.gitignore files don't have to live at the root of the working directory:

Patterns read from a .gitignore file in the same directory as the path, or in any parent directory (up to the top-level of the working tree), with patterns in the higher level files being overridden by those in lower level files down to the directory containing the file. These patterns match relative to the location of the .gitignore file.

Hatchling assumes that there is at most one .gitignore file, and that it describes paths relative to the project root. This mismatches the behavior of git in a couple of ways:

  1. If .gitignore is found in any of the ancestors of the project root, the paths are interpreted relative to the wrong path. The issue is also propagated to sdists, because the .gitignore file is copied as-is to the source directory.
Example
my-repo-root
├── .gitignore
└── foobar
    ├── LICENSE.txt
    ├── README.md
    ├── ignore_me
    ├── foobar
    │   ├── __about__.py
    │   └── __init__.py
    ├── pyproject.toml
    └── tests
        └── __init__.py

.gitignore might contain the pattern foobar/ignore_me, which would be effective enough for git. But Hatchling would not be able to apply the pattern to exclude ignore_me. If a file with the same name happened to be in the inner foobar directory, it would be accidentally excluded.

  1. Per-directory .gitignore files are ignored.

I don't use Mercurial anymore, but I imagine that 1 would apply there as well. Mercurial restricts .hgignore files to live at the root of the repo, so there can't be multiple ones. But you could have a project in a sub-directory of a hg repo.

It would be nice if Hatch/Hatchling could do the same thing the VCS does.


I don't know if a solution for 1 would involve, at sdist building time, to rewrite .gitignore to make relative patterns (those with / not at the end) relative to the project directory (or write the modified patterns into the sdist pyproject.toml instead of including the .gitignore file).

Per-directory .gitignores complicate things a bit. It implies that when traversing parents you cannot stop at the first .gitignore you find, because there can be a more general one up above. Traversal should stop when you reach the root of the working tree (i.e. when you find .git). But when you unpack an sdist you don't have a .git.

And I don't even want to start thinking about submodules.

@Julian
Copy link

Julian commented Aug 17, 2022

The current behavior also doesn't account for global .gitignore files, i.e. core.excludesFile and/or ~/.gitconfig.

I haven't thought hard enough but perhaps a guess at a solution would be using git ls-files to collect the list of files and let git manage its own behavior? In Rust there's a whole crate dedicated to all the fun involved in ignore files (used by e.g. fd and rg).

Julian added a commit to python-jsonschema/sphinx-json-schema-spec that referenced this issue Aug 17, 2022
Amongst other reasons, to also specifically avoid pypa/hatch#304.
@pradyunsg
Copy link
Member

Flit uses git ls-files and I can confirm that it more closely matches user expectations. :)

@ofek
Copy link
Sponsor Collaborator

ofek commented Aug 17, 2022

I can add more ignore file support but will never use git or hg directly unless behind an option. I won't spend time on an option like that myself but will accept a PR.

Please keep in mind that a VCS checkout is not guaranteed and is in fact the rarest of all situations. If a project is only published on PyPI in CI/CD then that assumption is met but then fails as soon as a user installs the sdist or attempts are made to build for other ecosystems like Conda or Fedora that use either the sdist on PyPI or the tarball release artifact on GitHub.

@sorcio
Copy link
Author

sorcio commented Aug 20, 2022

Note the issue is not a feature request to strictly mimic the behavior of git. It describes a misbehavior in specific conditions, i.e. when ignore files are not at the root of the project. The problem mostly appears when using sdists so using git in Hatchling is not a general solution.

Maybe Hatch has more context and is more likely to be used from a VCS checkout. So an option could be added there.

Supporting per-directory ignore files would be an additional feature. But it could be added as part of the same solution.

@hynek
Copy link

hynek commented Sep 14, 2022

FWIW, I've almost shipped a package with a .mypy_cache, because git magically ignored it. No, I don't have a global .gitignore anymore and I've stopped using .git/info/exclude. I have no idea why – maybe git got smart!? But I can't rely on git complaining about files lying around and that's spooky.


Would it be possible to maybe statically add certain dirs that never should be shipped like .DS_Store, .tox, .nox, .mypy_cache .pytest_cache, __pycache__? build and dist also come to mind.

I know you're opposed to magic, but this would reduce the amount of package screwups significantly.

@ofek
Copy link
Sponsor Collaborator

ofek commented Sep 18, 2022

#493

build is a setuptools thing

@hynek
Copy link

hynek commented Sep 21, 2022

Can I somehow convince you to add .mypy_cache and .pytest_cache too?

Because as I wrote: my Git is ignoring it by default and I suspect I'm not the only one so this can lead to a lot of .[mypy|pytest]_caches shipped since there's no feedback except double-checking the sdists and I suspect I'm 1 of 5 people world-wide who do that.

@ofek
Copy link
Sponsor Collaborator

ofek commented Sep 21, 2022

Can you find out how Git is ignoring those? It's not for me

@hynek
Copy link

hynek commented Sep 21, 2022

Yeah I've found out.

Both directories are created with an .gitignore file inside of them that contains * as an entry.

For mypy you may have to delete the directory and run it again because it doesn't seem like it's adding the files if the dir has been created by an older version.


I guess the trick for both would be to evaluate .gitignore files recursively? It seems like this is become a common pattern (and it's kinda a good idea tbh).

maresb added a commit to maresb/aesara that referenced this issue Jan 2, 2023
maresb added a commit to maresb/aesara that referenced this issue Jan 3, 2023
@maresb
Copy link
Contributor

maresb commented Jan 6, 2023

This was the only Hatch-related pain point / surprise I encountered while migrating a large codebase (aesara-devs/aesara#1384). It would be great to support recursive .gitignore evaluation as per the previous comment.

@effigies
Copy link

Given that git ls-files won't be supported here, I wonder if it would make sense as an additional feature in hatch-vcs? Setuptools_scm does provide this listing to setuptools, so it seems like a good fit for that plugin. Not sure how complicated that would be, though...

@miccoli
Copy link

miccoli commented May 19, 2023

I was hit by this (unintended?) behaviour. By the way I noticed because the source dist was not reproducible, due to the cached files.

For now I add

[tool.hatch.build]
exclude = [
  ".*_cache",
] 

in every pyproject.toml...

While the best option would be to honor .gitignore files also in child directories, I was wondering if it make sense to allow global exclude patterns in config.toml?

@wimglenn
Copy link
Contributor

More sensible default behavior would be desirable here. This bit me too, I don't keep a .gitignore in the sourcetree unless it's excluding stuff generated specifically by that project. Excludes for .venv, pyc etc all go in global gitignore.

In pytest-dev/pytest-freezer#9 it was pointed out that the sdist on PyPI had a whole venv inside. Oops, embarrassing.. :)

@hynek
Copy link

hynek commented Jun 17, 2023

For everyone running into these problems I’d like to propose https://github.com/hynek/build-and-inspect-python-package for your consideration to build and check your packages in a clean CI with introspection – and then ideally upload from CI using the new PyPI trusted publisher support.

This bug in Hatch has finally pushed me over the edge to take this on a few months ago and it’s basically just adding a workflow to your repo and a bit of configuration on PyPI’s side: https://github.com/python-attrs/attrs/blob/main/.github/workflows/pypi-package.yml

It’s very liberating.

@wimglenn
Copy link
Contributor

@hynek Thank you for that, really nice. Just tried it out and found a couple of interesting things: it didn't work on a windows runner (meh?), and the PKG-INFO inside the sdist didn't get the source timestamp like all the other files (also meh?)

As for hatch, after trying+failing multiple times in a row to configure contents in the .tar.gz and the .whl correctly, I think I'm giving up on it for the moment. I'm all for explicit configuration, but the default configuration should not do things that are obviously wrong.

@ofek
Copy link
Sponsor Collaborator

ofek commented Jun 21, 2023

@wimglenn What is not working with your explicit configuration? The default file inclusion for each target is well documented:

Based on that your project seems to have perfectly fine defaults for wheels. Therefore, you should not be using the global [tool.hatch.build] config but rather [tool.hatch.build.targets.sdist] specifically.

@hynek
Copy link

hynek commented Jun 21, 2023

I have to disagree here. It is not a perfectly fine default to ship files to PyPI that aren't part of the repo and aren't shown in git status – with or without a global .gitignore / .git/info/exclude. To me this is Hatch's biggest footgun by far.

@ofek
Copy link
Sponsor Collaborator

ofek commented Jun 21, 2023

I'm not sure where that response came from? I was talking specifically about that user's project.

@hynek
Copy link

hynek commented Jun 21, 2023

I'm sorry, maybe I've misunderstood you both.

@Julian
Copy link

Julian commented Jun 21, 2023

Just because I'm noticing this now,

Given that git ls-files won't be supported here

I'm not sure where this comes from,from what I see the last @ofek opined on the matter was

I can add more ignore file support but will never use git or hg directly unless behind an option. I won't spend time on an option like that myself but will accept a PR.

so AFAICT, @ofek correct me if I'm wrong, but the status of this issue is still "PR welcome to add an option, at least for part of the problem"?

@effigies
Copy link

FWIW I have a PoC patch against hatch-vcs that uses setuptools_scm to determine the file listing from git/hg: ofek/hatch-vcs#40

It seems to be held up by a setuptools_scm refactor, but it can be used now.

@ofek
Copy link
Sponsor Collaborator

ofek commented Jun 21, 2023

so AFAICT, @ofek correct me if I'm wrong, but the status of this issue is still "PR welcome to add an option, at least for part of the problem"?

that is correct

@wimglenn
Copy link
Contributor

wimglenn commented Jun 21, 2023

@ofek There is nothing wrong with that explicit configuration: when using [tool.hatch.build.targets.sdist] it does the correct thing.

My complaint was that the default configuration is not clever enough to know that packing .venv and pycs inside the sdist is obviously wrong. Then I've fumbled, trying to fix the sdist and inadvertently messed up the wheel and hatch doesn't know that putting site-packages/pyproject.toml and site-packages/README.rst etc in the wheel is also obviously wrong.

User error? Yes. But also perhaps a bit too easy to make mistakes without noticing.

@warsaw
Copy link
Contributor

warsaw commented Sep 7, 2023

If .gitignore is found in any of the ancestors of the project root, the paths are interpreted relative to the wrong path. The issue is also propagated to sdists, because the .gitignore file is copied as-is to the source directory.

I've been scratching my head at something related to this... I think.

I'm working in a monorepo where subpackages live under a libs/<subpackage> directory until my repo root. At the repo root, I have a .gitignore and one of the subpackages also has a .gitignore. In a different subpackage, I have the following in my pyproject.toml file:

[tool.hatch.build.targs.sdist]
exclude = [
    '.gitignore',
]

but when I run hatch build from the subpackage directory, the top level .gitignore file always shows up in my sdist. Since there's no .gitignore in this subpackage, I didn't understand why the root .gitignore got added to my .tar.gz sdist file.

FTR, I've tried various combinations of exclude patterns and also tried [tool.hatch.build] section but that top level .gitignore always gets added.

Maybe that's intended behavior based on the docs note that .gitignore cannot be excluded, but it was certainly confusing that a file from outside my subpackage directory -- where I explicitly run hatch build from -- gets copied into my sdist.

@sorcio
Copy link
Author

sorcio commented Sep 8, 2023

when I run hatch build from the subpackage directory, the top level .gitignore file always shows up in my sdist. Since there's no .gitignore in this subpackage, I didn't understand why the root .gitignore got added to my .tar.gz sdist file.

Yes, it's the same issue. It's not literally that the file is copied (so it doesn't matter if it's excluded). Hatch reads the rules, and applies them to a newly-generated .gitignore file. The effect is the same. There are at least a couple issues with this, discussed above: the behavior is surprising; and rules meant for a different path are applied without correction, so it's also wrong.


Although I originally reported this, I'm unlikely to work on this issue because I'm not using Hatch anymore.

LEW21 added a commit to LEW21/python-mundane that referenced this issue Apr 22, 2024
pypa/hatch#304 would make it unnecessary.
petuzk added a commit to petuzk/predeq that referenced this issue Jun 2, 2024
I've accidentally noticed that my .vscode/settings.json got included
into source distribution, even though it's ignored by git
as configured in my global config. This came as a surprise,
but appears to be a design decision (see pypa/hatch#304).

Switch to flit which by default only includes the package contents,
readme and license.
@petuzk
Copy link

petuzk commented Jun 2, 2024

I was exploring the content of my project's sdist, and discovered this issue as well. In my case my editor's config, which is excluded from git globally, got sucked in. It had been configured this way for so long I didn't even remember about it.

While it's true that this behavior is documented under VCS section, I unconsciously made the assumption that only tracked files are selected. It seems it's not just me who made this assumption, so I'd appreciate if you could add a note there explaining this limitation.

There's also a contradictory information given in sdist builder's documentation:

When the user has not set any file selection options, all files that are not ignored by your VCS will be included.

It links to the mentioned VCS section, but still, this statement introduces some confusion. It's not VCS who determines if the file is ignored, but Hatch itself.

@ChrisBarker-NOAA
Copy link

ChrisBarker-NOAA commented Jul 31, 2024

What's the status here? It seems to me that respecting .gitignore, but not embedded .gitignore is a major miss-feature. I agree that requiring git at run-time is not great so:

  1. If we add recursive .gitignore, that would help a lot, and seems to be not that heavy a lift -- I'll try to take a look at the code soon, and see if I can put together a PR.

  2. respecting global git setting would be good, but I think less critical -- maybe that's because I don't make much use of it -- and because a project with more than one person working on it really should put everything in the project itself, and it's not that hard to add it to excludes.

would a PR for (1) be considered?

NOTE: I took a quick look at the source, and can't find where in the code the filelist for the sdist is created -- so I'll need a pointer ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests