Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glob patterns for data_files #1681

Closed
agateau opened this issue Feb 11, 2019 · 11 comments · Fixed by #2712
Closed

glob patterns for data_files #1681

agateau opened this issue Feb 11, 2019 · 11 comments · Fixed by #2712
Labels
enhancement Needs Discussion Issues where the implementation still needs to be discussed. Needs Investigation Issues which are likely in scope but need investigation to figure out the cause

Comments

@agateau
Copy link

agateau commented Feb 11, 2019

Would be great to be able to use glob patterns for data_files, so that one can write something like this:

[options.data_files]
share/myapp/doc = doc/*.md

This would be consistent with the way package_data works.

@pganssle pganssle added enhancement Needs Triage Issues that need to be evaluated for severity and status. labels Feb 12, 2019
@pganssle
Copy link
Member

I have no great objections to this, though do we need some indication that it's to be interpreted as a glob pattern and not a literal string? Even if we limit it just to supporting *, asterisk is a valid character in a file or folder.

@pganssle pganssle added Needs Discussion Issues where the implementation still needs to be discussed. Needs Investigation Issues which are likely in scope but need investigation to figure out the cause and removed Needs Triage Issues that need to be evaluated for severity and status. labels Feb 17, 2019
@agateau
Copy link
Author

agateau commented Feb 23, 2019

* in file names sounds like a weird edge case to me. It is going to cause trouble for users in many places. I don't think it is worth supporting, but then I haven't checked if it is taken into account in other parts of setuptools.

@pganssle
Copy link
Member

@agateau I'm not really willing to break backwards compatibility by changing the way these strings are interpreted from being paths to being glob patterns. An asterisk is a valid character for a file name, it's definitely something we want to support, regardless of whether it will break in other places as well.

We just need a way to differentiate between "interpret this as a path" and "interpret this as a glob", hopefully it's not terribly difficult.

@hydrargyrum
Copy link

@pganssle * is already interpreted as a wildcard in a number of places, just see the number of occurrences in the documentation all over the place.
It's really not interpreting it as a wildcard in a single place that breaks consistency with all the rest of the API.

@pganssle
Copy link
Member

pganssle commented Apr 7, 2019

@hydrargyrum Like I said above, I'm in favor of adding a way to do this, but it would be a backwards-incompatible change and would be removing the ability to literally specify files with asterisks in their names if we just start interpreting all file paths as glob literals.

As much as I am in favor of making common patterns easier in setup.cfg, it's still pretty easy to use glob.glob in setup.py, whereas if we make this backwards-incompatible change, it will be impossible to get the exact current behavior (likely in either setup.py or setup.cfg).

If we want to make actual progress on this issue, I think we need to come up with possible backwards-compatible interfaces that would allow this to move forward. I think they will broadly fall into four categories:

  1. Special syntax for the specifier string that would not be a valid path on any supported OS, but would allow us to indicate that something is a glob pattern, e.g. ::glob::*.md (not sure if that is an invalid path on all supported OSes, just guessing)
  2. Some sort of wrapper type, like GlobPattern('*.md')
  3. A boolean option toggle defaulting to False that specifies that data_files should be interpreted as glob patterns.
  4. A new option that specifies glob patterns, like data_files_pattern, which is always interpreted as a glob pattern.

Number 1 is probably the best option, but it's not terribly discoverable, and right now at least only data_files would requires this special syntax, which is not great. Number 2 would be great if this were for setup.py only, but I don't think there's any good way to specify a custom wrapper type in setup.cfg.

Numbers 3 and 4 are ugly mainly because you now have two different options to specify the same thing, and it may not be obvious to people what they do.

If anyone has better options or wants to argue for one of these, please do let me know, because none of the options seem terribly appealing to me.

@agateau
Copy link
Author

agateau commented Apr 14, 2019

Number 1 looks like the best option to me as well, because it would work for setup.py and setup.cfg, but this syntax should be usable for all keys were a glob pattern can be used.

While I respect your desire to not break compatibility, I have yet to find a package shipping files with asterisks in their names. I may be wrong on this, but I would be surprised if simply changing the code to accept asterisks would wreak so much havoc. Quoting the Zen of Python:

practicality beats purity.

@eli-schwartz
Copy link
Contributor

The only invalid path on Unix is a null byte (and "/" cannot exist in filenames but validly denotes a directory separator).

It's theoretically possible but highly unlikely people wanted a literal asterisk in their file names, but by the same token that ":" is illegal on Windows, and in fact also illegal even on Unix if you happen to be using fat32 or NTFS without the POSIX namespace (the restriction is part of the filesystem spec)...

... the asterisk is also illegal in filenames on Windows (and at the filesystem layer on fat32 and NTFS).

So, such code depending on literal asterisks would unexpectedly fail on Windows, and also on that one person's USB drive used for copying around a personal python environment from Unix system to Unix system.

One could argue backwards compatibility with something inherently broken is not such an important priority, and even if someone did need that behavior, the asterisk would simply expand to multiple files including the one with the literal asterisk. So the package would collect too many files, but at least the desired file would in fact be there.

If we want to make it super unlikely for anyone to ever stumble across this by using some prefix marker like option 1, I don't think ::glob:: provides additional utility over glob: which would have symmetry with existing directives like file:, attr:, find:, and find_namespace:.

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Jun 27, 2021

I discovered this issue while trying to migrate some other project off of pbr and onto the builtin setup.cfg support (and wondering why the obvious thing did not work). Unfortunately, the project uses pbr's support for

[files]
data_files =
    share/foo = <glob>

so it cannot be mapped to [options.data_files] no matter how much I'd like... I'm not sure if I should propose a change that involves "oh, also reinstate setup.py as the logic driver in order to glob for files with glob.glob()".

@darkvertex
Copy link
Contributor

Recently discovered the same thing and found this thread.

I have begrudgingly reinstituted a setup.py just to do some glob() calls, but I feel a new directive like glob: could be in-line with mannerisms of the setup.cfg syntax.

@darkvertex
Copy link
Contributor

darkvertex commented Jun 30, 2021

So, I've taken a whack at adding a glob: directive: #2712 👀

I feel it's a relatively safe change as the present behaviour has not changed if the new directive is not used.

Open for any constructive feedback from anybody in any case.

@eli-schwartz
Copy link
Contributor

Well, @pganssle's concern was, I think, that people might be using the new directive and actually mean it to be a literal filename instead of a directive... the question is whether that's a practical concern. So yeah, relatively safe.

Thanks for doing the legwork of actually implementing this!

jaraco added a commit that referenced this issue Aug 26, 2021
Implement #1681 (globbing support for `[options.data_files]`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Needs Discussion Issues where the implementation still needs to be discussed. Needs Investigation Issues which are likely in scope but need investigation to figure out the cause
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants