Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the IgnoreList logic to be more distutils-like #114

Merged
merged 4 commits into from
May 3, 2020

Conversation

mgedmin
Copy link
Owner

@mgedmin mgedmin commented May 3, 2020

This should eliminate most of the inconsistencies with how distutils treats MANIFEST.in files. (Some inconsistencies remain: distutils allows you to add back previously excluded files.)

The way I went about it:

  • explicitly defined the internal canonical format for lists of file names
  • changed it to use / instead of os.path.sep
  • changed it to omit directories and list only files, which makes ignore handling simpler

To elaborate on the last point, here are some examples:

  1. 'exclude foo' in MANIFEST.in works only on regular files, not directories, but how can we know if all we have is a string?
  2. if you exclude all the files in a directory, distutils will omit that directory from the sdist, but it's difficult to do the same when all you have is a list of strings

Next, I changed IgnoreList to avoid fnmatch and be entirely based on regular expressions, and this time I'm using distutils's own translate_pattern() to produce exactly the same regular expressions that distutils itself will use for matching.

This may introduce some surprising behaviour, like 'global-exclude a*.txt' now matches on suffixes of file names, so it would exclude files named bar.txt! See https://bugs.python.org/issue14106.

Then tests started failing on Python 2.7, but all I get from the __repr__ is a bunch of opaque <_sre.SRE_Pattern object at 0xdeadbeef>, which are impossible to debug. Therefore I'm declaring Python 2.7 support officially dropped!

check-manifests's own configuration (--ignore or [check-manifest] ignore=) is now treated as a global-exclude statement in MANIFEST.in, which means:

  • it's matched anywhere in the file tree
  • it's matched against a filename suffix due to the above-mentioned Python bug
  • it's ignored if it matches a directory

You can ignore directories only by ignoring every file inside it. There's no way currently of specifying that that works for arbitrarily nested trees, but you can try --ignore=dir/*,dir/*/*,dir/*/*/* until you get the result you want.

This decision is not cast in stone: I may in the future change the handling of --ignore to match files and directories, because there's no reason it has to be distutils-compatible.

Expect regressions on Windows. I'll fix any that I find.

Closes #98, #113.

This should eliminate most of the inconsistencies with how distutils
treats MANIFEST.in files.  (Some inconsistencies remain: distutils
allows you to add back previously excluded files.)

The way I went about it:

- explicitly defined the internal canonical format for lists of file
  names

- changed it to use / instead of os.path.sep

- changed it to omit directories and list only files, which makes ignore
  handling simpler

To elaborate on the last point, here are some examples:

1. 'exclude foo' in MANIFEST.in works only on regular files,
   not directories, but how can we know if all we have is a string?

2. if you exclude all the files in a directory, distutils will omit that
   directory from the sdist, but it's difficult to do the same when all
   you have is a list of strings

Next, I changed IgnoreList to avoid fnmatch and be entirely based on
regular expressions, and this time I'm using distutils's own
translate_pattern() to produce exactly the same regular expressions
that distutils itself will use for matching.

This may introduce some surprising behaviour, like 'global-exclude a*.txt'
now matches on suffixes of file names, so it would exclude files named
bar.txt!  See https://bugs.python.org/issue14106.

Then tests started failing on Python 2.7, but all I get from the
__repr__ is a bunch of opaque <_sre.SRE_Pattern object at 0xdeadbeef>,
which are impossible to debug.  Therefore I'm declaring Python 2.7
support officially dropped!

check-manifests's own configuration (--ignore or [check-manifest]
ignore=) is now treated as a global-exclude statement in MANIFEST.in,
which means:

- it's matched anywhere in the file tree
- it's matched against a filename suffix due to the above-mentioned
  Python bug
- it's ignored if it matches a directory

You can ignore directories only by ignoring every file inside it.
There's no way currently of specifying that that works for arbitrarily
nested trees, but you can try --ignore=dir/*,dir/*/*,dir/*/*/*
until you get the result you want.

This decision is not cast in stone: I may in the future change the
handling of --ignore to match files and directories, because there's no
reason it has to be distutils-compatible.

Expect regressions on Windows.  I'll fix any that I find.
@mgedmin mgedmin self-assigned this May 3, 2020
distutils's translate_pattern() expects the paths to use os.path.sep,
not slashes.
Do not call normpath on test data: now our canonical form is using /s.

Remove useless isdir() check in find_suggestions, now that our canonical
filelists contain only files and not directories.
@mgedmin mgedmin merged commit bffd7bf into master May 3, 2020
@mgedmin mgedmin deleted the ignore-logic-rewrite branch May 3, 2020 12:38
@YannickJadoul
Copy link

Nice! Thanks a lot :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignoring dir/* does not ignore directory itself
2 participants