Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude .pyc in sdist output #1533

Merged
merged 2 commits into from Oct 27, 2018
Merged

Exclude .pyc in sdist output #1533

merged 2 commits into from Oct 27, 2018

Conversation

gokcennurlu
Copy link
Contributor

@gokcennurlu gokcennurlu commented Oct 27, 2018

Summary of changes

This restricts the recursive-include setuptools/_vendor to contain only .py and .txt files.

Closes #1414

Pull Request Checklist

  • Changes have tests
  • News fragment added in changelog.d. See documentation for details

@benoit-pierre
Copy link
Member

The issue is really with the recursive-include setuptools/_vendor * rule in MANIFEST.in, so you could use a stricter exclusion: recursive-exclude setuptools/_vendor *.pyc.

@benoit-pierre
Copy link
Member

benoit-pierre commented Oct 27, 2018

Or alternatively, fix the original _vendor rule to be less inclusive, like the pkg_resources rule: recursive-include setuptools/_vendor *.py *.txt.

@gokcennurlu
Copy link
Contributor Author

@benoit-pierre Thank you, I will look into fixing it now!

I spent some time to understand the codepath that leads to manifest & sources.txt generation and tried to figure out if this is just a MANIFEST.in fix or a behaviour change is required in setuptools itself. AFAIK, we are just removing .pyc files in this project, right? In that case, we shouldn't really need a test here.

@benoit-pierre
Copy link
Member

I agree that no test is needed.

@pganssle
Copy link
Member

We probably don't need a test, though I think to verify the correctness, we should build the sdist twice, so if we assume we have two tarballs, setuptools_before.tar.gz and setuptools_after.tar.gz, then:

tar -tf setuptools_before.tar.gz > s_before.txt
tar -tf setuptools_after.tar.gz > s_after.txt
diff s_before.txt s_after.txt

The result of this should contain only pyc files removed, and:

tar -tf setuptools_after.tar.gz | grep .pyc

Should return nothing.

@pganssle
Copy link
Member

Actually, maybe we should add a test for this, but only in the release script - if the release script could hard-fail if the sdist contains *.pyc files, that's probably enough to prevent us from making invalid releases.

@gokcennurlu
Copy link
Contributor Author

@pganssle I can verify that the fix works

## Before
$ git clone git@github.com:pypa/setuptools.git before --depth 1 && cd before
$ python2.7 bootstrap.py
adding minimal entry_points
Regenerating egg_info
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
  warnings.warn(msg)
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
running egg_info
writing requirements to setuptools.egg-info/requires.txt
writing dependency_links to setuptools.egg-info/dependency_links.txt
writing entry points to setuptools.egg-info/entry_points.txt
writing manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'setuptools.egg-info/SOURCES.txt'
...and again.
running egg_info
writing requirements to setuptools.egg-info/requires.txt
writing setuptools.egg-info/PKG-INFO
writing top-level names to setuptools.egg-info/top_level.txt
writing dependency_links to setuptools.egg-info/dependency_links.txt
writing entry points to setuptools.egg-info/entry_points.txt
reading manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'setuptools.egg-info/SOURCES.txt'
$ python2.7 setup.py sdist -d dist-before --formats=tar
...

## After
$ git clone git@github.com:pypa/setuptools.git after --depth 1 && cd after
# Assuming that you fetched the PR ref
$ git checkout f25c12c2f60fe524a92282337c999b64d2c8ba91
$ python2.7 bootstrap.py
adding minimal entry_points
Regenerating egg_info
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
  warnings.warn(msg)
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
running egg_info
writing requirements to setuptools.egg-info/requires.txt
writing dependency_links to setuptools.egg-info/dependency_links.txt
writing entry points to setuptools.egg-info/entry_points.txt
writing manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'setuptools.egg-info/SOURCES.txt'
...and again.
running egg_info
writing requirements to setuptools.egg-info/requires.txt
writing setuptools.egg-info/PKG-INFO
writing top-level names to setuptools.egg-info/top_level.txt
writing dependency_links to setuptools.egg-info/dependency_links.txt
writing entry points to setuptools.egg-info/entry_points.txt
reading manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'setuptools.egg-info/SOURCES.txt'
$ python2.7 setup.py sdist -d dist-after --formats=tar
$ cd ..

## Verifying
$ tar -tf before/dist-before/setuptools-40.5.0.post20181027.tar > before.txt
$ tar -tf after/dist-after/setuptools-40.5.0.post20181027.tar > after.txt
$ diff before.txt after.txt
72d71
< setuptools-40.5.0.post20181027/setuptools/_vendor/__init__.pyc
75d73
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/__about__.pyc
77d74
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/__init__.pyc
79d75
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/_compat.pyc
81d76
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/_structures.pyc
85d79
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/specifiers.pyc
88d81
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/version.pyc
91d83
< setuptools-40.5.0.post20181027/setuptools/_vendor/six.pyc

I don't know much about the release process though. Where exactly can we add such checks?

@gokcennurlu
Copy link
Contributor Author

Travis's PyPI plugin already calls sdist and I didn't want to call it manually and possibly intervene what's going to do. Yet, all those commands work based on .egg-info folder's content, I think something like this in travis file would work?

before_deploy:
  - python bootstrap.py
  - ! grep pyc setuptools.egg-info/SOURCES.txt

What do you think? It'd be hard to test this check before actually releasing something, though :D

@pganssle
Copy link
Member

@gokcennurlu I think I'm fine with that, it seems like it does the right thing. If run the commands that the deploy script does locally and it fails on master but succeeds in this PR, I trust it.

This should stop the PyPI release stage if a `.pyc` exists in the
generated SOURCES.txt.
Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'm a bit worried that the *.py *.txt inclusion mechanism may cause issues in the future if some of the vendored modules ship something other than *.txt and *.py they won't get packaged correctly in the sdist, so I'd rather try excluding *.pyc, but based on the comments I'm guessing that failed for some reason, and that doesn't seem super likely to cause problems.

@jaraco Do you want to give this a once-over?

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this change is happening at the right layer. Let me explain:

  1. When setuptools vendors packages, it grabs the wheels and expands them to setuptools/_vendor, including the .pyc files.
  2. The .pyc files are committed to the repository.
  3. Later, when building an sdist, we avoid including the .pyc files.

I wonder if instead the vendoring logic (in pavement.py) shouldn't delete those files after step 1, so that they're not maintained in the repo.

On the other hand, I suspect that even if the .pyc files are not included in the source repo, they may still be generated when the test suite is run, so could end up in the sdist. In that case, it will still be necessary to avoid including those in the sdist. So this proposed approach is probably fine.

A few other considerations:

What if there are non .py or .txt files in the vendored packages? Are those relevant? I notice that pkg_resources, which also has a _vendor, just relies on the recursive declaration from pkg_resources; perhaps setuptools._vendor should also.

All of these things seem like minor concerns and no reason to block the contrib, so let's proceed and we can iterate as needed.

@jaraco jaraco merged commit 72293ec into pypa:master Oct 27, 2018
@gokcennurlu
Copy link
Contributor Author

run_egg_info() in bootstrap.py seems to be the reason why pycs are there - to bootstrap and run egg_info on itself. I guess we'd not see those pyc files if we used an existing setuptools to run setup.py commands.

@gokcennurlu gokcennurlu deleted the dont_copy_pyc branch October 27, 2018 22:37
@pganssle
Copy link
Member

Good thing we added the ! grep pyc ... line, since that failed when making the 40.6.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants