New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude .pyc in sdist output #1533

Merged
merged 2 commits into from Oct 27, 2018

Conversation

Projects
None yet
4 participants
@gokcennurlu
Contributor

gokcennurlu commented Oct 27, 2018

Summary of changes

This restricts the recursive-include setuptools/_vendor to contain only .py and .txt files.

Closes #1414

Pull Request Checklist

  • Changes have tests
  • News fragment added in changelog.d. See documentation for details
@benoit-pierre

This comment has been minimized.

Member

benoit-pierre commented Oct 27, 2018

The issue is really with the recursive-include setuptools/_vendor * rule in MANIFEST.in, so you could use a stricter exclusion: recursive-exclude setuptools/_vendor *.pyc.

@benoit-pierre

This comment has been minimized.

Member

benoit-pierre commented Oct 27, 2018

Or alternatively, fix the original _vendor rule to be less inclusive, like the pkg_resources rule: recursive-include setuptools/_vendor *.py *.txt.

@gokcennurlu

This comment has been minimized.

Contributor

gokcennurlu commented Oct 27, 2018

@benoit-pierre Thank you, I will look into fixing it now!

I spent some time to understand the codepath that leads to manifest & sources.txt generation and tried to figure out if this is just a MANIFEST.in fix or a behaviour change is required in setuptools itself. AFAIK, we are just removing .pyc files in this project, right? In that case, we shouldn't really need a test here.

@benoit-pierre

This comment has been minimized.

Member

benoit-pierre commented Oct 27, 2018

I agree that no test is needed.

@gokcennurlu gokcennurlu force-pushed the bloomberg:dont_copy_pyc branch from fd5a09d to e628b23 Oct 27, 2018

@gokcennurlu gokcennurlu force-pushed the bloomberg:dont_copy_pyc branch from e628b23 to f25c12c Oct 27, 2018

@pganssle

This comment has been minimized.

Member

pganssle commented Oct 27, 2018

We probably don't need a test, though I think to verify the correctness, we should build the sdist twice, so if we assume we have two tarballs, setuptools_before.tar.gz and setuptools_after.tar.gz, then:

tar -tf setuptools_before.tar.gz > s_before.txt
tar -tf setuptools_after.tar.gz > s_after.txt
diff s_before.txt s_after.txt

The result of this should contain only pyc files removed, and:

tar -tf setuptools_after.tar.gz | grep .pyc

Should return nothing.

@pganssle

This comment has been minimized.

Member

pganssle commented Oct 27, 2018

Actually, maybe we should add a test for this, but only in the release script - if the release script could hard-fail if the sdist contains *.pyc files, that's probably enough to prevent us from making invalid releases.

@gokcennurlu

This comment has been minimized.

Contributor

gokcennurlu commented Oct 27, 2018

@pganssle I can verify that the fix works

## Before
$ git clone git@github.com:pypa/setuptools.git before --depth 1 && cd before
$ python2.7 bootstrap.py
adding minimal entry_points
Regenerating egg_info
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
  warnings.warn(msg)
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
running egg_info
writing requirements to setuptools.egg-info/requires.txt
writing dependency_links to setuptools.egg-info/dependency_links.txt
writing entry points to setuptools.egg-info/entry_points.txt
writing manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'setuptools.egg-info/SOURCES.txt'
...and again.
running egg_info
writing requirements to setuptools.egg-info/requires.txt
writing setuptools.egg-info/PKG-INFO
writing top-level names to setuptools.egg-info/top_level.txt
writing dependency_links to setuptools.egg-info/dependency_links.txt
writing entry points to setuptools.egg-info/entry_points.txt
reading manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'setuptools.egg-info/SOURCES.txt'
$ python2.7 setup.py sdist -d dist-before --formats=tar
...

## After
$ git clone git@github.com:pypa/setuptools.git after --depth 1 && cd after
# Assuming that you fetched the PR ref
$ git checkout f25c12c2f60fe524a92282337c999b64d2c8ba91
$ python2.7 bootstrap.py
adding minimal entry_points
Regenerating egg_info
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
  warnings.warn(msg)
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
running egg_info
writing requirements to setuptools.egg-info/requires.txt
writing dependency_links to setuptools.egg-info/dependency_links.txt
writing entry points to setuptools.egg-info/entry_points.txt
writing manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'setuptools.egg-info/SOURCES.txt'
...and again.
running egg_info
writing requirements to setuptools.egg-info/requires.txt
writing setuptools.egg-info/PKG-INFO
writing top-level names to setuptools.egg-info/top_level.txt
writing dependency_links to setuptools.egg-info/dependency_links.txt
writing entry points to setuptools.egg-info/entry_points.txt
reading manifest file 'setuptools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'setuptools.egg-info/SOURCES.txt'
$ python2.7 setup.py sdist -d dist-after --formats=tar
$ cd ..

## Verifying
$ tar -tf before/dist-before/setuptools-40.5.0.post20181027.tar > before.txt
$ tar -tf after/dist-after/setuptools-40.5.0.post20181027.tar > after.txt
$ diff before.txt after.txt
72d71
< setuptools-40.5.0.post20181027/setuptools/_vendor/__init__.pyc
75d73
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/__about__.pyc
77d74
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/__init__.pyc
79d75
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/_compat.pyc
81d76
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/_structures.pyc
85d79
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/specifiers.pyc
88d81
< setuptools-40.5.0.post20181027/setuptools/_vendor/packaging/version.pyc
91d83
< setuptools-40.5.0.post20181027/setuptools/_vendor/six.pyc

I don't know much about the release process though. Where exactly can we add such checks?

@gokcennurlu

This comment has been minimized.

Contributor

gokcennurlu commented Oct 27, 2018

Travis's PyPI plugin already calls sdist and I didn't want to call it manually and possibly intervene what's going to do. Yet, all those commands work based on .egg-info folder's content, I think something like this in travis file would work?

before_deploy:
  - python bootstrap.py
  - ! grep pyc setuptools.egg-info/SOURCES.txt

What do you think? It'd be hard to test this check before actually releasing something, though :D

@pganssle

This comment has been minimized.

Member

pganssle commented Oct 27, 2018

@gokcennurlu I think I'm fine with that, it seems like it does the right thing. If run the commands that the deploy script does locally and it fails on master but succeeds in this PR, I trust it.

Add before_deploy checks for pyc files in TravisCI
This should stop the PyPI release stage if a `.pyc` exists in the
generated SOURCES.txt.
@pganssle

pganssle approved these changes Oct 27, 2018 edited

This looks good to me. I'm a bit worried that the *.py *.txt inclusion mechanism may cause issues in the future if some of the vendored modules ship something other than *.txt and *.py they won't get packaged correctly in the sdist, so I'd rather try excluding *.pyc, but based on the comments I'm guessing that failed for some reason, and that doesn't seem super likely to cause problems.

@jaraco Do you want to give this a once-over?

@jaraco

I'm wondering if this change is happening at the right layer. Let me explain:

  1. When setuptools vendors packages, it grabs the wheels and expands them to setuptools/_vendor, including the .pyc files.
  2. The .pyc files are committed to the repository.
  3. Later, when building an sdist, we avoid including the .pyc files.

I wonder if instead the vendoring logic (in pavement.py) shouldn't delete those files after step 1, so that they're not maintained in the repo.

On the other hand, I suspect that even if the .pyc files are not included in the source repo, they may still be generated when the test suite is run, so could end up in the sdist. In that case, it will still be necessary to avoid including those in the sdist. So this proposed approach is probably fine.

A few other considerations:

What if there are non .py or .txt files in the vendored packages? Are those relevant? I notice that pkg_resources, which also has a _vendor, just relies on the recursive declaration from pkg_resources; perhaps setuptools._vendor should also.

All of these things seem like minor concerns and no reason to block the contrib, so let's proceed and we can iterate as needed.

@jaraco jaraco merged commit 72293ec into pypa:master Oct 27, 2018

5 checks passed

codecov/patch Coverage not affected when comparing bcadaf2...525e5ed
Details
codecov/project 81.45% (+<.01%) compared to bcadaf2
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
deploy/netlify Deploy preview ready!
Details
@gokcennurlu

This comment has been minimized.

Contributor

gokcennurlu commented Oct 27, 2018

run_egg_info() in bootstrap.py seems to be the reason why pycs are there - to bootstrap and run egg_info on itself. I guess we'd not see those pyc files if we used an existing setuptools to run setup.py commands.

@gokcennurlu gokcennurlu deleted the bloomberg:dont_copy_pyc branch Oct 27, 2018

@pganssle

This comment has been minimized.

Member

pganssle commented Nov 12, 2018

Good thing we added the ! grep pyc ... line, since that failed when making the 40.6.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment