New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not search excluded directories for packages #733

Merged
merged 2 commits into from Sep 22, 2016

Conversation

Projects
None yet
3 participants
@timheap
Contributor

timheap commented Aug 14, 2016

Previously, PackageFinder.find would search the whole directory tree looking for packages, then remove excluded packages from this list. This made building a package very slow under some circumstances where the file tree was large.

In my case, I have a Python project that includes JavaScript and CSS assets. These assets are compiled by a node.js build chain. Every time the Python package is built, setuptools searches the whole (rather large) node_modules tree for python packages, before rejecting the whole tree because node_modules/__init__.py does not exist. As the build process is happening in a Vagrant virtual machine, and VirtualBox shared directories are notoriously slow, this caused the build process for the Python package to take 30+ seconds just to find the packages!

This change stops PackageFinder.find from descending in to directories that will never be included. The public API should be 100% backwards compatible with previous versions. All the current tests pass without any modifications.

I am unsure if I should write new tests for this specific change, as it might require mocking os.walk to check that the excluded directories are not being searched.

@timheap timheap force-pushed the timheap:faster-package-finder branch 2 times, most recently from 8f0f90b to bc1c798 Aug 14, 2016

@timheap timheap force-pushed the timheap:faster-package-finder branch 2 times, most recently from 08ca1bd to e622362 Aug 29, 2016

@timheap

This comment has been minimized.

Contributor

timheap commented Aug 29, 2016

This has been rebased on top of master, and the commit that restricts py.test to make the tests pass again.

@timheap

This comment has been minimized.

Contributor

timheap commented Sep 13, 2016

The code for this patch is complete, barring any feedback. Is there anything more I can do to help this patch along?

@jaraco

This comment has been minimized.

Member

jaraco commented Sep 20, 2016

Let's go for it. Can you add a one more commit to CHANGES.txt explaining in the target version what's changed?

timheap added some commits Aug 14, 2016

Do not search excluded directories for packages
Previously, PackageFinder.find would search the whole directory tree
looking for packages, then remove excluded packages from this list. This
made building a package very slow under some circumstances where the
file tree was large.

This change stops PackageFinder.find from descending in to directories
that will never be included.

@timheap timheap force-pushed the timheap:faster-package-finder branch from e622362 to 75a78dc Sep 22, 2016

@timheap

This comment has been minimized.

Contributor

timheap commented Sep 22, 2016

I've rebased this on master, and added an entry to CHANGES.txt.

@jaraco jaraco merged commit 9d12348 into pypa:master Sep 22, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@timheap

This comment has been minimized.

Contributor

timheap commented Sep 22, 2016

I've just noticed one potential problem with this. With a package set up like:

pkg/
    subpkg/__init_.py
    __init__.py
setup.py

Using the previous find_packages, find_packages(exclude=['pkg']) == ['pkg.subpkg']. I am unsure whether this is desired behaviour, or even results in a valid package. Using the new implementation find_packages(exclude=['pkg']) == [], as pkg was excluded.

I can easily change it to restore the old behaviour, while still keeping (most of) the speed improvements from this patch.

@jaraco

This comment has been minimized.

Member

jaraco commented Sep 26, 2016

I'm unsure how important this behavior is. I suspect a simpler implementation as is now merged would be preferable, assuming there are no complaints. How about instead update the changelog entry to indicate the backward incompatibility, and we'll field it and see if anyone requires the old behavior?

@timheap

This comment has been minimized.

Contributor

timheap commented Sep 26, 2016

I've updated CHANGES.txt, and added a new test for the change in behaviour, in #798.

@bowlofeggs

This comment has been minimized.

bowlofeggs commented Oct 5, 2016

This change breaks backwards compatibility in order to accommodate an edge case, and even then it's just for performance reasons, not correctness. Additionally, the documentation hints that subpackages do get searched:

find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"])

Note that the documentation had tests.* as well as tests. The way I read it, the current release is not aligned with the documented behavior either.

This change also broke the include argument without noting that change in the release notes or documentation. Additionally, projects that want to be able to build on multiple versions of setuptools cannot use include (it doesn't exist on older versions) and so there isn't a way to get find_packages() to find subpackages without also finding unwanted superpackages.

I would like to suggest that this change be reverted, and that setuptools 29 be released with the previous behavior restored.

@bowlofeggs

This comment has been minimized.

bowlofeggs commented Oct 5, 2016

After writing the above comment, I see that @timheap has written a PR that might address my concerns. Thanks!

#809

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment