Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not search excluded directories for packages #733

Merged
merged 2 commits into from Sep 22, 2016

Conversation

mx-moth
Copy link
Contributor

@mx-moth mx-moth commented Aug 14, 2016

Previously, PackageFinder.find would search the whole directory tree looking for packages, then remove excluded packages from this list. This made building a package very slow under some circumstances where the file tree was large.

In my case, I have a Python project that includes JavaScript and CSS assets. These assets are compiled by a node.js build chain. Every time the Python package is built, setuptools searches the whole (rather large) node_modules tree for python packages, before rejecting the whole tree because node_modules/__init__.py does not exist. As the build process is happening in a Vagrant virtual machine, and VirtualBox shared directories are notoriously slow, this caused the build process for the Python package to take 30+ seconds just to find the packages!

This change stops PackageFinder.find from descending in to directories that will never be included. The public API should be 100% backwards compatible with previous versions. All the current tests pass without any modifications.

I am unsure if I should write new tests for this specific change, as it might require mocking os.walk to check that the excluded directories are not being searched.

@mx-moth mx-moth force-pushed the faster-package-finder branch 2 times, most recently from 8f0f90b to bc1c798 Compare August 14, 2016 06:06
@mx-moth mx-moth force-pushed the faster-package-finder branch 2 times, most recently from 08ca1bd to e622362 Compare August 29, 2016 06:32
@mx-moth
Copy link
Contributor Author

mx-moth commented Aug 29, 2016

This has been rebased on top of master, and the commit that restricts py.test to make the tests pass again.

@mx-moth
Copy link
Contributor Author

mx-moth commented Sep 13, 2016

The code for this patch is complete, barring any feedback. Is there anything more I can do to help this patch along?

@jaraco
Copy link
Member

jaraco commented Sep 20, 2016

Let's go for it. Can you add a one more commit to CHANGES.txt explaining in the target version what's changed?

Previously, PackageFinder.find would search the whole directory tree
looking for packages, then remove excluded packages from this list. This
made building a package very slow under some circumstances where the
file tree was large.

This change stops PackageFinder.find from descending in to directories
that will never be included.
@mx-moth
Copy link
Contributor Author

mx-moth commented Sep 22, 2016

I've rebased this on master, and added an entry to CHANGES.txt.

@jaraco jaraco merged commit 9d12348 into pypa:master Sep 22, 2016
@mx-moth
Copy link
Contributor Author

mx-moth commented Sep 22, 2016

I've just noticed one potential problem with this. With a package set up like:

pkg/
    subpkg/__init_.py
    __init__.py
setup.py

Using the previous find_packages, find_packages(exclude=['pkg']) == ['pkg.subpkg']. I am unsure whether this is desired behaviour, or even results in a valid package. Using the new implementation find_packages(exclude=['pkg']) == [], as pkg was excluded.

I can easily change it to restore the old behaviour, while still keeping (most of) the speed improvements from this patch.

@jaraco
Copy link
Member

jaraco commented Sep 26, 2016

I'm unsure how important this behavior is. I suspect a simpler implementation as is now merged would be preferable, assuming there are no complaints. How about instead update the changelog entry to indicate the backward incompatibility, and we'll field it and see if anyone requires the old behavior?

@mx-moth
Copy link
Contributor Author

mx-moth commented Sep 26, 2016

I've updated CHANGES.txt, and added a new test for the change in behaviour, in #798.

@bowlofeggs
Copy link

This change breaks backwards compatibility in order to accommodate an edge case, and even then it's just for performance reasons, not correctness. Additionally, the documentation hints that subpackages do get searched:

find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"])

Note that the documentation had tests.* as well as tests. The way I read it, the current release is not aligned with the documented behavior either.

This change also broke the include argument without noting that change in the release notes or documentation. Additionally, projects that want to be able to build on multiple versions of setuptools cannot use include (it doesn't exist on older versions) and so there isn't a way to get find_packages() to find subpackages without also finding unwanted superpackages.

I would like to suggest that this change be reverted, and that setuptools 29 be released with the previous behavior restored.

@bowlofeggs
Copy link

After writing the above comment, I see that @timheap has written a PR that might address my concerns. Thanks!

#809

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants