New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a way to ingore specific dirs when building list of files in egg_info #249

Closed
bb-migration opened this Issue Aug 26, 2014 · 7 comments

Comments

Projects
None yet
2 participants
@bb-migration

bb-migration commented Aug 26, 2014

Originally reported by: ionelmc (Bitbucket: ionelmc, GitHub: ionelmc)


I have a very large .tox dir, i'd like to have a way so that setuptools ignores it completely. Now it recurses inside it and takes few minutes to run egg_info and other commands that rely on distutils.filelist.listall


@bb-migration

This comment has been minimized.

bb-migration commented Apr 15, 2015

Original comment by reinout (Bitbucket: reinout, GitHub: reinout):


I had a similar problem and finally tracked it down: symlinks inside my project to a different filesystem when running inside a local vmware machine. (Background: buildout's "omelette" recipe that provides symlinks inside a project's parts/omelette folder).

But I've seen the same slowness when there's a filesystem cache directory inside the project. And a similar one with a huge number of javascript dependencies that are downloaded locally into the project folder (via 'bower' and 'npm').

Slowness all because setuptools wants to create a full list of all files. Even though, in my case, there is a MANIFEST.in that tells it to grab the local *.rst files and everything in one specific subdirectory.

@bb-migration

This comment has been minimized.

bb-migration commented May 8, 2015

Original comment by zdexter (Bitbucket: zdexter, GitHub: zdexter):


I am also experiencing this issue. python3 setup.py develop in a directory containing many git submodules stats all the submodules.

@bb-migration

This comment has been minimized.

bb-migration commented Nov 16, 2015

Original comment by reinout (Bitbucket: reinout, GitHub: reinout):


Note that here's a monkeypatch you can apparently apply locally: https://mail.python.org/pipermail/distutils-sig/2015-April/026175.html

It is starting to look attractive :-) But I don't feel like that's the best way.

Would it be possible to automatically exclude at least directories in setup.py's directory when they're excluded in the MANIFEST.in? It would mean some of the exclusion-code would run before and after walking the directory, but that's preferable (to me) to waiting a couple of minutes or even an hour.

Yes, I've had projects where "setup.py develop ." took an hour! One of the subdirectories was an remote CIFS mount on storage that's used for archive storage.

@bb-migration

This comment has been minimized.

bb-migration commented Nov 16, 2015

Original comment by reinout (Bitbucket: reinout, GitHub: reinout):


See #450, where there's an effort at fixing it.

@bb-migration

This comment has been minimized.

bb-migration commented Mar 17, 2016

Original comment by reinout (Bitbucket: reinout, GitHub: reinout):


I've got a monkeypatch (https://gist.github.com/reinout/b453e6fa98289b1a1983) which you can import from the top of your setup.py. This fixes the slowness.

See http://reinout.vanrees.org/weblog/2016/03/17/setuptools-speed.html

@bb-migration

This comment has been minimized.

bb-migration commented Mar 18, 2016

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Issue #450 was marked as a duplicate of this issue.

@spookylukey

This comment has been minimized.

Contributor

spookylukey commented May 28, 2016

I'm thinking that a totally new approach to handling the MANIFEST.in file is the answer.

The current approach (inherited from distutils) is to recursively list all the files, bringing the whole lot into memory, and then apply MANIFEST.in commands to that list. This is the problem, and the fix is "don't do that".

It is just the wrong approach. It gives you a small speedup in one case - when you have very few files, and you end up processing MANIFEST.in directives that would cause multiple walks over the directory structure. However, in this case, OS caching means that you the second walk is usually going to be very fast.

A far better approach is to just handle MANIFEST.in directives directly - for example, a recursive-include foo *.py would just cause the foo directory to be recursively walked looking for *.py files. This sounds 'dumb', but in reality is going to beat the current approach 99.99% of the time, and by a massive factor in many cases.

There are two ways to go about this:

  1. Create a class that inherits from distutils.filelist.FileList and then change the implementation to not use allfiles at all, but have include_pattern and exclude_pattern search the filesystem directly. This will break whenever other code assumes the current implementation of FileList e.g. the existence of the allfiles attribute. setuptools does this in one place, it would also have to be fixed, I think it should be possible relatively easily. (The offending code is setuptools.command.egg_info.manifest_maker._add_egg_info, and I think the fix is actually to remove the whole of the function and the one place it is called. It seems that the entire function is actually a workaround for the fact that include_pattern scans not the filesystem but the internally buffered version of the filesystem). However, there might be other issues that crop up.

  2. Start from scratch, with a test-driven approach to implementing MANIFEST.in directive parsing.It should be possible to do this relatively easily - the language is not that complicated. Further, it shouldn't be that hard to do it with plenty of unit tests etc., especially if the code which actually walks the file system is done using a separate class for which a mock version can be supplied.

My suggestion is that this should be implemented within setuptools. If successful, it could be submitted to distutils etc. as a replacement implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment