New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"build zipmanifest" should be called only once for each ZipProvider #240

Closed
bb-migration opened this Issue Aug 11, 2014 · 11 comments

Comments

Projects
None yet
1 participant
@bb-migration

bb-migration commented Aug 11, 2014

Originally reported by: jun66j5 (Bitbucket: jun66j5, GitHub: jun66j5)


When Trac 1.0-stable (r13065) with message catalogs on setuptools 5.4 is installed as a egg file, "build zipmanifest" is called more than 4 times on setuptools 5.3. Especally, that's too slow on Windows because zipfile.py on Windows is slow more than 10 times on Unix.

After 5.4, ZipProvider.zipinfo property reads egg file each time. Before 5.3, ZipProvider.__init__ method reads egg file and the result wil be stored in its instance variable.

I think the changes lead the issue. The following patch would be the same behavior of 5.3. Thoughts?

(Originally reported at https://groups.google.com/d/topic/trac-users/gX5kYTUFXM4)

#!diff
--- pkg_resources.py.orig     2014-08-09 22:06:34.877375000 +0900
+++ pkg_resources.py  2014-08-09 22:06:37.533625000 +0900
@@ -1636,7 +1636,11 @@

     @property
     def zipinfo(self):
-        return self._zip_manifests.load(self.loader.archive)
+        try:
+            return self._zipinfo
+        except AttributeError:
+            self._zipinfo = self._zip_manifests.load(self.loader.archive)
+            return self._zipinfo

     def get_resource_filename(self, manager, resource_name):
         if not self.egg_name:

How to reproduce:

#!shell
$ easy_install Babel==0.9.6
$ easy_install Genshi==0.6
$ easy_install http://svn.edgewall.org/repos/trac/branches/1.0-stable
$ python -m cProfile -s time $VENV/bin/trac-admin --version 2>/dev/null | head -15

setuptools 5.4 on Windows:

trac-admin-script.py 1.0.2dev-r13065
         2475815 function calls (2471154 primitive calls) in 13.135 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      187    3.756    0.020    7.826    0.042 zipfile.py:799(_RealGetContents)
   119597    1.124    0.000    1.640    0.000 zipfile.py:287(__init__)
   478400    1.027    0.000    1.027    0.000 {method 'read' of 'cStringIO.StringI' objects}
   119784    0.805    0.000    1.826    0.000 pkg_resources.py:1553(<genexpr>)
   266592    0.531    0.000    0.531    0.000 {method 'append' of 'list' objects}
   119597    0.511    0.000    0.749    0.000 zipfile.py:900(getinfo)
   122860    0.321    0.000    0.321    0.000 {_struct.unpack}
   119597    0.301    0.000    0.301    0.000 zipfile.py:383(_decodeExtra)
149422/148669    0.299    0.000    0.303    0.000 {len}

setuptools 5.3 on Windows:

trac-admin-script.py 1.0.2dev-r13065
         864149 function calls (859551 primitive calls) in 4.985 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       42    0.828    0.020    1.723    0.041 zipfile.py:799(_RealGetContents)
      944    0.275    0.000    0.536    0.001 api.py:316(<genexpr>)
     1218    0.273    0.000    0.534    0.000 api.py:317(<genexpr>)
   133269    0.270    0.000    0.270    0.000 {unichr}
    26466    0.247    0.000    0.361    0.000 zipfile.py:287(__init__)
   105875    0.226    0.000    0.226    0.000 {method 'read' of 'cStringIO.StringI' objects}
      132    0.171    0.001    0.171    0.001 {nt.rename}
    79708    0.157    0.000    0.157    0.000 {method 'append' of 'list' objects}
       42    0.138    0.003    2.206    0.053 pkg_resources.py:1533(build_zipmanifest)

setuptools 5.3 on Ubuntu 12.04:

trac-admin 1.0.2dev-r13065
         845470 function calls (840887 primitive calls) in 0.474 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       44    0.108    0.002    0.184    0.004 zipfile.py:755(_RealGetContents)
    27178    0.033    0.000    0.041    0.000 zipfile.py:277(__init__)
       69    0.028    0.000    0.028    0.000 {method 'get_data' of 'zipimport.zipimporter' objects}
       44    0.015    0.000    0.218    0.005 pkg_resources.py:1533(build_zipmanifest)
   108712    0.014    0.000    0.014    0.000 {method 'read' of 'cStringIO.StringI' objects}
       16    0.013    0.001    0.020    0.001 sre_compile.py:301(_optimize_unicode)
      944    0.012    0.000    0.020    0.000 api.py:316(<genexpr>)
     1218    0.012    0.000    0.019    0.000 api.py:317(<genexpr>)
  706/166    0.010    0.000    0.025    0.000 sre_parse.py:379(_parse)

setuptools 5.3 on Ubuntu 12.04:

trac-admin 1.0.2dev-r13065
         2367413 function calls (2362830 primitive calls) in 1.292 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      189    0.503    0.003    0.859    0.005 zipfile.py:755(_RealGetContents)
   120309    0.151    0.000    0.190    0.000 zipfile.py:277(__init__)
   481236    0.065    0.000    0.065    0.000 {method 'read' of 'cStringIO.StringI' objects}
   120498    0.058    0.000    0.117    0.000 pkg_resources.py:1550(<genexpr>)
   123576    0.035    0.000    0.035    0.000 {_struct.unpack}
   120309    0.031    0.000    0.031    0.000 zipfile.py:368(_decodeExtra)
   120309    0.028    0.000    0.037    0.000 zipfile.py:854(getinfo)
       69    0.028    0.000    0.028    0.000 {method 'get_data' of 'zipimport.zipimporter' objects}
   120381    0.025    0.000    0.025    0.000 {method 'find' of 'str' objects}

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown):


Thanks! There was some work on implementing caching of the zipfile process wide, so the try/catch would not had been needed. This was for cases where several packages are in the same zip. However, It was decided to make that behavior non-default. My guess is that this area slipped by in that decision. @jaraco I can take a look to make sure we don't regress the previous memory issues this weekend and merge if that is ok.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Duplicate of #154.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


This issue appears to me to be an exact duplicate of 154. If there is something that distinguishes it, I'm missing it.

Does enabling the environment variable to enable caching fix the issue?

I agree it would be nice to speed the loading of these structures. Perhaps we should revisit that issue to see if there is a better implementation that could cache the values during startup, but expire them over time to allow the memory to be reclaimed.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by jun66j5 (Bitbucket: jun66j5, GitHub: jun66j5):


Does enabling the environment variable to enable caching fix the issue?

If PKG_RESOURCES_CACHE_ZIP_MANIFESTS environment is enabled, the issue would be fixed and faster than 5.3.

However, I think setuptools 5.4 without the environment should have the same speed of 5.3.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


On further consideration, perhaps you're right. I believe my concerns in #154 about memory usage may have been overly conservative. Looking at the 5.3 code, it seems that the 'zipinfo' was always loaded once for each ZipProvider. It should be reasonable to retain that behavior.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


In my original consideration, I believed that the zipinfo was primarily only used on startup and had little performance impact on a running application. The issues reported in the referenced Trac report strongly suggests otherwise. Therefore, I believe the caching mechanism should be enabled by default.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Use MemoizedZipManifests for all operations. Fixes #240.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Released as 5.7. Please use the PKG_RESOURCES_CACHE_ZIP_MANIFESTS for Setuptools 5.4-5.6.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown):


Actually it is not the same. My fix to #154 was a global cache. This put an instance cache back into the ZipProvider. And indeed, prior to the #154 fix it was an instance cache built in the ZipProvider.init.

#!python


    def __init__(self, module):
        EggProvider.__init__(self, module)
        self.zipinfo = build_zipmanifest(self.loader.archive)
        self.zip_pre = self.loader.archive+os.sep

(See https://bitbucket.org/pypa/setuptools/src/bfbccab83c1d/pkg_resources.py - bfbccab83c1d - 2014-05-27)

In the case of #154, the reporter's deployment mechanism was accessing multiple packages from the same zip, but different ZipProviders (different packages). Thus, each Package's init function would had rebuilt the zip manifest from the same zip multiple times. In retrospect, #154 might had been fixable using weak references just so it wasn't guaranteed that the cached manifest would be around for the duration of the process, but I digress.

In any case with a global cache, an instance cache was no longer needed, so it was removed (10cc90d9b828) as part of the original fix. The zipinfo property then called the "build" mechanism directly each time. What is happening here, is that during the lifetime of the object it would seem that zipinfo gets called four or more times on a given instance. So without the global cache the zip manifest is being rebuilt, not just for the same zip but also the same package multiple times.

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown):


So I guess my question would be are ZipProviders around for the direction of the process. If so, adding an instance cache wouldn't be any better than turning on the global cache and may use more memory. If not, then some memory may be saved by reimplementing an instance cache, and leaving the optional global in place. [Edits in italics]

@bb-migration

This comment has been minimized.

bb-migration commented Aug 15, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


@philip_thiem Good analysis. My suspicion is that ZipProviders are around for the life of the process, but I haven't verified that fact. Do feel free to explore the issue further and provide additional analysis or changes if you believe the performance can be better optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment