New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"build zipmanifest" should be called only once for each ZipProvider #240
Comments
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Thanks! There was some work on implementing caching of the zipfile process wide, so the try/catch would not had been needed. This was for cases where several packages are in the same zip. However, It was decided to make that behavior non-default. My guess is that this area slipped by in that decision. @jaraco I can take a look to make sure we don't regress the previous memory issues this weekend and merge if that is ok. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): This issue appears to me to be an exact duplicate of 154. If there is something that distinguishes it, I'm missing it. Does enabling the environment variable to enable caching fix the issue? I agree it would be nice to speed the loading of these structures. Perhaps we should revisit that issue to see if there is a better implementation that could cache the values during startup, but expire them over time to allow the memory to be reclaimed. |
Original comment by jun66j5 (Bitbucket: jun66j5, GitHub: jun66j5):
If PKG_RESOURCES_CACHE_ZIP_MANIFESTS environment is enabled, the issue would be fixed and faster than 5.3. However, I think setuptools 5.4 without the environment should have the same speed of 5.3. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): On further consideration, perhaps you're right. I believe my concerns in #154 about memory usage may have been overly conservative. Looking at the 5.3 code, it seems that the 'zipinfo' was always loaded once for each ZipProvider. It should be reasonable to retain that behavior. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): In my original consideration, I believed that the zipinfo was primarily only used on startup and had little performance impact on a running application. The issues reported in the referenced Trac report strongly suggests otherwise. Therefore, I believe the caching mechanism should be enabled by default. |
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Actually it is not the same. My fix to #154 was a global cache. This put an instance cache back into the ZipProvider. And indeed, prior to the #154 fix it was an instance cache built in the ZipProvider.init.
(See https://bitbucket.org/pypa/setuptools/src/bfbccab83c1d/pkg_resources.py - bfbccab83c1d - 2014-05-27) In the case of #154, the reporter's deployment mechanism was accessing multiple packages from the same zip, but different ZipProviders (different packages). Thus, each Package's init function would had rebuilt the zip manifest from the same zip multiple times. In retrospect, #154 might had been fixable using weak references just so it wasn't guaranteed that the cached manifest would be around for the duration of the process, but I digress. In any case with a global cache, an instance cache was no longer needed, so it was removed (10cc90d9b828) as part of the original fix. The zipinfo property then called the "build" mechanism directly each time. What is happening here, is that during the lifetime of the object it would seem that zipinfo gets called four or more times on a given instance. So without the global cache the zip manifest is being rebuilt, not just for the same zip but also the same package multiple times. |
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): So I guess my question would be are ZipProviders around for the direction of the process. If so, adding an instance cache wouldn't be any better than turning on the global cache and may use more memory. If not, then some memory may be saved by reimplementing an instance cache, and leaving the optional global in place. [Edits in italics] |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): @philip_thiem Good analysis. My suspicion is that ZipProviders are around for the life of the process, but I haven't verified that fact. Do feel free to explore the issue further and provide additional analysis or changes if you believe the performance can be better optimized. |
Originally reported by: jun66j5 (Bitbucket: jun66j5, GitHub: jun66j5)
When Trac 1.0-stable (r13065) with message catalogs on setuptools 5.4 is installed as a egg file, "build zipmanifest" is called more than 4 times on setuptools 5.3. Especally, that's too slow on Windows because zipfile.py on Windows is slow more than 10 times on Unix.
After 5.4,
ZipProvider.zipinfo
property reads egg file each time. Before 5.3,ZipProvider.__init__
method reads egg file and the result wil be stored in its instance variable.I think the changes lead the issue. The following patch would be the same behavior of 5.3. Thoughts?
(Originally reported at https://groups.google.com/d/topic/trac-users/gX5kYTUFXM4)
How to reproduce:
setuptools 5.4 on Windows:
setuptools 5.3 on Windows:
setuptools 5.3 on Ubuntu 12.04:
setuptools 5.3 on Ubuntu 12.04:
The text was updated successfully, but these errors were encountered: