-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"import pkg_resources" fails with UnicodeDecodeError while parsing /usr/lib/pymodules/python2.7/rpl-1.5.5.egg-info #719
Comments
Upon reflection I think _version_from_file is a better place to catch the error and return None. |
The stack trace isn't just similar to #531, it's nearly identical... finding the same offending character in the same position. I think your report explains the issue there, and here's what I think was happening:
I suspect the issue is that the offending package wasn't properly packaged (or was packaged with an old or defective build system). I'm not sure we want to patch this by suppressing the error and masking the installed version. I could see augmenting the error to help the user better trace the issue to the source. I'll dig a bit more into rpl to see what we can learn from it. |
Hmm. Rpl is 404 in PyPI and difficult to find anything about. I eventually track it to its sourceforge home, where the latest release is from 2007 and there's an open ticket for exactly this issue. When I downloaded the 1.5.5 tgz file from the project page, I was unable to execute setup.py due to an encoding error in the script (the Perhaps, though, there's still something setuptools could do here beyond crashing with a nicer error message. Perhaps pkg_resources, when opening a metadata file, should use a lenient decoding, perhaps utf-8 with surrogate escapes or with some other replacement. |
I see this was already done, and was apparent in the error message above. |
…ing PKG_RESOURCES_METADATA_ERRORS='replace'. Ref #719.
I've created the issue-719 branch in the repo and pushed a possible workaround. This workaround keeps the existing behavior (fail fast and hard), but provides an environment variable for these environments affected by the issue to bypass it. This mechanism would help maintain the defacto expectation (that packages should be properly encoded), but enable legacy environments or environments with abandoned packages like rpl to continue to function. Thoughts? |
Your solution does provide a remedy for problems such as mine, and I appreciate that. I think, though, that setuptools could be kinder to its users. I don't see much point in sending users off to google what this problem with UnicodeDecodeError and what's the accepted workaround ("oh, so I have to set this environment variable and then it works? thanks mate. But why can't things just work? Sigh..."). I mean, what's the benefit from that virtual legwork? setuptools can just catch the error, consult a hard-coded list of "bad egg-infos", see that the egg-info in question is a known troublemaker, and silently disregard the error. I see why you'd be concerned about incentivizing maintainers of other packages to fix the encoding of the egg-info. For that, I think it sufficient to require that elements are not added to the "bad eggs" list until a ticket is opened against the offending package in the appropriate place. If the package is actively maintained, then the owner is very likely to care that his package is listed in a "hall of shame" such as this, so I think this is incentive enough to fix the root cause. Consider that my workstation's OS has reached end-of-life. Even if rpl were fixed in sourceforge, it would not help me much (I neither have root access to my machine nor do I know how to create and install a custom ubuntu package to replace the egg-info with a good one). There are lots of users who cannot fix the root cause, and I don't see much point in disrupting their work. Hence my suggestion of a "bad eggs" list. |
Or maybe a combination of both of our proposals would be best: have a "bad-eggs" list and allow it to be extended through an environment variable. That way users can work around the problem without involving you (the pkg_resources maintainer). |
You make some good points. I do want to be cognizant of over-complicating the implementation when it may just be one or two packages that are bad eggs. And that leads me to wonder, does it really need to fail fast and hard here, or just be noisy? I've pushed another implementation that I believe will always suppress decoding errors, but will log a warning if such an encoding issue is detected. How does this approach strike you? |
This seems to me like an excellent solution. Thank you! |
My ubuntu has an egg-info that pkg_resources fails to read with a UnicodeDecodeError. This made all sorts of things fail in my virtual environment.
The problematic egg-info file is attached, with an added extension (txt) to fool github into accepting the attachment.
rpl-1.5.5.egg-info.txt
This is with python 2.7.2 and setuptools aad4a69.
To make my virtualenv work properly, I had to patch pkg_resources/init.py, replacing the line:
in EggInfoDistribution._reload_version with this:
I can make a pull request out of this if you think this is a good solution.
BTW, the stack trace looks similar to that of #531 . You can also see multiple other people running into the same problem on stackoverflow.com, here, here, and here.
The text was updated successfully, but these errors were encountered: