Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbundling logic still breaks downstream. #2870

Closed
Lukasa opened this issue Nov 10, 2015 · 29 comments
Closed

Unbundling logic still breaks downstream. #2870

Lukasa opened this issue Nov 10, 2015 · 29 comments

Comments

@Lukasa
Copy link
Member

Lukasa commented Nov 10, 2015

Originally reported by @fasaxc. Interested parties: @sigmavirus24 @dcramer @eriol @ralphbean @warsaw

We added support for a new form of unbundling logic in #2567. Unfortunately, this seems not to work properly. Specifically, the presence of requests in a system makes it impossible to catch urllib3 exceptions correctly: see this Stack Overflow question for more.

This problem seems to specifically affect the urllib3 sub-packages, not urllib3 itself. One proposed change is to add every urllib3 sub-package that gets imported to the list of modules requests explicitly imports in this stub file. This is pretty gross, but might work. Does anyone have other suggestions?

@fasaxc
Copy link

fasaxc commented Nov 10, 2015

The issue I'm seeing is that, after importing first urllib3.exceptions then requests.package.urllib3.exceptions

sys.modules["urllib3"].exceptions is not sys.modules["urllib3.exceptions"]
sys.modules["urllib3"].exceptions is sys.modules["requests.packages.urllib3.exceptions"]

Importing requests.package.urllib3.exceptions results in the exceptions module being loaded a second time because patching sys.modules["requests.packages.urllib3"] doesn't also patch sys.modules["requests.packages.urllib3.exceptions"] so the import mechanism loads a second copy. Then, it assigns

modules["requests.packages.urllib3"].exceptions = sys.modules["requests.packages.urllib3.exceptions"]

but modules["requests.packages.urllib3"] is an alias for modules["urllib3"] so it clobbers modules["urllib3"].exceptions.

@Lukasa
Copy link
Member Author

Lukasa commented Nov 10, 2015

So we definitely broke this. On the other hand, I don't really understand the Python import logic well enough to effectively fix this problem. Thoughts?

@sigmavirus24
Copy link
Contributor

See also https://github.com/kennethreitz/requests/issues/2867

That SO question was about 2.6.0 I thought. Is this still a problem with 2.8.x?

@Lukasa
Copy link
Member Author

Lukasa commented Nov 10, 2015

Oh that's a good point, did this get backported into the distros?

@sigmavirus24
Copy link
Contributor

So 2.6.x has the old VendorAlias logic from pip. 2.7.0 had nothing but some distros (Debian specifically, more specifically @warsaw) backported the patch from @untitaker that was released in 2.8.0. So version information is very important here.

@Lukasa
Copy link
Member Author

Lukasa commented Nov 10, 2015

My recollection is that @fasaxc was using 2.7.0 from Ubuntu 14.04 trusty-liberty, but I may be mistaken.

@sigmavirus24
Copy link
Contributor

Ubuntu may have pulled in Debian's patch from @warsaw. In that case, this is still relevant. (Alternatively, Ubuntu may have continued using the VendorAlias from 2.6.x and we should check that.)

@untitaker
Copy link
Contributor

If it turns out that Debian actually does have the new logic, I'd say it's a good time to think about actually giving up on vendoring urllib3.

On 10 November 2015 16:20:13 CET, Ian Cordasco notifications@github.com wrote:

Ubuntu may have pulled in Debian's patch from @warsaw. In that case,
this is still relevant. (Alternatively, Ubuntu may have continued using
the VendorAlias from 2.6.x and we should check that.)


Reply to this email directly or view it on GitHub:
https://github.com/kennethreitz/requests/issues/2870#issuecomment-155449214

Sent from my phone. Please excuse my brevity.

@Lukasa
Copy link
Member Author

Lukasa commented Nov 10, 2015

@untitaker I can understand that, but I don't believe we can really discuss doing that unless we get the whole team together, which realistically means PyCon 2016.

@warsaw
Copy link

warsaw commented Nov 10, 2015

On Nov 10, 2015, at 07:20 AM, Ian Cordasco wrote:

Ubuntu may have pulled in Debian's patch from @warsaw. In that case, this is
still relevant. (Alternatively, Ubuntu may have continued using the
VendorAlias from 2.6.x and we should check that.)

Ubuntu tracks the Debian version. There are no deltas, except for some
additional fixes backported to the Trusty version.

% rmadison -u debian requests
requests | 0.12.1-1bpo60+1 | squeeze-backports | source
requests | 0.12.1-1+deb7u1 | oldstable | source
requests | 2.0.0-1
bpo70+2 | wheezy-backports | source
requests | 2.4.3-6 | stable | source
requests | 2.4.3-6 | stable-kfreebsd | source
requests | 2.8.1-1 | testing | source
requests | 2.8.1-1 | unstable | source

% rmadison -u ubuntu requests
requests | 0.8.2-1 | precise/universe | source
requests | 2.2.1-1 | trusty | source
requests | 2.2.1-1ubuntu0.2 | trusty-security | source
requests | 2.2.1-1ubuntu0.3 | trusty-updates | source
requests | 2.4.3-6 | vivid | source
requests | 2.7.0-3 | wily | source
requests | 2.7.0-3 | xenial | source
requests | 2.8.1-1 | xenial-proposed | source

@sigmavirus24
Copy link
Contributor

trusty-liberty is 14.04's OpenStack Liberty Archive then?

@Lukasa
Copy link
Member Author

Lukasa commented Nov 10, 2015

I believe so, yes.

@warsaw
Copy link

warsaw commented Nov 10, 2015

On Nov 10, 2015, at 06:24 AM, Cory Benfield wrote:

Oh that's a good point, did this get backported into the distros?

Don't forget, @eriol is really doing most of excellent work on this package
for Debian!

Our 2.8.1-1 removed the devendorizing patch we were carrying separately, so
these days we are really only carrying a few deltas from upstream:

https://anonscm.debian.org/cgit/python-modules/packages/requests.git/tree/debian/patches

Of course, we prefer to reduce the differences between the Debian version and
upstream.

@Lukasa
Copy link
Member Author

Lukasa commented Nov 10, 2015

Hmm. Can we attempt to discover whether this problem exists in Debian's released packages for 2.8.x?

@untitaker
Copy link
Contributor

Lukasa, who do you mean exactly when you say "the whole team"? Isn't Kenneth some sort of BDFL, so his decision would suffice?

On 10 November 2015 16:51:56 CET, Barry Warsaw notifications@github.com wrote:

On Nov 10, 2015, at 06:24 AM, Cory Benfield wrote:

Oh that's a good point, did this get backported into the distros?

Don't forget, @eriol is really doing most of excellent work on this
package
for Debian!

Our 2.8.1-1 removed the devendorizing patch we were carrying
separately, so
these days we are really only carrying a few deltas from upstream:

https://anonscm.debian.org/cgit/python-modules/packages/requests.git/tree/debian/patches

Of course, we prefer to reduce the differences between the Debian
version and
upstream.


Reply to this email directly or view it on GitHub:
https://github.com/kennethreitz/requests/issues/2870#issuecomment-155460186

Sent from my phone. Please excuse my brevity.

@Lukasa
Copy link
Member Author

Lukasa commented Nov 10, 2015

@untitaker He is, correct. However, of the three maintainers he's the one opposed to unbundling.

Ian and I are both unanimous that we'd like to unbundle. However, we have a duty of care over this library, and that includes not changing things that Kenneth cares a great deal about without his explicit say-so.

I don't want to have a detailed discussion with Kenneth about this remotely, because this issue is close to all our hearts and we should discuss it in an environment that makes it easy for us to treat each other like humans, with respect and kindness. Those two qualities are often lost online, and that would be a tragedy: our working relationships are more important than that.

@eriol
Copy link
Contributor

eriol commented Nov 11, 2015

@Lukasa I'm sad to confirm but yes, the problem actually exists in Debian testing (with the new import machinery).
I tried the same setup of https://github.com/kennethreitz/requests/issues/2867 using only system packages (so I took python-etcd from sid) and I can confirm it.
I looked at python-etcd code, but the problem was already explained by @fasaxc and is about exceptions.

On Debian this is what I got:

import requests
import urllib3

from urllib3.exceptions import MaxRetryError

print(urllib3.exceptions.MaxRetryError is MaxRetryError)  # False

I'm real sorry I did not noticed when we discussed https://github.com/kennethreitz/requests/pull/2567.

I'm not neither an import logic expert, but your proposal seems the only way to fix this, without unvendoring. Maybe we can ask to @brettcannon if he can take a look at this: I can volunteer to recap all the story so far.

Also, I just want to say thanks to you, @sigmavirus24 and @kennethreitz because although you can just mark this as wontfix, we started working together talking without forget that we are all human beings.
So, thank you.

@Lukasa
Copy link
Member Author

Lukasa commented Nov 11, 2015

@eriol Thanks for investigating. =) And thanks for working with us on this. Again, I can't reiterate enough, it's really important to us to get to a place where everyone's getting a good experience. We're not there yet, but we'll keep trying.

I'd be delighted to see if someone who knows the import machinery really well can propose a better solution to this problem. If you're willing to do the legwork on ramping up @brettcannon that would be even better @eriol.

Otherwise, we should aim to take that boring manual step or see if we can avoid the problem in requests in some other way.

@sigmavirus24
Copy link
Contributor

Otherwise, we should aim to take that boring manual step or see if we can avoid the problem in requests in some other way.

We should do this as a stop-gap solution for now. It will benefit the users and it's trivial for now. Meanwhile we should investigate longer-term solutions to this.

@brettcannon
Copy link
Member

I'm traveling this week but I can try and take a look next week.

@brettcannon
Copy link
Member

Basically @fasaxc is right in his analysis of the problem. The tricky bit is that this is all influenced by how you actually import something. You have to realize each of the following statements use slightly different logic to get you what you want:

import urllib3.exceptions; HTTPError = urllib3.exceptions.HTTPError
from urllib3 import exceptions; HTTPError = urllib3.exceptions.HTTPError
from urllib3.exceptions import HTTPError

If you use the last approach then that will definitely trip you up in this instance because import will do an explicit import for urllib3.exceptions and then look for the HTTPError attribute. This means that since you are not patching urllib3.exceptions in sys.modules you end up having from requests.packages.urllib3 import HTTPError needing to import requests.packages.urllib3 directly and since that isn't in sys.modules it leads to a fresh import. Now you have two separate modules with two separate classes and thus they won't succeed in a issubclass() check in an except clause.

Now to solve this, @Lukasa was right and you can just patch every module in urllib3. Another option is a custom importer that handles just this case, but I don't know if you want such a magical solution. A third option is to insert an object into sys.modules who uses __getattribute__ to direct to the proper module and handle all magical tweaks to sys.modules. And lastly, the simplest solution is either let Debian handle their own desire to monkeypatch vendored code or stop vendoring stuff entirely.

@Lukasa
Copy link
Member Author

Lukasa commented Nov 17, 2015

Thanks so much @brettcannon. 🍰 is owed. =)

Ok folks, given the set of solutions proposed, does anyone have preferences?

@sigmavirus24
Copy link
Contributor

Thank you @brettcannon!

@sigmavirus24
Copy link
Contributor

So I'm enumerating the options and asking questions to make sure I understand correctly.

you can just patch every module in urllib3

As in, ensuring that sys.modules has both urllib3.submodule and requests.packages.urllib3.submodule? I think this is the least magic way and given that @Lukasa and I are both core developers of urllib3, we will catch any new submodules that need to be added to requests' patching logic. I'm most strongly in favor of this one.

Another option is a custom importer that handles just this case, but I don't know if you want such a magical solution.

We had that option and it broke many a thing. I'm not in favor of going back down that route.

insert an object into sys.modules who uses __getattribute__ to direct to the proper module and handle all magical tweaks to sys.modules.

Hm. This sounds like something in the vein of option 2 but a little less magic-y. It still feels like some abuse of the module system and like it might cause us problems. If we come up with something like this, I would appreciate it if @fasaxc and @eriol would commit to testing this with python-etcd and making sure things still aren't broken before shipping a release with it.

And lastly, the simplest solution is either let Debian handle their own desire to monkeypatch vendored code or stop vendoring stuff entirely.

Breaking this into two sub-suggestions:

  • "let Debian handle their own desire to monkey patch vendored code" No. I don't want @eriol to have to figure that out. Vendoring is a position of this project. As much as I dislike downstreams unvendoring things, it isn't the individual maintainer's fault or decision and I don't want to push this work onto their backs.
  • "stop vendoring stuff entirely" Many people constantly badger (or even, at times, yell) this at us (I know you're not Brett) but it simply just won't happen like that (if it happens at all).

My 2¢:

We can take a short term solution of the first option for a 2.8.2 release and get that updated patch to @eriol, @warsaw, @ralphbean, etc. so distros can fix their packages while taking a look at the feasibility and reliability of option 3. I still prefer the first option to the third, but the third may have benefits I'm not seeing due to the bad taste left in my mouth by the VendorAlias (a.k.a., option 2).

@eriol
Copy link
Contributor

eriol commented Nov 17, 2015

@brettcannon many thanks!

I'm also in favour of the less magic solution.
I will be happy to test before the release python-etcd and all the dependants of requests that can be affected by this: using codesearch.debian.net I noticed that also cinder and proliantutils import urllib3's exceptions using requests.packages.urllib3, so I will check also them.

@sigmavirus24 I really appreciate your words, thanks!

@warsaw
Copy link

warsaw commented Nov 17, 2015

On Nov 17, 2015, at 06:14 AM, Ian Cordasco wrote:

As in, ensuring that sys.modules has both urllib3.submodule and
requests.packages.urllib3.submodule? I think this is the least magic way
and given that @Lukasa and I are both core developers of urllib3, we will
catch any new submodules that need to be added to requests' patching
logic. I'm most strongly in favor of this one.

It's mildly disconcerting that a library would fiddle with another library's
sys.modules namespace, but I agree that this is probably the least magical
(and thus most likely to work) way. Maybe we can convince @brettcannon to
build a nicer foolproof module alias system for 3.6. :)

We can take a short term solution of the first option for a 2.8.2 release
and get that updated patch to @eriol, @warsaw, @ralphbean, etc. so distros
can fix their packages while taking a look at the feasibility and reliability
of option 3. I still prefer the first option to the third, but the third may
have benefits I'm not seeing due to the bad taste left in my mouth by the
VendorAlias (a.k.a., option 2).

Yep, I suspect that #1 may be the best approach, and that #3 would be
difficult to debug if things go south. It may be that no solution is perfect,
given Python's current import semantics and implementation, in which case
doing the best you can with the least magic (i.e. most discoverable and
debuggable) would suck less.

Thanks!

@brettcannon
Copy link
Member

I created http://bugs.python.org/issue25649 for the aliasing idea that @warsaw had (I wouldn't count on it getting a blessed solution, but you never know :) .

And just to toss in my opinion, the explicit aliasing is the safest, most compatible way to do things (it's still not fool-proof since __name__, the spec object, etc. will still be the other name, but it's better than nothing). Getting a custom importer right is hard and using a non-module object is actually done in the wild, but some people make assumptions as to the kind of objects they get back from an import so it really depends on your user base whether it will work or not.

@sigmavirus24
Copy link
Contributor

With the recent unbundling, I'm closing this as it should no longer be an issue.

@eriol
Copy link
Contributor

eriol commented Aug 4, 2017

Just a quick note to say also that I was able to drop all the Debian specific patches: https://tracker.debian.org/news/860709.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants