New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Remove httplib. #1067
[WIP] Remove httplib. #1067
Conversation
Sorry @nateprewitt, I know that you worked hard on this, but h11 handles this for us more effectively than we can, so all of this code is redundant with its own logic. It's good that you did it, though, becuase it means our transition to being strict about content-length enforcement is finally present!
Current coverage is 95.74% (diff: 78.58%)@@ master #1067 diff @@
========================================
Files 21 21
Lines 1900 2233 +333
Methods 0 0
Messages 0 0
Branches 0 0
========================================
+ Hits 1900 2138 +238
- Misses 0 95 +95
Partials 0 0
|
I'm beginning to get a bit more aggressive about removing things we no longer need. That'll help with a future refactoring effort: we won't spend our time trying to refactor in a content-length check (for example) that is no longer required because h11 does it for us. |
This is a temporary change that we'll almost certainly revert, but I want to get the tests passing so that others can help out.
So, as I've been cleaning up the builds, I've noticed an issue: AppEngine. One major side-effect of this refactor is that we lose the tight integration with GAE that has been used in the past, because we are no longer using httplib. On top of that, we've generally be in favour of drastically changing the API we work with, meaning that it wouldn't even necessarily be that easy to get GAE working. I think, for this reason, I'd like to tag @jonparrott to weigh in here, as well as the wider community. How do we forsee our GAE integration working in the future? Will we want to try to factor our code in such a way that it becomes possible to drop back in a httplib-style backend like GAE? |
So for the short term I'm going to mark the GAE builds as "allowed to fail" on this branch, because I'd like us to come back to them but I don't want them standing in the way right now. =D |
Woo, ok, tests are green! So, I have opened up a project and shoved a whole lot of cards in there for things that people can do if they want to start getting involved. I'm going to continue pushing forward on some of the major refactoring work, but at this point now that the tests are passing (albeit with missing coverage) we have a decent basis to start work with. For that reason, I'm going to move this to a |
Closing in favour of #1068. |
Re: GAE, Jon can definitely add more context, but my understanding is that GAE doesn't actually need httplib. Even today, it just straight-up uses Assuming we keep a manager hierarchy like we have today, I don't think GAE support would need to change much. |
Echoing what @shazow said: Today we're basically just adapting App Engine's urlfetch API to match urllib3's. As long as that remains a viable option, we can continue to support GAE. Also note that I'm more than willing to help with that. To add some more context to this - I've been continuously pushing for App Engine standard to be "less weird". Some of that is coming to bear fruit, but App Engine's support for sockets is still "weird". I would expect to see App Engine standard's Python runtime become less weird over time. Also keep in mind that App Engine standard is currently limited to Python 2.7 only, and any Python 3 support on standard would likely look closer to "vanilla" Python than "App Engine" Python. All this is to say - don't sweat App Engine that much. As long as the API allows the right level of abstraction we can use the same approach we use today. |
This branch contains the most substantial chunk of work I have ever done on urllib3: the beginnings of an effort to remove our dependency on httplib.
Removing our dependency on httplib has been a long-held desire of the urllib3 project. In fact, it's so desired that #776 (our wishlist of features for v2.0) includes in a section entitled "unlikely longshots" the phrase "Get rid of httplib dependence".
This branch does exactly that, by bringing the
http.client
implementation from Python 3.5 that we were already used to interacting with into our codebase and then ripping its guts out and replacing them with @njsmith's h11 library.As you can see from the commit log, my work on this began back in October, but I've sat on it for the past two months in order to get it to a place where I had something worth showing. Now I finally do. On my machine, a run of
tox -e py27
passes all tests, albeit with incomplete test coverage:Initially, my goal was to do this in the least-disruptive way possible: that's why I began by trying to replicate the logic of httplib inside urllib3 but on top of h11. I would have succeeded in that goal if I'd been able to avoid rewriting any tests in any substantive way.
Unfortunately, as you can see from this change, I did not succeed at that. In particular, it was simply not possible to continue with some behaviours that urllib3 had previously promised, such as maintaining header case. I also had to bend over backwards to preserve some other behaviours urllib3 users have expected, such as reporting chunk boundaries, which is probably not something we should ever have promised to do.
From my perspective, then, I think that if we want to continue this work it needs to be considered a breaking change for urllib3. Any change this substantial simply cannot be done in a non-breaking way. A huge number of things will change: our tolerance of errors (we're much stricter now because h11 is much more insistent about being spec-compliant), our wire format (h11 does some case normalization on header field names), and maybe even some of our output formats.
This is also still very much a work in progress. There are TODO statements everywhere. The code is poorly factored: it's a wacky in-between state between trying to maintain the httplib abstraction and a total rejection of it in favour of something much clearer. A whole lot of our code that worked around httplib flaws is no longer necessary and can be removed, but hasn't yet been. A better underlying implementation that doesn't perform socket writes would also be possible, as would some refactoring of code I wrote to pull out common features. And the Python 3 tests don't pass, mostly because of the fact that right now this new branch emits headers as bytestrings rather than unicode strings.
However, at this point it's no longer feasible for me to work on this branch alone. Further progress requires actual governance decisions, and I am not qualified to make those on my own, nor should I. urllib3 has a great collection of core maintainers and regular contributors whose opinions and work I value, and would like to solicit to help me. As I see it, we need to answer the following questions:
stream()
expose chunk boundaries?str
header field names and values on Python 3?Response
object that contains a httplibResponse
object within it?When @shazow originally made me interim lead maintainer, he told me that I had complete authority because he could always undo any change I made that he didn't like. I think this particular change is an exception: I don't think I have any degree of "lead maintainer" authority on this. We're going to need @shazow to express some opinions too.
While I'm here, I'd also like to tag a few community members that are particularly likely to be interested in this proposal:
/cc @njsmith @durin42 @dstufft