Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Remove httplib. #1067

Closed
wants to merge 64 commits into from
Closed

[WIP] Remove httplib. #1067

wants to merge 64 commits into from

Conversation

Lukasa
Copy link
Sponsor Contributor

@Lukasa Lukasa commented Dec 8, 2016

This branch contains the most substantial chunk of work I have ever done on urllib3: the beginnings of an effort to remove our dependency on httplib.

Removing our dependency on httplib has been a long-held desire of the urllib3 project. In fact, it's so desired that #776 (our wishlist of features for v2.0) includes in a section entitled "unlikely longshots" the phrase "Get rid of httplib dependence".

This branch does exactly that, by bringing the http.client implementation from Python 3.5 that we were already used to interacting with into our codebase and then ripping its guts out and replacing them with @njsmith's h11 library.

As you can see from the commit log, my work on this began back in October, but I've sat on it for the past two months in order to get it to a place where I had something worth showing. Now I finally do. On my machine, a run of tox -e py27 passes all tests, albeit with incomplete test coverage:

py27 runtests: commands[1] | nosetests
SSS....................S....................................................................................................................................S.....................................................S........................................................................................................................................................................................................................SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS...............................S..S..............................S................................................................................
Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
urllib3                      20      0   100%   
urllib3._collections        148      0   100%   
urllib3.connection          515    103    80%   80-82, 158, 162, 175, 229, 261, 285, 295-300, 306-338, 341-364, 368-372, 375, 385-392, 396-399, 408, 449, 453, 471, 567, 571-572, 602-611, 615, 651-652, 657, 729-732, 736-740, 755, 781, 830, 855-856, 868-869, 905-906, 972-973, 985, 1005
urllib3.connectionpool      240      1    99%   256
urllib3.contrib               0      0   100%   
urllib3.contrib.socks        35      0   100%   
urllib3.exceptions           73      2    97%   216, 219
urllib3.fields               62      0   100%   
urllib3.filepost             32      0   100%   
urllib3.poolmanager         124      0   100%   
urllib3.request              32      0   100%   
urllib3.response            232      1    99%   453
urllib3.util                  1      0   100%   
urllib3.util.connection      47      0   100%   
urllib3.util.request         37      0   100%   
urllib3.util.response        19      5    74%   54-65
urllib3.util.retry          121      0   100%   
urllib3.util.selectors      295     45    85%   101, 360-423
urllib3.util.ssl_            64      0   100%   
urllib3.util.timeout         48      0   100%   
urllib3.util.url             97      0   100%   
urllib3.util.wait            14      1    93%   21
-------------------------------------------------------
TOTAL                      2256    158    93%   
----------------------------------------------------------------------
Ran 604 tests in 13.646s

OK (SKIP=42)
___________________________________ summary ____________________________________
  py27: commands succeeded
  congratulations :)

Initially, my goal was to do this in the least-disruptive way possible: that's why I began by trying to replicate the logic of httplib inside urllib3 but on top of h11. I would have succeeded in that goal if I'd been able to avoid rewriting any tests in any substantive way.

Unfortunately, as you can see from this change, I did not succeed at that. In particular, it was simply not possible to continue with some behaviours that urllib3 had previously promised, such as maintaining header case. I also had to bend over backwards to preserve some other behaviours urllib3 users have expected, such as reporting chunk boundaries, which is probably not something we should ever have promised to do.

From my perspective, then, I think that if we want to continue this work it needs to be considered a breaking change for urllib3. Any change this substantial simply cannot be done in a non-breaking way. A huge number of things will change: our tolerance of errors (we're much stricter now because h11 is much more insistent about being spec-compliant), our wire format (h11 does some case normalization on header field names), and maybe even some of our output formats.

This is also still very much a work in progress. There are TODO statements everywhere. The code is poorly factored: it's a wacky in-between state between trying to maintain the httplib abstraction and a total rejection of it in favour of something much clearer. A whole lot of our code that worked around httplib flaws is no longer necessary and can be removed, but hasn't yet been. A better underlying implementation that doesn't perform socket writes would also be possible, as would some refactoring of code I wrote to pull out common features. And the Python 3 tests don't pass, mostly because of the fact that right now this new branch emits headers as bytestrings rather than unicode strings.

However, at this point it's no longer feasible for me to work on this branch alone. Further progress requires actual governance decisions, and I am not qualified to make those on my own, nor should I. urllib3 has a great collection of core maintainers and regular contributors whose opinions and work I value, and would like to solicit to help me. As I see it, we need to answer the following questions:

  1. Is this worth doing? Do we want to be rid of httplib so badly that we will perform a massive codebase rewrite to get rid of it?
  2. If so, are we happy with using h11?
    1. If we are, do we depend on it using pip, or vendor it? How does our decision affect Requests?
  3. Should this be the beginning of a v2.0 branch that will contain other bits of work from v2.0 Wishlist #776?
    1. If it should, how many breakages do we want to roll into one change? Do we want to go all in and leave a massive codebase change?
  4. What pieces of functionality do we want to retain from the old codebase? In particular, do we want:
    1. To still have stream() expose chunk boundaries?
    2. To return str header field names and values on Python 3?
    3. To return different types for header field names and values on Python 3?
    4. To implement buffered I/O in the low-level primitives?
    5. To maintain the weird two-level Response structure where we have a urllib3 Response object that contains a httplib Response object within it?
  5. How would we want to manage a rollout of this functionality?
  6. How do we want to review it?
  7. What kind of timeline do we want for releasing it?
  8. If we decide to go ahead with a v2.0, should we freeze the current master branch for everything except bug/security fixes until we complete v2.0?

When @shazow originally made me interim lead maintainer, he told me that I had complete authority because he could always undo any change I made that he didn't like. I think this particular change is an exception: I don't think I have any degree of "lead maintainer" authority on this. We're going to need @shazow to express some opinions too.

While I'm here, I'd also like to tag a few community members that are particularly likely to be interested in this proposal:

/cc @njsmith @durin42 @dstufft

Sorry @nateprewitt, I know that you worked hard on this, but h11 handles
this for us more effectively than we can, so all of this code is
redundant with its own logic. It's good that you did it, though, becuase
it means our transition to being strict about content-length enforcement
is finally present!
@codecov-io
Copy link

codecov-io commented Dec 9, 2016

Current coverage is 95.74% (diff: 78.58%)

Merging #1067 into master will decrease coverage by 4.25%

@@           master      #1067   diff @@
========================================
  Files          21         21          
  Lines        1900       2233   +333   
  Methods         0          0          
  Messages        0          0          
  Branches        0          0          
========================================
+ Hits         1900       2138   +238   
- Misses          0         95    +95   
  Partials        0          0          

Powered by Codecov. Last update e6cae79...48780ee

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Dec 9, 2016

I'm beginning to get a bit more aggressive about removing things we no longer need. That'll help with a future refactoring effort: we won't spend our time trying to refactor in a content-length check (for example) that is no longer required because h11 does it for us.

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Dec 12, 2016

So, as I've been cleaning up the builds, I've noticed an issue: AppEngine. One major side-effect of this refactor is that we lose the tight integration with GAE that has been used in the past, because we are no longer using httplib. On top of that, we've generally be in favour of drastically changing the API we work with, meaning that it wouldn't even necessarily be that easy to get GAE working.

I think, for this reason, I'd like to tag @jonparrott to weigh in here, as well as the wider community. How do we forsee our GAE integration working in the future? Will we want to try to factor our code in such a way that it becomes possible to drop back in a httplib-style backend like GAE?

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Dec 12, 2016

So for the short term I'm going to mark the GAE builds as "allowed to fail" on this branch, because I'd like us to come back to them but I don't want them standing in the way right now. =D

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Dec 12, 2016

Woo, ok, tests are green!

So, I have opened up a project and shoved a whole lot of cards in there for things that people can do if they want to start getting involved. I'm going to continue pushing forward on some of the major refactoring work, but at this point now that the tests are passing (albeit with missing coverage) we have a decent basis to start work with.

For that reason, I'm going to move this to a v2 branch on this repository and encourage people who want to get involved to start making pull requests against that branch. Please create issues and move project cards around to make it clear when something is being worked on by someone to avoid duplicating too much effort. I'm going to continue focusing on excising code that we don't need and trying to complete some of the refactoring so that I can reduce the feature surface, and then I'll look at the internal refactor to try to separate the common protocol code and the bits that are to do with "synchronous I/O", with the goal of enabling future work to support async code.

@Lukasa Lukasa mentioned this pull request Dec 12, 2016
@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Dec 12, 2016

Closing in favour of #1068.

@Lukasa Lukasa closed this Dec 12, 2016
@shazow
Copy link
Member

shazow commented Dec 12, 2016

Re: GAE, Jon can definitely add more context, but my understanding is that GAE doesn't actually need httplib. Even today, it just straight-up uses urlfetch when available (and would have to fall back to plain sockets if not). If we can have a place where we bypass anything that touches a socket and pass it straight into AppEngine's urlfetch when enabled, then that's all it will take. This would mean we'd only support a subset of features, but that's already true today.

Assuming we keep a manager hierarchy like we have today, I don't think GAE support would need to change much.

@theacodes
Copy link
Member

Echoing what @shazow said: Today we're basically just adapting App Engine's urlfetch API to match urllib3's. As long as that remains a viable option, we can continue to support GAE. Also note that I'm more than willing to help with that.

To add some more context to this - I've been continuously pushing for App Engine standard to be "less weird". Some of that is coming to bear fruit, but App Engine's support for sockets is still "weird". I would expect to see App Engine standard's Python runtime become less weird over time. Also keep in mind that App Engine standard is currently limited to Python 2.7 only, and any Python 3 support on standard would likely look closer to "vanilla" Python than "App Engine" Python.

All this is to say - don't sweat App Engine that much. As long as the API allows the right level of abstraction we can use the same approach we use today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet