Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster headers implementation #544

Merged
merged 30 commits into from
Feb 19, 2015
Merged

Faster headers implementation #544

merged 30 commits into from
Feb 19, 2015

Conversation

ml31415
Copy link
Contributor

@ml31415 ml31415 commented Feb 5, 2015

I had been trying to speed up urllib3 by monkeypatching with geventhttpclient. Unfortunately there is not too much gain, as lots of copying of response and header objects is done, which could largely be skipped, when the header and response objects would be sufficiently compatible and directly recognized and accepted as compatible, instead of being converted by lots of copying around.

So for the sake of better compatibility and speedups, I would have liked to directly create a header class - the standard urllib3.HTTPHeaderDict - by geventhttpclients c-parser when monkeypatching, which won't need any further processing from urllib3s response object. When I compared both header container implementations, turned out that the current urllib3 version isn't that fast as it could be. Below some timings comparing the old vs the new version, HTTPHeaderDict_ being the old version:

print "Empty initializations"
%timeit h = HTTPHeaderDict_()
%timeit x = HTTPHeaderDict()

print "Initializations with data"
%timeit h = HTTPHeaderDict_(asdf='ddd', abc='100', ddd='200', efg='300')
%timeit x = HTTPHeaderDict(asdf='ddd', abc='100', ddd='200', efg='300')

from random import choice
from string import ascii_lowercase as asciis
hdrs = dict((''.join(choice(asciis) for _ in range(9)),
             ''.join(choice(asciis) for _ in range(9))) for _ in range(200))

print "Initializations with more data"
%timeit h = HTTPHeaderDict_(hdrs)
%timeit x = HTTPHeaderDict(hdrs)

h = HTTPHeaderDict_(asdf='ddd', abc='100', ddd='200', efg='300')
x = HTTPHeaderDict(asdf='ddd', abc='100', ddd='200', efg='300')

print "Fetching items"
%timeit h.__getitem__('abc')
%timeit x.__getitem__('abc')

print "Copying"
%timeit h.copy()
%timeit x.copy()

print "Adding headers"
%timeit h = HTTPHeaderDict_(); [h.add(k,v) for k, v in hdrs.iteritems()];
%timeit x = HTTPHeaderDict(); [x.add(k,v) for k, v in hdrs.iteritems()];

h = HTTPHeaderDict_(hdrs)
x = HTTPHeaderDict(hdrs)

print "Iteration of keys"
%timeit [k for k in h]
%timeit [k for k in x]

print "Iteration of values"
%timeit [h[k] for k in h]
%timeit [x[k] for k in x]

print "Getting keys"
%timeit h.keys()
%timeit x.keys()

print "Getting values"
%timeit h.values()
%timeit x.values()

For my machine, I get the following timings:

Empty initializations
100000 loops, best of 3: 4.38 µs per loop
1000000 loops, best of 3: 858 ns per loop
Initializations with data
100000 loops, best of 3: 11.3 µs per loop
100000 loops, best of 3: 12 µs per loop
Initializations with more data
1000 loops, best of 3: 252 µs per loop
1000 loops, best of 3: 235 µs per loop
Fetching items
100000 loops, best of 3: 2.34 µs per loop
1000000 loops, best of 3: 1 µs per loop
Copying
100000 loops, best of 3: 11.7 µs per loop
100000 loops, best of 3: 5.36 µs per loop
Adding headers
1000 loops, best of 3: 267 µs per loop
1000 loops, best of 3: 242 µs per loop
Iteration of keys
10000 loops, best of 3: 58.3 µs per loop
100000 loops, best of 3: 17.9 µs per loop
Iteration of values
1000 loops, best of 3: 706 µs per loop
1000 loops, best of 3: 271 µs per loop
Getting keys
10000 loops, best of 3: 44.6 µs per loop
100000 loops, best of 3: 2.25 µs per loop
Getting values
1000 loops, best of 3: 671 µs per loop
1000 loops, best of 3: 267 µs per loop

The new version passes all current tests, though it comes with one downside, which is worth discussing: It's not storing the original case of the header items. While the current version stores that, and therefore can restore the case if desired e.g. when iterating over the keys, this implementation is consequently using lower case, except when pretty printing the object. The standards require case insensitivity, so not preserving the case seems to me like an acceptable, maybe even desireable behaviour. Furthermore, also the current implementation has to drop the case information, when joining same header items together.

In terms of compatibility, I hope this should work with all python versions, though I mainly tested it on 2.7.

python test/test_collections.py
................
----------------------------------------------------------------------
Ran 16 tests in 0.004s

OK

@r4ndsen
Copy link

r4ndsen commented Feb 5, 2015

👍

@sigmavirus24
Copy link
Contributor

The new version passes all current tests, though it comes with one downside, which is worth discussing: It's not storing the original case of the header items.

I for one think this is unacceptable. We're all aware of the specification, but preserving the data as it is received is important and something I'm certain I've observed people relying upon. That said, if there's a way to preserve it, and improve peformance, I'm 100% for it. These improvements are good. What's most surprising to me is how drastic it is for an empty initialization (although I'm uncertain exactly how common that really is).

@sigmavirus24
Copy link
Contributor

Further, in its current state, this PR isn't compatible with Python 3 (according to Travis CI) and doesn't achieve 100% test coverage. Those are 2 things that are going to block this more significantly in the near term than anything else.

@ml31415
Copy link
Contributor Author

ml31415 commented Feb 5, 2015

Just extended the tests for full coverage. Have some issues with the full test run locally, so I haven't uploaded it right now. In case the suggested version would otherwise be fine, I'll fix it for Python3, too.

@sigmavirus24
Copy link
Contributor

@ml31415 I'd rather you fix it and push for Python 3 so I can pull it and verify your benchmarks.

@ml31415
Copy link
Contributor Author

ml31415 commented Feb 5, 2015

Fixing the case preservation? I suppose that would kill most of the speed gains.

@sigmavirus24
Copy link
Contributor

@ml31415 Fixing the code on Python 3: https://travis-ci.org/shazow/urllib3/jobs/49604628

@ml31415
Copy link
Contributor Author

ml31415 commented Feb 5, 2015

The last failing test, test_httplib_headers_case_insensitive, seems to be the exactly the point in question, whether or not case preservation is required or not.

@ml31415
Copy link
Contributor Author

ml31415 commented Feb 6, 2015

I modified the changes, so that the original case information is preserved. After some tuning, the speed improvements are similar as the previous version.

Empty initializations
1000000 loops, best of 3: 1.01 µs per loop
Initializations with data
100000 loops, best of 3: 12.8 µs per loop
Initializations with more data
1000 loops, best of 3: 258 µs per loop
Fetching items
1000000 loops, best of 3: 940 ns per loop
Copying
100000 loops, best of 3: 6.29 µs per loop
Adding headers
1000 loops, best of 3: 237 µs per loop
Iteration of keys
100000 loops, best of 3: 17 µs per loop
Iteration of values
1000 loops, best of 3: 239 µs per loop
Getting keys
100000 loops, best of 3: 2.07 µs per loop
Getting values
1000 loops, best of 3: 251 µs per loop

# Only one item so far, need to convert the tuple to list
_dict_setitem(self, key_lower, [vals[0], vals[1], val])

def update_add(*args, **kwds):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, update_add is a weird name. Maybe load?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with extend. :)

@shazow
Copy link
Member

shazow commented Feb 6, 2015

Looks good overall. update_add is the messiest part, but I'm not sure how to clean it up. Renaming would be appreciated if you feel it makes sense.

Otherwise if everyone is onboard, I'm happy to merge. :)

@shazow shazow added this to the v1.10.1 milestone Feb 7, 2015
@ml31415
Copy link
Contributor Author

ml31415 commented Feb 7, 2015

I named it like that on purpose, in order not to mess with the previous update implementation. Previous update replaces items when hitting double occurances, update_add extends entries when required. It's meant for replacing for-loops containing headerobj.add(key, value), like seen in the request module, or as a general import function of arbitrary other header objects. I also added the from_httplib constructor for that reason.

I also don't care about renaming it, though. In case you're talking about the .add implementation itself, well, it's just what benchmarks say it's the fastest. I played around with it for quite a while to achieve similar timings as before, when not preserving case information. This is the only solution which came close. The idea is, assume the general case i.e. no such item present, and optimize that as good as possible. I also had tried out another implementation using lists all over instead of tuples for single headers, but thatturned out to be too slow, about a factor of 1.5 slower than now.

@shazow
Copy link
Member

shazow commented Feb 7, 2015

Yup, I got all that. :) I just mean that the name "update_add" is confusing. I understand why it exists, and I agree it's necessary. Another possible name would be "extend"? (Its behaviour is similar to list.extend in a way...)

@ml31415
Copy link
Contributor Author

ml31415 commented Feb 7, 2015

I suppose I'm quite uncreative with namings :) Better suggestions cordially welcome!

@shazow
Copy link
Member

shazow commented Feb 7, 2015

@Lukasa @sigmavirus24 Thoughts?


# For some reason, the test above doesn't run into the __eq__ function,
# lacking test coverage for line 165, so comparing directly here
self.assertFalse(self.d == 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean

self.assertNotEqual(len(self.d), 2)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referred to line 197 in the comment. Without line 201, the testrun reported missing coverage for _collections, line 163 now. I had expected, this not equal comparison the line above would also call __eq__ and reach that line, but for some reason it didn't.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Feb 7, 2015

I think extend works for me.

@ml31415
Copy link
Contributor Author

ml31415 commented Feb 7, 2015

What remained a bit undefined is, when to return case-preserved headers and when not. E.g. keys() now returns the actual lower case headers, items() restores it. To me it would have been preferable, to have separate case-preserving functions, but the others are already expected to preserve case in some cases, e.g. test_connectionpool, line 621.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants