You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All the data has to be read into a temporary bytes, then copied into the joined buffer.
It's also memory-inefficient: the CPython implementation of bytes.join first converts the generator to a sequence, so if the content is 1GB, you will temporarily have 2GB of memory used.
10KB isn't really big enough to amortise all the overheads (increasing it significantly improves performance).
It looks like this used to be done with self.raw.read, but it was changed to the current approach 8 years ago. I've tried a quick test to switch back to self.raw.read(decode_content=True), but it's failing some unit tests, presumably because of subtleties in handling Content-Encoding. If the maintainers agree that this is worth pursuing then I can work on dealing with the corner cases to make a PR.
Expected Result
I expect resp.content from a non-streamed request to have similar performance to resp.raw.read() from a streamed request.
Actual Result
I've benchmarked response.content at 590 MB/s and response.raw.read()see sample code below) at 3180 MB/s — 5.4x faster. With 10-25 Gb/s networking becoming pretty standard in the data centre, this represents a significant bottleneck.
You'll need to run an HTTP server that can deliver a large file at high bandwidth (I happen to have Minio+Varnish on my local machine, but I'm sure other servers e.g. Apache could be used). Then run the script below as httpbench-requests.py all http://.... Note that Python 3.8 (or possibly it was 3.7) improved the performance of http.client.HTTPResponse.read, so if you use an older Python version the difference in performance is less enormous, but still >2x on my machine.
Are you suggesting replacing ''.join with BytesIO for joining together all the 10KB pieces? It won't avoid having two copies of all the data around at once because BytesIO.getvalue makes a copy. I haven't measured the performance but I'd be surprised if it's any better.
The core of Response.content looks like this (where CONTENT_CHUNK_SIZE is 10KB):
That is suboptimal for several reasons:
bytes
, then copied into the joined buffer.It looks like this used to be done with
self.raw.read
, but it was changed to the current approach 8 years ago. I've tried a quick test to switch back toself.raw.read(decode_content=True)
, but it's failing some unit tests, presumably because of subtleties in handling Content-Encoding. If the maintainers agree that this is worth pursuing then I can work on dealing with the corner cases to make a PR.Expected Result
I expect
resp.content
from a non-streamed request to have similar performance toresp.raw.read()
from a streamed request.Actual Result
I've benchmarked
response.content
at 590 MB/s andresponse.raw.read()
see sample code below) at 3180 MB/s — 5.4x faster. With 10-25 Gb/s networking becoming pretty standard in the data centre, this represents a significant bottleneck.Reproduction Steps
You'll need to run an HTTP server that can deliver a large file at high bandwidth (I happen to have Minio+Varnish on my local machine, but I'm sure other servers e.g. Apache could be used). Then run the script below as
httpbench-requests.py all http://...
. Note that Python 3.8 (or possibly it was 3.7) improved the performance of http.client.HTTPResponse.read, so if you use an older Python version the difference in performance is less enormous, but still >2x on my machine.System Information
The text was updated successfully, but these errors were encountered: