Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Response.content is unnecessarily slow #5503

Open
bmerry opened this issue Jun 18, 2020 · 3 comments
Open

Performance: Response.content is unnecessarily slow #5503

bmerry opened this issue Jun 18, 2020 · 3 comments

Comments

@bmerry
Copy link
Contributor

bmerry commented Jun 18, 2020

The core of Response.content looks like this (where CONTENT_CHUNK_SIZE is 10KB):

self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''

That is suboptimal for several reasons:

  1. All the data has to be read into a temporary bytes, then copied into the joined buffer.
  2. It's also memory-inefficient: the CPython implementation of bytes.join first converts the generator to a sequence, so if the content is 1GB, you will temporarily have 2GB of memory used.
  3. 10KB isn't really big enough to amortise all the overheads (increasing it significantly improves performance).

It looks like this used to be done with self.raw.read, but it was changed to the current approach 8 years ago. I've tried a quick test to switch back to self.raw.read(decode_content=True), but it's failing some unit tests, presumably because of subtleties in handling Content-Encoding. If the maintainers agree that this is worth pursuing then I can work on dealing with the corner cases to make a PR.

Expected Result

I expect resp.content from a non-streamed request to have similar performance to resp.raw.read() from a streamed request.

Actual Result

I've benchmarked response.content at 590 MB/s and response.raw.read()see sample code below) at 3180 MB/s — 5.4x faster. With 10-25 Gb/s networking becoming pretty standard in the data centre, this represents a significant bottleneck.

def load_requests_naive(url: str) -> bytes:
    with requests.get(url) as resp:
        return resp.content

def load_requests_stream(url: str) -> bytes:
    with requests.get(url, stream=True) as resp:
        return resp.raw.read()

Reproduction Steps

You'll need to run an HTTP server that can deliver a large file at high bandwidth (I happen to have Minio+Varnish on my local machine, but I'm sure other servers e.g. Apache could be used). Then run the script below as httpbench-requests.py all http://.... Note that Python 3.8 (or possibly it was 3.7) improved the performance of http.client.HTTPResponse.read, so if you use an older Python version the difference in performance is less enormous, but still >2x on my machine.

#!/usr/bin/env python

import argparse
import gc
import hashlib
import http.client
import io
import socket
import textwrap
import time
import urllib.parse
from typing import Callable, Tuple, Optional

import requests
import numpy as np


_Method = Callable[[str], bytes]
METHODS = {}


def method(name: str) -> Callable[[_Method], _Method]:
    def decorate(func: _Method) -> _Method:
        METHODS[name] = func
        return func

    return decorate


@method('requests-naive')
def load_requests_naive(url: str) -> bytes:
    with requests.get(url) as resp:
        return resp.content


@method('requests-stream-read')
def load_requests_stream(url: str) -> bytes:
    with requests.get(url, stream=True) as resp:
        return resp.raw.read()


def measure_method(method: str, args: argparse.Namespace) -> None:
    rates = []
    for i in range(args.passes):
        gc.collect()
        start = time.monotonic()
        data = METHODS[method](args.url)
        stop = time.monotonic()
        elapsed = stop - start
        rates.append(len(data) / elapsed)
        del data
    mean = np.mean(rates)
    std = np.std(rates) / np.sqrt(args.passes - 1)
    print('{}: {:.1f} ± {:.1f} MB/s'.format(method, mean / 1e6, std / 1e6))


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--passes', type=int, default=5)
    parser.add_argument('method')
    parser.add_argument('url')
    args = parser.parse_args()
    if args.method not in METHODS and args.method != 'all':
        parser.error('Method must be "all" or one of {}'.format(set(METHODS.keys())))

    if args.method == 'all':
        for method in METHODS:
            measure_method(method, args)
    else:
        measure_method(args.method, args)


if __name__ == '__main__':
    main()

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.9"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.8.0"
  },
  "platform": {
    "release": "5.4.0-37-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.24.0"
  },
  "system_ssl": {
    "version": "1010100f"
  },
  "urllib3": {
    "version": "1.25.9"
  },
  "using_pyopenssl": false
}
@GoddessLuBoYan
Copy link

so, use BytesIO is better?

@bmerry
Copy link
Contributor Author

bmerry commented Jun 30, 2020

so, use BytesIO is better?

Are you suggesting replacing ''.join with BytesIO for joining together all the 10KB pieces? It won't avoid having two copies of all the data around at once because BytesIO.getvalue makes a copy. I haven't measured the performance but I'd be surprised if it's any better.

@GoddessLuBoYan
Copy link

yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants