Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow buffering for HTTPResponse #49129

Closed
kristjanvalur mannequin opened this issue Jan 8, 2009 · 17 comments
Closed

Allow buffering for HTTPResponse #49129

kristjanvalur mannequin opened this issue Jan 8, 2009 · 17 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@kristjanvalur
Copy link
Mannequin

kristjanvalur mannequin commented Jan 8, 2009

BPO 4879
Nosy @gpshead, @amauryfa, @pitrou, @kristjanvalur, @benjaminp, @bitdancer, @vadmium
Files
  • nobuffer.patch: Suggested patch to enable readline buffering for HTTP headers
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-08-18.12:47:38.450>
    created_at = <Date 2009-01-08.10:43:55.503>
    labels = ['library', 'performance']
    title = 'Allow buffering for HTTPResponse'
    updated_at = <Date 2015-02-02.13:03:17.219>
    user = 'https://github.com/kristjanvalur'

    bugs.python.org fields:

    activity = <Date 2015-02-02.13:03:17.219>
    actor = 'martin.panter'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-08-18.12:47:38.450>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2009-01-08.10:43:55.503>
    creator = 'kristjan.jonsson'
    dependencies = []
    files = ['12649']
    hgrepos = []
    issue_num = 4879
    keywords = ['patch']
    message_count = 17.0
    messages = ['79406', '79552', '79601', '79602', '80064', '80254', '80946', '81020', '81035', '81036', '91597', '91621', '91683', '91684', '91914', '91939', '235254']
    nosy_count = 7.0
    nosy_names = ['gregory.p.smith', 'amaury.forgeotdarc', 'pitrou', 'kristjan.jonsson', 'benjamin.peterson', 'r.david.murray', 'martin.panter']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue4879'
    versions = ['Python 2.7']

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 8, 2009

    This is related to bpo-4336. While that issue finally revolved around
    fixing the Nagle problem for xmlrpc and other http clients, here I
    propose to fix another performance bottleneck with xmlrpc, reading the
    HTTP Headers.
    HTTPResponse creates a socket.fileobject() with zero buffering which
    means that the readline() operations used to read the headers become
    very inefficient since individual socket.recv() calls are used for each
    character.
    The reason for this is to support users who subsequently do
    socket.recv() on the underlying socket, and so we don't want to read too
    much of the headers. Note that there is no example of this in the
    standard library.
    In the provided patch, I have removed some vestigial code from
    xmrlpclib.py which attempted to use recv() even though it never did
    (because the connection has been closed when HTTPConnection returns the
    response). Even so, Guido has expressed his concern that we need to
    keep the support for this recv() behaviour in place.
    Therefore, I have added a new optional argument, nobuffer=True, which
    users that promise not to call recv() can set to false. This will then
    cause the generated fileobject from HTTPResponse to have default
    buffering, greatly increasing the performance of the reading of the
    headers.

    @kristjanvalur kristjanvalur mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Jan 8, 2009
    @kristjanvalur kristjanvalur mannequin changed the title Allo buffering for HTTPResponse Allow buffering for HTTPResponse Jan 8, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Jan 10, 2009

    Why not name the parameter buffering=False rather than nobuffer=True?
    Sounds more "natural" to me.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 11, 2009

    Yes, if you think so. But my intention was to indicate that
    the "nobuffering" is a special feature the user can turn off to resort
    to what could be considered normal, buffered, behaviour.

    But either way is fine, and I'll be happy to change it.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 11, 2009

    Checked in as revision: 68532

    @kristjanvalur kristjanvalur mannequin closed this as completed Jan 11, 2009
    @benjaminp
    Copy link
    Contributor

    Kristján, could you merge this change into the Py3k branch, please?

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 20, 2009

    Adding crossref to http://bugs.python.org/issue4448 regarding the
    porting of this feature to 3.1

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Feb 2, 2009

    Checked in r69209 to py3k

    @amauryfa
    Copy link
    Member

    amauryfa commented Feb 3, 2009

    This causes failures in test_urllib2net.

    The fix is easy: a handful of

    •    self.assertTrue(u.fp.\_sock.gettimeout() is None)
      

    + self.assertTrue(u.fp.raw._sock.gettimeout() is None)

    But doesn't this show a backward-incompatible change?
    or is the _sock member an implementation detail anyway?

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Feb 3, 2009

    The socket.socket.makefile() now returns a quite different kind of
    object, namely a SocketIO thing. This comes as a result of the IO
    refactoring in 3.0.
    The good side to this is that old and naughty apps have been forbidden
    to access the _sock member directly. This was the reason that we had
    to disable read buffering in 2.x: Some clients would read directly
    from the socket, even though it was "hidden".

    As for the test failure, I admit that I didn't enable the "network"
    resource for the testsuite. Will fix.

    Btw, test_xmlrpc_net.py fails, because time.xmlrpc.com doesn't
    resolve. Are there any other good and stable rpc servers out there?

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Feb 3, 2009

    I'm afraid I misunderstood the problem.
    Yes, since u.fp is now a io.BufferedReader() rather than a socket.Socket
    (), the _sock member has moved.

    But looking at the other assertions in test_urllib2net.py, you can see
    the different ways the code uses to get at the socket: There is
    u.fp.raw._sock and there is u.fp.fp.raw._sock.

    IMHO I think we have to assume the _sock member to be an implementation
    detail that is not part of the interface.

    @gpshead
    Copy link
    Member

    gpshead commented Aug 15, 2009

    I believe there will be a problem with the patch committed in r68532.
    If getresponse(buffering=True) is called, extra data on the socket may
    be consumed by the socket.makefile() buffer which will cause problems if
    the connection is not closed immediately after this response.

    I mentioned this in comments on http://bugs.python.org/issue2576 which
    happens to have a way of fixing this already proposed as they are trying
    to solve an almost identical issue to this without yet having seen this
    issue.

    @gpshead
    Copy link
    Member

    gpshead commented Aug 15, 2009

    trunk r74463 now forces the HTTPResponse to close afterwards when
    buffering=True to avoid the issue.

    @bitdancer
    Copy link
    Member

    And that update causes failures in test_xmlrpc.

    @bitdancer bitdancer reopened this Aug 18, 2009
    @bitdancer
    Copy link
    Member

    I think I'll open a new ticket instead.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Aug 24, 2009

    Gregory, please revert your change (74463).
    There is no "extra data" after the response, since such data can only
    be generated as a result of a new "request".

    Your change has disabled the HTTP/1.1 keepalive capability, causing
    test failurres in xmlrpc. Also showing perhaps that we need test cases
    for keepalive in regular httplib.py testsuite.

    If you think there are problems, you a new defect should be created.

    @gpshead
    Copy link
    Member

    gpshead commented Aug 24, 2009

    Already reverted in

    r74522 | gregory.p.smith | 2009-08-18 22:33:48 -0700 (Tue, 18 Aug 2009)

    for that reason.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 2, 2015

    Opened bpo-23377 for the problem of dropping extra buffered data at the end of a response.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants