Skip to content

Commit

Permalink
Issue #12319: Support for chunked encoding of HTTP request bodies
Browse files Browse the repository at this point in the history
When the body object is a file, its size is no longer determined with
fstat(), since that can report the wrong result (e.g. reading from a pipe).
Instead, determine the size using seek(), or fall back to chunked encoding
for unseekable files.

Also, change the logic for detecting text files to check for TextIOBase
inheritance, rather than inspecting the “mode” attribute, which may not
exist (e.g. BytesIO and StringIO).  The Content-Length for text files is no
longer determined ahead of time, because the original logic could have been
wrong depending on the codec and newline translation settings.

Patch by Demian Brecht and Rolf Krahl, with a few tweaks by me.
  • Loading branch information
vadmium committed Aug 24, 2016
1 parent a790fe7 commit 3c0d0ba
Show file tree
Hide file tree
Showing 9 changed files with 531 additions and 150 deletions.
98 changes: 70 additions & 28 deletions Doc/library/http.client.rst
Expand Up @@ -219,39 +219,62 @@ HTTPConnection Objects
:class:`HTTPConnection` instances have the following methods:


.. method:: HTTPConnection.request(method, url, body=None, headers={})
.. method:: HTTPConnection.request(method, url, body=None, headers={}, *, \
encode_chunked=False)

This will send a request to the server using the HTTP request
method *method* and the selector *url*.

If *body* is specified, the specified data is sent after the headers are
finished. It may be a string, a :term:`bytes-like object`, an open
:term:`file object`, or an iterable of :term:`bytes-like object`\s. If
*body* is a string, it is encoded as ISO-8859-1, the default for HTTP. If
it is a bytes-like object the bytes are sent as is. If it is a :term:`file
object`, the contents of the file is sent; this file object should support
at least the ``read()`` method. If the file object has a ``mode``
attribute, the data returned by the ``read()`` method will be encoded as
ISO-8859-1 unless the ``mode`` attribute contains the substring ``b``,
otherwise the data returned by ``read()`` is sent as is. If *body* is an
iterable, the elements of the iterable are sent as is until the iterable is
exhausted.

The *headers* argument should be a mapping of extra HTTP
headers to send with the request.

If *headers* does not contain a Content-Length item, one is added
automatically if possible. If *body* is ``None``, the Content-Length header
is set to ``0`` for methods that expect a body (``PUT``, ``POST``, and
``PATCH``). If *body* is a string or bytes object, the Content-Length
header is set to its length. If *body* is a :term:`file object` and it
works to call :func:`~os.fstat` on the result of its ``fileno()`` method,
then the Content-Length header is set to the ``st_size`` reported by the
``fstat`` call. Otherwise no Content-Length header is added.
finished. It may be a :class:`str`, a :term:`bytes-like object`, an
open :term:`file object`, or an iterable of :class:`bytes`. If *body*
is a string, it is encoded as ISO-8859-1, the default for HTTP. If it
is a bytes-like object, the bytes are sent as is. If it is a :term:`file
object`, the contents of the file is sent; this file object should
support at least the ``read()`` method. If the file object is an
instance of :class:`io.TextIOBase`, the data returned by the ``read()``
method will be encoded as ISO-8859-1, otherwise the data returned by
``read()`` is sent as is. If *body* is an iterable, the elements of the
iterable are sent as is until the iterable is exhausted.

The *headers* argument should be a mapping of extra HTTP headers to send
with the request.

If *headers* contains neither Content-Length nor Transfer-Encoding, a
Content-Length header will be added automatically if possible. If
*body* is ``None``, the Content-Length header is set to ``0`` for
methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If
*body* is a string or bytes-like object, the Content-Length header is
set to its length. If *body* is a binary :term:`file object`
supporting :meth:`~io.IOBase.seek`, this will be used to determine
its size. Otherwise, the Content-Length header is not added
automatically. In cases where determining the Content-Length up
front is not possible, the body will be chunk-encoded and the
Transfer-Encoding header will automatically be set.

The *encode_chunked* argument is only relevant if Transfer-Encoding is
specified in *headers*. If *encode_chunked* is ``False``, the
HTTPConnection object assumes that all encoding is handled by the
calling code. If it is ``True``, the body will be chunk-encoded.

.. note::
Chunked transfer encoding has been added to the HTTP protocol
version 1.1. Unless the HTTP server is known to handle HTTP 1.1,
the caller must either specify the Content-Length or must use a
body representation whose length can be determined automatically.

.. versionadded:: 3.2
*body* can now be an iterable.

.. versionchanged:: 3.6
If neither Content-Length nor Transfer-Encoding are set in
*headers* and Content-Length cannot be determined, *body* will now
be automatically chunk-encoded. The *encode_chunked* argument
was added.
The Content-Length for binary file objects is determined with seek.
No attempt is made to determine the Content-Length for text file
objects.

.. method:: HTTPConnection.getresponse()

Should be called after a request is sent to get the response from the server.
Expand Down Expand Up @@ -336,13 +359,32 @@ also send your request step by step, by using the four functions below.
an argument.


.. method:: HTTPConnection.endheaders(message_body=None)
.. method:: HTTPConnection.endheaders(message_body=None, *, encode_chunked=False)

Send a blank line to the server, signalling the end of the headers. The
optional *message_body* argument can be used to pass a message body
associated with the request. The message body will be sent in the same
packet as the message headers if it is string, otherwise it is sent in a
separate packet.
associated with the request.

If *encode_chunked* is ``True``, the result of each iteration of
*message_body* will be chunk-encoded as specified in :rfc:`7230`,
Section 3.3.1. How the data is encoded is dependent on the type of
*message_body*. If *message_body* implements the :ref:`buffer interface
<bufferobjects>` the encoding will result in a single chunk.
If *message_body* is a :class:`collections.Iterable`, each iteration
of *message_body* will result in a chunk. If *message_body* is a
:term:`file object`, each call to ``.read()`` will result in a chunk.
The method automatically signals the end of the chunk-encoded data
immediately after *message_body*.

.. note:: Due to the chunked encoding specification, empty chunks
yielded by an iterator body will be ignored by the chunk-encoder.
This is to avoid premature termination of the read of the request by
the target server due to malformed encoding.

.. versionadded:: 3.6
Chunked encoding support. The *encode_chunked* parameter was
added.


.. method:: HTTPConnection.send(data)

Expand Down
60 changes: 37 additions & 23 deletions Doc/library/urllib.request.rst
Expand Up @@ -30,18 +30,9 @@ The :mod:`urllib.request` module defines the following functions:
Open the URL *url*, which can be either a string or a
:class:`Request` object.

*data* must be a bytes object specifying additional data to be sent to the
server, or ``None`` if no such data is needed. *data* may also be an
iterable object and in that case Content-Length value must be specified in
the headers. Currently HTTP requests are the only ones that use *data*; the
HTTP request will be a POST instead of a GET when the *data* parameter is
provided.

*data* should be a buffer in the standard
:mimetype:`application/x-www-form-urlencoded` format. The
:func:`urllib.parse.urlencode` function takes a mapping or sequence of
2-tuples and returns an ASCII text string in this format. It should
be encoded to bytes before being used as the *data* parameter.
*data* must be an object specifying additional data to be sent to the
server, or ``None`` if no such data is needed. See :class:`Request`
for details.

urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header
in its HTTP requests.
Expand Down Expand Up @@ -192,14 +183,22 @@ The following classes are provided:

*url* should be a string containing a valid URL.

*data* must be a bytes object specifying additional data to send to the
server, or ``None`` if no such data is needed. Currently HTTP requests are
the only ones that use *data*; the HTTP request will be a POST instead of a
GET when the *data* parameter is provided. *data* should be a buffer in the
standard :mimetype:`application/x-www-form-urlencoded` format.
The :func:`urllib.parse.urlencode` function takes a mapping or sequence of
2-tuples and returns an ASCII string in this format. It should be
encoded to bytes before being used as the *data* parameter.
*data* must be an object specifying additional data to send to the
server, or ``None`` if no such data is needed. Currently HTTP
requests are the only ones that use *data*. The supported object
types include bytes, file-like objects, and iterables. If no
``Content-Length`` header has been provided, :class:`HTTPHandler` will
try to determine the length of *data* and set this header accordingly.
If this fails, ``Transfer-Encoding: chunked`` as specified in
:rfc:`7230`, Section 3.3.1 will be used to send the data. See
:meth:`http.client.HTTPConnection.request` for details on the
supported object types and on how the content length is determined.

For an HTTP POST request method, *data* should be a buffer in the
standard :mimetype:`application/x-www-form-urlencoded` format. The
:func:`urllib.parse.urlencode` function takes a mapping or sequence
of 2-tuples and returns an ASCII string in this format. It should
be encoded to bytes before being used as the *data* parameter.

*headers* should be a dictionary, and will be treated as if
:meth:`add_header` was called with each key and value as arguments.
Expand All @@ -211,8 +210,10 @@ The following classes are provided:
:mod:`urllib`'s default user agent string is
``"Python-urllib/2.6"`` (on Python 2.6).

An example of using ``Content-Type`` header with *data* argument would be
sending a dictionary like ``{"Content-Type": "application/x-www-form-urlencoded"}``.
An appropriate ``Content-Type`` header should be included if the *data*
argument is present. If this header has not been provided and *data*
is not None, ``Content-Type: application/x-www-form-urlencoded`` will
be added as a default.

The final two arguments are only of interest for correct handling
of third-party HTTP cookies:
Expand All @@ -235,15 +236,28 @@ The following classes are provided:
*method* should be a string that indicates the HTTP request method that
will be used (e.g. ``'HEAD'``). If provided, its value is stored in the
:attr:`~Request.method` attribute and is used by :meth:`get_method()`.
Subclasses may indicate a default method by setting the
The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise.
Subclasses may indicate a different default method by setting the
:attr:`~Request.method` attribute in the class itself.

.. note::
The request will not work as expected if the data object is unable
to deliver its content more than once (e.g. a file or an iterable
that can produce the content only once) and the request is retried
for HTTP redirects or authentication. The *data* is sent to the
HTTP server right away after the headers. There is no support for
a 100-continue expectation in the library.

.. versionchanged:: 3.3
:attr:`Request.method` argument is added to the Request class.

.. versionchanged:: 3.4
Default :attr:`Request.method` may be indicated at the class level.

.. versionchanged:: 3.6
Do not raise an error if the ``Content-Length`` has not been
provided and could not be determined. Fall back to use chunked
transfer encoding instead.

.. class:: OpenerDirector()

Expand Down
19 changes: 19 additions & 0 deletions Doc/whatsnew/3.6.rst
Expand Up @@ -324,6 +324,15 @@ exceptions: see :func:`faulthandler.enable`. (Contributed by Victor Stinner in
:issue:`23848`.)


http.client
-----------

:meth:`HTTPConnection.request() <http.client.HTTPConnection.request>` and
:meth:`~http.client.HTTPConnection.endheaders` both now support
chunked encoding request bodies.
(Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.)


idlelib and IDLE
----------------

Expand Down Expand Up @@ -500,6 +509,16 @@ The :class:`~unittest.mock.Mock` class has the following improvements:
(Contributed by Amit Saha in :issue:`26323`.)


urllib.request
--------------

If a HTTP request has a non-empty body but no Content-Length header
and the content length cannot be determined up front, rather than
throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now
falls back to use chunked transfer encoding.
(Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.)


urllib.robotparser
------------------

Expand Down

0 comments on commit 3c0d0ba

Please sign in to comment.