Support for `chunk_size` #1277

tomchristie · 2020-09-10T15:08:14Z

Closes #394

Add iter_raw(chunk_size: int=None)
Add iter_bytes(chunk_size: int=None)
Add iter_text(chunk_size: int=None)

First bit of work towards chunk_size on the response.iter_[raw|bytes|text] methods,
using a decoder class in line with the other decoding.

Nice thing about this is that it's really easy to unit test, eg...

def test_byte_chunker():
    decoder = ByteChunker()
    assert decoder.decode(b"1234567") == [b"1234567"]
    assert decoder.decode(b"89") == [b"89"]
    assert decoder.flush() == []

    decoder = ByteChunker(chunk_size=3)
    assert decoder.decode(b"1234567") == [b"123", b"456"]
    assert decoder.decode(b"89") == [b"789"]
    assert decoder.flush() == []

    decoder = ByteChunker(chunk_size=3)
    assert decoder.decode(b"123456") == [b"123", b"456"]
    assert decoder.decode(b"789") == [b"789"]
    assert decoder.flush() == []

    decoder = ByteChunker(chunk_size=3)
    assert decoder.decode(b"123456") == [b"123", b"456"]
    assert decoder.decode(b"78") == []
    assert decoder.flush() == [b"78"]

cdeler · 2020-09-10T15:52:22Z

httpx/_models.py

@@ -912,19 +913,28 @@ def read(self) -> bytes:
            self._content = b"".join(self.iter_bytes())
        return self._content

-    def iter_bytes(self) -> typing.Iterator[bytes]:
+    def iter_bytes(self, chunk_size: int = None) -> typing.Iterator[bytes]:


Are you sure about chunk_size with default None?

I do agree that it looks more suitable, but requests provides us with defaults chunk_size=1 or 512

In this PR, when chunk_size=None we just return the input content unchanged, as one single big chunk.

Yes, this would deviate from what Requests seems to do, but:

Setting a non-None default would break backward compatibility on our side.

Defaulting to "transparently pass the chunk sent by the server" is probably the most reasonable approach anyway.

That said, we'd need to add this deviation from Requests to the compatibility guide. 👍

httpx/_models.py

florimondmanca

Superb!

There may be room for factoring in some of the introduced logic within the _decoders.py module (bunch of repetition there), but happy to treat it as a follow-up.

Edit: so as per #1277 (comment), should we add a small note in our Requests compatibility guide on the default chunking behavior?

florimondmanca · 2020-10-10T07:11:36Z

httpx/_models.py

@@ -912,19 +913,28 @@ def read(self) -> bytes:
            self._content = b"".join(self.iter_bytes())
        return self._content

-    def iter_bytes(self) -> typing.Iterator[bytes]:
+    def iter_bytes(self, chunk_size: int = None) -> typing.Iterator[bytes]:


In this PR, when chunk_size=None we just return the input content unchanged, as one single big chunk.

Yes, this would deviate from what Requests seems to do, but:

Setting a non-None default would break backward compatibility on our side.

Defaulting to "transparently pass the chunk sent by the server" is probably the most reasonable approach anyway.

That said, we'd need to add this deviation from Requests to the compatibility guide. 👍

simonw · 2020-11-16T19:50:16Z

I really like this - it's a great answer to my question in #1392 about safely retrieving a truncated version of a resource when dealing with a URL provided by an untrusted user, as an extra protection against denial-of-service on top of the HTTPX timeouts.

florimondmanca · 2020-11-16T19:53:59Z

And if I'm correct, this would close #394?

piersoh · 2020-11-25T16:08:41Z

Just clarify when it says "chunk_size=None we just return the input content unchanged, as one single big chunk" - presumably that means it returns each HTTP chunk (which may be big or small) as opposed to the whole response? How does this work for HTTP/2?

tomchristie · 2020-11-30T12:39:52Z

@piersoh Correct yes. For HTTP/2 that'll mean chunk sizes that correspond to the data frames on the wire.

tomchristie added 2 commits September 10, 2020 15:42

Support iter_raw(chunk_size=...) and aiter_raw(chunk_size=...)

c1d56dc

Unit tests for ByteChunker

2b5d116

tomchristie mentioned this pull request Sep 10, 2020

Added chunk_size to Response.iter_bytes() and Response.aiter_bytes() (#394) #1271

Closed

tomchristie added 2 commits September 10, 2020 16:45

Support iter_bytes(chunk_size=...)

8b6034f

Add TextChunker

7c40709

cdeler reviewed Sep 10, 2020

View reviewed changes

Support iter_text(chunk_size=...)

bff7dbb

cdeler reviewed Sep 10, 2020

View reviewed changes

httpx/_models.py Show resolved Hide resolved

tomchristie and others added 3 commits October 7, 2020 13:45

Merge branch 'master' into chunk-size

f71e545

Fix merge with master

e9160c2

Merge branch 'master' into chunk-size

d3c8542

florimondmanca approved these changes Oct 10, 2020

View reviewed changes

florimondmanca added the enhancement New feature or request label Oct 10, 2020

florimondmanca mentioned this pull request Nov 16, 2020

Easy way to retrieve just the first X bytes from a URL #1392

Closed

Merge branch 'master' into chunk-size

d61e384

tomchristie merged commit 27df5e4 into master Nov 25, 2020

tomchristie deleted the chunk-size branch November 25, 2020 15:28

tomchristie mentioned this pull request Nov 30, 2020

Version 0.17.0 #1403

Merged

3 tasks

darkdragon-001 mentioned this pull request Sep 16, 2021

Async and sync compatible wrapper python-gitlab/python-gitlab#1036

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for `chunk_size` #1277

Support for `chunk_size` #1277

tomchristie commented Sep 10, 2020 •

edited

cdeler Sep 10, 2020

florimondmanca Oct 10, 2020

florimondmanca left a comment •

edited

florimondmanca Oct 10, 2020

simonw commented Nov 16, 2020

florimondmanca commented Nov 16, 2020

piersoh commented Nov 25, 2020 •

edited

tomchristie commented Nov 30, 2020

Support for chunk_size #1277

Support for chunk_size #1277

Conversation

tomchristie commented Sep 10, 2020 • edited

cdeler Sep 10, 2020

Choose a reason for hiding this comment

florimondmanca Oct 10, 2020

Choose a reason for hiding this comment

florimondmanca left a comment • edited

Choose a reason for hiding this comment

florimondmanca Oct 10, 2020

Choose a reason for hiding this comment

simonw commented Nov 16, 2020

florimondmanca commented Nov 16, 2020

piersoh commented Nov 25, 2020 • edited

tomchristie commented Nov 30, 2020

Support for `chunk_size` #1277

Support for `chunk_size` #1277

tomchristie commented Sep 10, 2020 •

edited

florimondmanca left a comment •

edited

piersoh commented Nov 25, 2020 •

edited