Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting chunk_size for Response.iter_bytes() etc... #394

Closed
sethmlarson opened this issue Sep 26, 2019 · 11 comments · Fixed by #1277
Closed

Allow setting chunk_size for Response.iter_bytes() etc... #394

sethmlarson opened this issue Sep 26, 2019 · 11 comments · Fixed by #1277
Labels
requests-compat Issues related to Requests backwards compatibility
Milestone

Comments

@sethmlarson
Copy link
Contributor

Requests allowed setting chunk_size within .iter_content() which is currently not an option for our alternatives .stream() and .stream_text().

For .stream_text() we should go the extra step and fix the issue that users sometimes run into when using this feature and use chunk-size for measuring the decoded text, not the raw bytes.

@sethmlarson sethmlarson added good first issue Good for newcomers requests-compat Issues related to Requests backwards compatibility labels Sep 26, 2019
@tomchristie
Copy link
Member

I seem to remeber this playing into some primitives around streaming bytes vs text that we never ended up digging into?

@sethmlarson sethmlarson removed the good first issue Good for newcomers label Sep 29, 2019
@tomchristie
Copy link
Member

A good first pass onto this would be to change the decoder interface slightly, so that instead of eg. yeilding a byte chunk, they yield a list of byte chunks.

On the first refactoring pass, we don't need to actually change the internal implmentation much - the decoders can just always yield a list with a single item.

We'd then be able to add a chunk_size argument to the decoders, which would return 0, 1, or many properly-sized chunks on each yield.

@florimondmanca florimondmanca changed the title Allow setting chunk_size for Response.stream() and Response.stream_text() Allow setting chunk_size for Response.aiter_*() Dec 21, 2019
@florimondmanca
Copy link
Member

florimondmanca commented Dec 21, 2019

Updated the issue title to reflect the current Response.aiter_* API :-) (see #610).

@b0g3r
Copy link
Contributor

b0g3r commented Mar 12, 2020

How could I help with this issue?

@florimondmanca
Copy link
Member

florimondmanca commented Mar 12, 2020

Hi @b0g3r! I think this is still something we’d like to have, and given discussions in python-gitlab/python-gitlab#1036 it seems like some folks would like to see it too. :)

Ways to move forward would be:

  • Propose an API for this, with context on the existing API on Requests
  • Investigate implementation details (ie how are we going to split chunks: buffering, other? Looking at how Requests/urllib3 do this can help)
  • Draft a PR :)

@b0g3r
Copy link
Contributor

b0g3r commented Mar 13, 2020

Do I understand correctly that we will need to forward chunk_size here?

def response_bytes() -> typing.Iterator[bytes]:
with as_network_error(socket.error):
for chunk in conn.stream(4096, decode_content=False):
yield chunk

@florimondmanca
Copy link
Member

florimondmanca commented Mar 13, 2020

@b0g3r Careful that we're in a sort of transition state w.r.t. urllib3 usage due to #804 (we'll soon use our own sync implementation, though keeping urllib3 as an option). Due to this I wouldn't advise relying on any existing urllib3 functionality — also because we'd want to provide chunk sizing on the async layer too, and it'd be odd to have a different implementation on both sides.

I think we want to look at controlling the chunk size directly from response.iter_bytes()/response.aiter_bytes(), instead…

@tomchristie
Copy link
Member

@b0g3r So, as with comment #394 (comment) - the right place to start with this would be a pull request to https://github.com/encode/httpx/blob/master/httpx/_decoders.py that changes the interface of the decoders, so that they return a list of bytes rather than bytes.

(And correspondingly, changing the places where the response calls the decoder such as

yield self.decoder.decode(chunk)
to deal with a list of bytes as a return result.)

I'd start with that as a foundational pull request, which will then make the remaining work much easier. (Adding chunk sizes to the decoder interface, and through to the response methods.)

@b0g3r
Copy link
Contributor

b0g3r commented Mar 13, 2020

(a)iter_raw(self, chunk_size=1)

for part in self._raw_stream:
    yield part

let's use bytestring as buffer

buffer = b""
for part in self._raw_stream:
    buffer += part
    while len(buffer) >= chunk_size:
        yield buffer[:chunk_size]
        buffer = buffer[chunk_size:]
if buffer:
    yield buffer

(a)iter_bytes(self, chunk_size=ITER_CHUNK_SIZE)

  • chunk_size=ITER_CHUNK_SIZE (512) because requests has it 🌚
  • calls (a)iter_raw

(a)iter_text(self, chunk_size=ITER_CHUNK_SIZE)

  • calls (a)iter_bytes

(a)iter_line(self, chunk_size=ITER_CHUNK_SIZE)

  • calls (a)iter_test
  • current code expects that each chunk containt full line(s), but it's not true (UPD: found code for splitting in LineDecoder)
  • requests has elegant solution

@b0g3r
Copy link
Contributor

b0g3r commented Mar 13, 2020

@tomchristie As I see (a)iter_raw doesn't use any decoder 🤔

@piersoh
Copy link

piersoh commented May 21, 2020

Would be good to have chunk_size=None option so that httpx can return chunks at the HTTP chunk boundaries as per the requests library - this is useful for apps that require timely delivery.

@tomchristie tomchristie added this to the v1.1 milestone Jul 30, 2020
@tomchristie tomchristie modified the milestones: v1.1, v1.0 Aug 7, 2020
@tomchristie tomchristie changed the title Allow setting chunk_size for Response.aiter_*() Allow setting chunk_size for Response.iter_bytes() etc... Aug 7, 2020
@tomchristie tomchristie changed the title Allow setting chunk_size for Response.iter_bytes() etc... Allow setting chunk_size for Response.iter_bytes() etc... Aug 7, 2020
cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020
cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020
cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020
cdeler added a commit to cdeler/httpx that referenced this issue Sep 10, 2020
@tomchristie tomchristie mentioned this issue Sep 22, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requests-compat Issues related to Requests backwards compatibility
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants