-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When stream=True iter_content(chunk_size=None) reads the input as a single big chunk #5536
Comments
|
@sigmavirus24 I don't think the server sends the file all at once. The example above produces no output for ~30 seconds and then prints 533830860. This starts printing right away: from requests import get
URL = 'https://dl.fedoraproject.org/pub/alt/iot/32/IoT/x86_64/images/Fedora-IoT-32-20200603.0.x86_64.raw.xz'
r = get(URL, stream=True)
for b in r.iter_content(chunk_size=2**23):
print(len(b)) |
I have the same issue with 2.24.0 When I use a |
Can confirm the same is occurring. Works as with Can try to put together a reproducible example if that's helpful? |
As promised, here's a reproducible example against httpbin.org:
Run this and you'll see that Change the chunk_size to 1 and everything works nicely (albeit with high overhead). If somebody can point me in the right direction, I'm happy to investigate this and do what is required to fix it. |
Any resolution to this? I am also still seeing this on v2.25.1 |
Hi @stephen-goveia, this is a behavior in urllib3 as noted in urllib3/urllib3#2123. We aren't able to change it in Requests, so the outcome will be determined by whether this makes it into the urllib3 v2 release. |
thanks @nateprewitt! |
Hi. I don't understand why this issue is still open.
|
Even after setting stream=True this is still an issue: import requests
import time
chunk_size = None
URL = 'https://httpbin.org/drip?duration=20&numbytes=4'
r = requests.get(URL, stream=True)
t = time.monotonic()
for x in r.iter_content(chunk_size=chunk_size):
t2 = time.monotonic()
print(f'{t2 - t}')
t = time.monotonic() prints:
|
Please keep in mind that I'm making this comment as a user, not as a contributor. You're right. It is.. but please read the documentation.
What should the module do when you ask not to download everything at once but to download "Nothing"? Just check the content-length header and set a suitable chunk size when dealing with large files |
It is not only about large files, it is also about SSE (server sent events). They are streamed, and clients expect them to arrive directly after the server sends them. |
No movement on this in ~8 months... Any update? |
Possible workaround using the resp = requests.get("something", stream=True)
for chunk in resp.raw.stream():
print(f"chunk size: {len(chunk)}") |
@mbhynes Not sure what you were doing to have that "work", but it certainly doesn't do what I'd expect... import requests
url = "https://httpbin.org/drip?duration=2&numbytes=8"
resp = requests.get(url, stream=True)
for chunk in resp.raw.stream():
print(f"chunk size: {len(chunk)}") just gives me a single 8-byte chunk back after 2 seconds, rather than 8 single byte chunks every few hundred milliseconds. I'd assume your endpoint happens to be returning the data via a "chunked transfer encoding" which has been able to handle streaming data in chunks for a long time already, but you could check by doing: print(resp.headers.get("transfer-encoding")) That said, I've created a pull-request with |
According to the documentation when stream=True iter_content(chunk_size=None) "will read data as it arrives in whatever size the chunks are received", But it actually collects all input into a single big bytes object consuming large amounts of memory and entirely defeating the purpose of iter_content().
Expected Result
iter_content(chunk_size=None) yields "data as it arrives in whatever size the chunks are received".
Actual Result
A single big chunk
Reproduction Steps
prints
System Information
The text was updated successfully, but these errors were encountered: