-
-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read upload files using read(CHUNK_SIZE)
rather than iter()
.
#1948
Conversation
Okay, this makes even more sense:
|
read(CHUNK_SIZE)
rather than iter()
.
(edited) |
Can you show me where you mean? |
I have this code: async with NamedTemporaryFile() as tmpfile:
debug(f"Buffering into {tmpfile.name}")
async for data in request.body:
await tmpfile.write(data)
await tmpfile.seek(0)
debug(f"Uploading to {uri} from {tmpfile.name}")
return await client.post(uri, content=tmpfile, …) (this uses aiofiles.tempfile)
|
|
@tomchristie, any ideas? |
Rather than me dig into this myself, lemme point you in the right directions towards figuring this. First up, you've given me a partial example. Can you give me a complete replication, but make it absolutely as simple as you can. Ideally I ought to be able to copy and paste your example to see the behaviour you're talking about. (Once we've got that we'll work through the next steps...) |
Resolves #1911
When digging into this, it turns out that when sending an upload file we're using
iter(file_obj)
, which happens to be a line-by-line iterator, and might yield super-large chunks for binary files. That happens to be slow because you don't really end up streaming the file to the network at all, but rather batching it all up in memory first.The low-hanging fruit here is to cap the size of the chunks that we send to a max of 64k, which from a bit of prodding seems to be a fairly decent value.
It's possible that using
.read()
on a stream if it exists might be beneficial too, but I've not dug into that yet.