Conversation
|
📝 Documentation updates detected! New suggestion: Add payload size limit and deserialization timeout troubleshooting Tip: Use labels in the Promptless dashboard to categorize suggestions by release or team 🏷️ |
jhcipar
left a comment
There was a problem hiding this comment.
i wonder if we need a workaround for this where we do something like if you pass in arguments larger than a certain size, we upload to blob storage and then pass the blob path or something
the current state is, you can write stuff directly from a local machine to a network volume or something this way. which is actually pretty cool, and it feels kinda bad to be removing that. but i do get the idea
|
this uses the old remote syntax but this is what i mean as a nice feature: @remote(config)
def hello_world_cpu(data):
import os
with open("/runpod-volume/archive.tar.gz", "wb") as f:
f.write(data)
async def main():
with open(".tetra/archive.tar.gz", "rb") as f:
tar_file = f.read()
await asyncio.gather(
hello_world_cpu(tar_file),
) |
we could probably implement a way to write directly to network volumes easily without having to go through an endpoint which would probably be a way better method, if we are committing to only enabling flash in S3 enabled datacenters this could just be a wrapper around S3 in a "flash" way. |
|
ahh yeah great point |
|
like imagine volume = NetworkVolume(...)
async with open("file") as f:
await volume.put("path/to/file", f) |
Deserialization decodes and unpickles the full payload without any pre-flight size check.
MAX_PAYLOAD_SIZEis defined in config but never enforced. A large base64 payload (e.g. 500 MB tensor) expands ~3x in memory during decode+unpickle and OOM-kills the container, silently losing the job. A malformed cloudpickle stream can also hang indefinitely, blocking all subsequent requests on that worker.This enforces the existing
MAX_PAYLOAD_SIZE(10 MB) at the deserialization boundary, rejecting oversized payloads before base64 decoding begins. It also wrapscloudpickle.loadsin a thread with a 30s wall-clock timeout so malformed streams can't hang a worker forever.Both new error types (
PayloadTooLargeError,DeserializeTimeoutError) are subclasses ofSerializationError, so existing catch blocks in the handler code paths continue to work without changes.Closes AE-2382