fix: enforce payload size limit and timeout on deserialization by KAJdev · Pull Request #289 · runpod/flash

KAJdev · 2026-03-25T21:41:41Z

Deserialization decodes and unpickles the full payload without any pre-flight size check. MAX_PAYLOAD_SIZE is defined in config but never enforced. A large base64 payload (e.g. 500 MB tensor) expands ~3x in memory during decode+unpickle and OOM-kills the container, silently losing the job. A malformed cloudpickle stream can also hang indefinitely, blocking all subsequent requests on that worker.

This enforces the existing MAX_PAYLOAD_SIZE (10 MB) at the deserialization boundary, rejecting oversized payloads before base64 decoding begins. It also wraps cloudpickle.loads in a thread with a 30s wall-clock timeout so malformed streams can't hang a worker forever.

Both new error types (PayloadTooLargeError, DeserializeTimeoutError) are subclasses of SerializationError, so existing catch blocks in the handler code paths continue to work without changes.

Closes AE-2382

…ze-check-large-payload-ooms

promptless · 2026-03-25T21:47:30Z

📝 Documentation updates detected!

New suggestion: Add payload size limit and deserialization timeout troubleshooting

Tip: Use labels in the Promptless dashboard to categorize suggestions by release or team 🏷️

jhcipar

i wonder if we need a workaround for this where we do something like if you pass in arguments larger than a certain size, we upload to blob storage and then pass the blob path or something

the current state is, you can write stuff directly from a local machine to a network volume or something this way. which is actually pretty cool, and it feels kinda bad to be removing that. but i do get the idea

jhcipar · 2026-03-25T22:16:21Z

this uses the old remote syntax but this is what i mean as a nice feature:

@remote(config)
def hello_world_cpu(data):
    import os
    with open("/runpod-volume/archive.tar.gz", "wb") as f:
        f.write(data)

async def main():
    with open(".tetra/archive.tar.gz", "rb") as f:
        tar_file = f.read()
    await asyncio.gather(
        hello_world_cpu(tar_file),
    )

KAJdev · 2026-03-25T22:16:26Z

i wonder if we need a workaround for this where we do something like if you pass in arguments larger than a certain size, we upload to blob storage and then pass the blob path or something

the current state is, you can write stuff directly from a local machine to a network volume or something this way. which is actually pretty cool, and it feels kinda bad to be removing that. but i do get the idea

we could probably implement a way to write directly to network volumes easily without having to go through an endpoint which would probably be a way better method, if we are committing to only enabling flash in S3 enabled datacenters this could just be a wrapper around S3 in a "flash" way.

jhcipar · 2026-03-25T22:17:04Z

ahh yeah great point

KAJdev · 2026-03-25T22:19:30Z

like imagine

volume = NetworkVolume(...)

async with open("file") as f:
    await volume.put("path/to/file", f)

KAJdev added 4 commits March 25, 2026 14:40

fix: enforce payload size limit and timeout on deserialization

b316035

fix: ruff formatting

5aef72f

Merge branch 'main' into zeke/ae-2382-flash-deserialization-has-no-si…

e2832a1

…ze-check-large-payload-ooms

fix: update large payload regression test for size limit

c679c46

KAJdev requested review from deanq and jhcipar March 25, 2026 21:52

jhcipar approved these changes Mar 25, 2026

View reviewed changes

KAJdev merged commit 1240d82 into main Mar 25, 2026
4 checks passed

KAJdev deleted the zeke/ae-2382-flash-deserialization-has-no-size-check-large-payload-ooms branch March 25, 2026 22:21

runpod-release-please-bot bot mentioned this pull request Mar 25, 2026

chore: release 1.11.1 #291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enforce payload size limit and timeout on deserialization#289

fix: enforce payload size limit and timeout on deserialization#289
KAJdev merged 4 commits intomainfrom
zeke/ae-2382-flash-deserialization-has-no-size-check-large-payload-ooms

KAJdev commented Mar 25, 2026

Uh oh!

promptless bot commented Mar 25, 2026

Uh oh!

jhcipar left a comment

Uh oh!

jhcipar commented Mar 25, 2026 •

edited

Loading

Uh oh!

KAJdev commented Mar 25, 2026 •

edited

Loading

Uh oh!

jhcipar commented Mar 25, 2026

Uh oh!

KAJdev commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KAJdev commented Mar 25, 2026

Uh oh!

promptless bot commented Mar 25, 2026

Uh oh!

jhcipar left a comment

Choose a reason for hiding this comment

Uh oh!

jhcipar commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KAJdev commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhcipar commented Mar 25, 2026

Uh oh!

KAJdev commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhcipar commented Mar 25, 2026 •

edited

Loading

KAJdev commented Mar 25, 2026 •

edited

Loading