feat(eval): add Modal and Daytona runtime providers for per-rollout cloud sandboxes by lukass16 · Pull Request #423 · hud-evals/hud-python

lukass16 · 2026-06-17T06:03:39Z

Issue

The engine could place rollouts locally, in Docker, on a borrowed substrate, or
HUD-hosted — but not on on-demand cloud sandboxes. We want isolated, parallel
cloud envs (Modal, Daytona) per rollout.

Solution

Two new Providers in hud/eval/runtime.py, same shape as DockerRuntime
(acquire → yield Runtime → tear down), so rollout()/connect()/scheduler are
unchanged:

ModalRuntime — Sandbox.create per rollout from a pre-built image, control
channel over raw TCP (unencrypted_ports), terminate on exit. Image resolves once
behind a lock (from_name, or lazy image= build) so concurrent rollouts can't
race a build.
DaytonaRuntime — sandbox from a snapshot (built once from image= if
missing), env server in a background session, reached via an asyncssh
local-forward (Daytona exposes only HTTPS previews; connect() dials tcp://).
SSH token is internal; users only need DAYTONA_API_KEY. workdir defaults to
/app (scaffold WORKDIR).

Single user handle is the image/snapshot name. Both exported from hud.eval, gated
behind optional [modal]/[daytona] extras. Adds modal_deploy.py to build+publish
the libero image.

Outcome / Verification

Drops in via Taskset.run(runtime=...); no engine/client/protocol changes.
Lint clean; new deps are optional extras.
Follow-ups: --runtime modal|daytona CLI flag, ws:// transport (drop the SSH
hop), warm-pool to amortize cold start.

Note

Medium Risk
Touches core eval placement (container/cloud provisioning, credentials, SSH tunneling) and changes DockerRuntime CLI shape; failures are mostly isolated per rollout but misconfiguration could affect parallel batch runs.

Overview
Adds portable per-task launch requirements via RuntimeConfig (image, RuntimeResources, RuntimeLimits) on Task, wired through platform sync and documented in the types reference. Runtime now carries the effective config after provider defaults merge with row-level overrides via with_overrides.

DockerRuntime is refactored around runtime_config (CPU/memory/GPU count → docker run flags) and surfaces unsupported fields with clear errors. ModalRuntime and DaytonaRuntime are new Providers that spin up isolated cloud sandboxes per rollout (Modal TCP tunnels; Daytona env serve + SSH local-forward), both honoring runtime_config where the backend allows. LocalRuntime and HUDRuntime explicitly reject task-level runtime_config for now.

Public exports include the runtime config types; optional [modal] and [daytona] extras gate cloud SDK deps. Provider contract tests cover Docker/Modal/Daytona mapping and validation.

^{Reviewed by Cursor Bugbot for commit ae79946. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add ModalRuntime as a Provider alongside DockerRuntime: resolve image once (from_name or lazy build), create an isolated Sandbox per rollout, expose the env control channel over raw TCP, terminate on exit. Export from hud.eval and add optional [modal] extra.

…oxes Add DaytonaRuntime as a Provider alongside ModalRuntime: resolve snapshot once (build from image if missing), create an isolated sandbox per rollout, start the env server in a background session, reach it via an asyncssh local-forward (Daytona exposes only HTTPS previews, connect dials tcp://), delete on exit. workdir defaults to /app to match the scaffolded Dockerfile.hud. Export from hud.eval and add optional [daytona] extra.

Environment(capabilities=[...]) called add_capability() before _hooks_done was initialized, raising AttributeError; move the flag init above the loop. Also apply ruff format to satisfy CI (runtime.py, claude sdk agent, cli init). Co-authored-by: Cursor <cursoragent@cursor.com>

The env server binds all interfaces inside the sandbox; the tunnel is the only ingress, so the all-interfaces bind is intentional. Co-authored-by: Cursor <cursoragent@cursor.com>

…smatch The default command hardcoded --port 8765 while the SSH forward used the port arg, so a non-default port left the tunnel pointing at a dead port. Build the default command from port; an explicit command still overrides. Co-authored-by: Cursor <cursoragent@cursor.com>

Add RuntimeConfig to allow tasks to specify runtime images, compute resources, and lifecycle limits. This feature enables more granular control over task execution environments, accommodating varying requirements within the same taskset. Update relevant classes and methods to support this new configuration, including integration into task payloads and validation tests.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d3775af. Configure here.}

lukass16 added 2 commits June 17, 2026 05:08

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread hud/eval/runtime.py

lukass16 and others added 5 commits June 17, 2026 06:08

chore(eval): silence S104 on intentional 0.0.0.0 bind in ModalRuntime

fb27f7f

The env server binds all interfaces inside the sandbox; the tunnel is the only ingress, so the all-interfaces bind is intentional. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(eval): type casting timeout to int for Modal and Daytona

acc264e

fix(eval): make Daytona sandboxes ephemeral by default

5977d5b

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread hud/eval/runtime.py Outdated

lukass16 and others added 2 commits June 17, 2026 21:44

fix(eval): fix exception handling in _ensure_snapshot

4a31f50

jdchawla29 force-pushed the lukass/modal-daytona-runtimes branch from 984917b to 420718b Compare June 18, 2026 19:27

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread hud/eval/runtime.py

Comment thread hud/eval/runtime.py

fix(eval): address runtime config CI feedback

04c0344

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread hud/eval/runtime.py Outdated

adjustments

d466034

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread hud/eval/runtime.py

fix(eval): keep docker image shorthand

d3775af

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread hud/eval/runtime.py

fix(eval): reject daytona run timeouts consistently

ae79946

jdchawla29 merged commit 566ecfe into v6 Jun 18, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eval): add Modal and Daytona runtime providers for per-rollout cloud sandboxes#423

feat(eval): add Modal and Daytona runtime providers for per-rollout cloud sandboxes#423
jdchawla29 merged 13 commits into
v6from
lukass/modal-daytona-runtimes

lukass16 commented Jun 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lukass16 commented Jun 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Solution

Outcome / Verification

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukass16 commented Jun 17, 2026 •

edited by cursor Bot

Loading