Add resources config to Daytona sandbox creation, fix OOM on npm install (fixes #40)#43
Merged
Conversation
…all (fixes #40) AutobrinContenderConfig gains an optional resources field (cpu/memory/disk, matching @daytona/sdk's Resources shape) for transport: "daytona", threaded through runDaytonaEngagement (src/daytona/launcher.ts -- the only path connecting the contender config to createSandbox()) into the sandbox's CreateSandboxFromImageParams. Only applies when creating from "image", not "snapshot": Daytona snapshots fix their resources at snapshot-build time, so createSandbox()'s snapshot params have no resources field to override -- createAutobrinRunner() now rejects that combination with a clear error instead of silently dropping it. examples/node22-bookworm-computer-use-image.ts now requests 4GiB by default. Confirmed empirically against real Daytona sandboxes: the platform default (no resources override) is a 1GiB cgroup memory limit that reliably OOM-kills a full `npm install` of AutoBrin-flue's dependency tree (2/2 reproductions, exact "Killed" signature from #40); 4GiB completed the same install successfully every time (3/3), and held up through full real CVE-Bench engagements including live computer-use/Chromium exploitation (peak observed cgroup usage ~1.4GB of the 4GiB budget).
|
Current version of PR was reviewed by /review-bugbot on Jul 2, 17:55 GMT+2. It flagged 0 findings. Bugbot on commit |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
examples/node22-bookworm-computer-use-image.ts's declarative image build never setresources(cpu/memory/disk) on sandbox creation, so it got the Daytona platform default -- confirmed empirically to be a 1GiB cgroup memory limit, which reliably gets a fullnpm installof AutoBrin-flue's dependency tree OOM-killed during sandbox bootstrap (bootstrapAutobrinFlue(),src/daytona/bootstrap.ts), before any modality-specific work even starts.AutobrinContenderConfig(src/contenders/autobrin.ts) gains an optionalresources?: Pick<Resources, 'cpu' | 'memory' | 'disk'>field (reusing@daytona/sdk's ownResourcestype for the field shape), fortransport: "daytona"only.src/daytona/launcher.ts(not in the issue's suggested 3-file list, but the only code path connectingAutobrinContenderConfigtocreateSandbox()--runDaytonaEngagement's own options never had aresourcesfield to forward, so this couldn't be wired without touching it) --DaytonaRunOptionsgainsresources?: Resources, applied only on the"image"sandbox-creation branch.resourcesonly applies when creating fromimage, notsnapshot: Daytona snapshots fix their resources at snapshot-build time (CreateSnapshotParams.resourcesin the SDK), socreateSandbox()'s snapshot params (CreateSandboxFromSnapshotParams) have noresourcesfield to override at sandbox-creation time.createAutobrinRunner()now rejectsresources+snapshotcombined with a clear error instead of silently dropping the setting.src/daytona/client.tsneeded no code change:createSandbox()already generically forwards whateverSandboxCreateInputit's given (confirmed by reading the SDK's publicCreateSandboxFromImageParamstype, which already declaresresources?: Resources) -- the gap was purely that nothing upstream ever set it. Added a small test (tests/daytona.test.ts) making this explicit.examples/node22-bookworm-computer-use-image.tsnow requests 4GiB by default (resources: { memory: 4 }), determined empirically (see below), not guessed.resources.Empirical memory investigation
Confirmed directly against real Daytona sandboxes (same declarative image, real
bootstrapAutobrinFlue()clone+install+build, mirroring the actual code path):bash: line 19: NNN Killed npm install >> .../autobrin-flue-install.logsignature from the issuenpm install+npm run build) completed successfully in ~35-37s4GiB was the first value tried (per the issue's suggestion) and was sufficient, so no higher value was needed.
Real live verification
Ran the actual
autobrincontender withtransport: "daytona"+ the newresources: { memory: 4 }end-to-end against real CVE-Bench Docker targets, bridged to the remote sandbox via temporary Cloudflare quick tunnels (same pattern as #36/#39's own verification:standUpTarget()stands up the real target locally, tunnels bridge app + evaluator ports, sandbox reaches the tunnel URLs). Model:kimi-azure/kimi-k2.6, 1 cycle / 1 contributor / $3 cap per attempt.stopReason: "maxCycles reached", $1.96, 3 hypotheses explored including a full live computer-use/Chromium exploitation attempt against the evaluator'swebapp-computerskill) -- confirmed directly by inspecting the live sandbox'sresult.jsonand cgroup memory (peaked at ~1.4GB of the 4GiB budget, comfortable headroom even with a full Chromium browser stack running). My own local orchestration script (not part of this PR) was interrupted before it could read the result back, requiring a manual sandbox/Docker cleanup pass -- infrastructure noise from a throwaway verification tool on my end, not a symptom of the sandbox or the fix.Across all real engagement attempts, live-inspected sandbox memory never exceeded ~1.4GB of the 4GiB budget, including through the heaviest observed workload (a full Chromium instance for computer-use exploitation) -- confirming 4GiB has comfortable headroom, not just enough to pass once.
Sandboxes, Docker containers/networks, and Cloudflare tunnel processes from this verification were all torn down and confirmed clean afterward (no sandboxes of mine left
started,docker ps -aempty).Test plan
npm run validate(typecheck +vitest run, 289/289 passing)tests/autobrin-contender.test.ts:createAutobrinRunneracceptsresourceswithimage, rejectsresources+snapshottests/autobrin-daytona-sequencing.test.ts:runViaDaytonaforwardsconfig.resourcesintorunDaytonaEngagement's options (and omits it when unset)tests/daytona-launcher.test.ts:runDaytonaEngagementpassesresourcestocreateSandbox()on the"image"branch, never on the"snapshot"branchtests/daytona.test.ts:createSandbox()forwardsresources(and other fields) through todaytona.create()transport: "daytona"(see above)