Skip to content

fix(execution): cap isolate memory at 128MB and recycle workers every 200 executions#4543

Merged
waleedlatif1 merged 4 commits intostagingfrom
waleedlatif1/memory-leak-check
May 10, 2026
Merged

fix(execution): cap isolate memory at 128MB and recycle workers every 200 executions#4543
waleedlatif1 merged 4 commits intostagingfrom
waleedlatif1/memory-leak-check

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented May 10, 2026

Summary

  • Revert isolate memoryLimit from 256MB → 128MB in isolated-vm-worker.cjs (both executeCode and executeTask). 256MB was added in improvement(sandbox): upgrade pptx/docx/pdf bootstrap with image helpers, MIME guards, and 256 MB isolate limit #4505 for doc generation; production data (48h) shows zero Reached memory limit errors, so the headroom was unused. Error messages updated accordingly.
  • Lower MAX_EXECUTIONS_PER_WORKER default from 500 → 200 in both apps/sim/lib/core/config/env.ts (env schema default) and apps/sim/lib/execution/isolated-vm.ts (fallback). Workers recycle 2.5× more aggressively. All 89 worker retirements in the last 48h hit the 500 ceiling, and MemoryTelemetry shows native context count climbing to 475 with RSS peaks of 15.6GB — only process.kill() reclaims that native memory, so faster recycling caps steady-state RSS.
  • Both values remain env-overridable (IVM_MAX_EXECUTIONS_PER_WORKER).

Why

Production app task triggered sim-production-us-east-1-app-task-high-memory alarm, climbing from 1GB → 13.7GB in 30 minutes after the v0.6.72 deploy. Math (13GB ÷ 256MB ≈ 50 isolates) + telemetry (externalMB 11.7GB peak, nativeContexts 475) point at native memory accumulating in the parent across executions faster than workers recycle.

Note on chosen value (200)

Initial fix used 100 (5× reduction). Raised to 200 after weighing tradeoffs:

  • 200 still cuts steady-state native memory accumulation by 2.5× vs the prior 500
  • Less spawn churn under burst load
  • Defensible range from telemetry was 50–250; 200 is the conservative end with most headroom for tail-latency cost
  • Env override (IVM_MAX_EXECUTIONS_PER_WORKER) lets us drop to 100 without a code change if post-deploy telemetry shows RSS still climbing

Type of Change

  • Bug fix

Testing

Tested manually. Spawn-rate impact at 200 is negligible: peak 74/hr → ~185/hr across the 4-worker pool, distributed via retiring flag (no in-flight interruption).

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 10, 2026 1:22am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 10, 2026

PR Summary

Medium Risk
Touches the isolated execution sandbox and worker lifecycle limits, which can change runtime behavior and cause more OOMs or process churn under load. Changes are small but affect a performance/stability-critical path.

Overview
Lowers isolated-vm memoryLimit from 256MB → 128MB for both code and task execution workers, and updates the associated memory-limit error messaging.

Reduces the default IVM_MAX_EXECUTIONS_PER_WORKER / MAX_EXECUTIONS_PER_WORKER from 500 → 200 (still env-overridable), causing worker processes to retire and respawn more frequently to curb memory accumulation.

Reviewed by Cursor Bugbot for commit eb6068b. Configure here.

Comment thread apps/sim/lib/execution/isolated-vm.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 10, 2026

Greptile Summary

This PR makes two targeted production fixes to address a memory alarm triggered by native heap accumulation in the isolated-vm worker pool: it reverts the isolate memoryLimit from 256 → 128 MB (production telemetry shows zero memory-limit errors, confirming the headroom was unnecessary) and lowers MAX_EXECUTIONS_PER_WORKER from 500 → 200 so workers recycle 2.5× more often, capping steady-state RSS.

  • isolated-vm-worker.cjs: memoryLimit reduced to 128 MB in both executeCode and executeTask; error messages updated to match.
  • env.ts + isolated-vm.ts: Default and fallback for MAX_EXECUTIONS_PER_WORKER lowered to 200 in both places, keeping them in sync. The env override IVM_MAX_EXECUTIONS_PER_WORKER remains available for further tuning without a code deploy.

Confidence Score: 5/5

Safe to merge — all three files are consistent with each other and the changes are backed by 48h of production telemetry.

The changes are narrow and mutually consistent: the env schema default, the in-code fallback, the isolate memory cap, and the error messages all tell the same story. The 256 → 128 MB revert is validated by zero memory-limit errors in production. The 500 → 200 recycling threshold directly addresses the observed RSS growth (nativeContexts climbing to 475, RSS peaking at 15.6 GB). Both values remain env-overridable, so further tuning requires no code change.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/lib/execution/isolated-vm-worker.cjs Reverts memoryLimit from 256 → 128 MB in both executeCode and executeTask; error messages updated to match. Both sites changed consistently.
apps/sim/lib/core/config/env.ts Single-line change lowering IVM_MAX_EXECUTIONS_PER_WORKER default from 500 → 200; consistent with isolated-vm.ts fallback.
apps/sim/lib/execution/isolated-vm.ts Fallback for MAX_EXECUTIONS_PER_WORKER lowered from 500 → 200, matching the env.ts schema default exactly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[New execution request] --> B{Worker available?}
    B -- yes --> C[Assign to worker]
    B -- no --> D[Queue / spawn new worker]
    C --> E[Create ivm.Isolate\nmemoryLimit: 128 MB]
    E --> F[Execute code / task]
    F --> G{Memory exceeded?}
    G -- yes --> H[Return MemoryLimitError\n'128 MB exceeded']
    G -- no --> I[Return result]
    I --> J[Increment worker.execCount]
    J --> K{execCount >= 200?}
    K -- yes --> L[Mark worker retiring\nprocess.kill on idle]
    K -- no --> M[Worker stays in pool]
    L --> N[Spawn replacement worker]
Loading

Reviews (4): Last reviewed commit: "fix(execution): update memory limit erro..." | Re-trigger Greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/execution/isolated-vm-worker.cjs
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit eb6068b. Configure here.

Comment thread apps/sim/lib/core/config/env.ts
Comment thread apps/sim/lib/execution/isolated-vm.ts
@waleedlatif1 waleedlatif1 changed the title fix(execution): cap isolate memory at 128MB and recycle workers every 100 executions fix(execution): cap isolate memory at 128MB and recycle workers every 200 executions May 10, 2026
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit eb6068b. Configure here.

@waleedlatif1 waleedlatif1 merged commit aae93f8 into staging May 10, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/memory-leak-check branch May 10, 2026 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant