Skip to content

uvicorn missing --limit-concurrency allows concurrent requests to exceed pod memory #23

@rdwj

Description

@rdwj

Problem

Containerfile:31 runs uvicorn with no --limit-concurrency. Each /execute request can spawn a subprocess that allocates up to 512 MB (the RLIMIT_AS ceiling from the minimal profile). With the default 704Mi pod memory limit and ~80 MB for the FastAPI parent, the pod can absorb roughly one concurrent execution before the cgroup OOM-killer fires.

In practice:

  • 1 concurrent request: ~80 MB (parent) + 512 MB (subprocess) = ~592 MB — fits in 704Mi
  • 2 concurrent requests: ~80 MB + 2 × 512 MB = ~1104 MB — exceeds 704Mi → cgroup OOM-kill

The current chart implicitly assumes the caller serializes requests, but nothing enforces that.

Options

  1. --limit-concurrency 1 on uvicorn — simplest, guarantees only one in-flight request per pod. Scale horizontally via replicas.
  2. Semaphore in pipeline.pyasyncio.Semaphore(1) around the subprocess spawn, returning 429 to excess callers. More informative to the caller than a connection queue.
  3. Both — semaphore for clean 429s, --limit-concurrency as a backstop.

Option 3 is belt-and-suspenders but cleanest operationally. The semaphore limit could be made configurable via env var to support pods with higher memory limits.

Context

Found during review of #20 (Python 3.12 memory fix). This was already true at the old 200 MB default (4 × 200 = 800 > 256Mi), so it predates that PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions