uvicorn missing --limit-concurrency allows concurrent requests to exceed pod memory

## Problem

`Containerfile:31` runs uvicorn with no `--limit-concurrency`. Each `/execute` request can spawn a subprocess that allocates up to 512 MB (the RLIMIT_AS ceiling from the minimal profile). With the default 704Mi pod memory limit and ~80 MB for the FastAPI parent, the pod can absorb roughly one concurrent execution before the cgroup OOM-killer fires.

In practice:
- 1 concurrent request: ~80 MB (parent) + 512 MB (subprocess) = ~592 MB — fits in 704Mi
- 2 concurrent requests: ~80 MB + 2 × 512 MB = ~1104 MB — exceeds 704Mi → cgroup OOM-kill

The current chart implicitly assumes the caller serializes requests, but nothing enforces that.

## Options

1. **`--limit-concurrency 1`** on uvicorn — simplest, guarantees only one in-flight request per pod. Scale horizontally via replicas.
2. **Semaphore in `pipeline.py`** — `asyncio.Semaphore(1)` around the subprocess spawn, returning 429 to excess callers. More informative to the caller than a connection queue.
3. **Both** — semaphore for clean 429s, `--limit-concurrency` as a backstop.

Option 3 is belt-and-suspenders but cleanest operationally. The semaphore limit could be made configurable via env var to support pods with higher memory limits.

## Context

Found during review of #20 (Python 3.12 memory fix). This was already true at the old 200 MB default (4 × 200 = 800 > 256Mi), so it predates that PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uvicorn missing --limit-concurrency allows concurrent requests to exceed pod memory #23

Problem

Options

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

uvicorn missing --limit-concurrency allows concurrent requests to exceed pod memory #23

Description

Problem

Options

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions