Skip to content

fix: align parallel context sizing with slots#69

Merged
inureyes merged 1 commit into
mainfrom
issue-57-parallel-context-semantics
May 21, 2026
Merged

fix: align parallel context sizing with slots#69
inureyes merged 1 commit into
mainfrom
issue-57-parallel-context-semantics

Conversation

@inureyes
Copy link
Copy Markdown
Member

Summary

Implements issue #57 by aligning server parallel context allocation with llama.cpp server semantics:

  • Treat explicit --ctx-size C as a total context budget shared by active request slots.
  • Resolve effective per-slot context as floor(C / N) for --parallel N.
  • Use an explicit --max-batch-size M as the divisor because it is the actual concurrent decode-sequence limit.
  • Keep --no-batch on single-slot semantics so it receives the full explicit context budget.
  • Reject startup when the effective per-slot context is below 512 tokens.
  • Derive the plain KV live-window cap from the same effective per-slot context, optionally tightened by explicit --max-kv-size.
  • Report the effective per-slot context through /slots and /health.context_size.
  • Update serve --estimate-memory so the preflight uses the same per-slot context and active-sequence count as runtime startup.
  • Update CLI help text, docs/environment-variables.md, and CHANGELOG.md.

Note: the issue references stale doc paths (docs/en/..., docs/ko/..., docs/CONTINUOUS_BATCHING.md, docs/man/...) that do not exist in the current tree, so the operator-facing docs were added to the existing environment/flag reference instead.

Validation

  • cargo fmt --all -- --check
  • cargo clippy -p mlxcel --lib --bin mlxcel --bin mlxcel-server --tests -- -D warnings
  • cargo test -p mlxcel --lib context
  • cargo test -p mlxcel --lib build_server_config
  • cargo test -p mlxcel --bin mlxcel serve_preflight
  • git diff --check

Notes

Full cargo test -p mlxcel --lib is still known to fail on the pre-existing NVFP4 sanitize test that was reproduced on origin/main during the preceding issue #52 audit, so this PR uses focused coverage for the changed behavior plus clippy/fmt.

@inureyes inureyes merged commit 31a03ca into main May 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant