Skip to content

Regenerate API client + add forwarder_max_concurrency wrapper (MLI-6875)#180

Merged
diazagasatya merged 2 commits into
masterfrom
feat/MLI-6875-forwarder-max-concurrency-sdk
May 29, 2026
Merged

Regenerate API client + add forwarder_max_concurrency wrapper (MLI-6875)#180
diazagasatya merged 2 commits into
masterfrom
feat/MLI-6875-forwarder-max-concurrency-sdk

Conversation

@diazagasatya
Copy link
Copy Markdown
Contributor

@diazagasatya diazagasatya commented May 28, 2026

Summary

Regenerates the API client from the latest llm-engine OpenAPI schema and exposes forwarder_max_concurrency on the user-facing client. Companion to scaleapi/llm-engine#835 (MLI-6876).

This regen also incidentally picks up four months of schema drift since #176 (Jan 2026). All changes are additive.

New fields exposed in the regen (all Optional[int] with default None)

Field Schemas Source
forwarder_max_concurrency CreateModelEndpointV1Request, UpdateModelEndpointV1Request, ModelEndpointDeploymentState MLI-6876 (this PR)
queue_message_timeout_seconds 14 schemas (Create/Update model endpoint variants + Get responses) prior llm-engine PR
task_expires_seconds same 14 schemas prior llm-engine PR
ctx, input ValidationError pydantic v2 native fields

Non-breaking change verification

Check Result
REST paths removed 0 (47 → 47)
HTTP operations removed 0
Schemas removed/renamed 0 (222 → 222)
Fields removed from existing schemas 0
Type changes on existing fields 0
Fields newly required 0
Hand-written wrapper imports All resolve
Generated method signatures Identical (spot-checked)
Test suite (pytest tests/) 20 passed, 4 skipped

Wrapper changes (launch/client.py)

  • create_model_endpoint: new forwarder_max_concurrency: Optional[int] = None kwarg (pass-through, no client-side validation — server enforces 1 ≤ N ≤ 20), docstring, and forwarded into the request payload.
  • edit_model_endpoint: same.
  • update_if_exists branch inside create_model_endpoint: forwards the kwarg through to edit_model_endpoint.

Out of scope

create_llm_model_endpoint is intentionally untouched. The upstream CreateLLM*ModelEndpointRequest family of schemas (vLLM, SGLang, etc.) did not gain forwarder_max_concurrency in MLI-6876 — covering those would require a separate llm-engine PR. Worth tracking as a follow-up if there's a use case (the original Whisper bug that motivated this work used the generic endpoint path, not the LLM-specific one).

Regen procedure

Used the documented justfile workflow:

just fetch-schema main
just generate     # required --skip-validate-spec for pre-existing OAS 3.1 contentMediaType in upload_file body

Version bumped to 0.4.2.

Test plan

  • poetry run pytest tests/ — 20 passed, 4 skipped
  • Schema-level diff vs HEAD — 0 breaking changes
  • Wrapper imports + syntax — clean
  • Smoke-test the new kwarg against a staging Launch deployment once llm-engine MLI-6876 is deployed

🤖 Generated with Claude Code

Greptile Summary

This PR regenerates the auto-generated API client from the latest llm-engine OpenAPI schema and exposes forwarder_max_concurrency on the user-facing LaunchClient wrapper. The regen is additive-only: 4 months of schema drift is picked up without removing any endpoints, operations, schemas, or existing fields.

  • launch/client.py: forwarder_max_concurrency: Optional[int] = None added to both create_model_endpoint and edit_model_endpoint, including the update_if_exists delegate call and the direct-create payload path. Docstrings updated consistently.
  • Generated models: forwarder_max_concurrency (max=20), queue_message_timeout_seconds (min=1, max=43200), and task_expires_seconds (min=1) added as optional nullable fields across CreateModelEndpointV1Request, UpdateModelEndpointV1Request, ModelEndpointDeploymentState, and 14 other schemas; input/ctx added to ValidationError for pydantic v2 compatibility.
  • Version: bumped from 0.4.1 → 0.4.2 in pyproject.toml.

Confidence Score: 5/5

Safe to merge — all changes are additive, no existing fields or endpoints are removed, and the new forwarder_max_concurrency kwarg is consistently threaded through every call path.

The wrapper changes are mechanical and well-scoped: the new kwarg is plumbed identically through the direct-create path, the update_if_exists delegate, and the edit path. The generated model changes are additive-only and match the upstream schema. No breaking changes were introduced.

No files require special attention.

Important Files Changed

Filename Overview
launch/client.py forwarder_max_concurrency added to both create_model_endpoint and edit_model_endpoint signatures, docstrings, update_if_exists branch, and payload dicts — all three paths are consistent.
launch/api_client/model/create_model_endpoint_v1_request.py Adds forwarder_max_concurrency (inclusive_maximum=20 only), queue_message_timeout_seconds, and task_expires_seconds; missing lower-bound constraint on forwarder_max_concurrency (already flagged in previous thread).
launch/api_client/model/update_model_endpoint_v1_request.py Same additions as create_model_endpoint_v1_request.py; forwarder_max_concurrency has only inclusive_maximum=20 with no lower-bound guard.
launch/api_client/model/model_endpoint_deployment_state.py Adds forwarder_max_concurrency as an optional read-only response field — consistent with the create/update schemas.
launch/api_client/model/validation_error.py Adds optional input (AnyTypeSchema) and ctx (DictSchema) fields for pydantic v2 ValidationError compatibility; both are optional with correct Unset defaults.
pyproject.toml Version bumped from 0.4.1 to 0.4.2.

Reviews (2): Last reviewed commit: "chore: apply isort to regenerated api_cl..." | Re-trigger Greptile

Regenerates the API client from llm-engine main, picking up four months
of schema drift since PR #176. All changes are additive:

- forwarder_max_concurrency (CreateModelEndpointV1Request /
  UpdateModelEndpointV1Request / ModelEndpointDeploymentState) — MLI-6876
- queue_message_timeout_seconds (14 schemas)
- task_expires_seconds (14 schemas)
- ctx, input on ValidationError (pydantic v2 native fields)

Verified non-breaking:
- 0 paths/operations/schemas removed
- 0 fields removed or type-changed
- 0 fields newly required (Optional with default None)
- Hand-written wrapper imports + signatures unchanged
- Test suite: 20 passed, 4 skipped

Wrapper changes (client.py only):
- create_model_endpoint: add forwarder_max_concurrency kwarg + docstring
- edit_model_endpoint: add forwarder_max_concurrency kwarg + docstring
- update_if_exists path: forward the kwarg through to edit_model_endpoint

Note: create_llm_model_endpoint is intentionally left unchanged because
the upstream CreateLLM*ModelEndpointRequest schemas do not yet expose
forwarder_max_concurrency — that would require a separate llm-engine PR.

Version bumped to 0.4.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>


class MetaOapg:
inclusive_maximum = 20
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing lower-bound constraint on forwarder_max_concurrency

The OpenAPI spec defines exclusiveMinimum: 0.0 for this field (meaning the value must be ≥ 1), but the generator only emitted inclusive_maximum = 20 — there is no corresponding exclusive_minimum or inclusive_minimum. As a result, forwarder_max_concurrency=0 passes client-side schema validation and only fails at the server. The same gap exists in update_model_endpoint_v1_request.py. Since these are generated files the fix belongs in the upstream schema or the generator invocation, but it's worth tracking so callers get an early error rather than an opaque server rejection.

Prompt To Fix With AI
This is a comment left during a code review.
Path: launch/api_client/model/create_model_endpoint_v1_request.py
Line: 356

Comment:
**Missing lower-bound constraint on `forwarder_max_concurrency`**

The OpenAPI spec defines `exclusiveMinimum: 0.0` for this field (meaning the value must be ≥ 1), but the generator only emitted `inclusive_maximum = 20` — there is no corresponding `exclusive_minimum` or `inclusive_minimum`. As a result, `forwarder_max_concurrency=0` passes client-side schema validation and only fails at the server. The same gap exists in `update_model_endpoint_v1_request.py`. Since these are generated files the fix belongs in the upstream schema or the generator invocation, but it's worth tracking so callers get an early error rather than an opaque server rejection.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Cursor Fix in Claude Code Fix in Codex

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker to this PR but a good catch. This is out of scope and a 0 field input will get a clear 422 rejection instead of silently failing.

The OpenAPI generator's import ordering doesn't match the repo's isort
config, so CI's `isort --check-only launch` step failed. This commit is
purely the output of `poetry run isort launch` — no semantic changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@diazagasatya diazagasatya requested a review from a team May 28, 2026 20:05
@diazagasatya diazagasatya merged commit a14527f into master May 29, 2026
3 checks passed
@diazagasatya diazagasatya deleted the feat/MLI-6875-forwarder-max-concurrency-sdk branch May 29, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants