RSPEED-2875: bump /v1/infer question limit to 32KB and add /v1/responses body size validator#1510
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (7)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
🧰 Additional context used📓 Path-based instructions (4)tests/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/unit/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/models/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (12)📓 Common learnings📚 Learning: 2026-04-15T18:54:09.157ZApplied to files:
📚 Learning: 2026-04-15T18:53:57.901ZApplied to files:
📚 Learning: 2026-04-07T14:44:42.022ZApplied to files:
📚 Learning: 2026-04-05T12:19:36.009ZApplied to files:
📚 Learning: 2026-01-12T10:58:40.230ZApplied to files:
📚 Learning: 2026-02-25T07:46:33.545ZApplied to files:
📚 Learning: 2026-04-15T18:54:07.540ZApplied to files:
📚 Learning: 2026-02-23T14:11:46.950ZApplied to files:
📚 Learning: 2026-04-06T20:18:11.336ZApplied to files:
📚 Learning: 2026-04-16T19:08:26.366ZApplied to files:
📚 Learning: 2026-04-16T19:08:35.441ZApplied to files:
🔇 Additional comments (7)
WalkthroughAdds a JSON-serialized request-body size validator (65,536 chars) to ResponsesRequest and raises the Rlsapi v1 Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/models/requests.py`:
- Around line 769-773: Extract the hardcoded 65_536/65536 into a shared constant
(e.g., MAX_REQUEST_BODY_CHARS = 65536) in constants.py, import that constant
into src.models.requests and replace both the numeric literal in the conditional
(len(serialized) > MAX_REQUEST_BODY_CHARS) and the message interpolation so the
message uses the constant value, and update any tests to reference the same
constant to prevent drift.
- Around line 742-769: The validate_body_size model_validator currently
serializes the input with json.dumps defaults which escapes non-ASCII and adds
spaces, inflating the character count; update the validate_body_size classmethod
to import a new constant (e.g., REQUEST_BODY_MAX_CHARS) from constants.py and
use json.dumps(values, ensure_ascii=False, separators=(",", ":")) to get a
compact, accurate wire-format length check, then compare len(serialized) against
REQUEST_BODY_MAX_CHARS and raise the same ValueError when exceeded; add the
constant in constants.py (set to 65536) and update the imports in
src/models/requests.py accordingly.
In `@src/models/rlsapi/requests.py`:
- Line 180: The OpenAPI schema still documents question.maxLength as 10240 while
the model RlsapiV1InferRequest.question now allows 32768; update the generated
docs/openapi.json so the question schema's maxLength is 32768 (or regenerate the
OpenAPI spec from the updated model/schema definitions) to keep clients and UI
validation consistent with RlsapiV1InferRequest.question.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d58987c7-f102-47e9-9bf5-78c11f847d72
📒 Files selected for processing (5)
src/models/requests.pysrc/models/rlsapi/requests.pytests/integration/endpoints/test_rlsapi_v1_integration.pytests/unit/models/requests/test_responses_request.pytests/unit/models/rlsapi/test_requests.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: bandit
- GitHub Check: radon
- GitHub Check: mypy
- GitHub Check: Pylinter
- GitHub Check: integration_tests (3.12)
- GitHub Check: integration_tests (3.13)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async unit tests
Files:
tests/integration/endpoints/test_rlsapi_v1_integration.pytests/unit/models/rlsapi/test_requests.pytests/unit/models/requests/test_responses_request.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/models/rlsapi/test_requests.pytests/unit/models/requests/test_responses_request.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/models/rlsapi/requests.pysrc/models/requests.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Usetyping_extensions.Selffor model validators in type annotations
Pydantic data models should extendBaseModel
IncludeAttributes:section in Pydantic model docstrings
Files:
src/models/rlsapi/requests.pysrc/models/requests.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/rlsapi/requests.pysrc/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/rlsapi/requests.pysrc/models/requests.py
📚 Learning: 2026-02-23T14:11:46.950Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.
Applied to files:
src/models/requests.py
🔇 Additional comments (3)
tests/unit/models/requests/test_responses_request.py (1)
20-58: Good boundary coverage for the new body-size guard.These tests validate the critical accept/reject edges (normal, exact limit, one-over, oversized structured input) and assert useful error messaging.
tests/integration/endpoints/test_rlsapi_v1_integration.py (1)
519-535: Boundary update is correct and aligned with the new infer question limit.Using 32,769 chars for the
questioncase correctly tests the one-over rejection path formax_length=32,768.tests/unit/models/rlsapi/test_requests.py (1)
639-646: Test parameter update is correct and keeps boundary validation intact.The
questionmax-length case now matches the new32_768constraint and is exercised through the existing at-limit/over-limit test pattern.
| @model_validator(mode="before") | ||
| @classmethod | ||
| def validate_body_size(cls, values: Any) -> Any: | ||
| """Validate that the request body does not exceed the maximum allowed size. | ||
|
|
||
| Serializes the raw request payload to JSON and checks the total character | ||
| count against the 65,536-character limit. This guard runs before field | ||
| coercion so that the limit reflects only what the client actually sent, | ||
| not the expanded representation produced by Pydantic's defaults. | ||
|
|
||
| Parameters: | ||
| values: The raw input dict (or other object) passed to the model. | ||
|
|
||
| Returns: | ||
| Any: ``values`` unchanged when the size check passes. | ||
|
|
||
| Raises: | ||
| ValueError: If the JSON-serialized size of ``values`` exceeds | ||
| 65,536 characters. | ||
| """ | ||
| try: | ||
| serialized = json.dumps(values) | ||
| except (TypeError, ValueError): | ||
| # Non-JSON-serializable payload (e.g. programmatic use with Pydantic | ||
| # model instances). The size guard only applies to wire-format HTTP | ||
| # requests which FastAPI always parses into JSON-compatible dicts. | ||
| return values | ||
| if len(serialized) > 65_536: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import json
samples = [
'{"input":"é"}', # non-ASCII character
'{"input":"' + ("x" * 20) + '"}' # compact ASCII payload
]
for raw in samples:
obj = json.loads(raw)
current = json.dumps(obj) # current implementation
compact = json.dumps(obj, separators=(",", ":"), ensure_ascii=False)
print("raw_len=", len(raw), "current_len=", len(current), "compact_len=", len(compact))
print("current:", current)
print("compact:", compact)
print("---")
PYRepository: lightspeed-core/lightspeed-stack
Length of output: 308
🏁 Script executed:
# Verify the file exists and check the relevant lines
head -n 775 src/models/requests.py | tail -n 35Repository: lightspeed-core/lightspeed-stack
Length of output: 1508
🏁 Script executed:
# Check if a constant is defined for the 65536 limit in constants.py
find . -name "constants.py" -type f -exec grep -l "65536\|65_536" {} \;Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Search for any reference to body size limits in the codebase
rg "65.?536" --type pyRepository: lightspeed-core/lightspeed-stack
Length of output: 2249
Body-size validation overcounts non-ASCII and compact JSON payloads.
The validator uses json.dumps(values) with default parameters, which applies ASCII escaping and spaced separators. This inflates the character count compared to the actual client payload—particularly for non-ASCII characters (which expand 40%+ when escaped as \uXXXX) and even ASCII payloads (which gain spaces after colons). Requests near the 65,536-character boundary will be incorrectly rejected.
Additionally, the magic number 65_536 should be extracted to constants.py per the coding guidelines.
💡 Suggested fix
- serialized = json.dumps(values)
+ serialized = json.dumps(values, separators=(",", ":"), ensure_ascii=False)Also extract the limit to constants.py and import it here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/models/requests.py` around lines 742 - 769, The validate_body_size
model_validator currently serializes the input with json.dumps defaults which
escapes non-ASCII and adds spaces, inflating the character count; update the
validate_body_size classmethod to import a new constant (e.g.,
REQUEST_BODY_MAX_CHARS) from constants.py and use json.dumps(values,
ensure_ascii=False, separators=(",", ":")) to get a compact, accurate
wire-format length check, then compare len(serialized) against
REQUEST_BODY_MAX_CHARS and raise the same ValueError when exceeded; add the
constant in constants.py (set to 65536) and update the imports in
src/models/requests.py accordingly.
There was a problem hiding this comment.
Verified — no change needed.
The default json.dumps separators add ~0.2% overhead (a few hundred chars on a 64KB payload). This makes the validator slightly more conservative (rejects marginally earlier), which is the safe direction. The ensure_ascii=True default similarly inflates non-ASCII — again, conservative.
An exact wire-format match is impossible regardless of separator/encoding choices, because Python's json.dumps output is never byte-identical to what the client sent (key ordering, whitespace, encoding all vary by client). Adding ensure_ascii=False, separators=(",",":") would create a false sense of precision without meaningfully changing the behavior.
The current implementation is correct and intentionally simple.
There was a problem hiding this comment.
@Lifto, that's a well-reasoned explanation — the conservative overcount is clearly the right trade-off here, and the "false precision" point about separators/ensure_ascii is a good one. I'll withdraw the suggestion.
(╯°□°)╯ ← ┻━┻
✏️ Learnings added
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In lightspeed-core/lightspeed-stack `src/models/requests.py`, the `ResponsesRequest.validate_body_size` validator uses `json.dumps(values)` with default parameters (ASCII escaping, spaced separators) intentionally. The slight overcount (~0.2% overhead) makes the 65,536-character limit conservatively strict, which is the desired safe direction. An exact wire-format match is impossible since Python's json.dumps output is never byte-identical to the client payload. Do not suggest switching to `ensure_ascii=False` or compact separators for this validator.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
4e8d0a0 to
7ef053a
Compare
|
/ok-to-test |
tisnik
left a comment
There was a problem hiding this comment.
Please rebase, the OpenAPI spec probably should not be there
asimurka
left a comment
There was a problem hiding this comment.
LGTM in overall. Please add those limits to constants.py to avoid magic numbers in code.
…y size validator - Raise question field max_length in RlsapiV1InferRequest from 10,240 to 32,768 chars to match the CLA client's 32KB input limit. Previously, valid CLA inputs would be rejected with a spurious 422. - Add @model_validator(mode='before') to ResponsesRequest that serialises the raw request body with json.dumps and rejects payloads whose total character count exceeds 65,536. The /v1/responses endpoint previously accepted arbitrarily large payloads without any guard. - Update the existing parametrised unit test and integration size-limit test to match the new 32,768 threshold. - Add tests/unit/models/requests/test_responses_request.py with four boundary tests: normal request accepted, at-limit (65,536 chars) accepted, one-over (65,537) rejected with clear error message, and large list input rejected. Relates-to: RSPEED-2875
7ef053a to
fb3efda
Compare
Description
Two targeted input-size guardrails:
/v1/infer— raisequestionlimit from 10 KB → 32 KBThe CLA client enforces a 32,000-character input limit (
MAX_QUESTION_SIZEincommand_line_assistant/commands/chat.py). The previous 10,240-character cap on thequestionfield caused legitimate CLA requests to be rejected with a spurious 422. Raisedmax_lengthinRlsapiV1InferRequestto 32,768 to comfortably accommodate the CLA limit with headroom./v1/responses— add 64 KB whole-body size guardThe
/responsesendpoint previously accepted arbitrarily-sized payloads. A new@model_validator(mode='before')onResponsesRequestserialises the raw request withjson.dumpsand rejects anything whose total character count exceeds 65,536. Usingmode='before'ensures the limit applies to the actual client payload, not the expanded form Pydantic produces after applying defaults. Returns 422 (not 413, which is reserved for LLM context-window exceeded).Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
Unit tests — run with
uv run make test-unit:tests/unit/models/requests/test_responses_request.py— 4 new boundary tests:{"input": "hello"}) acceptedIntegration tests — run with
uv run make test-integration:tests/integration/endpoints/test_rlsapi_v1_integration.py::test_infer_size_limit[question]— confirms HTTP 422 for a 32,769-char question (one over the new limit); the other three parametrised cases (stdin, attachment, terminal) remain at 65,536.Linters —
uv run make verifypasses clean: black, ruff, pylint 10/10, pyright 0 errors, pydocstyle, mypy.Files changed
src/models/rlsapi/requests.pymax_length=10_240→max_length=32_768onquestionfieldsrc/models/requests.pyimport json+ newvalidate_body_sizemodel validator onResponsesRequesttests/unit/models/rlsapi/test_requests.pytests/integration/endpoints/test_rlsapi_v1_integration.py"?" * 32_769tests/unit/models/requests/test_responses_request.pySummary by CodeRabbit
New Features
Improvements