LCORE-1583: Clear reasoning and max_output_tokens in responses#1432
Conversation
WalkthroughThe changes add handling for a known Llama Stack (LCORE) issue where Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/responses.md`:
- Around line 99-100: Update the docs to consistently mark the `reasoning`
parameter and `max_output_tokens` as "accepted but ignored until LCORE 0.6.0"
wherever they are referenced: update the standalone "reasoning" section, all
examples that show active behavior, and any descriptions that imply they are
supported to explicitly state they are currently cleared/ignored and a warning
is logged; alternatively remove examples demonstrating active `reasoning`
behavior until LCORE 0.6.0 to avoid client confusion.
In `@src/app/endpoints/responses.py`:
- Around line 162-171: Create and sanitize a copy of the incoming
responses_request instead of mutating the parameter in place: make a shallow
copy (e.g., sanitized_request) at the start of the handler where
responses_request is used, apply the existing checks and warnings to
sanitized_request (clearing reasoning and max_output_tokens there), and then use
or return sanitized_request for downstream processing; update all subsequent
references in this function from responses_request to sanitized_request so the
original argument is not modified.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d5326af3-ed8a-446f-969a-fc992e5bdff3
📒 Files selected for processing (2)
docs/responses.mdsrc/app/endpoints/responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: integration_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: Pylinter
- GitHub Check: mypy
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pyright
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: library mode / ci
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use absolute imports for internal modules from the Lightspeed Core Stack (e.g.,
from authentication import get_auth_dependency)
Files:
src/app/endpoints/responses.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use FastAPI imports from the fastapi module:from fastapi import APIRouter, HTTPException, Request, status, Depends
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack client imports
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntaxstr | intinstead ofUnion[str, int]for type annotations
UseOptional[Type]for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor functions performing I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with@abstractmethoddecorators for interface implementations
Use complete type annotations for all class attributes; avoid usingAnytype
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed
Files:
src/app/endpoints/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/responses.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Applied to files:
src/app/endpoints/responses.py
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Description
This PR adds explicit clearing of
reasoningandmax_output_tokensattributes ofresponsesendpoint due to their swapped validation in LLS. The attributes will be supported since LCORE 0.6.0.Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Bug Fixes
reasoningandmax_output_tokensparameters, with warning notifications logged when provided in requests.Documentation