RSPEED-2817: add configuration guard to rlsapi_v1 /infer endpoint by major · Pull Request #1468 · lightspeed-core/lightspeed-stack

major · 2026-04-07T12:39:49Z

Description

Add check_configuration_loaded() call to the rlsapi_v1 /infer endpoint so a missing config returns a clean HTTP 500 instead of an opaque LogicError crash. Matches the pattern used by every other endpoint.

Type of change

Bug fix

Tools used to create PR

Assisted-by: Claude (Opus)
Generated by: N/A

Related Tickets & Documents

RSPEED-2817

Note: This PR is stacked on #1463. Review the last commit only; the base commits belong to that PR.

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

uv run make verify - all linters pass
uv run pytest tests/unit/app/endpoints/test_rlsapi_v1.py - 49/49 pass
New test test_infer_endpoint_configuration_not_loaded verifies HTTP 500 when config is unloaded

Summary by CodeRabbit

No user-visible changes detected in this update. This appears to be an internal code improvement with no impact on end-user features or functionality.

coderabbitai · 2026-04-07T12:40:05Z

Walkthrough

An additional validation call to check_configuration_loaded(configuration) is inserted into the infer_endpoint function immediately before request ID generation, complementing an existing validation call at the endpoint start.

Changes

Cohort / File(s)	Summary
Configuration Validation `src/app/endpoints/rlsapi_v1.py`	Added redundant call to `check_configuration_loaded(configuration)` before `request_id` generation in `infer_endpoint` function.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a configuration guard to the rlsapi_v1 /infer endpoint to handle missing configuration gracefully.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/rlsapi_v1.py (1)

268-305: 🧹 Nitpick | 🔵 Trivial

Consider making model_id required or removing internal resolution.

The model_id fallback at line 294 is never exercised: retrieve_simple_response (line 262) and infer_endpoint (line 620) both resolve the model before calling _call_llm. This redundancy contradicts the "transport-only" claim in the docstring.

♻️ Suggested refactor: make model_id required

 async def _call_llm(
     question: str,
     instructions: str,
     tools: Optional[list[Any]] = None,
-    model_id: Optional[str] = None,
+    model_id: str,
 ) -> OpenAIResponseObject:
     """Call the LLM via the Responses API and return the full response object.

     This is a transport-only function: it calls the LLM and returns the raw
     response. Callers are responsible for token usage extraction and metrics.

     Args:
         question: The combined user input (question + context).
         instructions: System instructions for the LLM.
         tools: Optional list of MCP tool definitions for the LLM.
-        model_id: Fully qualified model identifier in provider/model format.
-            When omitted, the configured default model is used.
+        model_id: Fully qualified model identifier in provider/model format.

     Returns:
         The full OpenAIResponseObject from the LLM.

     Raises:
-        APIConnectionError: If the Llama Stack service is unreachable.
-        HTTPException: 503 if no default model is configured.
+        APIConnectionError: If the Llama Stack service is unreachable.
     """
     client = AsyncLlamaStackClientHolder().get_client()
-    resolved_model_id = model_id or await _get_default_model_id()
-    logger.debug("Using model %s for rlsapi v1 inference", resolved_model_id)
+    logger.debug("Using model %s for rlsapi v1 inference", model_id)

     response = await client.responses.create(
         input=question,
-        model=resolved_model_id,
+        model=model_id,
         instructions=instructions,
         tools=tools or [],
         stream=False,
         store=False,
     )
     return cast(OpenAIResponseObject, response)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 268 - 305, The _call_llm
function currently resolves a default model_id internally (via
_get_default_model_id) which is redundant because callers like
retrieve_simple_response and infer_endpoint already resolve the model; make
model_id required by changing the signature of _call_llm to accept a
non-optional model_id, remove the internal resolution (and the await
_get_default_model_id call) so the function becomes a true transport-only
wrapper, and update the docstring to reflect that model_id must be provided;
ensure callers (retrieve_simple_response, infer_endpoint) still pass the
resolved model_id and remove any code/comments relying on the internal fallback.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 268-305: The _call_llm function currently resolves a default
model_id internally (via _get_default_model_id) which is redundant because
callers like retrieve_simple_response and infer_endpoint already resolve the
model; make model_id required by changing the signature of _call_llm to accept a
non-optional model_id, remove the internal resolution (and the await
_get_default_model_id call) so the function becomes a true transport-only
wrapper, and update the docstring to reflect that model_id must be provided;
ensure callers (retrieve_simple_response, infer_endpoint) still pass the
resolved model_id and remove any code/comments relying on the internal fallback.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74ac17e7-3b12-4bc9-955a-baf8d874f038

📥 Commits

Reviewing files that changed from the base of the PR and between 6448468 and 367e0e6.

📒 Files selected for processing (2)

src/app/endpoints/rlsapi_v1.py
tests/unit/app/endpoints/test_rlsapi_v1.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / ci
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: build-pr

🧰 Additional context used

📓 Path-based instructions (4)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

tests/unit/app/endpoints/test_rlsapi_v1.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_rlsapi_v1.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

src/app/endpoints/rlsapi_v1.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/rlsapi_v1.py

🧠 Learnings (1)

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/rlsapi_v1.py

🔇 Additional comments (8)

src/app/endpoints/rlsapi_v1.py (7)

41-41: LGTM!

Import for check_configuration_loaded is correctly added from utils.endpoints.

257-265: LGTM!

The refactoring of retrieve_simple_response to delegate to _call_llm and unconditionally extract token usage is clean. The model_id resolution is handled consistently for both the LLM call and metrics extraction.

445-462: LGTM!

Clean extraction of the verbose mode decision logic. The dual opt-in requirement (server config + client request) is clearly documented and correctly implemented.

465-515: LGTM!

Well-structured helper that cleanly separates verbose and minimal response construction. The consistent field handling across both branches ensures predictable API responses.

599-600: Configuration guard correctly placed.

Adding check_configuration_loaded(configuration) at the start of the endpoint ensures a clean HTTP 500 response instead of an opaque LogicError crash when configuration is missing. This aligns with the pattern used by other endpoints.

625-641: LGTM on the refactored inference flow.

The changes correctly:

Check verbose mode before the try block

Extract token usage unconditionally on success (line 637)

Guard token extraction in the failure path with if response is not None (lines 640-641), which handles cases where exceptions occur after _call_llm returns but before the try block completes

677-682: LGTM!

Clean delegation to _build_infer_response with the verbose-mode conditional correctly passing response or None.

tests/unit/app/endpoints/test_rlsapi_v1.py (1)

496-518: LGTM! Test correctly validates the configuration guard.

The test properly simulates the unloaded configuration state by patching _configuration = None and verifies the expected HTTP 500 response. The # pylint: disable=protected-access comment appropriately acknowledges the private attribute access required for this test scenario.

Call check_configuration_loaded() before processing requests so a missing config returns a clean HTTP 500 instead of an opaque crash. Matches the pattern used by every other endpoint. Ref: RSPEED-2817 Signed-off-by: Major Hayden <major@redhat.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 652-653: Remove the redundant configuration guard call: delete the
second invocation of check_configuration_loaded(configuration) that appears
after the initial call in the same endpoint flow (the duplicate call referenced
in the diff). Keep the first call (the one at the start of the handler) and
ensure there is only a single call to check_configuration_loaded(configuration)
in this function to avoid unnecessary repetition.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f03f0183-ae31-46f9-b6c8-fdfe42747c63

📥 Commits

Reviewing files that changed from the base of the PR and between 367e0e6 and 23814cb.

📒 Files selected for processing (1)

src/app/endpoints/rlsapi_v1.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / ci
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (2)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

src/app/endpoints/rlsapi_v1.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/rlsapi_v1.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

📚 Learning: 2026-04-07T14:44:42.022Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

src/app/endpoints/rlsapi_v1.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/rlsapi_v1.py

coderabbitai · 2026-04-07T16:36:02Z

+    check_configuration_loaded(configuration)
+


🧹 Nitpick | 🔵 Trivial

Remove duplicate configuration guard call.

Line 652 repeats the same check_configuration_loaded(configuration) already executed at Line 644. This check is idempotent and has no side effects, so the second call is unnecessary and should be removed to keep the endpoint flow clean.

Proposed diff

- check_configuration_loaded(configuration) - request_id = get_suid()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

check_configuration_loaded(configuration)

request_id = get_suid()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/rlsapi_v1.py` around lines 652 - 653, Remove the redundant configuration guard call: delete the second invocation of check_configuration_loaded(configuration) that appears after the initial call in the same endpoint flow (the duplicate call referenced in the diff). Keep the first call (the one at the start of the handler) and ensure there is only a single call to check_configuration_loaded(configuration) in this function to avoid unnecessary repetition.

major · 2026-04-07T16:38:53Z

Ah, the other PR fixed this.

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

major mentioned this pull request Apr 7, 2026

RSPEED-2820: add rlsapi_v1 config section with quota enforcement #1469

Merged

6 tasks

major force-pushed the rlsapi-v1-configuration-guard branch from 367e0e6 to 23814cb Compare April 7, 2026 16:33

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

major closed this Apr 7, 2026

major deleted the rlsapi-v1-configuration-guard branch April 7, 2026 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSPEED-2817: add configuration guard to rlsapi_v1 /infer endpoint#1468

RSPEED-2817: add configuration guard to rlsapi_v1 /infer endpoint#1468
major wants to merge 1 commit into
lightspeed-core:mainfrom
major:rlsapi-v1-configuration-guard

major commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 7, 2026

Uh oh!

major commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	check_configuration_loaded(configuration)
	request_id = get_suid()

Conversation

major commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

major commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

major commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading