Skip to content

Conversation

@asimurka
Copy link

@asimurka asimurka commented Nov 23, 2025

Description

  • Adds new modular response models that are used to throw exceptions across all endpoints.
  • Provides methods to dynamically generate openapi documentation using examples wired directly inside the models.
  • Minimizes throwing of general Exceptions. Instead, adds new middleware that catches all unexpected exceptions and provides a uniform response to them.
  • Updates unit tests so that almost all endpoints have 100% code coverage.
  • Updates integration tests with new endpoint status codes.
  • Updates e2e tests to now reflect swagger documentation.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Bug Fixes

    • Authentication errors now return 401 (not 400); service/connectivity failures return 503 (not 500).
  • Improvements

    • Standardized structured error payloads with "response" and "cause" across endpoints for clearer diagnostics.
    • Consistent error handling and examples surfaced in API docs (OpenAPI) for many endpoints.
  • New Features

    • Global exception middleware returns standardized internal-error responses for unexpected failures.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 23, 2025

Walkthrough

Standardizes API responses across the codebase by introducing rich response models and OpenAPI helpers, replacing inline HTTPException detail payloads with model-based responses, updating endpoint response mappings, adding a global exception middleware, and updating tests to expect the new structured error formats and status codes.

Changes

Cohort / File(s) Summary
Response models & factories
src/models/responses.py
Adds AbstractSuccessfulResponse and AbstractErrorResponse, many concrete success and error response classes (BadRequestResponse, UnauthorizedResponse, ForbiddenResponse, NotFoundResponse, QuotaExceededResponse, InternalServerErrorResponse, ServiceUnavailableResponse, ModelsResponse, ToolsResponse, QueryResponse, ConfigurationResponse, etc.), openapi_response() helpers, factory/constructor classmethods, examples, and new data models (RAGChunk, ToolCall, ConversationData, ReferencedDocument, etc.).
Endpoint response mappings (bulk)
src/app/endpoints/...
authorized.py, config.py, conversations.py, conversations_v2.py, feedback.py, health.py, info.py, metrics.py, models.py, providers.py, query.py, query_v2.py, rags.py, root.py, shields.py, streaming_query.py, streaming_query_v2.py, tools.py
Replaces hard-coded response dicts and inline HTTPException details with .openapi_response() usages and response.model_dump() in raised HTTPException; adds per-endpoint response maps (e.g., conversation_get_responses), minor handler signature additions (Request), new helpers (simplify_session_data, group_providers), and adjusts status mappings (e.g., 400→401, 500→503) where applicable.
Authentication / Authorization
src/authentication/jwk_token.py, src/authentication/k8s.py, src/authentication/utils.py, src/authorization/middleware.py
Adds get_jwk_set cache and KeyNotFoundError; replaces inline auth error HTTPExceptions with UnauthorizedResponse / ForbiddenResponse / InternalServerErrorResponse / ServiceUnavailableResponse via model_dump(); consolidates token and K8s error handling; adds ClusterIDUnavailableError.
Global middleware & app wiring
src/app/main.py
Adds global_exception_middleware to catch unhandled exceptions and return InternalServerErrorResponse via JSONResponse, while re-raising HTTPException unchanged.
Utilities & telemetry
src/metrics/utils.py, src/utils/endpoints.py, src/utils/quota.py
Adds configuration checks and maps API/DB errors to ServiceUnavailableResponse/InternalServerErrorResponse/QuotaExceededResponse; decorates setup_model_metrics with @run_once_async.
Providers / grouping
src/app/endpoints/providers.py
Moves to typed ProviderListResponse / ProviderResponse returns, replaces manual dict transforms with client calls and group_providers typed to ProviderListResponse; maps API errors to ServiceUnavailableResponse / NotFoundResponse.
Tests — unit, integration, e2e
tests/unit/**, tests/integration/**, tests/e2e/features/*
Updates many tests to expect structured error detail objects ({"response":..., "cause":...}) and status-code shifts (notably 400→401 for auth, 500→503 for connection errors); introduces minimal_config fixture and adapts mocks to AppConfig; adds new model response tests and middleware integration tests; removes some legacy tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Endpoint
    participant ResponseModel
    participant Middleware
    participant ClientResp as Client Response

    rect rgba(200,220,255,0.6)
    Note over Endpoint,ResponseModel: Success flow (typed responses)
    Client->>Endpoint: HTTP request
    Endpoint->>ResponseModel: build ConfigurationResponse / ToolsResponse ...
    ResponseModel-->>Endpoint: typed object
    Endpoint->>ClientResp: 200 + JSON (model serialized)
    end

    rect rgba(255,220,220,0.6)
    Note over Endpoint,ResponseModel: Auth error (old vs new)
    Client->>Endpoint: HTTP request without auth
    alt Old
        Endpoint-->>ClientResp: 400 {"detail": "No Authorization header found"}
    else New
        Endpoint->>ResponseModel: UnauthorizedResponse(cause="No Authorization header found")
        ResponseModel-->>Endpoint: object (status_code=401)
        Endpoint->>Middleware: raise HTTPException(status_code=401, detail=response.model_dump())
        Middleware-->>ClientResp: 401 {"detail":{"response":"Missing or invalid credentials provided by client","cause":"No Authorization header found"}}
    end
    end

    rect rgba(220,255,220,0.6)
    Note over Endpoint,Middleware: Unhandled exception -> global middleware
    Client->>Endpoint: HTTP request
    Endpoint->>Endpoint: raises ValueError (uncaught)
    Middleware->>ResponseModel: InternalServerErrorResponse.generic()
    ResponseModel-->>Middleware: object
    Middleware-->>ClientResp: 500 {"detail":{"response":"Internal server error","cause":"<trace/cause>"}}
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas requiring extra attention:

  • src/models/responses.py — large surface change: validate schema, examples, openapi_response outputs, and factory methods.
  • Widespread endpoint edits — ensure consistent use of response.openapi_response() and response.model_dump() and verify status-code semantics changed intentionally (401 vs 400, 503 vs 500).
  • Authentication/K8s/JWK logic — concurrency and caching around get_jwk_set and error mapping.
  • global_exception_middleware in src/app/main.py — ensure middleware re-raises HTTPException correctly and does not swallow expected errors or break streaming endpoints.
  • Tests — large set of updated tests; confirm assertions align with serialized model shapes and that fixtures (minimal_config) are used correctly.

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • eranco74
  • tisnik

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: a comprehensive exception handling refactor across the codebase. It directly relates to the primary objective of introducing modular response models and standardized error handling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +1032 to +1033
"user_data_collection": {
"feedback_enabled": True,

Check warning

Code scanning / Bandit

Probable insecure usage of temp file/directory. Warning

Probable insecure usage of temp file/directory.
Comment on lines +1032 to +1033
"user_data_collection": {
"feedback_enabled": True,

Check warning

Code scanning / Bandit

Probable insecure usage of temp file/directory. Warning

Probable insecure usage of temp file/directory.
"transcripts_storage": "/tmp/data/transcripts",
},
"database": {
"sqlite": {"db_path": "/tmp/lightspeed-stack.db"},

Check warning

Code scanning / Bandit

Probable insecure usage of temp file/directory. Warning

Probable insecure usage of temp file/directory.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
tests/unit/authentication/test_noop_with_token.py (1)

60-89: Bring docstring in line with 401 + structured error detail

The test correctly asserts a 401 with a {response, cause} detail for the missing-header case, but the docstring still refers to a 400 status and a flat "No Authorization header found" message. Updating the text will avoid confusion for future readers.

     Test that NoopWithTokenAuthDependency raises an HTTPException when no
     Authorization header is present in the request.
 
-    Asserts that the exception has a status code of 400 and the detail message
-    "No Authorization header found".
+    Asserts that the exception has a status code of 401 and a structured
+    detail message indicating that no Authorization header was found.
src/authorization/middleware.py (1)

77-95: Forbidden response probably should use auth[0] (user ID) instead of username

auth is a 4‑tuple (user_id, username, skip_userid_check, token) (see ensure_test_user_id_and_name in tests/unit/authentication/test_jwk_token.py), but the forbidden branch calls:

response = ForbiddenResponse.endpoint(user_id=auth[1])

ForbiddenResponse.endpoint’s message is phrased in terms of a user ID. Passing the username here makes the error text say “User {username} is not authorized to access this endpoint”, which doesn’t match the model’s field naming and may be confusing.

Switching to auth[0] will align the payload with the documented user ID.

-    if not access_resolver.check_access(action, user_roles):
-        response = ForbiddenResponse.endpoint(user_id=auth[1])
-        raise HTTPException(**response.model_dump())
+    if not access_resolver.check_access(action, user_roles):
+        response = ForbiddenResponse.endpoint(user_id=auth[0])
+        raise HTTPException(**response.model_dump())
src/authentication/k8s.py (2)

174-205: get_user_info docstring and error mapping could be tightened

The new behavior—raising a 503 ServiceUnavailableResponse only when K8sClientSingleton.get_authn_api() fails, and treating create_token_review exceptions as “token invalid” by returning None—is coherent and works with the caller’s 401 mapping.

However, the Raises: section still suggests that any unexpected error triggers an HTTPException, which is no longer true for TokenReview API failures. Consider updating the docstring to clarify that only failures to initialize the Kubernetes auth API result in an immediate 503, while TokenReview errors are logged and treated as an invalid/expired token at the call site.


223-285: SubjectAccessReview 403s are being turned into 503s by the broad except block

In K8SAuthDependency.__call__, the forbidden case:

if not response.status.allowed:
    response = ForbiddenResponse.endpoint(user_id=user_info.user.uid)
    raise HTTPException(**response.model_dump())

is inside a try that’s immediately followed by:

except Exception as e:
    logger.error("API exception during SubjectAccessReview: %s", e)
    response = ServiceUnavailableResponse(...)
    raise HTTPException(**response.model_dump()) from e

Because HTTPException is a subclass of Exception, every 403 you raise for a denied SubjectAccessReview is caught and rethrown as a 503 ServiceUnavailableResponse. Callers will never see a proper 403 Forbidden from this path.

Narrowing the except to ApiException (or restricting the try just to the client call) fixes this while still mapping genuine Kubernetes API failures to 503.

-        try:
-            authorization_api = K8sClientSingleton.get_authz_api()
-            sar = kubernetes.client.V1SubjectAccessReview(
-                spec=kubernetes.client.V1SubjectAccessReviewSpec(
-                    user=user_info.user.username,
-                    groups=user_info.user.groups,
-                    non_resource_attributes=kubernetes.client.V1NonResourceAttributes(
-                        path=self.virtual_path, verb="get"
-                    ),
-                )
-            )
-            response = authorization_api.create_subject_access_review(sar)
-
-            if not response.status.allowed:
-                response = ForbiddenResponse.endpoint(user_id=user_info.user.uid)
-                raise HTTPException(**response.model_dump())
-        except Exception as e:
-            logger.error("API exception during SubjectAccessReview: %s", e)
-            response = ServiceUnavailableResponse(
-                backend_name="Kubernetes API",
-                cause="Unable to perform authorization check",
-            )
-            raise HTTPException(**response.model_dump()) from e
+        authorization_api = K8sClientSingleton.get_authz_api()
+        sar = kubernetes.client.V1SubjectAccessReview(
+            spec=kubernetes.client.V1SubjectAccessReviewSpec(
+                user=user_info.user.username,
+                groups=user_info.user.groups,
+                non_resource_attributes=kubernetes.client.V1NonResourceAttributes(
+                    path=self.virtual_path,
+                    verb="get",
+                ),
+            )
+        )
+
+        try:
+            sar_response = authorization_api.create_subject_access_review(sar)
+        except ApiException as e:
+            logger.error("API exception during SubjectAccessReview: %s", e)
+            error_response = ServiceUnavailableResponse(
+                backend_name="Kubernetes API",
+                cause="Unable to perform authorization check",
+            )
+            raise HTTPException(**error_response.model_dump()) from e
+
+        if not sar_response.status.allowed:
+            error_response = ForbiddenResponse.endpoint(user_id=user_info.user.uid)
+            raise HTTPException(**error_response.model_dump())

Also, the class docstring still claims only a 403 is raised “if the token is invalid, expired, or the user is not authorized”. After introducing UnauthorizedResponse (401) and ServiceUnavailableResponse (503) paths, it would be good to update that summary to reflect the broader set of outcomes.

tests/unit/app/endpoints/test_shields.py (1)

24-42: Reset AppConfig before simulating unloaded config

AppConfig is a process-wide singleton, so if any earlier test initialized it, _configuration stays populated. In that state this test silently exercises the “configuration loaded” branch and never raises the expected 500. Clear _configuration on the instance you patch in so the test reliably hits the failure path.

-    mock_config = AppConfig()
-    mocker.patch("app.endpoints.shields.configuration", mock_config)
+    mock_config = AppConfig()
+    mock_config._configuration = None  # pylint: disable=protected-access
+    mocker.patch("app.endpoints.shields.configuration", mock_config)
tests/unit/app/endpoints/test_config.py (1)

20-44: Ensure the singleton really mimics “not loaded” state

Because AppConfig() returns the shared singleton, any prior init_from_dict call leaves _configuration non-None. Without resetting it here, this test won’t raise and will incorrectly succeed. Clear _configuration on the patched instance before invoking the handler so the guard fires.

-    mock_config = AppConfig()
-    mocker.patch("app.endpoints.config.configuration", mock_config)
+    mock_config = AppConfig()
+    mock_config._configuration = None  # pylint: disable=protected-access
+    mocker.patch("app.endpoints.config.configuration", mock_config)
tests/unit/app/endpoints/test_streaming_query.py (1)

153-171: Override the autouse config fixture when testing the failure path

The autouse setup_configuration fixture initializes AppConfig before every test, so AppConfig() here already has _configuration populated. As a result, the guard never trips and this assertion is false. Explicitly null out _configuration on the instance you patch into the module so this test actually exercises the “configuration not loaded” branch.

-    mock_config = AppConfig()
-    mocker.patch("app.endpoints.streaming_query.configuration", mock_config)
+    mock_config = AppConfig()
+    mock_config._configuration = None  # pylint: disable=protected-access
+    mocker.patch("app.endpoints.streaming_query.configuration", mock_config)
tests/unit/app/endpoints/test_query.py (1)

1777-1860: Fix inaccurate type hints in test_evaluate_model_hints

The parametrized test passes UserConversation | None for user_conversation and tuples for request_values / expected_values, but the function signature annotates them all as list. This is misleading for readers and static analyzers, even though it’s harmless at runtime.

Consider tightening the annotations, e.g.:

-def test_evaluate_model_hints(
-    user_conversation: list,
-    request_values: list,
-    expected_values: list,
-) -> None:
+def test_evaluate_model_hints(
+    user_conversation: UserConversation | None,
+    request_values: tuple[str | None, str | None],
+    expected_values: tuple[str | None, str | None],
+) -> None:
src/app/endpoints/query.py (1)

577-607: Fix return type mismatch in parse_metadata_from_text_item

parse_metadata_from_text_item is annotated to return Optional[ReferencedDocument], but when text_item is not a TextContentItem it returns docs, which is a list[ReferencedDocument]. That’s inconsistent with the signature and could surprise future callers or type checkers.

You can simplify and fix this by removing the unused docs list and returning None in the non-TextContentItem case:

-def parse_metadata_from_text_item(
-    text_item: TextContentItem,
-) -> Optional[ReferencedDocument]:
+def parse_metadata_from_text_item(
+    text_item: TextContentItem,
+) -> Optional[ReferencedDocument]:
@@
-    docs: list[ReferencedDocument] = []
-    if not isinstance(text_item, TextContentItem):
-        return docs
+    if not isinstance(text_item, TextContentItem):
+        return None
@@
-    return None
+    return None

This matches the declared return type and the way the function is used (single document or None).

tests/unit/app/endpoints/test_conversations.py (1)

859-861: Incorrect use of set(Action.X) when patching get_actions

In tests where you restrict authorized actions, e.g.:

mocker.patch(
    "authorization.resolvers.NoopAccessResolver.get_actions",
    return_value=set(Action.GET_CONVERSATION),
)

and similarly for Action.DELETE_CONVERSATION, set(Action.GET_CONVERSATION) creates a set of characters from the enum’s string value (e.g., {"g", "e", "t", ...}), not {Action.GET_CONVERSATION}.

This can cause the authorize decorator to see a completely different set of actions than intended and potentially short‑circuit with an endpoint‑level 403 instead of exercising the conversation‑level forbidden path you’re trying to test.

You likely want:

return_value={Action.GET_CONVERSATION}
# and
return_value={Action.DELETE_CONVERSATION}

instead of set(Action.…).

🧹 Nitpick comments (28)
tests/unit/authentication/test_utils.py (1)

23-41: Prefer cast over # type: ignore for exc.detail

The new 401 + structured-detail checks are correct, but test_extract_user_token_no_header still relies on # type: ignore while the next test uses cast. Mirroring the pattern improves consistency and type-checking.

 def test_extract_user_token_no_header() -> None:
@@
-    try:
-        extract_user_token(headers)
-    except HTTPException as exc:
-        assert exc.status_code == 401
-        assert exc.detail["response"] == (  # type: ignore
-            "Missing or invalid credentials provided by client"
-        )
-        assert exc.detail["cause"] == "No Authorization header found"  # type: ignore
+    try:
+        extract_user_token(headers)
+    except HTTPException as exc:
+        assert exc.status_code == 401
+        detail = cast(dict[str, str], exc.detail)
+        assert detail["response"] == "Missing or invalid credentials provided by client"
+        assert detail["cause"] == "No Authorization header found"
src/authentication/jwk_token.py (1)

37-46: Avoid holding the cache lock across network I/O in get_jwk_set

The error mapping and use of UnauthorizedResponse here looks consistent with the new error model surface, but get_jwk_set currently holds _jwk_cache_lock while performing the entire aiohttp request and JSON/JWK parsing. This unnecessarily serializes all concurrent requests that miss the cache.

Consider narrowing the critical section to just the cache read/write, and doing the HTTP fetch outside the lock (with a simple double‑checked pattern) so that slow or failing key servers don't become a bottleneck:

 async def get_jwk_set(url: str) -> KeySet:
-    """Fetch the JWK set from the cache, or fetch it from the URL if not cached."""
-    async with _jwk_cache_lock:
-        if url not in _jwk_cache:
-            async with aiohttp.ClientSession() as session:
-                async with session.get(url) as resp:
-                    resp.raise_for_status()
-                    _jwk_cache[url] = JsonWebKey.import_key_set(await resp.json())
-        return _jwk_cache[url]
+    """Fetch the JWK set from the cache, or fetch it from the URL if not cached."""
+    cached = _jwk_cache.get(url)
+    if cached is not None:
+        return cached
+
+    async with _jwk_cache_lock:
+        # Re-check under the lock in case another task populated it
+        cached = _jwk_cache.get(url)
+        if cached is not None:
+            return cached
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(url) as resp:
+                resp.raise_for_status()
+                jwk_set = JsonWebKey.import_key_set(await resp.json())
+
+        _jwk_cache[url] = jwk_set
+        return jwk_set

This keeps correctness while improving concurrency characteristics.

Also applies to: 132-165

src/models/responses.py (2)

41-58: Be aware that AbstractSuccessfulResponse.openapi_response now hard‑fails when examples are missing

AbstractSuccessfulResponse.openapi_response() pulls examples from model_json_schema() and raises SchemaError if none are present. This is a nice guardrail, but it means any future subclass that forgets to define model_config["json_schema_extra"]["examples"] will cause an exception at import/OpenAPI generation time.

If that strictness is intentional, it’s fine; otherwise, consider a softer fallback (e.g., returning a response spec without example instead of raising).

Also applies to: 69-87


1087-1141: Validate that AbstractErrorResponse.openapi_response matches how FastAPI actually serializes HTTPException

AbstractErrorResponse models errors as:

  • status_code: int
  • detail: DetailModel(response, cause)

and openapi_response() builds examples like:

named_examples[label] = {"value": {"detail": detail}}

while routes raise errors via HTTPException(**response.model_dump()), meaning the actual HTTP body is {"detail": {"response": ..., "cause": ...}}.

That’s coherent, but a few things to double‑check:

  • The schema for cls (with status_code + detail) will not match the example values (which omit status_code). OpenAPI tooling may complain about examples not matching the schema.
  • When no examples are defined in json_schema_extra, named_examples stays empty and you emit "examples": None, which is a bit odd; you might prefer to either raise (like AbstractSuccessfulResponse) or omit the examples key entirely.

If you want tighter alignment, you could either:

  • Include status_code in the example values, or
  • Change model in the response spec to represent the actual body shape (i.e., just the detail object) rather than the full helper model.

This is largely a documentation/contract concern; runtime behavior with HTTPException(**model_dump()) looks fine.

Also applies to: 1439-1542, 1555-1685

tests/e2e/features/feedback.feature (1)

111-115: Feedback error responses correctly aligned with new response models

The updated 403/401/500 expectations (nested detail with response/cause) match ForbiddenResponse.feedback_disabled, UnauthorizedResponse (“missing header”), and InternalServerErrorResponse.feedback_path_invalid("/invalid") from src/models/responses.py, so the E2E specs look consistent with the refactored error handling.

One small consistency nit: the ForbiddenResponse OpenAPI example for the "feedback" label still uses "response": "Storing feedback is disabled." while this scenario (and feedback_disabled()) use "Feedback is disabled". Consider aligning the example text so docs, models, and tests all show the same message.

Also applies to: 262-269, 277-284, 308-310

src/app/endpoints/shields.py (1)

16-22: Shields endpoint now correctly wired into standardized response models

The shields_responses mapping and the APIConnectionError → ServiceUnavailableResponse translation (raised via HTTPException(**response.model_dump())) look consistent with the new error-handling pattern and the ServiceUnavailableResponse contract.

One optional consideration: cause=str(e) will surface the raw APIConnectionError message to clients. If you prefer to avoid leaking backend details, you might want to replace this with a more generic cause (similar to the "Connection error while trying to reach backend service." example) and log the full exception only server‑side.

Also applies to: 29-37, 79-82

tests/unit/app/endpoints/test_feedback.py (1)

33-53: Config mocking for is_feedback_enabled/is_feedback_disabled is sound; consider isolating from real FS checks

The pattern of creating an AppConfig, setting its _configuration.user_data_collection, and patching app.endpoints.feedback.configuration is appropriate and keeps the test focused on is_feedback_enabled().

Minor suggestion: UserDataCollection’s validator will call checks.directory_check(Path("/tmp"), ...) when feedback_enabled=True. To avoid any dependency on the host filesystem, you could patch checks.directory_check in these tests.

src/app/endpoints/tools.py (1)

29-140: Tools endpoint refactor correctly standardizes responses; minor optional optimizations

The new implementation cleanly:

  • Documents 200/401/403/500/503 via tools_responses with ToolsResponse and the error response models.
  • Maps APIConnectionError from both toolgroups.list() and tools.list() to an HTTP 503 via ServiceUnavailableResponse, matching the PR’s error-handling goals.
  • Skips missing toolgroups on BadRequestError and aggregates tools with a clear server_source split between MCP and builtin using configuration.mcp_servers.

Two small, optional refinements:

  • For large numbers of MCP servers/toolgroups, consider precomputing a name -> server dict once instead of using next(... for s in configuration.mcp_servers ...) on each tool (Lines 107–119).
  • The “toolgroup is not found” case (Lines 89–91) may be more of a warning than an error; downgrading the log level to logger.warning could better reflect that it’s a client/config issue rather than a server failure.

These are non-blocking and can be deferred.

tests/unit/app/endpoints/test_conversations_v2.py (2)

93-107: ReferencedDocument construction in transform tests looks good; # type: ignore likely unnecessary

The docs list using ReferencedDocument(doc_title="Test Doc", doc_url="http://example.com") correctly exercises the referenced_documents path in transform_chat_message, including URL normalization.

The trailing # type: ignore on the list definition seems unnecessary here and could mask real type issues in the future. You can probably drop it without changing behavior.


582-628: Inconsistent semantics between success flag and “cannot be deleted” message

In test_unsuccessful_deletion, when conversation_cache.delete returns False, the test asserts:

  • response.success is True
  • response.response == "Conversation cannot be deleted"

This combination is semantically confusing: the success flag indicates success while the message states it “cannot be deleted”. Unless the intent is to treat “already deleted/no-op” as a success case, consider either:

  • Setting success to False and using a clearly error-like message, or
  • Keeping success as True but rephrasing the message to something like “Conversation already deleted” or “No conversation to delete”.

Right now the API surface is ambiguous for clients interpreting this response.

src/utils/endpoints.py (1)

159-173: System-prompt disabling behavior is explicit and user-friendly

Raising an UnprocessableEntityResponse when disable_query_system_prompt is set and a system_prompt is supplied provides a precise 422 with a helpful cause message telling clients to remove the field. This is a solid UX improvement.

If this pattern is reused elsewhere, you might later factor a small helper like UnprocessableEntityResponse.system_prompt_disabled() to avoid hard‑coding the message in multiple places, but it’s not urgent.

src/app/endpoints/query.py (2)

214-243: Update docstring to reflect 503 on backend connection failures

query_endpoint_handler_base’s docstring still says connection errors to Llama Stack result in an HTTP 500, but the implementation now wraps APIConnectionError in ServiceUnavailableResponse("Llama Stack", cause) and raises it via HTTPException(**response.model_dump()), i.e., HTTP 503.

Consider updating the docstring text to mention 503 Service Unavailable instead of 500 so behavior and documentation stay in sync.

Also applies to: 417-427


427-435: Handle RateLimitError more defensively when model name is missing

The RateLimitError branch builds a QuotaExceededResponse.model(used_model) using used_model = getattr(e, "model", ""). If RateLimitError lacks a model attribute or it’s empty, this yields a slightly odd message (“for model ”).

A safer pattern would be to fall back to QuotaExceededResponse.from_exception(e) when used_model is falsy, e.g.:

-    except RateLimitError as e:
-        used_model = getattr(e, "model", "")
-        response = QuotaExceededResponse.model(used_model)
-        raise HTTPException(**response.model_dump()) from e
+    except RateLimitError as e:
+        used_model = getattr(e, "model", "") or ""
+        if used_model:
+            response = QuotaExceededResponse.model(used_model)
+        else:
+            response = QuotaExceededResponse.from_exception(e)
+        raise HTTPException(**response.model_dump()) from e

This keeps messages clean even if the upstream error doesn’t expose a model name.

src/app/endpoints/streaming_query_v2.py (1)

316-337: Update streaming v2 docstring to mention 503 instead of 500

The streaming_query_endpoint_handler_v2 docstring still says it “Returns HTTP 500 if unable to connect to the Llama Stack server,” but the responses mapping (and underlying shared handler) now use ServiceUnavailableResponse (503) for backend connectivity issues.

Recommend updating the docstring’s “Raises” section to reference HTTP 503 Service Unavailable instead of 500 to avoid confusion.

src/app/endpoints/conversations_v2.py (3)

96-103: Consider DRY-ing and tightening conversation cache configuration checks

The repeated pattern:

if configuration.conversation_cache_configuration.type is None:
    logger.warning("Converastion cache is not configured")
    response = InternalServerErrorResponse.cache_unavailable()
    raise HTTPException(**response.model_dump())

appears in multiple handlers, with a typo in the log message in three places and a slightly different spelling in the update handler. You may want to:

  • Extract a small helper (e.g., ensure_conversation_cache_configured()) that performs this check and raises the standardized 500 response.
  • Fix the log message spelling uniformly.

This would reduce duplication and keep cache-unavailable behavior consistent across endpoints.

-    if configuration.conversation_cache_configuration.type is None:
-        logger.warning("Converastion cache is not configured")
-        response = InternalServerErrorResponse.cache_unavailable()
-        raise HTTPException(**response.model_dump())
+    ensure_conversation_cache_configured(logger)

and in a shared helper module:

def ensure_conversation_cache_configured(logger: logging.Logger) -> None:
    if configuration.conversation_cache_configuration.type is None:
        logger.warning("Conversation cache is not configured")
        response = InternalServerErrorResponse.cache_unavailable()
        raise HTTPException(**response.model_dump())

Also applies to: 125-129, 160-163, 194-197


176-180: Missing Request parameter on update_conversation_endpoint_handler may affect authorize middleware

Unlike the other endpoints in this module, update_conversation_endpoint_handler does not accept a Request parameter, yet it is decorated with @authorize(Action.UPDATE_CONVERSATION). If the authorize middleware relies on a Request argument to check and attach request.state.authorized_actions, this handler may bypass or break that logic.

Consider adding request: Request as the first parameter for consistency with the other handlers and to avoid surprises in the authorization stack.

-@router.put("/conversations/{conversation_id}", responses=conversation_update_responses)
-@authorize(Action.UPDATE_CONVERSATION)
-async def update_conversation_endpoint_handler(
-    conversation_id: str,
-    update_request: ConversationUpdateRequest,
-    auth: Any = Depends(get_auth_dependency()),
-) -> ConversationUpdateResponse:
+@router.put("/conversations/{conversation_id}", responses=conversation_update_responses)
+@authorize(Action.UPDATE_CONVERSATION)
+async def update_conversation_endpoint_handler(
+    request: Request,  # pylint: disable=unused-argument
+    conversation_id: str,
+    update_request: ConversationUpdateRequest,
+    auth: Any = Depends(get_auth_dependency()),
+) -> ConversationUpdateResponse:

229-241: Double‑check conversation existence check with skip_userid_check semantics

check_conversation_existence hard-codes:

if configuration.conversation_cache_configuration.type is None:
    return
conversations = configuration.conversation_cache.list(user_id, False)

so it ignores the skip_userid_check flag that the list/get/delete/update handlers pass when they call configuration.conversation_cache.list(...) directly.

If skip_userid_check=True can legitimately broaden visibility (e.g., service accounts, admin flows), this existence check could return a false negative for conversations that are accessible but not owned by user_id. If v2 is strictly “per-user only” that may be fine, but it would be good to confirm and, if needed, thread skip_userid_check through this helper.

-def check_conversation_existence(user_id: str, conversation_id: str) -> None:
+def check_conversation_existence(
+    user_id: str, conversation_id: str, skip_userid_check: bool = False
+) -> None:
@@
-    conversations = configuration.conversation_cache.list(user_id, False)
+    conversations = configuration.conversation_cache.list(user_id, skip_userid_check)

and update callers accordingly.

tests/unit/models/responses/test_successful_responses.py (1)

696-739: SchemaError message assertions are tightly coupled to implementation details

Tests like:

with pytest.raises(SchemaError, match="has no label"):
    InvalidResponse.openapi_response()

and:

with pytest.raises(SchemaError, match="Examples not found"):
    NoExamplesResponse.openapi_response()

depend on the exact wording of the SchemaError messages. Any minor phrasing change in src/models/responses.py will break these tests even though behavior is still correct.

If you want the tests to be more resilient while still validating behavior, you could:

  • Assert only on the exception type, or
  • Use a more generic match that looks for a stable token (e.g., the class name) rather than the full English sentence.

Not blocking, but worth considering to reduce test brittleness.

Also applies to: 962-973

tests/unit/app/endpoints/test_conversations.py (2)

299-323: Assertions on exact detail["response"]/detail["cause"] strings may be brittle

Several tests assert substrings on the structured error payload, e.g.:

assert "Invalid conversation ID format" in detail["response"]
assert INVALID_CONVERSATION_ID in detail["cause"]
assert "Conversation not found" in detail["response"]
assert "does not exist" in detail["cause"]

These are generally aligned with the current implementations of BadRequestResponse and NotFoundResponse, but they are quite sensitive to wording and casing changes in src/models/responses.py.

Given this PR’s goal of centralizing message construction in response models, it may be more robust to:

  • Assert against the specific fields you care about (e.g., that resource and resource_id are correct if they are exposed), or
  • At least loosen the expectations (e.g., check only for "conversation" and the ID, or for a stable prefix like "Invalid" / "not found") rather than full phrases.

Not mandatory, but it would make tests less fragile to minor copy changes in the response messages.

Also applies to: 731-733, 397-400, 804-807, 765-773, 781-787, 1004-1008, 1256-1260


808-842: Comment/docstring drift around “general exception” vs APIConnectionError

In test_session_deletion_exception the comment says:

# Mock AsyncLlamaStackClientHolder to raise a general exception
mock_client.agents.session.delete.side_effect = APIConnectionError(...)

but the implementation and assertions now specifically exercise the APIConnectionError → 503 path via ServiceUnavailableResponse.

The behavior is correct and matches the endpoint code, but updating the comment/docstring to mention APIConnectionError explicitly would avoid confusion for future readers.

src/app/endpoints/conversations.py (2)

84-123: simplify_session_data matches tests and intent; consider skipping missing output messages

The transformation logic cleanly strips Llama Stack session turns down to:

  • messages list of {content, type} pairs (role → type),
  • started_at / completed_at timestamps.

This aligns with the tests in TestSimplifySessionData and with the documented intent.

One minor refinement you might consider: if output_message is absent for a turn, you currently append an assistant message with content=None and type=None. Skipping the output message entirely in that case would produce a cleaner history, but it’s not required by the current tests.


209-288: GET conversation endpoint correctly standardized to new response models

The refactored get_conversation_endpoint_handler now:

  • Uses BadRequestResponse for invalid IDs,
  • ForbiddenResponse.conversation for access‑denied cases,
  • NotFoundResponse for missing conversations or missing sessions,
  • ServiceUnavailableResponse for APIConnectionError,
  • InternalServerErrorResponse.database_error() for SQLAlchemyError.

This lines up with the unit tests in tests/unit/app/endpoints/test_conversations.py and the broader error‑model strategy. The only small oddity is catching SQLAlchemyError inside the Llama Stack client try block, whereas DB access primarily occurs in helpers like retrieve_conversation that are called before the block; but it’s harmless and may still guard future DB usage in that region.

src/app/endpoints/query_v2.py (1)

198-216: Consider making get_topic_summary less dependent on globals

get_topic_summary currently pulls the system prompt from the global configuration singleton and assumes client.responses.create and response.output always succeed. That’s fine given the current check_configuration_loaded(configuration) at the endpoint level, but it makes this helper harder to reuse or unit test in isolation.

If you expect additional call sites in the future, consider optionally passing AppConfig (or the prompt string) as a parameter and adding a small try/except around client.responses.create to surface a clearer failure (e.g., log + rethrow) while still letting higher-level handlers map to the appropriate InternalServerErrorResponse/ServiceUnavailableResponse.

Also applies to: 218-234

src/app/endpoints/rags.py (1)

147-154: BadRequestError→404 mapping might be too broad

BadRequestError from client.vector_stores.retrieve is treated unconditionally as “rag not found” and mapped to NotFoundResponse(resource="rag", resource_id=rag_id). If llama_stack_client ever uses BadRequestError for other client-side issues (e.g., malformed IDs, validation errors), those would be surfaced as 404s instead of 400-type failures.

If the client library exposes a more specific signal (status code, error type, or message) for the not-found case, it would be safer to branch on that and only construct NotFoundResponse when you’re sure it’s a genuine 404-equivalent scenario. Otherwise this behavior is acceptable but a bit lossy.

tests/unit/app/endpoints/test_tools.py (1)

211-253: Partial-failure test currently duplicates the full-failure scenario

test_tools_endpoint_partial_failure’s docstring says “one MCP server failing with APIConnectionError”, but mock_client.tools.list.side_effect = api_error causes every tools call to raise the same APIConnectionError, making this effectively identical in behavior to test_tools_endpoint_api_connection_error.

If you want to exercise a genuine partial-failure scenario (where some toolgroups succeed and one fails, and the endpoint’s behavior differs from the all-fail case), consider adjusting the side effect list, for example:

-    api_error = APIConnectionError(request=mocker.Mock())
-    mock_client.tools.list.side_effect = api_error
+    api_error = APIConnectionError(request=mocker.Mock())
+    mock_client.tools.list.side_effect = [
+        [mock_tools_response[0]],  # first toolgroup succeeds
+        api_error,                 # second toolgroup fails
+    ]

and then asserting whatever aggregate behavior you expect (e.g., fail-fast with 503 vs partial results).

Also applies to: 255-289

src/app/endpoints/streaming_query.py (1)

797-824: Docstrings and behavior diverge on connection errors and quotas

streaming_query_endpoint_handler_base and streaming_query_endpoint_handler docstrings still say “HTTP 500 if unable to connect to Llama Stack”, but the implementation now:

  • Maps APIConnectionError to ServiceUnavailableResponse → HTTP 503.
  • Maps RateLimitError to QuotaExceededResponse → HTTP 429.

It would be good to update the docstrings to accurately describe 503 for backend connectivity issues and 429 for quota/RateLimitError conditions, and to mention the standardized error payloads.

-    Raises:
-        HTTPException: Returns HTTP 500 if unable to connect to Llama Stack
+    Raises:
+        HTTPException:
+            - 429: Quota exceeded (RateLimitError from backend)
+            - 500: Lightspeed Stack configuration not loaded
+            - 503: Unable to connect to Llama Stack

and similarly adjust the wrapper endpoint’s docstring.

Also applies to: 828-836, 903-921

src/app/endpoints/providers.py (1)

54-93: Align group_providers docstring with its type hint

The implementation and return shape used to build ProvidersListResponse look correct, but the parameter docstring still talks about a generic "list of ProviderInfo objects" while the type hint is ProviderListResponse. Consider tightening the docstring to match the actual alias to avoid confusion for future readers.

-def group_providers(providers: ProviderListResponse) -> dict[str, list[dict[str, Any]]]:
-    """Group a list of ProviderInfo objects by their API type.
+def group_providers(providers: ProviderListResponse) -> dict[str, list[dict[str, Any]]]:
+    """Group providers returned by Llama Stack by their API type.
@@
-    Args:
-        providers: List of ProviderInfo objects.
+    Args:
+        providers: ProviderListResponse from the Llama Stack client.

Also applies to: 95-113

src/app/endpoints/feedback.py (1)

117-155: Configuration checks and feedback storage error handling; consider guarding /status as well

Using check_configuration_loaded(configuration) in feedback_endpoint_handler and update_feedback_status, and mapping OSError in store_feedback to InternalServerErrorResponse.feedback_path_invalid(...), gives you consistent 500 responses when configuration or storage is broken. One follow-up you might consider is adding the same configuration guard to the /feedback/status endpoint so all three feedback routes return the standardized "configuration not loaded" error when config is missing, rather than depending on whatever exception is_feedback_enabled() might raise.

 @router.get("/status", responses=feedback_get_response)
 def feedback_status() -> StatusResponse:
     """
@@
-    """
-    logger.debug("Feedback status requested")
+    """
+    check_configuration_loaded(configuration)
+    logger.debug("Feedback status requested")

Also applies to: 158-173, 192-193

Comment on lines 28 to 32
try:
model_list = await AsyncLlamaStackClientHolder().get_client().models.list()
except APIConnectionError as e:
response = ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e))
raise HTTPException(**response.model_dump()) from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify the error handling pattern and exception coverage.

The error handling follows the PR's standardized approach. Please verify:

  1. HTTPException construction: Confirm that ServiceUnavailableResponse.model_dump() returns a dict with the correct keys (status_code, detail) expected by HTTPException.__init__().

  2. Exception coverage: Only APIConnectionError is caught. Verify that other exceptions from models.list() (e.g., timeout, authentication errors, unexpected API errors) are either handled by upstream middleware or should be caught here as well.

Run the following script to verify the ServiceUnavailableResponse implementation and broader error handling:


🏁 Script executed:

#!/bin/bash
# Description: Verify ServiceUnavailableResponse structure and error handling patterns

# Search for ServiceUnavailableResponse class definition
ast-grep --pattern 'class ServiceUnavailableResponse($$$) {
  $$$
}'

# Search for model_dump usage with HTTPException across the codebase
rg -n -C2 'HTTPException\(\*\*.*\.model_dump\(\)' --type=py

# Search for other exceptions that might be raised by models.list()
rg -n -C3 'models\.list\(\)' --type=py

Length of output: 28635



🏁 Script executed:

#!/bin/bash
# Find ServiceUnavailableResponse class definition
fd -type f -name "*.py" | xargs rg -l "class ServiceUnavailableResponse" --type=py

# Show the class definition
rg -n -A 20 "class ServiceUnavailableResponse" --type=py

Length of output: 1480



🏁 Script executed:

#!/bin/bash
# Find AbstractErrorResponse class definition
rg -n -A 30 "class AbstractErrorResponse" src/models/responses.py

# Also check if model_dump is overridden
rg -n "def model_dump" src/models/responses.py -A 5

Length of output: 1209



🏁 Script executed:

#!/bin/bash
# Check HTTPException imports
rg -n "from.*HTTPException|import.*HTTPException" src/metrics/utils.py

# Check what exceptions are imported in metrics/utils.py
rg -n "from|import" src/metrics/utils.py -B 5 -A 5 | head -50

# Look for other exception handlers for models.list() calls to see patterns
rg -n -B 5 -A 10 "models\.list\(\)" src/app/endpoints/

Length of output: 5243



🏁 Script executed:

#!/bin/bash
# Check exception handling context in query.py around models.list()
rg -n -B 15 -A 20 "await client.models.list\(\)" src/app/endpoints/query.py | head -80

# Check exception handling context in streaming_query.py around models.list()
rg -n -B 15 -A 30 "await client.models.list\(\)" src/app/endpoints/streaming_query.py | head -80

Length of output: 3369



🏁 Script executed:

#!/bin/bash
# Check the full try/except structure in query.py that contains models.list()
rg -n "try:" src/app/endpoints/query.py -A 200 | grep -A 200 "282:"

# Check the full try/except structure in streaming_query.py that contains models.list()
rg -n "try:" src/app/endpoints/streaming_query.py -A 200 | grep -A 200 "858:"

Length of output: 17203



🌐 Web query:

llama_stack_client models.list() exception types what errors

💡 Result:

Short answer — exceptions you should expect from client.models.list():

  • Base classes:

    • llama_stack_client.APIError (base). [1][2]
    • llama_stack_client.APIConnectionError (network / I/O / httpx underlying). [1][2]
    • llama_stack_client.APIStatusError (non-2xx HTTP responses; has .status_code and .response). [1][2]
  • Status-specific subclasses (raised for 4xx/5xx codes):

    • BadRequestError (400)
    • AuthenticationError (401)
    • PermissionDeniedError (403)
    • NotFoundError (404)
    • UnprocessableEntityError (422)
    • RateLimitError (429)
    • InternalServerError (>=500)
    • (other / unexpected status → other APIStatusError variants). [1][2]
  • Client-side parse/serialization errors / invalid data are reported as APIError subclasses (e.g., invalid response parsing). Retries: connection/timeouts, 408, 409, 429 and >=500 are retried by default (configurable). [1][2]

References: official llama-stack-client Python docs / repo and PyPI package docs. [1][2]


Add exception handling for APIStatusError and other APIError subclasses.

The error handling pattern is correct for HTTPException construction, but exception coverage is incomplete. Models.list() from llama_stack_client can raise APIConnectionError, APIStatusError (with subclasses like BadRequestError, AuthenticationError, NotFoundError, InternalServerError), and APIError. Currently, only APIConnectionError is caught at line 30. Other exceptions—particularly API status errors (4xx/5xx responses) and serialization errors—will propagate uncaught.

Add exception handling for APIStatusError and map appropriate HTTP status responses:

  • 5xx errors → ServiceUnavailableResponse
  • 4xx errors → appropriate response type (e.g., NotFoundResponse, UnauthorizedResponse)
  • Generic APIErrorInternalServerErrorResponse

This is critical for metrics setup, as uncaught exceptions during initialization could crash the service startup.

🤖 Prompt for AI Agents
In src/metrics/utils.py around lines 28 to 32, the current try/except only
catches APIConnectionError from
AsyncLlamaStackClientHolder().get_client().models.list(); extend the exception
handling to also catch APIStatusError and APIError (and their subclasses) and
map them to appropriate HTTPException responses: for APIStatusError check the
status code and raise ServiceUnavailableResponse for 5xx, NotFoundResponse for
404, UnauthorizedResponse for 401/403, BadRequestResponse for 400, etc.; catch
generic APIError and raise InternalServerErrorResponse; ensure each branch
constructs the matching response object (backend_name="Llama Stack",
cause=str(e)) and raises HTTPException(**response.model_dump()) from e so
initialization failures are handled without crashing.

Comment on lines +47 to +73
@pytest.fixture(name="minimal_config")
def minimal_config_fixture() -> AppConfig:
"""Create a minimal AppConfig with only required fields.
This fixture provides a minimal valid configuration that can be used
in tests that don't need specific configuration values. It includes
only the required fields to avoid unnecessary instantiation.
Returns:
AppConfig: A minimal AppConfig instance with required fields only.
"""
cfg = AppConfig()
cfg.init_from_dict(
{
"name": "test",
"service": {"host": "localhost", "port": 8080},
"llama_stack": {
"api_key": "test-key",
"url": "http://test.com:1234",
"use_as_library_client": False,
},
"user_data_collection": {},
"authentication": {"module": "noop"},
"authorization": {"access_rules": []},
}
)
return cfg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

minimal_config leaks global AppConfig state across tests

AppConfig is a singleton; calling cfg.init_from_dict(...) here mutates the global configuration for the whole process. Because the fixture doesn’t restore the previous state, any test that uses minimal_config can affect later tests that rely on a different configuration or on “no configuration loaded”, making the suite order-dependent.

You can keep tests isolated by snapshotting and restoring the internal state in the fixture teardown.

-@pytest.fixture(name="minimal_config")
-def minimal_config_fixture() -> AppConfig:
+@pytest.fixture(name="minimal_config")
+def minimal_config_fixture() -> Generator[AppConfig, None, None]:
@@
-    cfg = AppConfig()
-    cfg.init_from_dict(
+    cfg = AppConfig()
+
+    # Snapshot current singleton state so we can restore it after the test
+    # pylint: disable=protected-access
+    original_configuration = cfg._configuration
+    original_conversation_cache = cfg._conversation_cache
+    original_quota_limiters = cfg._quota_limiters
+
+    cfg.init_from_dict(
         {
@@
-            "authorization": {"access_rules": []},
+            "authorization": {"access_rules": []},
         }
     )
-    return cfg
+    try:
+        yield cfg
+    finally:
+        # Restore previous configuration and cached resources
+        cfg._configuration = original_configuration
+        cfg._conversation_cache = original_conversation_cache
+        cfg._quota_limiters = original_quota_limiters

(Based on learnings about AppConfig being a singleton with cached state.)

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, very nice work!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/authentication/k8s.py (1)

133-158: Confirm initialization of version_data before the try block in _get_cluster_id.

The review comment correctly identifies a potential UnboundLocalError at line 150. If get_cluster_custom_object() raises a TypeError before the variable assignment completes, version_data remains undefined. The subsequent log statement at line 150 would then fail with UnboundLocalError, bypassing the intended ClusterIDUnavailableError handling.

The suggested fix is valid:

@classmethod
def _get_cluster_id(cls) -> str:
    version_data: dict[str, Any] | None = None
    try:
        custom_objects_api = cls.get_custom_objects_api()
        version_data = custom_objects_api.get_cluster_custom_object(
            "config.openshift.io", "v1", "clusterversions", "version"
        )
        cluster_id = version_data["spec"]["clusterID"]
        cls._cluster_id = cluster_id
        return cluster_id
    except TypeError as e:
        logger.error(
            "Failed to get cluster_id, version object is: %s", version_data
        )
        raise ClusterIDUnavailableError("Failed to get cluster ID") from e

Initialize version_data to None before the try block to ensure it is always defined and safe to log.

tests/unit/app/endpoints/test_streaming_query.py (1)

116-145: Fix assertion placement and connection‑error mocking in streaming tests.

Two issues in the new/updated tests:

  1. Assertions inside pytest.raises block (Lines 166‑168)
    In test_streaming_query_endpoint_handler_configuration_not_loaded, the assertions are placed after the awaited call but still inside the with pytest.raises(...) block. Once streaming_query_endpoint_handler raises, the rest of the block is skipped, so the status/detail assertions never execute. Move the asserts outside the with block:

    with pytest.raises(HTTPException) as e:
        await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH)
    
    assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
    assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
  2. Mocking the wrong attribute for connection errors (Lines 189‑191)
    Production code at src/app/endpoints/streaming_query.py:862 calls await client.models.list(), but the test sets mock_client.models.side_effect = APIConnectionError(...). That won't affect models.list(). Change to:

    mock_client.models.list.side_effect = APIConnectionError(request=query_request)  # type: ignore

    so that the models.list() call actually raises and exercises the connection‑error path.

The rest of the refactor in this file (real QueryRequest usage in multiple tests, the configuration autouse fixture, and the MCP/streaming flow assertions) looks coherent with the new error‑handling design.

♻️ Duplicate comments (3)
tests/unit/app/endpoints/test_providers.py (1)

182-182: Docstring still says 500 while test now expects 503.

The test asserts HTTP_503_SERVICE_UNAVAILABLE and "Unable to connect to Llama Stack" at lines 199 and 202, but the docstring still references HTTP 500.

Apply this diff to update the docstring:

-    """Test that /providers/{provider_id} raises HTTP 500 if Llama Stack connection fails."""
+    """Test that /providers/{provider_id} raises HTTP 503 if Llama Stack connection fails."""
src/metrics/utils.py (1)

5-6: Metrics error handling improved; consider also catching generic APIError.

The new flow (check_configuration_loaded + catching APIConnectionError/APIStatusError and mapping them to ServiceUnavailableResponse via HTTPException(**response.model_dump())) is a solid alignment with the standardized error model.

You still don’t cover generic APIError subclasses from llama_stack_client (e.g., parsing/serialization or unexpected API errors). Those will surface as generic 500s instead of the structured 503-style backend error you now use for the other failure modes. If you want full consistency, consider extending the except clause to include APIError as well and mapping it either to ServiceUnavailableResponse or InternalServerErrorResponse.

Also applies to: 9-10, 16-19, 23-32

src/models/responses.py (1)

720-782: Fix ConversationDeleteResponse.success and align the “not found” example

ConversationDeleteResponse.__init__ ignores the deleted flag and always sets success=True, so clients see success: True even when the conversation was not deleted. The "not found" example also shows success: True and uses a different message text than the runtime path.

Consider updating both the initializer and the example to reflect the real outcome:

     def __init__(self, *, deleted: bool, conversation_id: str) -> None:
@@
         response_msg = (
             "Conversation deleted successfully"
             if deleted
             else "Conversation cannot be deleted"
         )
         super().__init__(
             conversation_id=conversation_id,  # type: ignore[call-arg]
-            success=True,  # type: ignore[call-arg]
+            success=deleted,  # type: ignore[call-arg]
             response=response_msg,  # type: ignore[call-arg]
         )
@@
                 {
                     "label": "not found",
                     "value": {
                         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
-                        "success": True,
-                        "response": "Conversation can not be deleted",
+                        "success": False,
+                        "response": "Conversation cannot be deleted",
                     },
                 },
🧹 Nitpick comments (5)
tests/unit/app/endpoints/test_providers.py (1)

184-184: Consider removing redundant authorization mock.

Unlike other tests that use minimal_config (lines 43, 67, 114, 151), this test additionally calls mock_authorization_resolvers. Since minimal_config is already being used, this call may be redundant.

If the mock is unnecessary, apply this diff:

-    mocker.patch("app.endpoints.providers.configuration", minimal_config)
-    mock_authorization_resolvers(mocker)
+    mocker.patch("app.endpoints.providers.configuration", minimal_config)
tests/e2e/features/feedback.feature (1)

111-115: Structured error response looks good.

The change to a nested detail object with "response" and "cause" fields aligns with the PR's standardization goal.

Minor inconsistency: Line 113 ends with a period ("Storing feedback is disabled.") while line 112 doesn't ("Storing feedback is disabled"). Consider making the punctuation consistent.

src/authentication/k8s.py (1)

174-207: K8s auth flow is consistent with standardized responses; consider distinguishing backend failures from invalid tokens.

The new authentication flow looks good overall:

  • extract_user_token(request.headers) centralizes header parsing and maps missing/malformed headers to structured UnauthorizedResponses.
  • get_user_info now maps failures to obtain the authn API to a ServiceUnavailableResponse("Kubernetes API", ...) via HTTPException(**response.model_dump()), which is appropriate.
  • K8SAuthDependency.__call__:
    • Treats user_info is None as 401 via UnauthorizedResponse(cause="Invalid or expired Kubernetes token").
    • Maps kube:admin cluster‑ID failures to a 500 InternalServerErrorResponse.
    • Maps SAR API failures to a 503 ServiceUnavailableResponse and authorization denials to ForbiddenResponse.endpoint.

One nuance: in get_user_info, any exception raised by auth_api.create_token_review(token_review) is caught broadly and results in None, which then becomes a 401 “Invalid or expired Kubernetes token”. That conflates Kubernetes API outages or server‑side errors with bad credentials. Given the rest of this PR’s push toward differentiating backend failures (5xx/503) from client errors (401/403), you might want to:

  • Catch ApiException (and possibly connection‑level exceptions) from create_token_review and map them to a ServiceUnavailableResponse("Kubernetes API", "Unable to validate token") via HTTPException.
  • Reserve the user_info is None → 401 path for genuinely unauthenticated tokens.

If you’re comfortable preserving the previous behavior where TokenReview failures looked like invalid tokens, the current implementation is still consistent—but calling out this distinction now may save debugging time later.

Also applies to: 228-285

src/models/responses.py (1)

367-370: Prefer default_factory=list for rag_chunks

rag_chunks currently uses Field([], ...) as the default. While Pydantic deep‑copies defaults, using default_factory=list is clearer and avoids any confusion about shared mutable defaults.

-    rag_chunks: list[RAGChunk] = Field(
-        [],
-        description="List of RAG chunks used to generate the response",
-    )
+    rag_chunks: list[RAGChunk] = Field(
+        default_factory=list,
+        description="List of RAG chunks used to generate the response",
+    )

You might apply the same pattern anywhere else you later add list defaults for consistency.

tests/unit/app/endpoints/test_shields.py (1)

170-216: Tighten shields “list” tests and remove unused cfg setup

In both test_shields_endpoint_handler_unable_to_retrieve_shields_list and test_shields_endpoint_handler_success_with_shields_data:

  • A real AppConfig (cfg) is constructed but never used; instead, configuration is patched with a plain Mock.
  • Assertions are very weak (response is not None / minimal shape checks), so these tests would still pass even if the handler returned an unexpected status or payload type.

Consider:

  • Either patch app.endpoints.shields.configuration with cfg (for consistency with other tests), or drop the unused cfg creation.

  • Strengthen assertions along the lines of:

    • assert isinstance(response, ShieldsResponse)
    • assert response.shields == [] for the “unable_to_retrieve_shields_list” case.
    • For the “success_with_shields_data” case, assert identifiers and list length (similar to what you already do, but you can also assert the response type).

This will better validate the endpoint behavior and keep the test setup aligned with the configuration usage pattern.

Also applies to: 285-352

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b42eb24 and ff940a9.

📒 Files selected for processing (14)
  • src/authentication/k8s.py (4 hunks)
  • src/authentication/utils.py (1 hunks)
  • src/authorization/middleware.py (4 hunks)
  • src/metrics/utils.py (2 hunks)
  • src/models/responses.py (19 hunks)
  • tests/e2e/features/feedback.feature (4 hunks)
  • tests/unit/app/endpoints/test_config.py (4 hunks)
  • tests/unit/app/endpoints/test_feedback.py (6 hunks)
  • tests/unit/app/endpoints/test_providers.py (7 hunks)
  • tests/unit/app/endpoints/test_shields.py (8 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (6 hunks)
  • tests/unit/authentication/test_noop_with_token.py (4 hunks)
  • tests/unit/authentication/test_utils.py (3 hunks)
  • tests/unit/models/responses/test_error_responses.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/authentication/test_utils.py
  • tests/unit/app/endpoints/test_config.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/unit/authentication/test_noop_with_token.py
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/unit/authentication/test_noop_with_token.py
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.

Applied to files:

  • src/models/responses.py
🧬 Code graph analysis (10)
src/authentication/k8s.py (2)
src/authentication/utils.py (1)
  • extract_user_token (8-30)
src/models/responses.py (5)
  • ForbiddenResponse (1253-1358)
  • InternalServerErrorResponse (1544-1652)
  • ServiceUnavailableResponse (1655-1685)
  • UnauthorizedResponse (1178-1250)
  • endpoint (1325-1329)
tests/unit/app/endpoints/test_providers.py (2)
src/app/endpoints/providers.py (2)
  • get_provider_endpoint_handler (118-160)
  • providers_endpoint_handler (56-92)
src/configuration.py (2)
  • configuration (73-77)
  • AppConfig (39-181)
src/authentication/utils.py (1)
src/models/responses.py (1)
  • UnauthorizedResponse (1178-1250)
tests/unit/app/endpoints/test_shields.py (4)
src/app/endpoints/shields.py (1)
  • shields_endpoint_handler (42-82)
src/configuration.py (2)
  • configuration (73-77)
  • AppConfig (39-181)
src/models/responses.py (1)
  • ShieldsResponse (128-152)
src/client.py (1)
  • get_client (49-55)
tests/unit/app/endpoints/test_streaming_query.py (2)
src/configuration.py (2)
  • configuration (73-77)
  • AppConfig (39-181)
tests/unit/utils/auth_helpers.py (1)
  • mock_authorization_resolvers (8-26)
src/authorization/middleware.py (2)
src/models/responses.py (4)
  • ForbiddenResponse (1253-1358)
  • InternalServerErrorResponse (1544-1652)
  • generic (1599-1604)
  • endpoint (1325-1329)
src/authorization/resolvers.py (3)
  • check_access (123-124)
  • check_access (134-138)
  • check_access (171-188)
src/metrics/utils.py (3)
src/configuration.py (1)
  • configuration (73-77)
src/models/responses.py (1)
  • ServiceUnavailableResponse (1655-1685)
src/utils/endpoints.py (1)
  • check_configuration_loaded (116-130)
src/models/responses.py (2)
src/quota/quota_exceed_error.py (1)
  • QuotaExceedError (6-38)
src/models/config.py (3)
  • config (140-146)
  • Action (329-377)
  • Configuration (623-649)
tests/unit/models/responses/test_error_responses.py (3)
src/models/responses.py (23)
  • AbstractErrorResponse (1087-1140)
  • BadRequestResponse (1143-1175)
  • DetailModel (1080-1084)
  • ForbiddenResponse (1253-1358)
  • InternalServerErrorResponse (1544-1652)
  • QuotaExceededResponse (1461-1541)
  • ServiceUnavailableResponse (1655-1685)
  • UnauthorizedResponse (1178-1250)
  • UnprocessableEntityResponse (1419-1458)
  • openapi_response (45-58)
  • openapi_response (788-812)
  • openapi_response (1117-1140)
  • conversation (1313-1322)
  • endpoint (1325-1329)
  • feedback_disabled (1332-1337)
  • model (1522-1526)
  • from_exception (1529-1533)
  • generic (1599-1604)
  • configuration_not_loaded (1607-1612)
  • feedback_path_invalid (1615-1620)
  • query_failed (1623-1628)
  • cache_unavailable (1631-1636)
  • database_error (1639-1644)
src/quota/quota_exceed_error.py (1)
  • QuotaExceedError (6-38)
docs/demos/lcore/js/reveal.js (1)
  • result (2444-2444)
tests/unit/app/endpoints/test_feedback.py (5)
src/app/endpoints/feedback.py (6)
  • assert_feedback_enabled (74-89)
  • feedback_endpoint_handler (94-123)
  • feedback_status (159-173)
  • is_feedback_enabled (61-71)
  • store_feedback (126-155)
  • update_feedback_status (178-215)
src/configuration.py (2)
  • configuration (73-77)
  • AppConfig (39-181)
src/models/config.py (2)
  • config (140-146)
  • UserDataCollection (223-256)
src/models/requests.py (2)
  • FeedbackRequest (252-395)
  • FeedbackStatusUpdateRequest (398-423)
tests/unit/utils/auth_helpers.py (1)
  • mock_authorization_resolvers (8-26)
🪛 GitHub Check: Bandit
src/models/responses.py

[warning] 1032-1033:
Probable insecure usage of temp file/directory.


[warning] 1032-1033:
Probable insecure usage of temp file/directory.


[warning] 1039-1039:
Probable insecure usage of temp file/directory.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: build-pr
🔇 Additional comments (10)
tests/e2e/features/feedback.feature (3)

262-262: Correct status code for authentication failure.

The change from 400 to 401 is correct per HTTP semantics—401 (Unauthorized) is the appropriate status for missing or invalid authentication credentials. The structured error response with distinct "response" (general) and "cause" (specific) fields provides clearer diagnostic information.

Also applies to: 267-268


277-277: Consistent unauthorized handling.

The status code correction to 401 and structured error format match the previous unauthorized scenario, ensuring consistency across the feature file.

Also applies to: 282-283


308-309: Error messages verified—no issues found.

The E2E test error messages match the backend implementation. The backend's InternalServerErrorResponse.feedback_path_invalid() method generates exactly the same messages: "Failed to store feedback" with cause "Failed to store feedback at directory: {path}". Unit and integration tests confirm this format is correctly returned by the endpoint.

src/authorization/middleware.py (1)

4-24: Standardized authorization error responses look correct.

Using InternalServerErrorResponse.generic() for unknown auth modules / missing auth and ForbiddenResponse.endpoint(user_id=auth[0]) for access denial aligns with the new error model and keeps semantics clear. The HTTPException(**response.model_dump()) pattern here is fine as long as the response model continues to dump status_code and detail in the shape FastAPI expects.

Also applies to: 66-69, 84-85, 93-94

tests/unit/app/endpoints/test_streaming_query.py (1)

1715-1755: Quota‑exceeded test expectations align with the new error response.

test_query_endpoint_quota_exceeded correctly asserts HTTP 429 and checks that the structured detail includes the updated "response": "The model quota has been exceeded" text and mentions the model name in "cause". This matches the standardized quota‑exceeded error semantics introduced in the PR.

tests/unit/app/endpoints/test_feedback.py (2)

33-52: Feedback tests correctly exercise new configuration and structured error responses.

  • test_is_feedback_enabled / test_is_feedback_disabled and the new feedback_status_* tests mock AppConfig._configuration.user_data_collection with UserDataCollection and verify that is_feedback_enabled / feedback_status return the expected booleans and status payloads; this mirrors how production code reads configuration.
  • test_assert_feedback_enabled_disabled now asserts that the 403 detail is a dict with "response": "Storing feedback is disabled" and "cause": "Storing feedback is disabled.", matching ForbiddenResponse.feedback_disabled().
  • test_feedback_endpoint_handler_error and test_store_feedback_on_io_error both simulate filesystem failures and correctly assert a 500 with detail containing "response": "Failed to store feedback" and a "cause" mentioning the directory, in line with InternalServerErrorResponse.feedback_path_invalid(...).
  • test_store_feedback and test_feedback_endpoint_valid_requests validate successful persistence (and the stored JSON shape) and successful endpoint behavior for various payloads.

These updates keep the tests aligned with the new standardized response models without over‑mocking production behavior.

Also applies to: 65-67, 123-155, 177-245, 247-285, 296-310, 312-345


55-77: The review comment is based on incorrect assumptions about the pytest configuration.

The codebase has asyncio_mode = "auto" enabled in pyproject.toml (under [tool.pytest.ini_options]). With this setting, pytest-asyncio automatically handles async tests without requiring the @pytest.mark.asyncio decorator. The tests test_assert_feedback_enabled_disabled and test_assert_feedback_enabled will be properly awaited and executed, despite lacking the decorator.

While there is some stylistic inconsistency in the file (some async tests have the decorator, some don't), this has no functional impact on test execution. The review's warning only applies if asyncio_mode were set to something other than "auto", which is not the case here.

Likely an incorrect or invalid review comment.

src/authentication/utils.py (1)

5-28: extract_user_token error handling now matches the documented contract

The function’s behavior (raising a structured 401 HTTPException via UnauthorizedResponse for missing/malformed Authorization headers) is consistent with the updated docstring and the response model. No further changes needed here.

tests/unit/authentication/test_noop_with_token.py (1)

67-88: Tests correctly assert 401 and structured detail for noop-with-token auth failures

The updated tests now verify both the 401 status code and the {response, cause} structure on HTTPException.detail, matching UnauthorizedResponse behavior. The use of cast(dict[str, str], exc_info.value.detail) is appropriate for type checking.

Also applies to: 108-111

tests/unit/models/responses/test_error_responses.py (1)

1-687: Strong, comprehensive coverage for error response models

This test module does a good job exercising constructors, factory helpers, and openapi_response() for all error types, including edge cases like missing labels. It tightly couples tests to the documented examples and status codes, which should make future regressions in error payloads or OpenAPI docs very visible.

Comment on lines 39 to 43
with pytest.raises(HTTPException) as e:
await shields_endpoint_handler(request=request, auth=auth)
assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
assert e.detail["response"] == "Configuration is not loaded"
assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Move assertions outside the pytest.raises block

In test_shields_endpoint_handler_configuration_not_loaded, the assertions are inside the with pytest.raises(...) block. Once shields_endpoint_handler raises, the remaining lines in the block are not executed, so the status and detail aren’t actually checked.

Consider:

-    with pytest.raises(HTTPException) as e:
-        await shields_endpoint_handler(request=request, auth=auth)
-        assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
-        assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
+    with pytest.raises(HTTPException) as e:
+        await shields_endpoint_handler(request=request, auth=auth)
+    assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+    assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
🤖 Prompt for AI Agents
In tests/unit/app/endpoints/test_shields.py around lines 39-43, the test places
assertions inside the with pytest.raises(HTTPException) block so they never run
after the exception is raised; move the assert lines outside the with block: use
the context manager only to capture the exception (e.g., with pytest.raises(...)
as e: await shields_endpoint_handler(...)), then after the block assert
e.value.status_code and e.value.detail["response"] == "Configuration is not
loaded" so the exception is both raised and validated.

Comment on lines 271 to 276
with pytest.raises(HTTPException) as e:
await shields_endpoint_handler(request=request, auth=auth)
assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
assert e.detail["response"] == "Unable to connect to Llama Stack"
assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
assert e.value.detail["response"] == "Service unavailable" # type: ignore
assert "Unable to connect to Llama Stack" in e.value.detail["cause"] # type: ignore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same pytest.raises indentation issue in connection-error test

test_shields_endpoint_llama_stack_connection_error has the same pattern: assertions live inside the with pytest.raises(...) block and are never executed after the exception is raised.

Suggested fix:

-    with pytest.raises(HTTPException) as e:
-        await shields_endpoint_handler(request=request, auth=auth)
-        assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
-        assert e.value.detail["response"] == "Service unavailable"  # type: ignore
-        assert "Unable to connect to Llama Stack" in e.value.detail["cause"]  # type: ignore
+    with pytest.raises(HTTPException) as e:
+        await shields_endpoint_handler(request=request, auth=auth)
+    assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
+    assert e.value.detail["response"] == "Service unavailable"  # type: ignore
+    assert "Unable to connect to Llama Stack" in e.value.detail["cause"]  # type: ignore
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with pytest.raises(HTTPException) as e:
await shields_endpoint_handler(request=request, auth=auth)
assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
assert e.detail["response"] == "Unable to connect to Llama Stack"
assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
assert e.value.detail["response"] == "Service unavailable" # type: ignore
assert "Unable to connect to Llama Stack" in e.value.detail["cause"] # type: ignore
with pytest.raises(HTTPException) as e:
await shields_endpoint_handler(request=request, auth=auth)
assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
assert e.value.detail["response"] == "Service unavailable" # type: ignore
assert "Unable to connect to Llama Stack" in e.value.detail["cause"] # type: ignore
🤖 Prompt for AI Agents
In tests/unit/app/endpoints/test_shields.py around lines 271 to 276, the
assertions for the connection-error test are placed inside the with
pytest.raises(HTTPException) block so they never run; move the awaits that are
expected to raise into the with block, then dedent the assert lines so they
execute after the context manager (i.e., use with pytest.raises(...) as e: await
shields_endpoint_handler(...), then on the next lines outside the with assert
e.value.status_code == ..., assert e.value.detail["response"] == ..., and assert
"Unable to connect to Llama Stack" in e.value.detail["cause"]).

@tisnik tisnik merged commit b9c1061 into lightspeed-core:main Nov 24, 2025
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants