Skip to content

RSPEED-2689: Record token metrics when verbose infer post-processing fails#1364

Merged
tisnik merged 1 commit intolightspeed-core:mainfrom
major:RSPEED-2689/fix-rlsapi-metrics
Mar 20, 2026
Merged

RSPEED-2689: Record token metrics when verbose infer post-processing fails#1364
tisnik merged 1 commit intolightspeed-core:mainfrom
major:RSPEED-2689/fix-rlsapi-metrics

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Mar 20, 2026

Description

The verbose /infer path records metrics via build_turn_summary() on success, but if extract_text_from_response_items() throws after a successful LLM call, build_turn_summary() is never reached and token metrics (llm_calls_total, llm_token_sent_total, llm_token_received_total) are lost for tokens that were consumed. This adds extract_token_usage() to the exception handler (guarded by verbose_enabled and response is not None) so metrics are captured even when post-processing fails. Also adds response = None initialization before the try block to avoid UnboundLocalError.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude
  • Generated by: N/A

Related Tickets & Documents

  • Closes RSPEED-2689

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

Unit tests verify extract_token_usage is called in the except block when the verbose LLM call succeeds but post-processing fails (extract_text_from_response_items raises). A second test verifies the non-verbose exception path does NOT call extract_token_usage again (already called inside retrieve_simple_response()). All 43 existing tests continue to pass. uv run make test-unit and uv run make verify pass with 0 errors.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error-path handling so token usage is captured when a verbose response exists and failures occur, and removed a redundant reset to prevent unintended clearing.
  • Tests

    • Added unit tests and test helpers to verify token-usage extraction is triggered on failures in verbose mode and skipped in non-verbose mode.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 20, 2026

Walkthrough

Initialize response = None before inference in the RLS API endpoint and conditionally extract token usage from a verbose response in the exception handler; add tests validating token-usage extraction behavior for verbose and non-verbose failure paths.

Changes

Cohort / File(s) Summary
Core API Error Handling
src/app/endpoints/rlsapi_v1.py
Initialize response = None prior to the inference try-block; remove redundant reset; call extract_token_usage(response.usage, model_id) in the handled-exception path only when verbose is enabled and response exists.
Test Coverage
tests/unit/app/endpoints/test_rlsapi_v1.py
Add pylint file-level suppression and _setup_config_mock helper; add tests test_infer_verbose_extract_token_usage_on_text_extraction_failure and test_infer_non_verbose_no_extract_token_usage_on_failure to assert token-usage extraction behavior in verbose vs non-verbose failure scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: recording token metrics when verbose infer post-processing fails, which directly aligns with the primary bug fix in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)

477-479: Redundant reassignment of response = None.

Line 478 reassigns response = None, but this is already the value from line 460. The reassignment is harmless but unnecessary noise.

♻️ Remove redundant assignment
         else:
-            response = None
             response_text = await retrieve_simple_response(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 477 - 479, The assignment
response = None before calling retrieve_simple_response is redundant because
response was already set to None earlier; remove that extra assignment to clean
up the code (inside the else branch where response and response_text are set)
and leave the call to retrieve_simple_response(...) intact so behavior does not
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 477-479: The assignment response = None before calling
retrieve_simple_response is redundant because response was already set to None
earlier; remove that extra assignment to clean up the code (inside the else
branch where response and response_text are set) and leave the call to
retrieve_simple_response(...) intact so behavior does not change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: af023a3a-1b35-4c9a-9ac1-12649b93ad35

📥 Commits

Reviewing files that changed from the base of the PR and between 0f4a68c and 6de788a.

📒 Files selected for processing (2)
  • src/app/endpoints/rlsapi_v1.py
  • tests/unit/app/endpoints/test_rlsapi_v1.py

Signed-off-by: Major Hayden <major@redhat.com>
@major major force-pushed the RSPEED-2689/fix-rlsapi-metrics branch from 6de788a to 0cb2f9f Compare March 20, 2026 15:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/app/endpoints/test_rlsapi_v1.py (1)

5-5: Consider splitting this test module over time instead of broad lint suppression.

too-many-lines suppression is acceptable short-term, but this file is already very large and hard to navigate. Incremental split by endpoint behavior/theme would improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/app/endpoints/test_rlsapi_v1.py` at line 5, The module-level
suppression "# pylint: disable=too-many-lines" hides that this test module has
grown too large; instead remove that broad suppression and split the tests by
endpoint/behavior into smaller test modules so each file is under lint limits
(e.g., move groups of tests for a single endpoint or behavior from
test_rlsapi_v1 into separate modules like test_rlsapi_v1_auth.py,
test_rlsapi_v1_items.py, etc.), update any shared fixtures/imports to a common
conftest or helper module, and re-run linting to ensure each new file no longer
triggers too-many-lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/app/endpoints/test_rlsapi_v1.py`:
- Line 5: The module-level suppression "# pylint: disable=too-many-lines" hides
that this test module has grown too large; instead remove that broad suppression
and split the tests by endpoint/behavior into smaller test modules so each file
is under lint limits (e.g., move groups of tests for a single endpoint or
behavior from test_rlsapi_v1 into separate modules like test_rlsapi_v1_auth.py,
test_rlsapi_v1_items.py, etc.), update any shared fixtures/imports to a common
conftest or helper module, and re-run linting to ensure each new file no longer
triggers too-many-lines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 78da4bbd-626f-4430-9300-9f3326bdbbc8

📥 Commits

Reviewing files that changed from the base of the PR and between 6de788a and 0cb2f9f.

📒 Files selected for processing (2)
  • src/app/endpoints/rlsapi_v1.py
  • tests/unit/app/endpoints/test_rlsapi_v1.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/app/endpoints/rlsapi_v1.py

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 861a8dc into lightspeed-core:main Mar 20, 2026
21 of 22 checks passed
@major major deleted the RSPEED-2689/fix-rlsapi-metrics branch March 20, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants