-
Notifications
You must be signed in to change notification settings - Fork 49
LCORE-411: add token usage metrics #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughPropagates provider_id through query handling, extends retrieve_response signature, and adds per-turn token metrics updates using a new utility. Streaming handler now updates token metrics on turn completion. Tests mock the new metrics calls and add a unit test for the utility. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant QueryHandler as query_endpoint_handler
participant Retriever as retrieve_response
participant LLM as LLM Service
participant Metrics as Metrics Utils
Client->>QueryHandler: POST /query (model hint)
QueryHandler->>QueryHandler: select_model_and_provider_id()
QueryHandler->>Retriever: retrieve_response(model_id, provider_id, request, token, ...)
Retriever->>LLM: Invoke model
LLM-->>Retriever: LLM response
Retriever->>Metrics: update_llm_token_count_from_turn(turn, model_id, provider_id, system_prompt)
Retriever-->>QueryHandler: (summary, transcript)
QueryHandler-->>Client: 200 OK (response)
sequenceDiagram
autonumber
participant Client
participant StreamHandler as streaming_query_endpoint_handler
participant Stream as LLM Stream
participant Metrics as Metrics Utils
Client->>StreamHandler: POST /streaming_query
StreamHandler->>Stream: start streaming
loop token/turns
Stream-->>StreamHandler: partials
alt turn_complete
StreamHandler->>Metrics: update_llm_token_count_from_turn(turn, model_id, provider_id, system_prompt)
note right of Metrics: Guarded with try/except and logging
end
end
StreamHandler-->>Client: stream closed
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/query.py (2)
217-224: Pass raw model_id to retrieve_response; construct provider/model inside the callee.
Currently you pass llama_stack_model_id (provider/model) as model_id. This later becomes the model label in metrics, causing label inconsistency vs llm_calls_total (which uses raw model_id). Pass model_id instead and let retrieve_response build provider/model for agent calls.Apply:
- summary, conversation_id = await retrieve_response( - client, - llama_stack_model_id, - query_request, - token, - mcp_headers=mcp_headers, - provider_id=provider_id, - ) + summary, conversation_id = await retrieve_response( + client, + model_id, # raw model id for metrics labels + query_request, + token, + mcp_headers=mcp_headers, + provider_id=provider_id, + )Follow-up in retrieve_response below to build the provider/model identifier for get_agent.
453-461: Always send provider/model to get_agent by constructing it locally.
After passing raw model_id into retrieve_response, build llama_stack_model_id here.Apply:
- agent, conversation_id, session_id = await get_agent( - client, - model_id, + llama_stack_model_id = f"{provider_id}/{model_id}" if provider_id else model_id + agent, conversation_id, session_id = await get_agent( + client, + llama_stack_model_id, system_prompt, available_input_shields, available_output_shields, query_request.conversation_id, query_request.no_tools or False, )
🧹 Nitpick comments (9)
src/metrics/utils.py (1)
62-64: Minor: avoid repeatedly constructing ChatFormat.Cache a single formatter instance.
+from functools import lru_cache + +@lru_cache(maxsize=1) +def _get_formatter() -> ChatFormat: + return ChatFormat(Tokenizer.get_instance()) @@ - tokenizer = Tokenizer.get_instance() - formatter = ChatFormat(tokenizer) + formatter = _get_formatter()src/metrics/__init__.py (1)
45-55: Clean up stale TODOs now that token metrics exist.Remove or update the LCORE-411 TODO comments to prevent confusion.
tests/unit/app/endpoints/test_query.py (1)
454-455: Deduplicate metrics patching with an autouse fixture.Replace repeated mock_metrics(...) calls with one autouse fixture to reduce noise.
Example (place at top of this file or in conftest.py):
import pytest @pytest.fixture(autouse=True) def _mock_llm_token_metrics(mocker): mocker.patch( "app.endpoints.query.metrics.update_llm_token_count_from_turn", return_value=None, )Also applies to: 486-487, 519-520, 559-559, 609-609, 661-661, 773-773, 829-829, 884-884, 955-955, 1017-1017, 1113-1113, 1351-1351, 1402-1402
tests/unit/metrics/test_utis.py (4)
1-1: Typo in filename: rename to test_utils.py for consistency and discoverability.
Tests targeting metrics/utils should live under a correctly spelled filename to avoid future confusion.Apply this git mv:
- tests/unit/metrics/test_utis.py + tests/unit/metrics/test_utils.py
6-17: Simplify mocking of Async client construction.
You patch get_client twice and rebind mock_client. Create a single AsyncMock and patch once to reduce brittleness.Apply:
- mock_client = mocker.patch( - "client.AsyncLlamaStackClientHolder.get_client" - ).return_value - # Make sure the client is an AsyncMock for async methods - mock_client = mocker.AsyncMock() - mocker.patch( - "client.AsyncLlamaStackClientHolder.get_client", return_value=mock_client - ) + mock_client = mocker.AsyncMock() + mocker.patch("client.AsyncLlamaStackClientHolder.get_client", return_value=mock_client)
65-76: Make call assertions resilient by checking chained calls on child mocks explicitly.
While assert_has_calls on the parent may pass via mock_calls, asserting on children is clearer and avoids false positives if internal chaining changes.Example:
- mock_metric.assert_has_calls( - [ - mocker.call.labels("test_provider-0", "test_model-0"), - mocker.call.labels().set(0), - mocker.call.labels("default_provider", "default_model"), - mocker.call.labels().set(1), - mocker.call.labels("test_provider-1", "test_model-1"), - mocker.call.labels().set(0), - ], - any_order=False, # Order matters here - ) + mock_metric.labels.assert_has_calls( + [ + mocker.call("test_provider-0", "test_model-0"), + mocker.call("default_provider", "default_model"), + mocker.call("test_provider-1", "test_model-1"), + ] + ) + mock_metric.labels.return_value.set.assert_has_calls( + [mocker.call(0), mocker.call(1), mocker.call(0)] + )
79-124: Great targeted test for token metrics; add minimal assertions on formatter usage.
Validates both sent/received counters. Add light checks to ensure encode_dialog_prompt is called twice with expected shapes.Optional add:
+ assert mock_formatter.encode_dialog_prompt.call_count == 2 + out_call, in_call = mock_formatter.encode_dialog_prompt.call_args_list + assert isinstance(out_call.args[0], list) and len(out_call.args[0]) == 1 + assert isinstance(in_call.args[0], list) and len(in_call.args[0]) >= 1src/app/endpoints/query.py (2)
331-349: Validation may mismatch on identifier shape; compare against both forms.
ModelListResponse.identifier might be either "model" or "provider/model" depending on upstream. Current check requires identifier == provider/model, which can falsely reject. Consider accepting either.Suggested change:
- llama_stack_model_id = f"{provider_id}/{model_id}" + llama_stack_model_id = f"{provider_id}/{model_id}" # Validate that the model_id and provider_id are in the available models logger.debug("Searching for model: %s, provider: %s", model_id, provider_id) - if not any( - m.identifier == llama_stack_model_id and m.provider_id == provider_id - for m in models - ): + if not any( + (m.identifier == llama_stack_model_id) + or (m.identifier == model_id and m.provider_id == provider_id) + for m in models + ):Also, in the “first available LLM” branch, return a consistent first element:
- model_id = model.identifier - provider_id = model.provider_id - logger.info("Selected model: %s", model) - return model_id, model_id, provider_id + model_id = model.identifier + provider_id = model.provider_id + llama_stack_model_id = f"{provider_id}/{model_id}" + logger.info("Selected model: %s", llama_stack_model_id) + return llama_stack_model_id, model_id, provider_id
225-236: Consider incrementing llm_calls_total only after a successful agent turn.
You count success before create_turn; failures will also increment failures_total, producing double counting. Increment success after a successful response.Proposed move:
- # Update metrics for the LLM call - metrics.llm_calls_total.labels(provider_id, model_id).inc() ... - response = await agent.create_turn( + response = await agent.create_turn( ... ) + # Update metrics for the successful LLM call + metrics.llm_calls_total.labels(provider_id, model_id).inc()Please confirm your intended semantics (attempt vs. success) and adjust dashboards accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/streaming_query.py(1 hunks)src/metrics/__init__.py(1 hunks)src/metrics/utils.py(2 hunks)tests/unit/app/endpoints/test_query.py(16 hunks)tests/unit/app/endpoints/test_streaming_query.py(2 hunks)tests/unit/metrics/test_utis.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
src/app/endpoints/streaming_query.py (2)
src/utils/endpoints.py (1)
get_system_prompt(53-84)src/metrics/utils.py (1)
update_llm_token_count_from_turn(58-75)
tests/unit/app/endpoints/test_streaming_query.py (1)
tests/unit/app/endpoints/test_query.py (1)
mock_metrics(49-54)
tests/unit/app/endpoints/test_query.py (1)
tests/unit/app/endpoints/test_streaming_query.py (1)
mock_metrics(61-66)
src/metrics/__init__.py (1)
src/metrics/utils.py (1)
update_llm_token_count_from_turn(58-75)
tests/unit/metrics/test_utis.py (1)
src/metrics/utils.py (2)
setup_model_metrics(18-55)update_llm_token_count_from_turn(58-75)
src/app/endpoints/query.py (1)
src/metrics/utils.py (1)
update_llm_token_count_from_turn(58-75)
🪛 GitHub Actions: Ruff
src/metrics/__init__.py
[error] 9-9: Ruff: F401 'update_llm_token_count_from_turn' imported but unused. Remove the import or re-export the symbol.
🪛 GitHub Actions: Python linter
src/metrics/utils.py
[error] 9-9: Pylint: C0411 - wrong-import-order: third-party import llama_stack_client.types.agents.turn.Turn should be placed before first-party imports (configuration.configuration, client.AsyncLlamaStackClientHolder, log.get_logger, metrics, utils.common.run_once_async). [Command: uv run pylint src tests]
[error] 10-10: Pylint: C0411 - wrong-import-order: third-party import llama_stack.models.llama.llama3.tokenizer.Tokenizer should be placed before first-party imports (configuration.configuration, client.AsyncLlamaStackClientHolder, log.get_logger, metrics, utils.common.run_once_async). [Command: uv run pylint src tests]
[error] 11-11: Pylint: C0411 - wrong-import-order: third-party import llama_stack.models.llama.llama3.chat_format.ChatFormat should be placed before first-party imports (configuration.configuration, client.AsyncLlamaStackClientHolder, log.get_logger, metrics, utils.common.run_once_async). [Command: uv run pylint src tests]
[error] 12-12: Pylint: C0411 - wrong-import-order: third-party import llama_stack.models.llama.datatypes.RawMessage should be placed before first-party imports (configuration.configuration, client.AsyncLlamaStackClientHolder, log.get_logger, metrics, utils.common.run_once_async). [Command: uv run pylint src tests]
src/app/endpoints/query.py
[error] 391-391: Pylint: R0913: Too many arguments (6/5) (too-many-arguments). [Command: uv run pylint src tests]
[error] 391-391: Pylint: R0917: Too many positional arguments (6/5) (too-many-positional-arguments). [Command: uv run pylint src tests]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (3)
src/metrics/utils.py (1)
62-64: Plumb model-specific tokenizers or defer to provider/Llama Stack token-count API
Current use ofTokenizer.get_instance()likely defaults to a Llama3 tokenizer and may miscount tokens for other providers. Plumb a tokenizer per (provider, model) or defer to the provider’s token-count API to ensure accurate billing metrics.tests/unit/app/endpoints/test_streaming_query.py (1)
61-67: LGTM: metrics side effects are properly patched for streaming tests.Keeps tests focused on endpoint behavior.
tests/unit/app/endpoints/test_query.py (1)
49-55: LGTM: central helper to patch metrics for query tests.Consistent with streaming tests.
src/metrics/__init__.py
Outdated
| Histogram, | ||
| ) | ||
|
|
||
| from .utils import update_llm_token_count_from_turn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Ruff F401: mark re-export via all.
Declare all so the import is considered an intentional re-export.
-from .utils import update_llm_token_count_from_turn
+from .utils import update_llm_token_count_from_turn
+
+# Explicit re-exports
+__all__ = (
+ "update_llm_token_count_from_turn",
+ "rest_api_calls_total",
+ "response_duration_seconds",
+ "provider_model_configuration",
+ "llm_calls_total",
+ "llm_calls_failures_total",
+ "llm_calls_validation_errors_total",
+ "llm_token_sent_total",
+ "llm_token_received_total",
+)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from .utils import update_llm_token_count_from_turn | |
| from .utils import update_llm_token_count_from_turn | |
| # Explicit re-exports | |
| __all__ = ( | |
| "update_llm_token_count_from_turn", | |
| "rest_api_calls_total", | |
| "response_duration_seconds", | |
| "provider_model_configuration", | |
| "llm_calls_total", | |
| "llm_calls_failures_total", | |
| "llm_calls_validation_errors_total", | |
| "llm_token_sent_total", | |
| "llm_token_received_total", | |
| ) |
🧰 Tools
🪛 GitHub Actions: Ruff
[error] 9-9: Ruff: F401 'update_llm_token_count_from_turn' imported but unused. Remove the import or re-export the symbol.
🤖 Prompt for AI Agents
In src/metrics/__init__.py around line 9, the module imports
update_llm_token_count_from_turn but Ruff flags F401 because it looks like an
unused import; declare an explicit __all__ list that includes
"update_llm_token_count_from_turn" so the import is considered an intentional
public re-export. Add or update the __all__ variable to contain that symbol (and
other public names if needed) so linters recognize the re-export.
| def update_llm_token_count_from_turn( | ||
| turn: Turn, model: str, provider: str, system_prompt: str = "" | ||
| ) -> None: | ||
| """Update the LLM calls metrics from a turn.""" | ||
| tokenizer = Tokenizer.get_instance() | ||
| formatter = ChatFormat(tokenizer) | ||
|
|
||
| raw_message = cast(RawMessage, turn.output_message) | ||
| encoded_output = formatter.encode_dialog_prompt([raw_message]) | ||
| token_count = len(encoded_output.tokens) if encoded_output.tokens else 0 | ||
| metrics.llm_token_received_total.labels(provider, model).inc(token_count) | ||
|
|
||
| input_messages = [RawMessage(role="user", content=system_prompt)] + cast( | ||
| list[RawMessage], turn.input_messages | ||
| ) | ||
| encoded_input = formatter.encode_dialog_prompt(input_messages) | ||
| token_count = len(encoded_input.tokens) if encoded_input.tokens else 0 | ||
| metrics.llm_token_sent_total.labels(provider, model).inc(token_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Runtime type mismatch and incorrect system-prompt role/content in token counting.
- Casting client messages to RawMessage does not convert shapes; ChatFormat.encode_dialog_prompt will likely raise at runtime.
- The system prompt should use role="system" and content must be content items, not a bare string.
Apply this refactor to safely convert messages and avoid crashes.
-def update_llm_token_count_from_turn(
- turn: Turn, model: str, provider: str, system_prompt: str = ""
-) -> None:
- """Update the LLM calls metrics from a turn."""
- tokenizer = Tokenizer.get_instance()
- formatter = ChatFormat(tokenizer)
-
- raw_message = cast(RawMessage, turn.output_message)
- encoded_output = formatter.encode_dialog_prompt([raw_message])
- token_count = len(encoded_output.tokens) if encoded_output.tokens else 0
- metrics.llm_token_received_total.labels(provider, model).inc(token_count)
-
- input_messages = [RawMessage(role="user", content=system_prompt)] + cast(
- list[RawMessage], turn.input_messages
- )
- encoded_input = formatter.encode_dialog_prompt(input_messages)
- token_count = len(encoded_input.tokens) if encoded_input.tokens else 0
- metrics.llm_token_sent_total.labels(provider, model).inc(token_count)
+def update_llm_token_count_from_turn(
+ turn: Turn, model: str, provider: str, system_prompt: str = ""
+) -> None:
+ """Update token metrics for a completed turn."""
+ tokenizer = Tokenizer.get_instance()
+ formatter = ChatFormat(tokenizer)
+
+ # ----- Output tokens (assistant) -----
+ output_tokens = 0
+ if getattr(turn, "output_message", None) is not None:
+ om = turn.output_message # type: ignore[attr-defined]
+ if isinstance(om.content, str):
+ om_content = [TextContentItem(text=om.content, type="text")]
+ else:
+ om_content = om.content
+ encoded_output = formatter.encode_dialog_prompt(
+ [RawMessage(role=getattr(om, "role", "assistant"), content=om_content)]
+ )
+ output_tokens = len(encoded_output.tokens) if encoded_output.tokens else 0
+ metrics.llm_token_received_total.labels(provider, model).inc(output_tokens)
+
+ # ----- Input tokens (system + user/tool messages) -----
+ input_raw: list[RawMessage] = []
+ if system_prompt:
+ input_raw.append(
+ RawMessage(
+ role="system",
+ content=[TextContentItem(text=system_prompt, type="text")],
+ )
+ )
+ for m in cast(list, getattr(turn, "input_messages", [])):
+ msg_content = m.content
+ if isinstance(msg_content, str):
+ content_items = [TextContentItem(text=msg_content, type="text")]
+ else:
+ content_items = msg_content
+ input_raw.append(RawMessage(role=m.role, content=content_items))
+
+ input_tokens = 0
+ if input_raw:
+ encoded_input = formatter.encode_dialog_prompt(input_raw)
+ input_tokens = len(encoded_input.tokens) if encoded_input.tokens else 0
+ metrics.llm_token_sent_total.labels(provider, model).inc(input_tokens)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def update_llm_token_count_from_turn( | |
| turn: Turn, model: str, provider: str, system_prompt: str = "" | |
| ) -> None: | |
| """Update the LLM calls metrics from a turn.""" | |
| tokenizer = Tokenizer.get_instance() | |
| formatter = ChatFormat(tokenizer) | |
| raw_message = cast(RawMessage, turn.output_message) | |
| encoded_output = formatter.encode_dialog_prompt([raw_message]) | |
| token_count = len(encoded_output.tokens) if encoded_output.tokens else 0 | |
| metrics.llm_token_received_total.labels(provider, model).inc(token_count) | |
| input_messages = [RawMessage(role="user", content=system_prompt)] + cast( | |
| list[RawMessage], turn.input_messages | |
| ) | |
| encoded_input = formatter.encode_dialog_prompt(input_messages) | |
| token_count = len(encoded_input.tokens) if encoded_input.tokens else 0 | |
| metrics.llm_token_sent_total.labels(provider, model).inc(token_count) | |
| def update_llm_token_count_from_turn( | |
| turn: Turn, model: str, provider: str, system_prompt: str = "" | |
| ) -> None: | |
| """Update token metrics for a completed turn.""" | |
| tokenizer = Tokenizer.get_instance() | |
| formatter = ChatFormat(tokenizer) | |
| # ----- Output tokens (assistant) ----- | |
| output_tokens = 0 | |
| if getattr(turn, "output_message", None) is not None: | |
| om = turn.output_message # type: ignore[attr-defined] | |
| if isinstance(om.content, str): | |
| om_content = [TextContentItem(text=om.content, type="text")] | |
| else: | |
| om_content = om.content | |
| encoded_output = formatter.encode_dialog_prompt( | |
| [RawMessage(role=getattr(om, "role", "assistant"), content=om_content)] | |
| ) | |
| output_tokens = len(encoded_output.tokens) if encoded_output.tokens else 0 | |
| metrics.llm_token_received_total.labels(provider, model).inc(output_tokens) | |
| # ----- Input tokens (system + user/tool messages) ----- | |
| input_raw: list[RawMessage] = [] | |
| if system_prompt: | |
| input_raw.append( | |
| RawMessage( | |
| role="system", | |
| content=[TextContentItem(text=system_prompt, type="text")], | |
| ) | |
| ) | |
| for m in cast(list, getattr(turn, "input_messages", [])): | |
| msg_content = m.content | |
| if isinstance(msg_content, str): | |
| content_items = [TextContentItem(text=msg_content, type="text")] | |
| else: | |
| content_items = msg_content | |
| input_raw.append(RawMessage(role=m.role, content=content_items)) | |
| input_tokens = 0 | |
| if input_raw: | |
| encoded_input = formatter.encode_dialog_prompt(input_raw) | |
| input_tokens = len(encoded_input.tokens) if encoded_input.tokens else 0 | |
| metrics.llm_token_sent_total.labels(provider, model).inc(input_tokens) |
🤖 Prompt for AI Agents
In src/metrics/utils.py around lines 58-75, the current code unsafely casts
messages to RawMessage and passes a bare string as the system prompt, causing
runtime shape mismatches for ChatFormat.encode_dialog_prompt; instead,
explicitly build properly-shaped RawMessage objects for encoding: map
turn.input_messages and turn.output_message into RawMessage instances with the
correct role and a content list (do not use direct cast), and construct the
system prompt as a RawMessage with role="system" and content expressed as the
required content-item list (wrap the string into the proper content-item
structure expected by ChatFormat) before calling formatter.encode_dialog_prompt
so token counting won’t raise at runtime.
Signed-off-by: Haoyu Sun <hasun@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/app/endpoints/query.py (1)
392-400: Resolved: trailing params now keyword-only.
This addresses the earlier R0913/R0917 concern.
🧹 Nitpick comments (2)
src/app/endpoints/query.py (2)
416-421: Docstring clarity: specify model_id format.
Model here is the combined provider/model used by Llama Stack; make that explicit to avoid confusion with the bare model label used in metrics.@@ - Parameters: - model_id (str): The identifier of the LLM model to use. - provider_id (str): The identifier of the LLM provider to use. + Parameters: + model_id (str): Llama Stack model identifier in "provider/model" form (e.g., "openai/gpt-4.1"). + provider_id (str): Provider identifier (e.g., "openai"). Keyword-only.
518-521: Keep label normalization; guard metrics to be non-fatal.
Nice job normalizing the model label. Wrap the metrics call so tokenizer/formatter issues never fail the endpoint.- # Update token count metrics for the LLM call - model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id - update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt) + # Update token count metrics for the LLM call (best-effort) + try: + model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id + update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt) + except Exception as e: # noqa: BLE001 + logger.warning("Token metrics update failed: %s", e, exc_info=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
src/app/endpoints/query.py(5 hunks)src/app/endpoints/streaming_query.py(2 hunks)src/metrics/utils.py(2 hunks)tests/unit/app/endpoints/test_query.py(16 hunks)tests/unit/app/endpoints/test_streaming_query.py(2 hunks)tests/unit/metrics/test_utis.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/unit/app/endpoints/test_streaming_query.py
- tests/unit/metrics/test_utis.py
- src/app/endpoints/streaming_query.py
- tests/unit/app/endpoints/test_query.py
- src/metrics/utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/app/endpoints/query.py (2)
src/metrics/utils.py (1)
update_llm_token_count_from_turn(60-77)src/models/requests.py (1)
QueryRequest(72-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (2)
src/app/endpoints/query.py (2)
27-27: LGTM: metrics utility import is correct.
Direct import keeps call sites explicit and avoids circular deps.
216-223: Good: provider_id plumbed and passed keyword-only.
Call site matches the updated retrieve_response signature.
|
E2E test failure is due to timeout(10s) on querying |
|
E2E test passed again. |
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my side, thank you
umago
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good! Thank u
Description
record metrics of tokens sent to LLM provider and those received from LLM provider.
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
call the query or streaming_query endpoint and check the metrics
llm_token_received_totalandllm_token_sent_totalSummary by CodeRabbit
Chores
Tests
Note