Skip to content

Conversation

@dinmukhamedm
Copy link
Member

@dinmukhamedm dinmukhamedm commented Sep 29, 2025

Important

Enhances Gemini model response handling by adding support for reasoning tokens and content, updating version, and adding tests.

  • Behavior:
    • Adds handling for reasoning_tokens in _process_response_usage() in litellm/__init__.py and _set_response_attributes() in google_genai/__init__.py.
    • Updates _process_response_choices() in litellm/__init__.py to handle reasoning_content by appending it to the content list.
    • Adjusts token counting logic to include thoughts_token_count in google_genai/__init__.py.
  • Version:
    • Bumps version to 0.7.16 in pyproject.toml and version.py.
  • Tests:
    • Adds tests for reasoning tokens and content handling in test_google_genai.py and test_litellm_gemini.py.
    • Introduces new VCR cassettes for various test scenarios in tests/cassettes/.

This description was created by Ellipsis for 51360fe. You can customize this summary. It will automatically update as commits are pushed.


Note

Capture Gemini reasoning tokens and reasoning_content in spans, include Gemini thoughts tokens in output token count, add tests/cassettes, and bump version to 0.7.16.

  • Instrumentation:
    • LiteLLM (src/lmnr/opentelemetry_lib/litellm/__init__.py):
      • Set gen_ai.usage.reasoning_tokens from completion_tokens_details.reasoning_tokens.
      • Append reasoning_content to gen_ai.completion.*.content (as serialized list alongside text/content).
    • Google GenAI (src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py):
      • Include thoughts_token_count in gen_ai.usage.output_tokens (sum of candidates_token_count + thoughts_token_count).
      • Set llm.usage.reasoning_tokens from thoughts_token_count.
  • Tests:
    • Add reasoning token/content tests for Google GenAI and LiteLLM (sync/async/streaming) in tests/test_google_genai.py and new tests/test_litellm_gemini.py with corresponding VCR cassettes under tests/cassettes/.
    • Update tests/conftest.py to filter key query parameter.
  • Version:
    • Bump to 0.7.16 in pyproject.toml and src/lmnr/version.py.

Written by Cursor Bugbot for commit 51360fe. This will update automatically on new commits. Configure here.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 51360fe in 2 minutes and 1 seconds. Click for details.
  • Reviewed 1344 lines of code in 15 files
  • Skipped 0 files when reviewing.
  • Skipped posting 13 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. pyproject.toml:9
  • Draft comment:
    Version bump to 0.7.16 is correct.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. src/lmnr/opentelemetry_lib/litellm/__init__.py:408
  • Draft comment:
    New logic extracts reasoning_tokens from completion_tokens_details. Verify that model_as_dict handles unexpected input.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that a function handles unexpected input, which is against the rules. It doesn't provide a specific suggestion or point out a specific issue with the code.
3. src/lmnr/opentelemetry_lib/litellm/__init__.py:478
  • Draft comment:
    New processing of reasoning_content in _process_response_choices merges reasoning and main content. Confirm that the output order meets requirements.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that the output order meets requirements, which is against the rules. It does not provide a specific code suggestion or ask for a specific test to be written. Therefore, it should be removed.
4. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py:280
  • Draft comment:
    Output token count now sums candidates and thoughts tokens. Consider refactoring this calculation for clarity with explicit None checks.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
5. tests/cassettes/test_google_genai_reasoning_tokens.yaml:1
  • Draft comment:
    Cassette test added for reasoning tokens. Ensure cassettes remain updated with API changes.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
6. tests/test_litellm_gemini.py:37
  • Draft comment:
    Tests use sleep and Laminar.flush() to wait for spans. Consider if a more robust async synchronization can be applied in the future.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
7. tests/test_google_genai.py:61
  • Draft comment:
    Test assertions for prompt and completion content look comprehensive. Ensure JSON parsing remains consistent if API response format changes.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
8. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py:276
  • Draft comment:
    The comment refers to 'thinking tokens' while the variable is named 'thoughts_token_count'. Please confirm if this discrepancy is intentional or if the comment should be updated for consistency.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is purely about documentation/terminology consistency and doesn't suggest any actual code changes are needed. The terms are close enough to be clearly referring to the same concept. The comment is asking for confirmation rather than pointing out a clear issue. This falls under the rule about not asking authors to confirm their intentions. The terms are slightly different, so there could be some value in maintaining consistent terminology throughout the codebase. Maybe this inconsistency could cause confusion for future maintainers? While consistency is good, this difference is minor enough that it doesn't impact code understanding. The terms are clearly synonymous in this context. More importantly, the comment violates our rule about not asking authors to confirm things. Delete this comment. It's asking for confirmation rather than pointing out a clear issue, and the terminology difference is too minor to be worth addressing.
9. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_async.yaml:21
  • Draft comment:
    There appears to be an extraneous backslash before the closing square bracket in the JSON string (line 21). Please remove it if unintentional.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
10. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_async.yaml:25
  • Draft comment:
    There appears to be an extraneous backslash before the closing curly bracket in the JSON string (line 25). Please verify and remove if it was added by mistake.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
11. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts.yaml:3
  • Draft comment:
    There is an unusual usage of consecutive single quotes in the text string ("How many times does the letter ''r'' appear in the word strawberry?"). Consider using a single quote (i.e., "letter 'r'") if that was the intent.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test fixture file that captures real API interactions. The double quotes are likely there because that's how the actual API request was formatted. Changing the formatting could make the test fixture not match real behavior. Test fixtures often contain exact copies of real requests/responses, so we shouldn't modify their format unless absolutely necessary. Maybe the double quotes are actually causing issues with the API? Maybe this is a standardization issue that should be fixed across all test fixtures? Since this is a recorded API interaction that presumably works (test passes), we should preserve the exact format. If there was an actual issue, the test would fail. The comment should be deleted. We shouldn't modify the format of recorded API interactions in test fixtures unless there's a clear problem.
12. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts_async.yaml:36
  • Draft comment:
    There appears to be an unexpected backslash before the closing curly brace. Verify if this is intentional or if it should be removed to avoid lexicographical issues.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
13. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts_async.yaml:40
  • Draft comment:
    An unexpected backslash is present before the "responseId" field. Please double-check if this extra character is needed.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 30% <= threshold 50% The comment is asking the PR author to double-check if an extra character is needed, which violates the rule against asking the author to confirm or double-check things. However, it does point out a potential issue with an unexpected character, which could be useful if rephrased to suggest a correction or ask if it was intentional.

Workflow ID: wflow_seC8cJmR4Qw8bfSL

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@skull8888888
Copy link
Member

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


@dinmukhamedm dinmukhamedm merged commit c79dd67 into main Sep 29, 2025
7 checks passed
@dinmukhamedm dinmukhamedm deleted the gemini-thinking branch September 29, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants