0.7.16 Fixes around gemini reasoning response and tokens #196

dinmukhamedm · 2025-09-29T14:32:00Z

Important

Enhances Gemini model response handling by adding support for reasoning tokens and content, updating version, and adding tests.

Behavior:
- Adds handling for reasoning_tokens in _process_response_usage() in litellm/__init__.py and _set_response_attributes() in google_genai/__init__.py.
- Updates _process_response_choices() in litellm/__init__.py to handle reasoning_content by appending it to the content list.
- Adjusts token counting logic to include thoughts_token_count in google_genai/__init__.py.
Version:
- Bumps version to 0.7.16 in pyproject.toml and version.py.
Tests:
- Adds tests for reasoning tokens and content handling in test_google_genai.py and test_litellm_gemini.py.
- Introduces new VCR cassettes for various test scenarios in tests/cassettes/.

^{This description was created by}^{for 51360fe. You can customize this summary. It will automatically update as commits are pushed.}

Note

Capture Gemini reasoning tokens and reasoning_content in spans, include Gemini thoughts tokens in output token count, add tests/cassettes, and bump version to 0.7.16.

Instrumentation:
- LiteLLM (src/lmnr/opentelemetry_lib/litellm/__init__.py):
  - Set gen_ai.usage.reasoning_tokens from completion_tokens_details.reasoning_tokens.
  - Append reasoning_content to gen_ai.completion.*.content (as serialized list alongside text/content).
- Google GenAI (src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py):
  - Include thoughts_token_count in gen_ai.usage.output_tokens (sum of candidates_token_count + thoughts_token_count).
  - Set llm.usage.reasoning_tokens from thoughts_token_count.
Tests:
- Add reasoning token/content tests for Google GenAI and LiteLLM (sync/async/streaming) in tests/test_google_genai.py and new tests/test_litellm_gemini.py with corresponding VCR cassettes under tests/cassettes/.
- Update tests/conftest.py to filter key query parameter.
Version:
- Bump to 0.7.16 in pyproject.toml and src/lmnr/version.py.

^{Written by Cursor Bugbot for commit 51360fe. This will update automatically on new commits. Configure here.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 51360fe in 2 minutes and 1 seconds. Click for details.

Reviewed 1344 lines of code in 15 files
Skipped 0 files when reviewing.
Skipped posting 13 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. pyproject.toml:9

Draft comment:
Version bump to 0.7.16 is correct.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. src/lmnr/opentelemetry_lib/litellm/__init__.py:408

Draft comment:
New logic extracts reasoning_tokens from completion_tokens_details. Verify that model_as_dict handles unexpected input.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that a function handles unexpected input, which is against the rules. It doesn't provide a specific suggestion or point out a specific issue with the code.

3. src/lmnr/opentelemetry_lib/litellm/__init__.py:478

Draft comment:
New processing of reasoning_content in _process_response_choices merges reasoning and main content. Confirm that the output order meets requirements.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that the output order meets requirements, which is against the rules. It does not provide a specific code suggestion or ask for a specific test to be written. Therefore, it should be removed.

4. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py:280

Draft comment:
Output token count now sums candidates and thoughts tokens. Consider refactoring this calculation for clarity with explicit None checks.
Reason this comment was not posted:
Comment looked like it was already resolved.

5. tests/cassettes/test_google_genai_reasoning_tokens.yaml:1

Draft comment:
Cassette test added for reasoning tokens. Ensure cassettes remain updated with API changes.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

6. tests/test_litellm_gemini.py:37

Draft comment:
Tests use sleep and Laminar.flush() to wait for spans. Consider if a more robust async synchronization can be applied in the future.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

7. tests/test_google_genai.py:61

Draft comment:
Test assertions for prompt and completion content look comprehensive. Ensure JSON parsing remains consistent if API response format changes.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

8. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py:276

Draft comment:
The comment refers to 'thinking tokens' while the variable is named 'thoughts_token_count'. Please confirm if this discrepancy is intentional or if the comment should be updated for consistency.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is purely about documentation/terminology consistency and doesn't suggest any actual code changes are needed. The terms are close enough to be clearly referring to the same concept. The comment is asking for confirmation rather than pointing out a clear issue. This falls under the rule about not asking authors to confirm their intentions. The terms are slightly different, so there could be some value in maintaining consistent terminology throughout the codebase. Maybe this inconsistency could cause confusion for future maintainers? While consistency is good, this difference is minor enough that it doesn't impact code understanding. The terms are clearly synonymous in this context. More importantly, the comment violates our rule about not asking authors to confirm things. Delete this comment. It's asking for confirmation rather than pointing out a clear issue, and the terminology difference is too minor to be worth addressing.

9. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_async.yaml:21

Draft comment:
There appears to be an extraneous backslash before the closing square bracket in the JSON string (line 21). Please remove it if unintentional.
Reason this comment was not posted:
Comment looked like it was already resolved.

10. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_async.yaml:25

Draft comment:
There appears to be an extraneous backslash before the closing curly bracket in the JSON string (line 25). Please verify and remove if it was added by mistake.
Reason this comment was not posted:
Comment looked like it was already resolved.

11. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts.yaml:3

Draft comment:
There is an unusual usage of consecutive single quotes in the text string ("How many times does the letter ''r'' appear in the word strawberry?"). Consider using a single quote (i.e., "letter 'r'") if that was the intent.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test fixture file that captures real API interactions. The double quotes are likely there because that's how the actual API request was formatted. Changing the formatting could make the test fixture not match real behavior. Test fixtures often contain exact copies of real requests/responses, so we shouldn't modify their format unless absolutely necessary. Maybe the double quotes are actually causing issues with the API? Maybe this is a standardization issue that should be fixed across all test fixtures? Since this is a recorded API interaction that presumably works (test passes), we should preserve the exact format. If there was an actual issue, the test would fail. The comment should be deleted. We shouldn't modify the format of recorded API interactions in test fixtures unless there's a clear problem.

12. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts_async.yaml:36

Draft comment:
There appears to be an unexpected backslash before the closing curly brace. Verify if this is intentional or if it should be removed to avoid lexicographical issues.
Reason this comment was not posted:
Comment looked like it was already resolved.

13. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts_async.yaml:40

Draft comment:
An unexpected backslash is present before the "responseId" field. Please double-check if this extra character is needed.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 30% <= threshold 50% The comment is asking the PR author to double-check if an extra character is needed, which violates the rule against asking the author to confirm or double-check things. However, it does point out a potential issue with an unexpected character, which could be useful if rephrased to suggest a correction or ask if it was intentional.

Workflow ID: wflow_seC8cJmR4Qw8bfSL

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

skull8888888 · 2025-09-29T14:35:10Z

bugbot run

cursor

✅ Bugbot reviewed your changes and found no bugs!

dinmukhamedm added 3 commits September 29, 2025 14:17

add thinking tokens to google genai tokens

296eb97

add google genai reasoning tokens for litellm

0359944

bump version to 0.7.16

51360fe

ellipsis-dev bot reviewed Sep 29, 2025

View reviewed changes

cursor bot reviewed Sep 29, 2025

View reviewed changes

dinmukhamedm merged commit c79dd67 into main Sep 29, 2025
7 checks passed

dinmukhamedm deleted the gemini-thinking branch September 29, 2025 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.7.16 Fixes around gemini reasoning response and tokens #196

0.7.16 Fixes around gemini reasoning response and tokens #196

Uh oh!

dinmukhamedm commented Sep 29, 2025 •

edited by cursor bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

skull8888888 commented Sep 29, 2025

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

0.7.16 Fixes around gemini reasoning response and tokens #196

0.7.16 Fixes around gemini reasoning response and tokens #196

Uh oh!

Conversation

dinmukhamedm commented Sep 29, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

skull8888888 commented Sep 29, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

✅ Bugbot reviewed your changes and found no bugs!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dinmukhamedm commented Sep 29, 2025 •

edited by cursor bot

Loading