-
Notifications
You must be signed in to change notification settings - Fork 9
0.7.16 Fixes around gemini reasoning response and tokens #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 51360fe in 2 minutes and 1 seconds. Click for details.
- Reviewed
1344lines of code in15files - Skipped
0files when reviewing. - Skipped posting
13draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. pyproject.toml:9
- Draft comment:
Version bump to 0.7.16 is correct. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. src/lmnr/opentelemetry_lib/litellm/__init__.py:408
- Draft comment:
New logic extracts reasoning_tokens from completion_tokens_details. Verify that model_as_dict handles unexpected input. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to verify that a function handles unexpected input, which is against the rules. It doesn't provide a specific suggestion or point out a specific issue with the code.
3. src/lmnr/opentelemetry_lib/litellm/__init__.py:478
- Draft comment:
New processing of reasoning_content in _process_response_choices merges reasoning and main content. Confirm that the output order meets requirements. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to confirm that the output order meets requirements, which is against the rules. It does not provide a specific code suggestion or ask for a specific test to be written. Therefore, it should be removed.
4. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py:280
- Draft comment:
Output token count now sums candidates and thoughts tokens. Consider refactoring this calculation for clarity with explicit None checks. - Reason this comment was not posted:
Comment looked like it was already resolved.
5. tests/cassettes/test_google_genai_reasoning_tokens.yaml:1
- Draft comment:
Cassette test added for reasoning tokens. Ensure cassettes remain updated with API changes. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
6. tests/test_litellm_gemini.py:37
- Draft comment:
Tests use sleep and Laminar.flush() to wait for spans. Consider if a more robust async synchronization can be applied in the future. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
7. tests/test_google_genai.py:61
- Draft comment:
Test assertions for prompt and completion content look comprehensive. Ensure JSON parsing remains consistent if API response format changes. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
8. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py:276
- Draft comment:
The comment refers to 'thinking tokens' while the variable is named 'thoughts_token_count'. Please confirm if this discrepancy is intentional or if the comment should be updated for consistency. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is purely about documentation/terminology consistency and doesn't suggest any actual code changes are needed. The terms are close enough to be clearly referring to the same concept. The comment is asking for confirmation rather than pointing out a clear issue. This falls under the rule about not asking authors to confirm their intentions. The terms are slightly different, so there could be some value in maintaining consistent terminology throughout the codebase. Maybe this inconsistency could cause confusion for future maintainers? While consistency is good, this difference is minor enough that it doesn't impact code understanding. The terms are clearly synonymous in this context. More importantly, the comment violates our rule about not asking authors to confirm things. Delete this comment. It's asking for confirmation rather than pointing out a clear issue, and the terminology difference is too minor to be worth addressing.
9. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_async.yaml:21
- Draft comment:
There appears to be an extraneous backslash before the closing square bracket in the JSON string (line 21). Please remove it if unintentional. - Reason this comment was not posted:
Comment looked like it was already resolved.
10. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_async.yaml:25
- Draft comment:
There appears to be an extraneous backslash before the closing curly bracket in the JSON string (line 25). Please verify and remove if it was added by mistake. - Reason this comment was not posted:
Comment looked like it was already resolved.
11. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts.yaml:3
- Draft comment:
There is an unusual usage of consecutive single quotes in the text string ("How many times does the letter ''r'' appear in the word strawberry?"). Consider using a single quote (i.e., "letter 'r'") if that was the intent. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test fixture file that captures real API interactions. The double quotes are likely there because that's how the actual API request was formatted. Changing the formatting could make the test fixture not match real behavior. Test fixtures often contain exact copies of real requests/responses, so we shouldn't modify their format unless absolutely necessary. Maybe the double quotes are actually causing issues with the API? Maybe this is a standardization issue that should be fixed across all test fixtures? Since this is a recorded API interaction that presumably works (test passes), we should preserve the exact format. If there was an actual issue, the test would fail. The comment should be deleted. We shouldn't modify the format of recorded API interactions in test fixtures unless there's a clear problem.
12. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts_async.yaml:36
- Draft comment:
There appears to be an unexpected backslash before the closing curly brace. Verify if this is intentional or if it should be removed to avoid lexicographical issues. - Reason this comment was not posted:
Comment looked like it was already resolved.
13. tests/cassettes/test_google_genai/test_google_genai_reasoning_tokens_with_include_thoughts_async.yaml:40
- Draft comment:
An unexpected backslash is present before the "responseId" field. Please double-check if this extra character is needed. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =30%<= threshold50%The comment is asking the PR author to double-check if an extra character is needed, which violates the rule against asking the author to confirm or double-check things. However, it does point out a potential issue with an unexpected character, which could be useful if rephrased to suggest a correction or ask if it was intentional.
Workflow ID: wflow_seC8cJmR4Qw8bfSL
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
|
bugbot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Enhances Gemini model response handling by adding support for reasoning tokens and content, updating version, and adding tests.
reasoning_tokensin_process_response_usage()inlitellm/__init__.pyand_set_response_attributes()ingoogle_genai/__init__.py._process_response_choices()inlitellm/__init__.pyto handlereasoning_contentby appending it to the content list.thoughts_token_countingoogle_genai/__init__.py.0.7.16inpyproject.tomlandversion.py.test_google_genai.pyandtest_litellm_gemini.py.tests/cassettes/.This description was created by
for 51360fe. You can customize this summary. It will automatically update as commits are pushed.
Note
Capture Gemini reasoning tokens and reasoning_content in spans, include Gemini thoughts tokens in output token count, add tests/cassettes, and bump version to 0.7.16.
src/lmnr/opentelemetry_lib/litellm/__init__.py):gen_ai.usage.reasoning_tokensfromcompletion_tokens_details.reasoning_tokens.reasoning_contenttogen_ai.completion.*.content(as serialized list alongside text/content).src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/google_genai/__init__.py):thoughts_token_countingen_ai.usage.output_tokens(sum ofcandidates_token_count+thoughts_token_count).llm.usage.reasoning_tokensfromthoughts_token_count.tests/test_google_genai.pyand newtests/test_litellm_gemini.pywith corresponding VCR cassettes undertests/cassettes/.tests/conftest.pyto filterkeyquery parameter.0.7.16inpyproject.tomlandsrc/lmnr/version.py.Written by Cursor Bugbot for commit 51360fe. This will update automatically on new commits. Configure here.