Feat/claude agent sdk #213

dinmukhamedm · 2025-11-16T23:22:46Z

Note

Adds Claude Agent SDK auto-instrumentation (with local proxy and span-context handoff), updates deps, and adds comprehensive tests; bumps version to 0.7.23a1.

Instrumentation:
- Introduces ClaudeAgentInstrumentor in opentelemetry/instrumentation/claude_agent/__init__.py with async, streaming, and module-level wrappers (ClaudeSDKClient.connect/query/receive_*, claude_agent_sdk.query, create_sdk_mcp_server).
- Wires initializer in tracing/_instrument_initializers.py and registers enum in tracing/instruments.py (Instruments.CLAUDE_AGENT).
Proxy integration:
- Adds opentelemetry/instrumentation/claude_agent/proxy.py to manage a local Claude proxy (start/stop, port selection, env var swap, span-context publication).
SDK:
- Fixes Laminar.use_span behavior when uninitialized (uses contextmanager correctly).
Dependencies:
- Adds extras claude-agent-sdk (via lmnr-claude-code-proxy) and dev dep claude-agent-sdk>=0.1.6 in pyproject.toml.
Versioning:
- Bumps version to 0.7.23a1 in pyproject.toml and src/lmnr/version.py.
Tests:
- Adds mock transport tests/.../mock_transport.py and proxy cleanup fixture.
- New tests for SDK client, module query, alias import, and tools (test_claude_sdk_client.py, test_query.py, test_query_with_alias.py, test_tool.py).
- Adjusts tests/test_initialize.py fixture to preserve/restore Laminar state.

^{Written by Cursor Bugbot for commit b7d751d. This will update automatically on new commits. Configure here.}

Important

Add Claude Agent SDK with auto-instrumentation, proxy management, and tests, updating dependencies and version.

Instrumentation:
- Add ClaudeAgentInstrumentor in __init__.py for claude_agent_sdk with span input/output and streaming support.
- Implement proxy management in proxy.py (start/stop server, env var routing, trace propagation).
- Register instrumentor via ClaudeAgentInstrumentorInitializer in _instrument_initializers.py and instruments.py.
Tests:
- Add tests for client flow, module query, and tools in test_claude_sdk_client.py, test_query.py, test_query_with_alias.py, and test_tool.py.
- Provide MockClaudeTransport for simulated I/O in mock_transport.py.
Dependencies & Version:
- Add extras claude-agent-sdk and lmnr-claude-code-proxy; bump version to 0.7.23a1 in pyproject.toml and version.py.

^{This description was created by}^{for b7d751d. You can customize this summary. It will automatically update as commits are pushed.}

…on (#209) * wip: initial version to support ClaudeSDKClient.query() instrumentation * add async generator wrapping for capturing output * chore test * chore is_streaming field * wip: fixing query() test * fix wrapping standalone query() function * fix async gen function span staying active during the next functon call

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to d35952a in 2 minutes and 2 seconds. Click for details.

Reviewed 1110 lines of code in 11 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. pyproject.toml:70

Draft comment:
Ensure version constraints for claude-agent dependencies are consistent between optional and dev sections (e.g. 'lmnr-claude-code-proxy>=0.1.0a2' vs 'claude-agent-sdk>=0.1.6').
Reason this comment was not posted:
Comment looked like it was already resolved.

2. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py:374

Draft comment:
Avoid 'await release_proxy()' since release_proxy() is a synchronous function. Either remove the await or convert release_proxy() to an async function.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment claims release_proxy() is synchronous and shouldn't be awaited. However, I cannot see the implementation of release_proxy in this diff - it's in a separate file (.proxy). The evidence shows inconsistent usage: in _wrap_async it's called without await (line 308), but in _wrap_async_gen it's awaited (line 375). Similarly, start_proxy() is called without await in _wrap_async (line 289) but with await in _wrap_async_gen (line 326). This could mean: (1) the functions have both sync and async versions, (2) there's a bug in one of the usages, or (3) the comment is incorrect. Without seeing the actual implementation of release_proxy, I cannot definitively say the comment is correct. The inconsistency within the same file is suspicious though. I'm making an assumption about whether release_proxy() is sync or async without seeing its implementation. The inconsistent usage patterns in the code could indicate either a bug or that there are multiple versions of these functions. I should not assume the comment is correct without strong evidence. Given the rule that I need STRONG EVIDENCE to keep a comment, and I cannot see the implementation of release_proxy() in this diff, I don't have strong evidence. The inconsistent usage could be intentional (different contexts require different approaches) or could be a bug in the existing code rather than the line being commented on. Without access to the proxy module implementation, I cannot verify the claim. I should delete this comment because I cannot verify from the diff alone whether release_proxy() is sync or async. The inconsistent usage patterns in the code are not sufficient evidence to confirm the comment is correct, and understanding this requires seeing the implementation in another file.

3. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py:194

Draft comment:
Consider logging exceptions in _record_input instead of silently passing to aid in debugging.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a new file being added, so technically everything in it is "new code" or a "change". However, the comment is suggesting a code quality improvement (logging exceptions instead of silently passing). The rules state that "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This comment is actionable and clear. However, I need to consider: 1) Is there strong evidence this is the right approach? The silent exception handling might be intentional for instrumentation code that shouldn't disrupt the main application flow. 2) Looking at similar patterns in the codebase, _record_output also silently passes exceptions, suggesting this is a deliberate pattern. 3) In instrumentation/observability code, it's often a best practice to fail silently to avoid breaking the instrumented application. The comment doesn't acknowledge this trade-off. The silent exception handling might be intentional design for instrumentation code - observability tooling typically should not break the application it's monitoring. The comment doesn't consider this context and may be suggesting a change that could make the instrumentation more intrusive. Without seeing other similar instrumentation files in this codebase, I can't be certain this is the wrong pattern. While the silent exception handling might be intentional, the comment suggests logging (not raising) exceptions, which would maintain the non-intrusive behavior while improving debuggability. However, there's a logger available in the file, and if the developers wanted logging here, they likely would have added it. The pattern is consistent across both _record_input and _record_output, suggesting it's deliberate. This appears to be a subjective code quality suggestion without strong evidence it's necessary. This comment suggests a code quality improvement but lacks strong evidence that it's necessary. The silent exception handling appears to be an intentional pattern for instrumentation code (used consistently in both _record_input and _record_output), and the comment doesn't acknowledge the trade-offs involved. Since I need strong evidence to keep a comment and this is more of a subjective suggestion, I should delete it.

4. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py:210

Draft comment:
Consider logging exceptions in _record_output rather than silently passing, to help diagnose issues with output serialization.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a code quality suggestion about error handling. The comment is about code that was added in this diff (it's a new file). However, looking at the pattern in the codebase, _record_input uses the exact same silent exception handling pattern. The comment is suggesting a refactor/improvement rather than pointing out a bug. According to the rules, "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This comment is actionable and clear. However, I need to consider: 1) Is this obviously important enough to warrant a comment? 2) Is there strong evidence this is correct? The fact that _record_input uses the same pattern suggests this might be an intentional design choice. The comment doesn't acknowledge this consistency. Also, the rules say "Do NOT comment unless there is clearly a code change required" - this is more of a "nice to have" suggestion rather than a clear requirement. The comment doesn't acknowledge that _record_input uses the exact same pattern, which suggests this might be an intentional design choice across the instrumentation code. If logging were truly important here, it would likely be needed in both places. The comment might be suggesting an inconsistency that doesn't actually exist, or suggesting a change that goes against the established pattern in this code. While the pattern is consistent between _record_input and _record_output, that doesn't necessarily mean it's the right pattern. However, the rules state that comments should point out clear code changes that are required, not suggest improvements. This is more of a "consider doing X" suggestion rather than identifying a clear problem. Without strong evidence that the silent exception handling is causing issues, this falls into the category of speculative improvements. This comment suggests a code quality improvement but doesn't identify a clear bug or required change. The silent exception handling is consistent with the pattern used in _record_input, suggesting it's intentional. Without strong evidence that this is causing problems, this is more of a speculative improvement suggestion rather than a required fix.

Workflow ID: wflow_wwEXhmESWW7Ugdxk

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py

…e_agent/__init__.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py

ellipsis-dev

Caution

Changes requested ❌

Reviewed b7d751d in 1 minute and 53 seconds. Click for details.

Reviewed 378 lines of code in 10 files
Skipped 0 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. pyproject.toml:70

Draft comment:
Dependency version bumped to 'lmnr-claude-code-proxy>=0.1.0a3'. Please ensure that any breaking changes or new features are documented (e.g. in CHANGELOG).
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that breaking changes or new features are documented, which falls under the rule of not asking the author to ensure things. It also relates to dependency changes, which should not be commented on unless it's about confirming intentionality or specific issues.

2. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/proxy.py:70

Draft comment:
Using a background thread with a fixed GRACE_TO_FLUSH_SECONDS delay for stopping the proxy is acceptable, but ensure that this delay does not lead to race conditions in concurrent environments or tests.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the delay does not lead to race conditions, which is against the rules. It does not provide a specific suggestion or ask for a specific test to be written. Therefore, it should be removed.

3. tests/test_initialize.py:12

Draft comment:
The fixture nicely saves and restores Laminar's internal state. Consider adding a short comment explaining why manual state management is critical in tests.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

4. tests/test_instrumentations/test_claude_agent/conftest.py:4

Draft comment:
The 'cleanup_claude_proxy' fixture efficiently cleans up the proxy server before and after tests. Ensure that suppressing errors in _cleanup_proxy does not hide underlying issues during debugging.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

5. tests/test_instrumentations/test_claude_agent/test_claude_sdk_client.py:9

Draft comment:
The test for ClaudeSDKClient verifies span creation and attributes correctly. Confirm that using 'asyncio.run' aligns with your event loop policy and that the order of spans is deterministic.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

6. tests/test_instrumentations/test_claude_agent/test_query.py:12

Draft comment:
The query test validates that the span attributes (name, path, output) are correctly set. Remove any commented-out prompt placeholders if they are no longer needed.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50% None

7. tests/test_instrumentations/test_claude_agent/test_query_with_alias.py:9

Draft comment:
The alias test for 'query' confirms that both the original and aliased functions produce the same span. Ensure consistent naming conventions are maintained.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50% None

8. tests/test_instrumentations/test_claude_agent/test_tool.py:14

Draft comment:
Tool tests correctly define decorated tools and validate the generated spans. Be cautious with hard-coded span counts (e.g. expecting 9 spans) which might change if instrumentation behavior is updated.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

Workflow ID: wflow_vv4fQqg60Q38x9gq

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-11-17T00:26:59Z

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py

+        try:
+            result = await wrapped(*args, **kwargs)
+        except Exception as e:  # pylint: disable=broad-except
+            if original_base_url is not None:


The async wrapper (_wrap_async) duplicates environment restoration logic for ANTHROPIC_BASE_URL. Consider extracting this into a helper function to avoid duplication and potential inconsistencies.

Rainhunter13 and others added 9 commits November 13, 2025 22:38

fix recording span error on iterator end

01c428f

support aliases for module functions

220f4a0

add create_sdk_mcp_server wrapping and test

1d85722

mock claude transport for unit tests

28325a5

wip: mvp for starting proxyprocess to save claude cli requests spans

67a5df7

magic: running proxy process for current spans context

aaa6541

package binary solution

8f7ff4e

bump version to 0.7.23a1

d35952a

cursor bot reviewed Nov 16, 2025

View reviewed changes

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py Outdated Show resolved Hide resolved

ellipsis-dev bot reviewed Nov 16, 2025

View reviewed changes

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py Outdated Show resolved Hide resolved

Update src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claud…

110af7c

…e_agent/__init__.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

cursor bot reviewed Nov 16, 2025

View reviewed changes

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/claude_agent/__init__.py Outdated Show resolved Hide resolved

fix tests, add grace before stop server

b7d751d

dinmukhamedm merged commit 0ff547b into main Nov 17, 2025
8 checks passed

ellipsis-dev bot reviewed Nov 17, 2025

View reviewed changes

dinmukhamedm deleted the feat/claude-agent-sdk branch November 17, 2025 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/claude agent sdk #213

Feat/claude agent sdk #213

Uh oh!

dinmukhamedm commented Nov 16, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat/claude agent sdk #213

Feat/claude agent sdk #213

Uh oh!

Conversation

dinmukhamedm commented Nov 16, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dinmukhamedm commented Nov 16, 2025 •

edited by cursor bot

Loading