release: 0.11.0#343
Conversation
364e2b2 to
b1d20d6
Compare
b1d20d6 to
b837eeb
Compare
b837eeb to
ac067a6
Compare
ac067a6 to
16b956f
Compare
16b956f to
2ea4386
Compare
2ea4386 to
3b9a668
Compare
| event_type = items[0].event_type | ||
| assert all(i.event_type == event_type for i in items), ( | ||
| "_process_items requires all items to share the same event_type; " | ||
| "callers must split START and END batches before dispatching." | ||
| ) |
There was a problem hiding this comment.
assert in production guard defeats data-corruption protection
The code comment correctly identifies this as a potential "silent data-corruption bug," but using assert for the guard means it is silently stripped when Python runs with the -O (optimize) flag. If a caller ever passes a mixed-event-type list, START and END spans would be fed to the wrong batched method with no warning. Use an explicit if/raise instead.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agentex/lib/core/tracing/span_queue.py
Line: 107-111
Comment:
**`assert` in production guard defeats data-corruption protection**
The code comment correctly identifies this as a potential "silent data-corruption bug," but using `assert` for the guard means it is silently stripped when Python runs with the `-O` (optimize) flag. If a caller ever passes a mixed-event-type list, START and END spans would be fed to the wrong batched method with no warning. Use an explicit `if/raise` instead.
How can I resolve this? If you propose a fix, please make it concise.| sgp_spans: list[SGPSpan] = [] | ||
| for span in spans: | ||
| self._add_source_to_span(span) | ||
| sgp_span = create_span( | ||
| name=span.name, | ||
| span_type=_get_span_type(span), | ||
| span_id=span.id, | ||
| parent_id=span.parent_id, | ||
| trace_id=span.trace_id, | ||
| input=span.input, | ||
| output=span.output, | ||
| metadata=span.data, | ||
| ) | ||
| sgp_span.start_time = span.start_time.isoformat() # type: ignore[union-attr] | ||
| self._spans[span.id] = sgp_span | ||
| sgp_spans.append(sgp_span) | ||
|
|
||
| if self.disabled: | ||
| logger.warning("SGP is disabled, skipping span upsert") | ||
| return | ||
| # TODO(AGX1-198): Batch multiple spans into a single upsert_batch call | ||
| # instead of one span per HTTP request. | ||
| # https://linear.app/scale-epd/issue/AGX1-198/actually-use-sgp-batching-for-spans | ||
| await self.sgp_async_client.spans.upsert_batch( # type: ignore[union-attr] | ||
| items=[sgp_span.to_request_params()] | ||
| items=[s.to_request_params() for s in sgp_spans] | ||
| ) |
There was a problem hiding this comment.
_spans populated before upsert — stale entries on HTTP failure
Spans are added to self._spans before the upsert_batch HTTP call (lines 155–156). If the batch upsert throws (network error, server 5xx), the exception is caught upstream by the queue's _handle, but _spans already holds entries for spans whose start event was never delivered to SGP. A subsequent on_spans_end will find those spans, update them, and send end-only upserts — orphaned end events with no matching start on the server.
The old single-span code registered the span in _spans only after a successful upsert, so failures were cleanly skipped on the end path. Consider populating _spans only after confirming the batch call succeeded, or rolling back entries on exception.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agentex/lib/core/tracing/processors/sgp_tracing_processor.py
Line: 141-163
Comment:
**`_spans` populated before upsert — stale entries on HTTP failure**
Spans are added to `self._spans` before the `upsert_batch` HTTP call (lines 155–156). If the batch upsert throws (network error, server 5xx), the exception is caught upstream by the queue's `_handle`, but `_spans` already holds entries for spans whose start event was never delivered to SGP. A subsequent `on_spans_end` will find those spans, update them, and send end-only upserts — orphaned end events with no matching start on the server.
The old single-span code registered the span in `_spans` only after a successful upsert, so failures were cleanly skipped on the end path. Consider populating `_spans` only after confirming the batch call succeeded, or rolling back entries on exception.
How can I resolve this? If you propose a fix, please make it concise.3b9a668 to
b702eb9
Compare
| sgp_span.start_time = span.start_time.isoformat() # type: ignore[union-attr] | ||
| self._spans[span.id] = sgp_span | ||
| sgp_spans.append(sgp_span) | ||
|
|
||
| if self.disabled: | ||
| logger.warning("SGP is disabled, skipping span upsert") | ||
| return | ||
| # TODO(AGX1-198): Batch multiple spans into a single upsert_batch call | ||
| # instead of one span per HTTP request. | ||
| # https://linear.app/scale-epd/issue/AGX1-198/actually-use-sgp-batching-for-spans | ||
| await self.sgp_async_client.spans.upsert_batch( # type: ignore[union-attr] | ||
| items=[sgp_span.to_request_params()] | ||
| items=[s.to_request_params() for s in sgp_spans] | ||
| ) |
There was a problem hiding this comment.
shutdown() crashes with AttributeError when disabled=True and spans are in-flight
on_spans_start now populates self._spans (line 155) before the if self.disabled: return guard (line 158). If any spans are started but not yet ended when shutdown() is called in disabled mode, it reaches self.sgp_async_client.spans.upsert_batch(...) where self.sgp_async_client is None, triggering an AttributeError. Before this PR the disabled path returned before populating _spans, so _spans was always empty at shutdown time and this was never triggered in practice. The fix is to either move the self._spans[span.id] = sgp_span assignment after the if self.disabled guard, or add an early if self.disabled: return check at the top of shutdown() (mirroring how on_spans_end handles it at line 184).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agentex/lib/core/tracing/processors/sgp_tracing_processor.py
Line: 154-163
Comment:
**`shutdown()` crashes with `AttributeError` when `disabled=True` and spans are in-flight**
`on_spans_start` now populates `self._spans` (line 155) **before** the `if self.disabled: return` guard (line 158). If any spans are started but not yet ended when `shutdown()` is called in disabled mode, it reaches `self.sgp_async_client.spans.upsert_batch(...)` where `self.sgp_async_client` is `None`, triggering an `AttributeError`. Before this PR the disabled path returned before populating `_spans`, so `_spans` was always empty at shutdown time and this was never triggered in practice. The fix is to either move the `self._spans[span.id] = sgp_span` assignment after the `if self.disabled` guard, or add an early `if self.disabled: return` check at the top of `shutdown()` (mirroring how `on_spans_end` handles it at line 184).
How can I resolve this? If you propose a fix, please make it concise.b702eb9 to
04eafa5
Compare
04eafa5 to
65af241
Compare
Automated Release PR
0.11.0 (2026-05-05)
Full Changelog: v0.10.4...v0.11.0
Features
usage,response_id, plumbprevious_response_id, opt-inprompt_cache_keyfor stateful responses and prompt caching (#335) (ba5d64b)Chores
This pull request is managed by Stainless's GitHub App.
The semver version number is based on included commit messages. Alternatively, you can manually set the version number in the title of this pull request.
For a better experience, it is recommended to use either rebase-merge or squash-merge when merging this pull request.
🔗 Stainless website
📚 Read the docs
🙋 Reach out for help or questions
Greptile Summary
on_spans_start/on_spans_end) across the tracing stack:AsyncTracingProcessorgets default fan-out implementations,SGPAsyncTracingProcessoroverrides them to coalesce all spans in a drain batch into a singleupsert_batchHTTP call, andAsyncSpanQueue._process_itemsis reworked to group spans by processor before dispatching._spansis populated before theupsert_batchcall (orphaned end events on HTTP failure),shutdown()lacks adisabledguard (crashes withAttributeErrorwhen disabled spans are in-flight), and theasserttype-guard in_process_itemsis silently elided underpython -O.on_spans_startfans out toself.on_span_start, whileSGPAsyncTracingProcessor.on_span_startdelegates toself.on_spans_start; any future subclass that follows the same delegation pattern without also overridingon_spans_startwill hit infinite recursion.Confidence Score: 4/5
Safe to merge with caveats — the three issues from prior review threads remain open, but they are bounded in scope; the new mutual-recursion trap is P2 only.
Score capped at 4 due to the unresolved P1 shutdown() crash when disabled=True (flagged in a prior thread). No new P0 or P1 issues were introduced in this pass; the mutual-recursion concern and empty-batch HTTP call are P2.
src/agentex/lib/core/tracing/processors/sgp_tracing_processor.py — unresolved shutdown/disabled crash and stale-_spans-on-failure issues from prior review; src/agentex/lib/core/tracing/span_queue.py — assert guard elided under -O.
Important Files Changed
_spansbefore the HTTP call (stale entry on failure) andshutdown()crashes on disabled=True with in-flight spans — both flagged in previous review threads plus a new empty-batch HTTP call issue.on_spans_start/on_spans_endmethods that fan out to per-span calls; design creates a mutual-recursion trap for future subclasses that delegateon_span_start→on_spans_startwithout also overridingon_spans_start._process_itemsto group spans by processor and dispatch via batched methods;assertguard for same-event-type precondition is stripped by-O(flagged in prior review thread); logic otherwise sound.upsert_batchcall and correct span tracking; coverage is thorough for the happy path.TestProcessItemsPreconditionsandTestAsyncSpanQueueBatchedDispatchsuites; mock helper updated to fan out batched calls to per-span mocks for backwards-compatible test assertions.Sequence Diagram
sequenceDiagram participant Q as AsyncSpanQueue participant PI as _process_items participant AP as AsyncTracingProcessor (default) participant SGP as SGPAsyncTracingProcessor Q->>PI: _process_items(starts) PI->>PI: group spans by processor PI->>SGP: on_spans_start([span_a, span_b, ...]) SGP->>SGP: populate _spans[id] for each SGP->>SGP: if disabled → return SGP-->>SGP: upsert_batch(all spans in one HTTP call) Note over PI,AP: Processor using default fallback PI->>AP: on_spans_start([span_a, span_b]) AP->>AP: asyncio.gather(on_span_start(a), on_span_start(b)) AP-->>AP: per-span exceptions caught and logged Q->>PI: _process_items(ends) PI->>SGP: on_spans_end([span_a, span_b, ...]) SGP->>SGP: pop _spans, update fields SGP-->>SGP: upsert_batch(ended spans) Q->>SGP: shutdown() SGP-->>SGP: upsert_batch(_spans remaining) Note over SGP: crashes if disabled=True with spans in-flightPrompt To Fix All With AI
Reviews (10): Last reviewed commit: "release: 0.11.0" | Re-trigger Greptile