fix(streams): add sliding TTL to Redis stream keys#215
Merged
Conversation
e5d30ff to
11b9540
Compare
- Use raise_on_error=False on pipeline.execute() so an EXPIRE failure doesn't propagate to the caller after XADD already succeeded; log a warning instead. Prevents callers from retrying and duplicating messages on a benign TTL-refresh failure. - Loosen TTL test lower bound from 3590 to 3480 (120s window) to tolerate CI scheduling jitter without losing regression coverage.
declan-scale
approved these changes
May 1, 2026
x
added a commit
to scaleapi/scale-agentex-python
that referenced
this pull request
May 5, 2026
Mirror scaleapi/scale-agentex#215 (server-side adapter): pipeline XADD with EXPIRE so each task:* stream key gets a sliding TTL. Orphaned streams (no writes for the TTL window) self-delete in Redis without needing an explicit cleanup_stream call from the caller. This is the right shape of fix for the SDK's leak: an explicit DEL on terminal task transitions (an earlier draft of this PR) introduced a race where the server's task_updated event published to the same topic could be deleted before a connected frontend SSE consumer read it. EXPIRE sidesteps that — TTL only fires after inactivity, so an actively-streaming agent or actively-reading consumer keeps the key alive, and the key only ages out once everyone is done with it. Defaults match the server: REDIS_STREAM_TTL_SECONDS=3600 (1h), overridable via env var. Setting it to 0 short-circuits to plain XADD (no TTL refresh), matching the server's escape hatch. Implementation notes: - transaction=False on the pipeline: connection-level batching, no MULTI/EXEC overhead for what's already a fast op. - raise_on_error=False: an EXPIRE failure after a successful XADD must not surface to the caller. The message has been published; retrying would duplicate it. We log and move on. Next successful XADD will reset the TTL anyway.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
XADD+EXPIRE.REDIS_STREAM_TTL_SECONDS=3600(1h sliding window).REDIS_STREAM_TTL_SECONDS=0to disable (preserves pre-change behavior).Fixes orphan stream accumulation when the SSE-disconnect cleanup path doesn't run (e.g., consumer never connects, backend crashes mid-task). Active streams stay alive indefinitely — every write refreshes the TTL.
Test plan
test_send_data_sets_ttl_on_stream_key— TTL ~3600s set on key aftersend_datatest_send_data_skips_ttl_when_disabled— no TTL set whenREDIS_STREAM_TTL_SECONDS=0test_send_data_still_applies_maxlen— MAXLEN trimming still worksmake testsuite passes (389 passed locally)Greptile Summary
This PR adds a configurable sliding TTL to Redis stream keys by pipelining
XADD+EXPIREin a single round-trip, defaulting to 3600 s and disabling cleanly atREDIS_STREAM_TTL_SECONDS=0. The implementation correctly handles pipeline errors withraise_on_error=Falseand per-result type checks, and is covered by focused integration tests.Confidence Score: 5/5
Safe to merge — no blocking issues; logic is correct, error handling is sound, and tests provide good coverage.
The previous P1 concern about raise_on_error has been fully addressed. Pipeline results are inspected individually, XADD failures propagate correctly, and EXPIRE failures degrade gracefully. The TTL-disabled path preserves the original behaviour exactly. No new P0 or P1 findings.
No files require special attention.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Caller participant RedisStreamRepository participant RedisPipeline participant Redis Caller->>RedisStreamRepository: send_data(topic, data) RedisStreamRepository->>RedisStreamRepository: check REDIS_STREAM_TTL_SECONDS alt TTL > 0 RedisStreamRepository->>RedisPipeline: pipeline(transaction=False) RedisPipeline->>RedisPipeline: queue XADD(topic, data, maxlen) RedisPipeline->>RedisPipeline: queue EXPIRE(topic, ttl_seconds) RedisPipeline->>Redis: execute(raise_on_error=False) Redis-->>RedisPipeline: [message_id | Exception, True/False | Exception] RedisPipeline-->>RedisStreamRepository: results[] alt results[0] is Exception RedisStreamRepository->>Caller: raise Exception else results[1] is Exception RedisStreamRepository->>RedisStreamRepository: logger.warning(TTL refresh failed) RedisStreamRepository->>Caller: return message_id else RedisStreamRepository->>Caller: return message_id end else TTL == 0 (disabled) RedisStreamRepository->>Redis: xadd(topic, data, maxlen) Redis-->>RedisStreamRepository: message_id RedisStreamRepository->>Caller: return message_id endReviews (3): Last reviewed commit: "fix(streams): address greptile review fi..." | Re-trigger Greptile