ref(spans): introduce spans buffer store abstraction by lvthanh03 · Pull Request #116382 · getsentry/sentry

lvthanh03 · 2026-05-28T14:34:18Z

Refs STREAM-1002

SpansBuffer had orchestration, Redis command details, Lua result mapping, key construction, and observability all mixed together. This PR keeps the high-level buffer flow readable while moving low-level Redis mechanics behind SpansBufferStore.

This PR Introduces SpansBufferStore as the redis store class for the SpansBuffer.

This moves Redis-specific mechanics out of SpansBuffer and into the store, including:

Redis key construction
payload writes
add-buffer.lua script loading and EVALSHA execution
Redis result mapping back into InsertedSubsegment
loading flush candidates from queue shards
acquiring flush locks and mapping lock results back to FlushCandidate
loading payload keys, payload bytes, and segment ingest metadata (span count, byte count)
reading the current queue deadline for expiration/loss metrics

flowchart TB
    Relay["Relay span payloads"] --> Process["process_spans"]
    SpanFlusher["SpanFlusher"] --> Flush["flush_segments"]
    SpanFlusher --> Done["done_flush_segments"]

    subgraph Buffer["SpansBuffer"]
        Process --> BuildSubsegments["build Subsegment objects"]
        BuildSubsegments --> ProcessStore
        ProcessStore --> ProcessObs["process metrics and logs"]

        Flush --> FlushStore
        FlushStore --> RecordLoss["record loss metrics"]
        RecordLoss --> BuildFlushed["build FlushedSegment objects"]
        BuildFlushed --> FlushObs["flush metrics and logs"]

        Done --> CleanupStore
        CleanupStore --> DoneObs["done_flush_segments metrics"]

        subgraph Store["self.store: SpansBufferStore"]
            ProcessStore["store_payloads, insert_subsegments, update_queue"]
            FlushStore["load_flush_candidates, acquire_flush_locks, load_segments, queue deadline"]
            CleanupStore["cleanup_flushed_segments"]
        end
    end

    ProcessStore -. "Redis commands and Lua result mapping" .-> Redis[(Redis)]
    FlushStore -. "Redis commands and result mapping" .-> Redis
    CleanupStore -. "Redis cleanup" .-> Redis

    BuildFlushed -->|"FlushedSegment objects"| SpanFlusher

Introduce LoadedSegment to keep a flush candidate, loaded payloads, payload keys, and ingest metadata together through the flush pipeline. This replaces parallel maps keyed by segment key and makes the next Redis store abstraction smaller and safer.

untitaker · 2026-05-28T14:57:13Z

I'm very suspicious about this layer of abstraction. I feel that SpanBuffer itself becomes a very shallow and useless abstraction this way. A lot of the "orchestration" is deep inside add-buffer.lua already, and in order to replicate this logic without redis, we would have to reimplement a lot of it.

Redis isn't slow, so I don't see a need to mock it out. It's about as fast as in-memory operations.

evanh · 2026-05-28T15:26:29Z

Redis isn't slow, so I don't see a need to mock it out.

@untitaker I agree with this, but I do feel like this makes the code easier to follow.

linear-code · 2026-05-28T15:28:35Z

STREAM-1002

lvthanh03 · 2026-05-28T19:55:17Z

I'm very suspicious about this layer of abstraction. I feel that SpanBuffer itself becomes a very shallow and useless abstraction this way.

SpanBuffer is still doing all the orchestration and other smaller operations that does not interact with redis (i.e. group spans by parent).

A lot of the "orchestration" is deep inside add-buffer.lua already, and in order to replicate this logic without redis, we would have to reimplement a lot of it.

I agree, maybe we shouldn't word this as adding an "abstraction" over redis but we're really just adding a class that owns any operation that talks to redis. I would say this addition is mainly for readability.

untitaker

didn't mean to block this. refactoring for readability is always good. i'm just not sure about using this as a testing strategy (i.e. writing most of our tests without redis)

missed a point

lvthanh03 · 2026-05-28T22:45:26Z

i'm just not sure about using this as a testing strategy (i.e. writing most of our tests without redis)

That makes sense now, I misunderstood the initial comment. I thought we were talking about how separating spans buffer redis operations out of the main orchestration class would impact speed.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 27826fc. Configure here.}

fpacifici

Mostly high leve suggestion for follow up. But there is one blocker: why are you not using the real redis in the test_buffer_store module?
Mocking redis is just removing test coverage here.

- remove SpansBuffer key-helper passthroughs - move payload preparation into SpansBufferStore - add docstrings for public store methods - simplify insert_subsegments and flush candidate mapping

untitaker · 2026-05-29T17:11:21Z

+
+pytestmark = [pytest.mark.django_db]
+
+# Keep these tests in their own Redis keyspace. CI runs test files in parallel,


the units of concurrency for running tests have their own redis each. you can use flushdb and in fact it's running automatically after every test.

With flushdb, I ran into the problem locally where I ran pytest tests/sentry/spans/test_buffer* and I would have one singular test failing (test_deep2), but the test ran file in isolation. I will look into it more.

fpacifici

Please see the comment inline.

Re: deployment, while this code is reasonably well tested, it is a large change we generally ship behind feature flags. Please be extra careful to s4s2 (both correctness and performance) when you ship it. You'd have to catch issues before it goes out to prod.

For the future:

please contain the size of your changes. That makes it not only easier to review but also easier to validate. The key is not necessarily the number of lines (copying a file in a different place makes a lot of lines but it is trivial), but the actual logic change. Here the risk is passing the wrong parameter to a function for example.
consider moving some test coverage from test_buffer to test_buffer_store. Now that is a smaller system so we can test more corner cases. You do not have to necessarily remove tests from test_buffer.

fpacifici · 2026-05-29T23:58:45Z

+            if compression_level == -1:
+                self._zstd_compressor = None
+            else:
+                self._zstd_compressor = zstandard.ZstdCompressor(level=compression_level)


You were doing this once per call to process_spans before. Now you are doing this at every batch.
Is this an expensive operation that had to be done rarely ? I'd consider moving into store_payloads unless we are sure this is cheap.

lvthanh03 added 2 commits May 27, 2026 17:37

ref(spans): introduce spans buffer store abstraction

46beb4f

lvthanh03 requested review from a team as code owners May 28, 2026 14:34

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 28, 2026

fix: typing

dc4d1da

Base automatically changed from tony/loaded-segments-datamodel to master May 28, 2026 16:55

untitaker approved these changes May 28, 2026

View reviewed changes

Merge branch 'master' into tony/spans-store-abstraction

5801619

vercel Bot deployed to Preview May 28, 2026 20:35 View deployment

This comment was marked as outdated.

Sign in to view

use actual redis for testing

27826fc

cursor Bot reviewed May 28, 2026

View reviewed changes

Comment thread src/sentry/spans/buffer.py Outdated

fpacifici requested changes May 28, 2026

View reviewed changes

lvthanh03 added 2 commits May 29, 2026 11:57

address feedback

9e12f2e

- remove SpansBuffer key-helper passthroughs - move payload preparation into SpansBufferStore - add docstrings for public store methods - simplify insert_subsegments and flush candidate mapping

fix: tests using redis running in parallel cause failures

aad7efb

lvthanh03 requested a review from fpacifici May 29, 2026 16:31

untitaker reviewed May 29, 2026

View reviewed changes

fpacifici approved these changes May 30, 2026

View reviewed changes


		pytestmark = [pytest.mark.django_db]

		# Keep these tests in their own Redis keyspace. CI runs test files in parallel,

Uh oh!

Conversation

lvthanh03 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

untitaker commented May 28, 2026

Uh oh!

evanh commented May 28, 2026

Uh oh!

linear-code Bot commented May 28, 2026

Uh oh!

lvthanh03 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

untitaker left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

lvthanh03 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fpacifici left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

untitaker May 29, 2026

Choose a reason for hiding this comment

Uh oh!

lvthanh03 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

fpacifici left a comment

Choose a reason for hiding this comment

Uh oh!

fpacifici May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lvthanh03 commented May 28, 2026 •

edited

Loading

lvthanh03 commented May 28, 2026 •

edited

Loading

lvthanh03 commented May 28, 2026 •

edited

Loading