Skip to content

perf(objectstore): Pass through compressed bytes when client accepts encoding#109571

Merged
jan-auer merged 14 commits intomasterfrom
ref/objectstore-accept-encoding
Mar 2, 2026
Merged

perf(objectstore): Pass through compressed bytes when client accepts encoding#109571
jan-auer merged 14 commits intomasterfrom
ref/objectstore-accept-encoding

Conversation

@jan-auer
Copy link
Member

@jan-auer jan-auer commented Feb 27, 2026

Upgrades to objectstore-client 0.1.0, which adds an accept_encoding parameter to Session.get(). When the stored encoding matches a value the client accepts, the SDK skips decompression and reports the encoding in response metadata.

Both objectstore-backed endpoints now take advantage of this to avoid unnecessary decompress-then-recompress round-trips.

Proxy endpoint (organization.py)

  • Parses the Accept-Encoding header and passes it to Session.get()
  • When the response carries a Content-Encoding, checks whether the browser accepts it before forwarding; decompresses and strips the header if not
  • Strips Content-Length when decompressing, since the value refers to the compressed size

Attachment download endpoint (event_attachment_details.py / eventattachment.py)

  • Adds BlobStream dataclass and EventAttachment.get_blob_stream() to the model — for pure V2 blobs this negotiates encoding with the objectstore; all other blob types (inline, V1, V1+V2 double-write) delegate to getfile() with no encoding
  • download() is reduced to a single path: call get_blob_stream(), stream the payload, set Content-Encoding or Content-Length based on whether encoding was negotiated
  • Removes the double-write read verification scaffolding and the old two-path structure

Shared

  • Adds parse_accept_encoding() to the objectstore module — parses the Accept-Encoding header per RFC 7231, normalises to lowercase, strips q-values, and excludes q=0 encodings

…encoding

For the objectstore proxy endpoint, compute decode_content by comparing
the objectstore response's Content-Encoding against the browser's
Accept-Encoding. If the browser already supports the stored encoding,
forward raw compressed bytes instead of decompressing on the API side.

For the event attachment download endpoint, add a fast path for pure V2
blobs: when the request includes Accept-Encoding, call the objectstore
client with accept_encoding so it can skip decompression and return
metadata.compression. Set Content-Encoding on the response when the SDK
preserves compression, omitting Content-Length (which refers to
uncompressed size).

Bumps objectstore-client to >=0.1.0 which adds the accept_encoding
parameter to Session.get().

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 27, 2026
RFC 7231 §5.3.4 allows clients to explicitly reject an encoding with
q=0 (e.g. Accept-Encoding: identity;q=0). The previous parser stripped
q-values but kept all encoding names regardless, so q=0 entries were
incorrectly treated as accepted.

Co-Authored-By: Claude <noreply@anthropic.com>
If the blob is missing or objectstore is unavailable, the get() call
would propagate a RequestError as a 500. Wrap it in try/except and fall
through to the existing stream_attachment() path instead.

Also drops the `type: ignore` on blob_path by storing it in a local
variable so the type narrowing works without the workaround.

Co-Authored-By: Claude <noreply@anthropic.com>
Move the Accept-Encoding header parser out of the endpoint and into
sentry.objectstore, where it belongs alongside the session utilities
it supports. The function is now public and exported via __all__.

The docstring clarifies that this is intended for objectstore GET
requests and explicitly documents that identity;q=0 is not supported
and will be silently ignored.

Co-Authored-By: Claude <noreply@anthropic.com>
…point

Replace the inline Accept-Encoding parsing with the shared utility and
simplify Content-Encoding handling to a single string — objectstore only
ever applies one encoding, so the set + issubset logic was unnecessary.

Add unit tests for parse_accept_encoding.

Co-Authored-By: Claude <noreply@anthropic.com>
When the proxy decompresses a stored object (because the client's
Accept-Encoding doesn't include the stored encoding), the upstream
Content-Length reflects the compressed size. Forwarding it alongside
a decompressed body causes a size mismatch that can truncate or corrupt
the response on the client side.

Strip Content-Length alongside Content-Encoding when decode_content=True.

Co-Authored-By: Claude <noreply@anthropic.com>
jan-auer and others added 2 commits March 2, 2026 10:52
…stream

Add BlobStream dataclass and get_blob_stream(accept_encoding) to
EventAttachment. For pure V2 blobs, this passes accept_encoding to the
objectstore for compressed transfer to the client. All other blob types
(inline, V1, V1+V2 double-write) delegate to getfile() with no encoding.

The download() endpoint now calls get_blob_stream() and streams the
result directly, removing the two-path structure, the double-write read
verification scaffolding, and a handful of imports.

Co-Authored-By: Claude <noreply@anthropic.com>
…ept-encoding

* origin/master: (63 commits)
  fix(api): Add missing cursor query parameter to paginated endpoint OpenAPI schemas (#109642)
  docs(sentry-apps): Add sentryAppId to sentry-app-installations API schema (#109628)
  feat(occurrences on eap): Implement double reads from EAP in organization events trace API endpoint (#109391)
  feat(occurrences on eap): Implement double reads from EAP for reprocessing2 flow (#109345)
  feat(ci): report backend test fails (#109543)
  feat(seer): Add signed viewer context header to Seer API requests (#109626)
  devenv: cleanup devenv-managed uv (#109617)
  feat(seer): Iterate on the instructions at the top of seer settings pages (#109586)
  ref(seer): Add typed wrappers for remaining Seer API callsites (#109607)
  feat(preprod): Make snapshots endpoint org scoped (#109575)
  chore: capture exception (#109620)
  fix(formatting): run ruff format (#109618)
  feat(preprod): Create admin gated recompare snapshots endpoint (#109546)
  feat(cells): expand locality/cell distinction (#109538)
  feat(cells): add db migration for synapse (#109615)
  feat(preprod): Add public install-details endpoint and shared utilities (#109583)
  fix(tests): Fix flaky test_cross_trace_query_with_spans_and_logs (#109572)
  fix(grouping): Resolve mypy possibly-undefined errors in grouphash caching (#109602)
  fix(dashboards): Default axisRange to auto for existing widgets in builder (#109598)
  fix(billing): Fix category display names in pending changes (#109612)
  ...
…string

Co-Authored-By: Claude <noreply@anthropic.com>
@jan-auer jan-auer marked this pull request as ready for review March 2, 2026 09:58
@jan-auer jan-auer requested review from a team as code owners March 2, 2026 09:58
Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

The project parameter was added to the download() signature but never
used in the method body. Remove it from the signature and call site.

Co-Authored-By: Claude <noreply@anthropic.com>
jan-auer and others added 2 commits March 2, 2026 13:18
….1.1

The function was upstreamed into objectstore-client 0.1.1. Remove the
local implementation and its unit tests, and import from the package.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Member

@lcian lcian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
I tested this and the optimization doesn't work when going through the hybridcloud proxy (because that one always uses decode_content=True for the response part), however the header is always consistent with the actual returned data format, so correctness-wise it's fine (and we shouldn't use the control silo endpoint anyways generally).

@jan-auer jan-auer merged commit 646bd0f into master Mar 2, 2026
78 checks passed
@jan-auer jan-auer deleted the ref/objectstore-accept-encoding branch March 2, 2026 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants