Skip to content

feat(snapshots): Add all_image_file_names_as_regex (new regex dep)#116427

Merged
NicoHinderling merged 1 commit into
masterfrom
feat/snapshot-all-image-file-names-as-regex
May 29, 2026
Merged

feat(snapshots): Add all_image_file_names_as_regex (new regex dep)#116427
NicoHinderling merged 1 commit into
masterfrom
feat/snapshot-all-image-file-names-as-regex

Conversation

@NicoHinderling
Copy link
Copy Markdown
Contributor

@NicoHinderling NicoHinderling commented May 28, 2026

Adds an optional all_image_file_names_as_regex field to the snapshot upload endpoint. It works like the existing all_image_file_names, except each entry is a regex pattern (matched with fullmatch) rather than a literal file name. This lets callers declare the head build's complete image set with a handful of patterns instead of an exhaustive list — useful for selective uploads, where only changed images are sent but the full set is needed downstream to tell removed images apart from skipped ones.

The two fields are mutually exclusive and both require selective. Every uploaded image name must be covered by the declared set (appear in the list, or match at least one pattern), preserving the existing invariant. fullmatch (not search) is used because the literal field uses exact full-string equality, and full-match is its faithful regex analog.

Renames still work: rename detection runs after the removed/skipped partitioning by matching content hashes across added/removed/skipped, and that shared logic is unchanged.

New dependency: google-re2

Patterns are client-supplied, so they cannot be matched with a backtracking engine (stdlib re or the regex lib) without exposing a ReDoS vector — a pathological pattern can hang a worker. This PR matches them with RE2 (google-re2), a linear-time, finite-automaton engine:

  • Catastrophic backtracking is impossible by construction — no time budget or timeout needed.
  • Unsupported constructs (backreferences, lookaround) are rejected at compile time (re2.error), so the engine enforces the safe subset. These make no sense for filename patterns anyway.

Compiling a pattern is the validation (invalid/unsupported → 400), and the same compiled object does the matching. Pattern length (500 chars) and count (100) caps remain as sanity bounds. Reviewed with security; RE2's linear-time guarantee is the agreed approach over a wall-clock-timeout mitigation.

google-re2 was added to the internal PyPI mirror in getsentry/pypi#2192 (merged). It ships prebuilt wheels for all our targets and has no new transitive dependencies.

Refactor included

To avoid a third parallel code path, the literal and regex modes are unified behind a single SnapshotManifest.head_image_name_matcher() consumed by categorize_image_diff, and the endpoint's coverage checks are consolidated into one helper (make_image_name_matcher). Behavior-preserving. A root_validator on SnapshotManifest enforces the mutual-exclusion invariant at the model level.

@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 28, 2026
@NicoHinderling NicoHinderling marked this pull request as ready for review May 28, 2026 20:36
@NicoHinderling NicoHinderling requested a review from a team as a code owner May 28, 2026 20:36
Comment thread src/sentry/preprod/snapshots/manifest.py
@NicoHinderling NicoHinderling requested a review from a team as a code owner May 28, 2026 21:41
@NicoHinderling NicoHinderling changed the title feat(snapshots): Add all_image_file_names_as_regex to snapshot upload feat(snapshots): Add all_image_file_names_as_regex (new regex dep) May 28, 2026
Comment thread src/sentry/preprod/snapshots/manifest.py Outdated
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b630a47. Configure here.

Comment thread src/sentry/preprod/snapshots/manifest.py Outdated
@NicoHinderling NicoHinderling force-pushed the feat/snapshot-all-image-file-names-as-regex branch from b630a47 to fb729ce Compare May 29, 2026 01:04
Comment thread src/sentry/preprod/snapshots/manifest.py Outdated
@NicoHinderling NicoHinderling force-pushed the feat/snapshot-all-image-file-names-as-regex branch 2 times, most recently from 2c2a28a to 6170488 Compare May 29, 2026 17:17
Comment thread src/sentry/preprod/snapshots/tasks.py
@NicoHinderling NicoHinderling force-pushed the feat/snapshot-all-image-file-names-as-regex branch from 6170488 to 5e396d4 Compare May 29, 2026 17:27
Add an optional all_image_file_names_as_regex field to the snapshot
upload endpoint. It mirrors all_image_file_names but treats each entry as
a regex pattern (matched with fullmatch), so callers can declare the head
build's complete image set with patterns instead of an exhaustive list.
The two fields are mutually exclusive and both require selective.

Unify the literal and regex modes behind a single
SnapshotManifest.head_image_name_matcher() consumed by
categorize_image_diff, and consolidate the endpoint coverage checks into
one helper.

Patterns are client-supplied, so match them with RE2 (google-re2), a
linear-time engine: catastrophic backtracking (ReDoS) is impossible by
construction, and unsupported constructs (backreferences, lookaround) are
rejected at compile time. Pattern length (500 chars) and count (100) caps
are sanity bounds on top.

Co-Authored-By: Claude <noreply@anthropic.com>
@NicoHinderling NicoHinderling force-pushed the feat/snapshot-all-image-file-names-as-regex branch from 5e396d4 to dc1a20c Compare May 29, 2026 18:35
@NicoHinderling NicoHinderling merged commit c5b68df into master May 29, 2026
86 checks passed
@NicoHinderling NicoHinderling deleted the feat/snapshot-all-image-file-names-as-regex branch May 29, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants