feat(snapshots): Add all_image_file_names_as_regex (new regex dep)#116427
Merged
NicoHinderling merged 1 commit intoMay 29, 2026
Conversation
mtopo27
approved these changes
May 28, 2026
Contributor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b630a47. Configure here.
b630a47 to
fb729ce
Compare
2c2a28a to
6170488
Compare
6170488 to
5e396d4
Compare
Add an optional all_image_file_names_as_regex field to the snapshot upload endpoint. It mirrors all_image_file_names but treats each entry as a regex pattern (matched with fullmatch), so callers can declare the head build's complete image set with patterns instead of an exhaustive list. The two fields are mutually exclusive and both require selective. Unify the literal and regex modes behind a single SnapshotManifest.head_image_name_matcher() consumed by categorize_image_diff, and consolidate the endpoint coverage checks into one helper. Patterns are client-supplied, so match them with RE2 (google-re2), a linear-time engine: catastrophic backtracking (ReDoS) is impossible by construction, and unsupported constructs (backreferences, lookaround) are rejected at compile time. Pattern length (500 chars) and count (100) caps are sanity bounds on top. Co-Authored-By: Claude <noreply@anthropic.com>
5e396d4 to
dc1a20c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Adds an optional
all_image_file_names_as_regexfield to the snapshot upload endpoint. It works like the existingall_image_file_names, except each entry is a regex pattern (matched withfullmatch) rather than a literal file name. This lets callers declare the head build's complete image set with a handful of patterns instead of an exhaustive list — useful for selective uploads, where only changed images are sent but the full set is needed downstream to tellremovedimages apart fromskippedones.The two fields are mutually exclusive and both require
selective. Every uploaded image name must be covered by the declared set (appear in the list, or match at least one pattern), preserving the existing invariant.fullmatch(notsearch) is used because the literal field uses exact full-string equality, and full-match is its faithful regex analog.Renames still work: rename detection runs after the removed/skipped partitioning by matching content hashes across
added/removed/skipped, and that shared logic is unchanged.New dependency:
google-re2Patterns are client-supplied, so they cannot be matched with a backtracking engine (stdlib
reor theregexlib) without exposing a ReDoS vector — a pathological pattern can hang a worker. This PR matches them with RE2 (google-re2), a linear-time, finite-automaton engine:re2.error), so the engine enforces the safe subset. These make no sense for filename patterns anyway.Compiling a pattern is the validation (invalid/unsupported → 400), and the same compiled object does the matching. Pattern length (500 chars) and count (100) caps remain as sanity bounds. Reviewed with security; RE2's linear-time guarantee is the agreed approach over a wall-clock-timeout mitigation.
google-re2was added to the internal PyPI mirror in getsentry/pypi#2192 (merged). It ships prebuilt wheels for all our targets and has no new transitive dependencies.Refactor included
To avoid a third parallel code path, the literal and regex modes are unified behind a single
SnapshotManifest.head_image_name_matcher()consumed bycategorize_image_diff, and the endpoint's coverage checks are consolidated into one helper (make_image_name_matcher). Behavior-preserving. Aroot_validatoronSnapshotManifestenforces the mutual-exclusion invariant at the model level.