-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix(preprod): Deduplicate sibling artifacts by app_id to prevent duplicate rows in status checks (EME-610) #103528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #103528 +/- ##
========================================
Coverage 80.58% 80.58%
========================================
Files 9260 9261 +1
Lines 395362 395382 +20
Branches 25207 25207
========================================
+ Hits 318606 318623 +17
- Misses 76326 76329 +3
Partials 430 430 |
a17ecbe to
96f898e
Compare
trevor-e
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this looks good other than the platform change!
src/sentry/preprod/models.py
Outdated
|
|
||
| artifacts_by_app_id = defaultdict(list) | ||
| for artifact in all_artifacts: | ||
| artifacts_by_app_id[artifact.app_id].append(artifact) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also needs to deduplicate by platform. It's valid for a user to upload an Android and iOS build that have the same app ID. And can you please add a test case for this scenario. 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch. Test added!
src/sentry/preprod/models.py
Outdated
| else: | ||
| selected_ids.append(artifacts[0].id) | ||
|
|
||
| return PreprodArtifact.objects.filter(id__in=selected_ids).order_by("app_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a second query needed afterwards, haven't we already fetched everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's a good point, this was done to avoid changing the method signature but changing it isn't that heavy. will adjust.
src/sentry/preprod/models.py
Outdated
| QuerySet of PreprodArtifact objects, ordered by app_id for stable results | ||
| QuerySet of PreprodArtifact objects, deduplicated by app_id, ordered by app_id | ||
| """ | ||
| from collections import defaultdict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: AI loves to do this for some reason but keep all imports at the top unless necessary to break import loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
…icate rows in status checks (EME-610) This commit addresses Trevor's concern in EME-610: duplicate rows appearing in status checks after reprocessing/retriggering CI pipelines. Problem: When CI is retriggered, new artifacts are created with the same app_id as existing artifacts. The previous implementation returned ALL artifacts for a commit, causing status checks to show duplicate rows (e.g., 4 rows after reprocessing instead of 2). Trevor's quote: "The status check itself isn't duplicated...The issue is the duplicated content inside the status check, after re-running it went from 2x builds -> 4x builds." Solution: Modified `get_sibling_artifacts_for_commit()` to deduplicate by app_id, matching the behavior in the Emerge codebase (BaseBuildUtils.ts:findSiblingBuilds): - Groups all artifacts by app_id - For each app_id group: - If the calling artifact is in that group, use it - Otherwise, use the earliest (oldest) artifact for that app_id - Result: Max 1 artifact per app_id, preventing duplicate rows Example with fix: - Initial upload: iOS-old, Android-old → 2 rows ✓ - Reprocess creates: iOS-new, Android-new (4 total artifacts) - When iOS-new triggers: Returns [iOS-new, Android-old] → 2 rows ✓ - When Android-new triggers: Returns [iOS-old, Android-new] → 2 rows ✓ This ensures status checks consistently show one row per app, even after multiple reprocessing runs. Changes: - Modified PreprodArtifact.get_sibling_artifacts_for_commit() to deduplicate - Added comprehensive test test_sibling_deduplication_after_reprocessing All 17 existing tests pass ✅
96f898e to
e2d9d18
Compare
- Deduplicate by (app_id, artifact_type) instead of just app_id to handle Android and iOS apps with the same app_id - Move defaultdict import to top of file - Add comment explaining why second query is needed - Add test case for Android and iOS with same app_id - Update existing test to set artifact_type fields
e2d9d18 to
835623f
Compare
Summary
Fixes EME-610: Duplicate rows appearing in GitHub status checks after reprocessing/retriggering CI pipelines.
This PR addresses only the duplicate rows issue (not the duplicate check runs issue which is in PR #103521).
Problem: Duplicate Rows After Reprocessing
Issue
When CI is retriggered or artifacts are reprocessed, new artifacts are created with the same
app_idas existing artifacts. The status check would then show ALL artifacts, resulting in duplicate rows.Example:
Root Cause
get_sibling_artifacts_for_commit()returned ALL artifacts for a commit without deduplication:Solution
Deduplicate siblings by
app_id:app_idapp_idgroup:app_idExample with fix:
[iOS-new, Android-old]→ 2 rows ✓[iOS-old, Android-new]→ 2 rows ✓Testing
Added comprehensive test
test_sibling_deduplication_after_reprocessing:ios_new.get_sibling_artifacts_for_commit()→ Returns [iOS-new, Android-old]android_new.get_sibling_artifacts_for_commit()→ Returns [iOS-old, Android-new]