Skip to content

Conversation

@runningcode
Copy link
Contributor

@runningcode runningcode commented Nov 18, 2025

Summary

Fixes EME-610: Duplicate rows appearing in GitHub status checks after reprocessing/retriggering CI pipelines.

This PR addresses only the duplicate rows issue (not the duplicate check runs issue which is in PR #103521).

Problem: Duplicate Rows After Reprocessing

Issue

When CI is retriggered or artifacts are reprocessed, new artifacts are created with the same app_id as existing artifacts. The status check would then show ALL artifacts, resulting in duplicate rows.

Example:

  • Initial upload: iOS-old, Android-old → 2 rows in status check ✓
  • Reprocess: iOS-new, Android-new created (4 total artifacts for same commit)
  • Status check now shows: iOS-old, iOS-new, Android-old, Android-new → 4 rows

Root Cause

get_sibling_artifacts_for_commit() returned ALL artifacts for a commit without deduplication:

return PreprodArtifact.objects.filter(
    commit_comparison=self.commit_comparison,
    project__organization_id=self.project.organization_id,
).order_by("app_id")

Solution

Deduplicate siblings by app_id:

  1. Group all artifacts by app_id
  2. For each app_id group:
    • If the calling artifact is in that group → use it
    • Otherwise → use the earliest (oldest) artifact for that app_id
  3. Result: Max 1 artifact per app_id

Example with fix:

  • When iOS-new triggers: Returns [iOS-new, Android-old]2 rows
  • When Android-new triggers: Returns [iOS-old, Android-new]2 rows
  • Never shows all 4 artifacts

Testing

Added comprehensive test test_sibling_deduplication_after_reprocessing:

  • Creates initial artifacts: iOS-old, Android-old
  • Simulates reprocessing: iOS-new, Android-new (4 total)
  • Verifies deduplication:
    • ios_new.get_sibling_artifacts_for_commit() → Returns [iOS-new, Android-old]
    • android_new.get_sibling_artifacts_for_commit() → Returns [iOS-old, Android-new]
    • Never returns all 4 artifacts

@linear
Copy link

linear bot commented Nov 18, 2025

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 18, 2025
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #103528   +/-   ##
========================================
  Coverage   80.58%    80.58%           
========================================
  Files        9260      9261    +1     
  Lines      395362    395382   +20     
  Branches    25207     25207           
========================================
+ Hits       318606    318623   +17     
- Misses      76326     76329    +3     
  Partials      430       430           

@runningcode runningcode force-pushed the no/fix-reprocess-duplicate-rows-eme-610 branch from a17ecbe to 96f898e Compare November 18, 2025 17:42
@runningcode runningcode marked this pull request as ready for review November 18, 2025 17:47
@runningcode runningcode requested a review from a team as a code owner November 18, 2025 17:47
Copy link
Member

@trevor-e trevor-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this looks good other than the platform change!


artifacts_by_app_id = defaultdict(list)
for artifact in all_artifacts:
artifacts_by_app_id[artifact.app_id].append(artifact)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also needs to deduplicate by platform. It's valid for a user to upload an Android and iOS build that have the same app ID. And can you please add a test case for this scenario. 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch. Test added!

else:
selected_ids.append(artifacts[0].id)

return PreprodArtifact.objects.filter(id__in=selected_ids).order_by("app_id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a second query needed afterwards, haven't we already fetched everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's a good point, this was done to avoid changing the method signature but changing it isn't that heavy. will adjust.

QuerySet of PreprodArtifact objects, ordered by app_id for stable results
QuerySet of PreprodArtifact objects, deduplicated by app_id, ordered by app_id
"""
from collections import defaultdict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AI loves to do this for some reason but keep all imports at the top unless necessary to break import loops

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

…icate rows in status checks (EME-610)

This commit addresses Trevor's concern in EME-610: duplicate rows appearing in
status checks after reprocessing/retriggering CI pipelines.

Problem:
When CI is retriggered, new artifacts are created with the same app_id as
existing artifacts. The previous implementation returned ALL artifacts for a
commit, causing status checks to show duplicate rows (e.g., 4 rows after
reprocessing instead of 2).

Trevor's quote: "The status check itself isn't duplicated...The issue is the
duplicated content inside the status check, after re-running it went from 2x
builds -> 4x builds."

Solution:
Modified `get_sibling_artifacts_for_commit()` to deduplicate by app_id,
matching the behavior in the Emerge codebase (BaseBuildUtils.ts:findSiblingBuilds):
- Groups all artifacts by app_id
- For each app_id group:
  - If the calling artifact is in that group, use it
  - Otherwise, use the earliest (oldest) artifact for that app_id
- Result: Max 1 artifact per app_id, preventing duplicate rows

Example with fix:
- Initial upload: iOS-old, Android-old → 2 rows ✓
- Reprocess creates: iOS-new, Android-new (4 total artifacts)
- When iOS-new triggers: Returns [iOS-new, Android-old] → 2 rows ✓
- When Android-new triggers: Returns [iOS-old, Android-new] → 2 rows ✓

This ensures status checks consistently show one row per app, even after
multiple reprocessing runs.

Changes:
- Modified PreprodArtifact.get_sibling_artifacts_for_commit() to deduplicate
- Added comprehensive test test_sibling_deduplication_after_reprocessing

All 17 existing tests pass ✅
- Deduplicate by (app_id, artifact_type) instead of just app_id to handle
  Android and iOS apps with the same app_id
- Move defaultdict import to top of file
- Add comment explaining why second query is needed
- Add test case for Android and iOS with same app_id
- Update existing test to set artifact_type fields
@runningcode runningcode merged commit a8dd765 into master Nov 20, 2025
66 checks passed
@runningcode runningcode deleted the no/fix-reprocess-duplicate-rows-eme-610 branch November 20, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants