fix: cleanup maintainers duplicates and prevent them [CM-677]#3971
fix: cleanup maintainers duplicates and prevent them [CM-677]#3971
Conversation
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
|
|
|
|
||
| -- Step 3: Create the correct unique index on non-nullable columns. | ||
| CREATE UNIQUE INDEX maintainers_internal_repo_identity_role_unique_idx | ||
| ON "maintainersInternal" ("repoId", "identityId", role); |
There was a problem hiding this comment.
Role changes create orphaned rows never end-dated
Medium Severity
The new unique index on ("repoId", "identityId", role) relaxes uniqueness from the prior ("repoId", "identityId") constraint. When compare_and_update_maintainers detects a role change, calling upsert_maintainer with the new role no longer conflicts with the old-role row — it inserts a new row instead. The old-role row is never end-dated because the removal loop (keyed by github_username) still sees the person in the new list. This causes orphaned active-looking rows to accumulate indefinitely. On subsequent runs, get_maintainers_for_repo returns multiple rows per person, but current_maintainers_dict (keyed by username) silently drops all but one, leading to non-deterministic comparison behavior.
Additional Locations (1)
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes duplicate maintainersInternal rows by migrating uniqueness enforcement to non-nullable columns and updating the Git Integration service’s upsert logic to match the new constraint.
Changes:
- Adds a migration to de-duplicate
maintainersInternalrows and replace the existing unique index with a new unique index on("repoId", "identityId", role). - Updates
upsert_maintainerto useON CONFLICT ("repoId", "identityId", role)and to preserve an existingstartDatewhile clearingendDateon update. - Adds a corresponding undo migration file (currently empty).
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| services/apps/git_integration/src/crowdgit/database/crud.py | Aligns application upsert conflict target and update behavior with the new DB uniqueness rule. |
| backend/src/database/migrations/V1774628604__fixMaintainersInternalUniqueIndex.sql | Cleans up duplicates and replaces the unique index to enforce correct uniqueness semantics. |
| backend/src/database/migrations/U1774628604__fixMaintainersInternalUniqueIndex.sql | Undo migration placeholder (empty). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ON CONFLICT ("repoId", "identityId", role) DO UPDATE | ||
| SET "originalRole" = EXCLUDED."originalRole", | ||
| "repoUrl" = EXCLUDED."repoUrl", | ||
| "startDate" = COALESCE("maintainersInternal"."startDate", EXCLUDED."startDate"), | ||
| "endDate" = NULL, | ||
| "updatedAt" = NOW() |
There was a problem hiding this comment.
With the new conflict target ("repoId", "identityId", role), upserts for the same repo/identity but a different role will insert a new row instead of updating an existing row. The current maintainer sync logic (e.g., building current_maintainers_dict keyed only by github_username) appears to assume at most one row per username, so allowing multiple roles can lead to nondeterministic behavior and leave multiple active rows when a role changes. If the intent is still “one row per repo/identity”, keep the conflict target aligned to that (and update role in the DO UPDATE). If the intent is “one row per repo/identity/role”, update the sync logic to key by (username, role) and/or end-date the old role row when a role changes so only one active row remains per username/role.
| -- The unique index on ("repoId", "identityId", "startDate", "endDate") fails to | ||
| -- prevent duplicates because startDate and endDate are NULL on insert, and | ||
| -- PostgreSQL treats NULL != NULL in unique indexes — so the constraint never fires. | ||
| -- | ||
| -- Fix: | ||
| -- Unique index on ("repoId", "identityId", role). All three columns are NOT NULL, | ||
| -- so the uniqueness check always works. startDate/endDate become purely informational. |
There was a problem hiding this comment.
The migration header comment says the existing unique index is on ("repoId", "identityId", "startDate", "endDate"), but the index being dropped (maintainers_internal_repo_identity_unique_idx) is defined in this repo’s earlier migrations as ("repoId", "identityId") only. Please update the comment to match the actual index being replaced (and the real reason it allowed the problematic duplicates), so future readers don’t assume the prior schema included start/end dates in the uniqueness constraint.
| -- The unique index on ("repoId", "identityId", "startDate", "endDate") fails to | |
| -- prevent duplicates because startDate and endDate are NULL on insert, and | |
| -- PostgreSQL treats NULL != NULL in unique indexes — so the constraint never fires. | |
| -- | |
| -- Fix: | |
| -- Unique index on ("repoId", "identityId", role). All three columns are NOT NULL, | |
| -- so the uniqueness check always works. startDate/endDate become purely informational. | |
| -- The existing unique index on ("repoId", "identityId") does not match how we | |
| -- use this table in practice, and over time duplicate rows for the same | |
| -- ("repoId", "identityId", role) were inserted and persisted. | |
| -- | |
| -- Fix: | |
| -- Enforce uniqueness on ("repoId", "identityId", role). All three columns are | |
| -- NOT NULL, so the uniqueness check always works, and startDate/endDate remain | |
| -- purely informational. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| "repoUrl" = EXCLUDED."repoUrl", | ||
| "startDate" = COALESCE("maintainersInternal"."startDate", EXCLUDED."startDate"), | ||
| "endDate" = NULL, | ||
| "updatedAt" = NOW() |
There was a problem hiding this comment.
Role changes create orphaned active maintainer records
High Severity
Changing ON CONFLICT from ("repoId", "identityId", ...) to ("repoId", "identityId", role) means that when a maintainer's role changes, the upsert no longer conflicts with the existing record — it silently inserts a new row instead of updating the old one. The previous unique index on ("repoId", "identityId") prevented two records for the same person in the same repo. Now the relaxed constraint allows it, but compare_and_update_maintainers never calls set_maintainer_end_date on the old-role record, leaving it active indefinitely. This produces orphaned duplicate maintainer records — one per old role — with no endDate.


This pull request addresses an issue with the uniqueness constraint on the
maintainersInternaltable, ensuring that duplicate maintainer records are properly prevented and cleaned up. The changes update both the database schema and the application logic to enforce uniqueness based on non-nullable columns, and remove existing duplicate entries.Database migration and data integrity:
V1774628604__fixMaintainersInternalUniqueIndex.sql) that:maintainersInternal, keeping only the most recently updated entry for each (repoId,identityId,role) group.("repoId", "identityId", role)to ensure proper uniqueness enforcement.Application logic update:
upsert_maintainerfunction incrud.pyto use the new unique constraint by changing theON CONFLICTclause to match on("repoId", "identityId", role)instead of the previous set including nullable dates. The update logic also now preservesstartDateif it is already set.Note
Medium Risk
Includes a data-cleanup migration that deletes duplicate rows and changes a uniqueness constraint, which can impact existing data and upsert behavior if assumptions about roles/dates differ in production.
Overview
Fixes duplicate
maintainersInternalrecords by replacing the ineffective unique constraint (previously relying on nullablestartDate/endDate) with a new unique index on non-nullable("repoId", "identityId", role).Adds a migration that deduplicates existing rows (keeping the most recently updated per repo/identity/role) and updates the git integration
upsert_maintainerto target the new conflict key while preserving an existingstartDate, refreshingrepoUrl/originalRole, and clearingendDateon update.Written by Cursor Bugbot for commit 6295c7e. This will update automatically on new commits. Configure here.