Skip to content

fix: cleanup maintainers duplicates and prevent them [CM-677]#3971

Merged
mbani01 merged 2 commits intomainfrom
fix/maintainers_duplicates
Mar 27, 2026
Merged

fix: cleanup maintainers duplicates and prevent them [CM-677]#3971
mbani01 merged 2 commits intomainfrom
fix/maintainers_duplicates

Conversation

@mbani01
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 commented Mar 27, 2026

This pull request addresses an issue with the uniqueness constraint on the maintainersInternal table, ensuring that duplicate maintainer records are properly prevented and cleaned up. The changes update both the database schema and the application logic to enforce uniqueness based on non-nullable columns, and remove existing duplicate entries.

Database migration and data integrity:

  • Adds a migration (V1774628604__fixMaintainersInternalUniqueIndex.sql) that:
    • Removes duplicate rows from maintainersInternal, keeping only the most recently updated entry for each (repoId, identityId, role) group.
    • Drops the old unique index that relied on nullable columns.
    • Creates a new unique index on the non-nullable columns ("repoId", "identityId", role) to ensure proper uniqueness enforcement.

Application logic update:

  • Updates the upsert_maintainer function in crud.py to use the new unique constraint by changing the ON CONFLICT clause to match on ("repoId", "identityId", role) instead of the previous set including nullable dates. The update logic also now preserves startDate if it is already set.

Note

Medium Risk
Includes a data-cleanup migration that deletes duplicate rows and changes a uniqueness constraint, which can impact existing data and upsert behavior if assumptions about roles/dates differ in production.

Overview
Fixes duplicate maintainersInternal records by replacing the ineffective unique constraint (previously relying on nullable startDate/endDate) with a new unique index on non-nullable ("repoId", "identityId", role).

Adds a migration that deduplicates existing rows (keeping the most recently updated per repo/identity/role) and updates the git integration upsert_maintainer to target the new conflict key while preserving an existing startDate, refreshing repoUrl/originalRole, and clearing endDate on update.

Written by Cursor Bugbot for commit 6295c7e. This will update automatically on new commits. Configure here.

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 self-assigned this Mar 27, 2026
@mbani01 mbani01 requested review from Copilot and joanagmaia and removed request for Copilot March 27, 2026 16:48
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

joanagmaia
joanagmaia previously approved these changes Mar 27, 2026
Copy link
Copy Markdown
Contributor

@joanagmaia joanagmaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread services/apps/git_integration/src/crowdgit/database/crud.py

-- Step 3: Create the correct unique index on non-nullable columns.
CREATE UNIQUE INDEX maintainers_internal_repo_identity_role_unique_idx
ON "maintainersInternal" ("repoId", "identityId", role);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Role changes create orphaned rows never end-dated

Medium Severity

The new unique index on ("repoId", "identityId", role) relaxes uniqueness from the prior ("repoId", "identityId") constraint. When compare_and_update_maintainers detects a role change, calling upsert_maintainer with the new role no longer conflicts with the old-role row — it inserts a new row instead. The old-role row is never end-dated because the removal loop (keyed by github_username) still sees the person in the new list. This causes orphaned active-looking rows to accumulate indefinitely. On subsequent runs, get_maintainers_for_repo returns multiple rows per person, but current_maintainers_dict (keyed by username) silently drops all but one, leading to non-deterministic comparison behavior.

Additional Locations (1)
Fix in Cursor Fix in Web

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings March 27, 2026 17:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes duplicate maintainersInternal rows by migrating uniqueness enforcement to non-nullable columns and updating the Git Integration service’s upsert logic to match the new constraint.

Changes:

  • Adds a migration to de-duplicate maintainersInternal rows and replace the existing unique index with a new unique index on ("repoId", "identityId", role).
  • Updates upsert_maintainer to use ON CONFLICT ("repoId", "identityId", role) and to preserve an existing startDate while clearing endDate on update.
  • Adds a corresponding undo migration file (currently empty).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File Description
services/apps/git_integration/src/crowdgit/database/crud.py Aligns application upsert conflict target and update behavior with the new DB uniqueness rule.
backend/src/database/migrations/V1774628604__fixMaintainersInternalUniqueIndex.sql Cleans up duplicates and replaces the unique index to enforce correct uniqueness semantics.
backend/src/database/migrations/U1774628604__fixMaintainersInternalUniqueIndex.sql Undo migration placeholder (empty).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +342 to +347
ON CONFLICT ("repoId", "identityId", role) DO UPDATE
SET "originalRole" = EXCLUDED."originalRole",
"repoUrl" = EXCLUDED."repoUrl",
"startDate" = COALESCE("maintainersInternal"."startDate", EXCLUDED."startDate"),
"endDate" = NULL,
"updatedAt" = NOW()
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new conflict target ("repoId", "identityId", role), upserts for the same repo/identity but a different role will insert a new row instead of updating an existing row. The current maintainer sync logic (e.g., building current_maintainers_dict keyed only by github_username) appears to assume at most one row per username, so allowing multiple roles can lead to nondeterministic behavior and leave multiple active rows when a role changes. If the intent is still “one row per repo/identity”, keep the conflict target aligned to that (and update role in the DO UPDATE). If the intent is “one row per repo/identity/role”, update the sync logic to key by (username, role) and/or end-date the old role row when a role changes so only one active row remains per username/role.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +10
-- The unique index on ("repoId", "identityId", "startDate", "endDate") fails to
-- prevent duplicates because startDate and endDate are NULL on insert, and
-- PostgreSQL treats NULL != NULL in unique indexes — so the constraint never fires.
--
-- Fix:
-- Unique index on ("repoId", "identityId", role). All three columns are NOT NULL,
-- so the uniqueness check always works. startDate/endDate become purely informational.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration header comment says the existing unique index is on ("repoId", "identityId", "startDate", "endDate"), but the index being dropped (maintainers_internal_repo_identity_unique_idx) is defined in this repo’s earlier migrations as ("repoId", "identityId") only. Please update the comment to match the actual index being replaced (and the real reason it allowed the problematic duplicates), so future readers don’t assume the prior schema included start/end dates in the uniqueness constraint.

Suggested change
-- The unique index on ("repoId", "identityId", "startDate", "endDate") fails to
-- prevent duplicates because startDate and endDate are NULL on insert, and
-- PostgreSQL treats NULL != NULL in unique indexes — so the constraint never fires.
--
-- Fix:
-- Unique index on ("repoId", "identityId", role). All three columns are NOT NULL,
-- so the uniqueness check always works. startDate/endDate become purely informational.
-- The existing unique index on ("repoId", "identityId") does not match how we
-- use this table in practice, and over time duplicate rows for the same
-- ("repoId", "identityId", role) were inserted and persisted.
--
-- Fix:
-- Enforce uniqueness on ("repoId", "identityId", role). All three columns are
-- NOT NULL, so the uniqueness check always works, and startDate/endDate remain
-- purely informational.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

"repoUrl" = EXCLUDED."repoUrl",
"startDate" = COALESCE("maintainersInternal"."startDate", EXCLUDED."startDate"),
"endDate" = NULL,
"updatedAt" = NOW()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Role changes create orphaned active maintainer records

High Severity

Changing ON CONFLICT from ("repoId", "identityId", ...) to ("repoId", "identityId", role) means that when a maintainer's role changes, the upsert no longer conflicts with the existing record — it silently inserts a new row instead of updating the old one. The previous unique index on ("repoId", "identityId") prevented two records for the same person in the same repo. Now the relaxed constraint allows it, but compare_and_update_maintainers never calls set_maintainer_end_date on the old-role record, leaving it active indefinitely. This produces orphaned duplicate maintainer records — one per old role — with no endDate.

Additional Locations (1)
Fix in Cursor Fix in Web

@mbani01 mbani01 merged commit 0c10504 into main Mar 27, 2026
19 checks passed
@mbani01 mbani01 deleted the fix/maintainers_duplicates branch March 27, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants