fix(groups): preserve authoritative names and stable gid aliases by lwittwer · Pull Request #35 · lrhodin/imessage

lwittwer · 2026-03-27T00:06:31Z

Title

fix(groups): preserve authoritative names and stable gid aliases

Summary

This fixes two related group-portal problems: custom group names being clobbered on restart, and gid alias / participant matching drifting onto the wrong portal.

Changes

treat CloudKit / cached group names as authoritative and avoid participant-fallback renames for existing rooms
tighten participant-set matching so diff=1 only passes for self-only differences
harden gid alias eviction and stale-cache self-healing
fold the participant-list helper refactor into the final behavior change for a smaller PR surface

Notes

Scope is the shared group portal logic; no chat.db-only fixes are included here.

mackid1993

Risk Analysis

1. participantSetsMatch signature change is a breaking semantic tightening (medium risk)

The old participantSetsMatch(a, b) allowed any single-member difference (diff <= 1). The new participantSetsMatch(a, b, selfHandle) only allows diff == 1 when the differing member is normalizedSelf. This is the correct fix for the reported bug (non-self differences wrongly passing), but it is a behavioral change that flows through 6+ call sites simultaneously:

resolveExistingGroupPortalID (2 paths)
resolveExistingGroupByGid steps 2, 3, and 4
findPortalIDsByParticipants (cloud_chat DB)
guidCacheMatchIsStale (new helper, wraps participantSetsMatch)

If any call site passes an empty selfHandle, participantSetsMatch now returns false for ANY diff >= 1 (the normalizedSelf != "" guard). This is safe because the only empty-selfHandle path is guidCacheMatchIsStale which returns false (not stale) when portalIDStr has no comma — and participantSetsMatch is never reached with an empty handle. However, a future caller that passes "" will silently get strict-equality semantics, which could cause hard-to-diagnose portal creation failures. Consider adding a brief doc comment on participantSetsMatch noting this behavior.

2. resolveExistingGroupByGid stale-guid self-heal acquires write lock inside read-lock iteration (medium risk)

In step 1 of resolveExistingGroupByGid, the code iterates imGroupGuids under RLock, then on a stale match: releases RLock, acquires write Lock to delete the entry, releases write Lock, does DB work, then re-acquires RLock and continues. This pattern is already used elsewhere in the file (pre-existing), so it is consistent. However, the map mutation + re-iteration combination means the for loop may visit already-deleted entries or skip entries due to Go's map iteration semantics after mutation. In practice this is benign (worst case: a stale entry is visited twice or skipped, caught on the next message), but worth noting.

3. handleReadReceipt / handleTyping staleness check calls guidCacheMatchIsStale while holding imGroupGuidsMu.RLock (low risk)

The diff shows guidCacheMatchIsStale is called inside the for portalIDStr, guid := range c.imGroupGuids loop. guidCacheMatchIsStale itself does NOT acquire imGroupGuidsMu (it only reads the portal ID string and calls normalizeIdentifierForPortalID + participantSetsMatch), so there is no deadlock. This is safe.

4. makePortalKey gid-alias staleness eviction (low risk)

The new code in makePortalKey canonicalizes participants before the staleness check. The guard len(participants) > 0 && len(canonical) > 0 prevents eviction when no participant info is available (e.g., some APNs messages). The compare-before-delete pattern (if c.gidAliases[gidID] == aliasedID) is correct for preventing races. Good defensive coding.

Complexity Review

1. buildCanonicalParticipantList extraction — good simplification

Extracting the repeated normalize/dedup/sort logic into a shared helper reduces four copy-pasted blocks to one. Clean and correct.

2. resolveGroupName returning (string, bool) — appropriate

The authoritative flag is well-motivated: it prevents refreshGroupPortalNamesFromContacts from clobbering user-set names with contact-derived fallbacks. The change is minimal and targeted.

3. resolveExistingGroupByGid step 3 now prefers exact over fuzzy matches — mild added complexity, justified

The old code returned the first match (arbitrary map iteration order). The new code collects exactMatch/fuzzyMatch and prefers exact. This is a behavioral improvement that avoids non-deterministic portal selection. The two extra variables and break/continue logic are straightforward.

4. guidCacheMatchIsStale helper — clean abstraction

Used in 4 call sites (handleReadReceipt, handleTyping, resolveExistingGroupByGid, makePortalKey/makeDeletePortalKey). Logic is simple and well-documented.

5. Step 2 in resolveExistingGroupByGid now excludes comma-based portals — intentional narrowing

The comment explains that step 3 (groupPortalIndex) handles comma-based portals more reliably. This is a meaningful semantic change but the comment makes the rationale clear. No concern.

Regression Potential

1. False-negative portal matching (medium regression potential)

The tighter participantSetsMatch could cause a portal not to be found when it previously would have been, if the single differing member is NOT self. This is the intended fix (two different groups with one non-self member difference should NOT match), but if there are edge cases where Apple reports self with an unexpected identifier that doesn't match selfHandle after normalization, a valid alias could be rejected. The normalizeIdentifierForPortalID(selfHandle) call in participantSetsMatch handles tel:/mailto: normalization, but watch for:

Edge case: user has multiple Apple IDs or handles that aren't in isMyHandle — buildCanonicalParticipantList filters those out, but participantSetsMatch's self check only knows the single selfHandle.

2. handleReadReceipt / handleTyping case-insensitive guid comparison + staleness check

The diff changes guid == *msg.SenderGuid to strings.EqualFold(guid, *msg.SenderGuid). This is a good hardening (guids should be case-insensitive UUIDs), but it widens the match set, which is then narrowed by the new staleness check. Net effect is more correct, but if a message arrives with no participants (len(rawParticipants) == 0), guidCacheMatchIsStale returns false (not stale), so the match proceeds as before. No regression.

3. refreshGroupPortalNamesFromContacts now skips non-authoritative renames for portals with existing names

This is the core name-clobbering fix. Regression scenario: a portal has a CloudKit/cached name that was actually stale (e.g., from a previous incorrect sync), and a contact-based name would have been more correct. Now the stale authoritative name persists. This seems like the right trade-off since real renames come through APNs, but monitor for stale name reports.

4. handleParticipantChange guid cache guard

The new guard prevents caching sender_guid for portals where the gid doesn't match. This could cause a cache miss later in handleReadReceipt/handleTyping for messages that would previously have had a cached guid. In practice, the guard is narrow (only blocks non-comma, non-matching-gid portals), so impact should be minimal.

Suggested Changes

Minor: participantSetsMatch tracks only the LAST diffMember when diff > 1. When both A-not-in-B and B-not-in-A contribute a diff member, diffMember gets overwritten. This is fine because diff > 1 returns false regardless, but the variable name diffMember (singular) is slightly misleading. No code change needed, but a comment like // diffMember only meaningful when diff == 1 would clarify.
Consider: add a unit test for participantSetsMatch with empty selfHandle. The empty-selfHandle-means-strict-equality semantics is non-obvious and could surprise future callers.
Nit: makeDeletePortalKey stale-alias eviction (lines around the handleDeleteChat path) only evicts the alias but doesn't do the DB metadata self-heal that resolveExistingGroupByGid does. This is probably fine for delete operations (the portal is being deleted anyway), but worth documenting the asymmetry.

Overall Assessment

This is a well-targeted fix for two real problems (name clobbering and gid alias drift). The buildCanonicalParticipantList extraction and guidCacheMatchIsStale helper reduce duplication. The tighter participantSetsMatch semantics are correct but represent a meaningful behavioral change that should be monitored after deploy. The lock handling follows existing patterns in the codebase. Good PR.

mackid1993 · 2026-03-27T00:36:02Z

Triage Summary

PR	Title	Risk	Recommendation
#32	vCard group prefix strip	Low	Approve — clean 4-line fix
#34	Canonicalize DM senders in backfill	Low	Approve — minor follow-up for tapback sender
#37	HEIC decode when conversion disabled	Low	Approve — small, well-scoped
#35	Preserve group names & stable gid aliases	Medium	Approve with monitoring
#36	Target replies/tapbacks to correct part	Medium	Approve with caveat — verify transition behavior
#33	Stabilize SMS/RCS portal handling	Medium-High	Request changes — has actionable bugs

This PR (#35): Good refactor that reduces complexity. Monitor post-deploy for: tighter participantSetsMatch semantics (could false-negative if Apple reports self with unexpected identifier), and stale authoritative names persisting if APNs rename events are missed.

mackid1993

Second Risk Review

The first review by mackid1993 was thorough. I focused on areas that may have been under-examined.

1. Missing test coverage for `participantSetsMatch` semantic change (notable gap)

There are no unit tests for participantSetsMatch anywhere in the repo (pkg/connector/*_test.go has no coverage for it). The signature and behavior change from diff <= 1 (any member) to diff == 1 && diffMember == normalizedSelf is the highest-impact semantic change in this PR — it affects 5+ call sites across portal resolution, guid caching, read receipts, and typing indicators. The empty-selfHandle strict-equality fallback and the diffMember overwrite-on-multi-diff behavior (first review, item 1) are non-obvious edge cases that really should have test coverage before merge.

2. `resolveExistingGroupByGid` step 1: stale-heal does synchronous DB write on the message-handling hot path (performance concern)

The new stale-guid self-heal in step 1 does GetExistingPortalByKey + metadata mutation + stalePortal.Save(ctx) synchronously during makePortalKey resolution, which is on the critical path for every incoming group message routed through resolveExistingGroupByGid. If the DB is slow or the portal table is large, this adds latency to message delivery. The comment says "MUST be synchronous" because portalToConversation would re-poison the cache, which is valid, but this means a single stale entry causes a DB round-trip on every message until it's cleaned up. In practice this only fires once per stale entry (it deletes+clears), so the amortized cost is fine, but worth noting.

3. `guidCacheMatchIsStale` passes raw `c.handle` to `participantSetsMatch` — normalization happens inside, but the portal ID parts are pre-normalized (subtle correctness note)

In guidCacheMatchIsStale, portalParts comes from splitting a comma-based portal ID (already in canonical/normalized form). incomingParts are normalized from raw participants. But c.handle is passed directly to participantSetsMatch, which calls normalizeIdentifierForPortalID(selfHandle) internally. This is correct — the normalization is idempotent. However, if c.handle is ever set to an already-normalized value (e.g., tel:+1234567890), the double normalization still works. No bug here, just confirming.

4. `handleParticipantChange` guid cache guard could silently drop valid cache entries (low risk)

The new guard:

isOwnGidPortal := portalIDStr == "gid:"+strings.ToLower(*msg.SenderGuid)
if strings.Contains(portalIDStr, ",") || isOwnGidPortal {

This means if finalPortalKey resolved to a different gid: portal (e.g., via alias resolution), the sender_guid is NOT cached for it. This prevents cache poisoning (good), but it also means that after an alias resolves a gid portal to a different gid portal, the guid cache won't be populated, and future handleReadReceipt/handleTyping cache lookups for that guid will miss. They'll fall through to other resolution paths, so correctness is maintained, but with an extra DB lookup each time.

5. `resolveExistingGroupByGid` step 2 narrowing — correct but asymmetric with `resolveExistingGroupPortalID`

Step 2 now only checks gid: portals in imGroupParticipants, deferring comma-based portals to step 3 (groupPortalIndex). However, resolveExistingGroupPortalID (the non-gid path) still uses groupPortalIndex with the old diff-tolerance logic augmented by the new participantSetsMatch guard. The two resolution paths now have subtly different matching behavior for the same participant set. This is intentional per the comments, but makes the resolution logic harder to reason about as a whole.

6. No issues found with lock ordering or deadlocks

All guidCacheMatchIsStale calls from within imGroupGuidsMu.RLock sections are safe — the helper acquires no locks. The RLock-release-WriteLock-release-RLock pattern in resolveExistingGroupByGid step 1 follows existing conventions in the codebase.

Summary

The main actionable item is adding unit tests for participantSetsMatch before merge — the semantic change is significant and the function is pure (no dependencies), making it trivial to test. The rest of the findings are low-risk observations. The overall approach is sound.

mackid1993 · 2026-03-27T00:47:30Z

Proposed Fixes for PR #35

Fix 1: Unit tests for `participantSetsMatch` (must-have)

Create pkg/connector/cloud_backfill_store_test.go with table-driven tests covering the regression boundary between old (diff <= 1) and new (diff == 1 && member is self) behavior.

Fix 2: Async DB write in `resolveExistingGroupByGid` step 1 (nice-to-have)

The synchronous GetExistingPortalByKey + portal.Save sits on the hot message-delivery path. The in-memory cache is already cleared, so the DB write is only for durability and can be async.

Fix 3: Guid cache guard in `handleParticipantChange` misses alias-resolved portals (should-fix)

When finalPortalKey.ID is a gid: portal resolved via gidAliases, isOwnGidPortal is false. Widen from isOwnGidPortal to isGidPortal (strings.HasPrefix(portalIDStr, "gid:")) to cover alias-resolved portals.

See commit(s) for implementation.

- Add table-driven unit tests for participantSetsMatch covering the regression boundary between old (diff<=1) and new (diff==1 && self) behavior. - Widen guid cache guard in handleParticipantChange from isOwnGidPortal to isGidPortal (strings.HasPrefix) to cover alias-resolved gid portals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mackid1993 · 2026-03-27T01:10:48Z

Final Review — Opus Risk Analysis

Risk: Low-Medium | Recommendation: MERGE with monitoring ✅

This PR fixes two real user-facing bugs (custom group names being clobbered by contact-derived fallbacks, and gid alias drift to incorrect portals) with well-targeted changes. The buildCanonicalParticipantList extraction is a net complexity reduction (removes 4 copy-pasted blocks).

Highest-risk change: `participantSetsMatch` semantic tightening

Old: diff <= 1 for any member. New: diff == 1 only if the differing member is normalized self. This is the correct fix — the old behavior caused false-positive matches between genuinely different groups that differed by one non-self member. The new unit tests cover the key boundary conditions (empty selfHandle, self-only diff, non-self diff, duplicates).

Edge case to watch: If Apple reports self with an identifier that doesn't match c.handle after normalization AND isn't in isMyHandle, the new code will false-negative. Low probability but would manifest as duplicate portals.

Other changes are low-risk:

resolveExistingGroupByGid stale-guid self-heal is synchronous but amortized (fires once per stale entry)
RLock/WriteLock/RLock pattern is pre-existing; worst case is idempotent re-processing
refreshGroupPortalNamesFromContacts skip logic correctly preserves authoritative names
strings.EqualFold for guid comparison is strictly more correct

Post-merge monitoring:

Watch for duplicate portals (would indicate false-negative matching)
Watch for stale group name reports (would indicate missed APNs rename events)
Watch for "Skipping stale guid cache entry" log warnings to validate self-heal

No blocking issues found. The reviewer feedback (tests, widened guid cache guard) has been addressed.

mackid1993 · 2026-03-27T01:38:54Z

Review findings

`makeDeletePortalKey` multi-participant fallback produces inconsistent portal IDs (Medium)

pkg/connector/client.go ~line 1937-1947

The "last resort" multi-participant path in makeDeletePortalKey manually constructs a portal ID:

members := make([]string, 0, len(msg.DeleteChatParticipants)+1)
members = append(members, c.handle)
for _, p := range msg.DeleteChatParticipants {
    members = append(members, addIdentifierPrefix(p))
}
sort.Strings(members)

This differs from buildCanonicalParticipantList in three ways:

It does not deduplicate participants (if c.handle already appears in DeleteChatParticipants, it will be included twice)
It uses addIdentifierPrefix instead of normalizeIdentifierForPortalID — these produce different results for the same input
It does not filter self handles from the input list before appending c.handle

This means the portal ID produced here will often not match the portal ID that makePortalKey produces for the same group, causing the delete to target a nonexistent portal and silently fail. The fix would be to use buildCanonicalParticipantList here, consistent with how handleParticipantChange was refactored.

participantSetsMatch previously compared the differing member against a single selfHandle string. Users with multiple Apple ID handles (phone + email) would fail the check when CloudKit stored a different handle variant than c.handle, causing duplicate portal creation. Replace the selfHandle string parameter with an isSelf func(string) bool predicate backed by c.isMyHandle, which checks against all known user handles. Update all 5 call sites and findPortalIDsByParticipants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mackid1993 · 2026-03-27T02:10:05Z

Bug fix pushed (commit `15fd99e`)

`participantSetsMatch` fails for multi-handle users (Medium)

Bug: participantSetsMatch compared the differing member against a single selfHandle string (typically c.handle, the user's primary phone number). Users with multiple Apple ID handles (phone + email) would fail the diff-of-1 check when CloudKit stored the email variant. This caused findPortalIDsByParticipants and the participant-cache lookup in resolveExistingGroupByGid to miss valid matches, potentially creating duplicate portals.

Fix: Changed participantSetsMatch to accept an isSelf func(string) bool predicate instead of a bare selfHandle string. All 5 call sites in client.go and findPortalIDsByParticipants in cloud_backfill_store.go now pass c.isMyHandle, which checks against all known user handles. Updated tests to use the predicate and added a new test case for the multi-handle scenario.

…cheMatchIsStale inputs participantSetsMatch now tracks whether ALL differing members are self handles, not just the last one, fixing false negatives when set A has self_phone and set B has self_email. guidCacheMatchIsStale now canonicalizes participants internally via buildCanonicalParticipantList so all callsites (makeDeletePortalKey, handleReadReceipt, handleTyping) get consistent behavior without needing to pre-canonicalize. Also adds compare-before-delete for gidAliases in makeDeletePortalKey to prevent race conditions, consistent with makePortalKey. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refreshGroupPortalNamesFromContacts was blocking all non-authoritative renames when portal.Name was non-empty. This meant portals created before contacts loaded stayed stuck with raw phone number names. Authoritative names (user-set via iMessage or CloudKit) are already protected by resolveGroupName returning them with priority, so the equality check `newName == portal.Name` is sufficient. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…yloads Typing and read-receipt payloads may only include [sender, target] without the full group roster. The previous symmetric participantSetsMatch comparison rejected valid cached comma-portal matches when the partial payload was shorter than the cached member list. Switch to a subset check: only flag a cache entry as stale when incoming participants contain members NOT present in the portal's set. Missing portal members in the payload are expected and no longer trigger rejection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

guidCacheMatchIsStale returned false for all non-comma portal IDs, which meant gid->gid aliases cached by resolveExistingGroupByGid could never be detected as stale. Once such an alias went stale, messages would route to the wrong room indefinitely. Look up the target portal's participant list from imGroupParticipants when the portal ID has no comma, so gid: aliases get the same staleness validation as comma-based portals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The second return value was never consumed by any caller — simplify the signature to return only the name string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…thoritative renames resolveGroupName now returns (string, bool) where the bool indicates whether the name came from an authoritative source (in-memory cache or CloudKit display_name). refreshGroupPortalNamesFromContacts uses this to skip contact-derived fallback names when the portal already has a name, preventing user-set group names from being clobbered on restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(groups): preserve authoritative names and stable gid aliases

17b062b

mackid1993 reviewed Mar 27, 2026

View reviewed changes

This was referenced Mar 27, 2026

fix(contacts): strip vCard group prefix from property names #32

Merged

fix(backfill): canonicalize DM senders consistently #34

Merged

fix(heic): decode for dimensions/thumbnail when conversion is disabled #37

Merged

This was referenced Mar 27, 2026

fix(connector): target replies and tapbacks to the correct part #36

Merged

fix(connector): stabilize SMS and RCS portal handling #33

Merged

mackid1993 reviewed Mar 27, 2026

View reviewed changes

David and others added 7 commits March 26, 2026 22:45

docs(connector): fix inaccurate comment about sender_guid caching

97d29ca

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(groups): remove unused authoritative bool from resolveGroupName

3969178

The second return value was never consumed by any caller — simplify the signature to return only the name string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mackid1993 merged commit f802693 into lrhodin:master Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(groups): preserve authoritative names and stable gid aliases#35

fix(groups): preserve authoritative names and stable gid aliases#35
mackid1993 merged 10 commits intolrhodin:masterfrom
lwittwer:pr_group_portal_name_stability

lwittwer commented Mar 27, 2026

Uh oh!

mackid1993 left a comment

Uh oh!

mackid1993 commented Mar 27, 2026

Uh oh!

mackid1993 left a comment

Uh oh!

mackid1993 commented Mar 27, 2026

Uh oh!

mackid1993 commented Mar 27, 2026

Uh oh!

mackid1993 commented Mar 27, 2026

Uh oh!

mackid1993 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lwittwer commented Mar 27, 2026

Title

Summary

Changes

Notes

Uh oh!

mackid1993 left a comment

Choose a reason for hiding this comment

Risk Analysis

Complexity Review

Regression Potential

Suggested Changes

Overall Assessment

Uh oh!

mackid1993 commented Mar 27, 2026

Triage Summary

Uh oh!

mackid1993 left a comment

Choose a reason for hiding this comment

Second Risk Review

1. Missing test coverage for participantSetsMatch semantic change (notable gap)

2. resolveExistingGroupByGid step 1: stale-heal does synchronous DB write on the message-handling hot path (performance concern)

3. guidCacheMatchIsStale passes raw c.handle to participantSetsMatch — normalization happens inside, but the portal ID parts are pre-normalized (subtle correctness note)

4. handleParticipantChange guid cache guard could silently drop valid cache entries (low risk)

5. resolveExistingGroupByGid step 2 narrowing — correct but asymmetric with resolveExistingGroupPortalID

6. No issues found with lock ordering or deadlocks

Summary

Uh oh!

mackid1993 commented Mar 27, 2026

Proposed Fixes for PR #35

Fix 1: Unit tests for participantSetsMatch (must-have)

Fix 2: Async DB write in resolveExistingGroupByGid step 1 (nice-to-have)

Fix 3: Guid cache guard in handleParticipantChange misses alias-resolved portals (should-fix)

Uh oh!

mackid1993 commented Mar 27, 2026

Final Review — Opus Risk Analysis

Highest-risk change: participantSetsMatch semantic tightening

Other changes are low-risk:

Post-merge monitoring:

Uh oh!

mackid1993 commented Mar 27, 2026

Review findings

makeDeletePortalKey multi-participant fallback produces inconsistent portal IDs (Medium)

Uh oh!

mackid1993 commented Mar 27, 2026

Bug fix pushed (commit 15fd99e)

participantSetsMatch fails for multi-handle users (Medium)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Missing test coverage for `participantSetsMatch` semantic change (notable gap)

2. `resolveExistingGroupByGid` step 1: stale-heal does synchronous DB write on the message-handling hot path (performance concern)

3. `guidCacheMatchIsStale` passes raw `c.handle` to `participantSetsMatch` — normalization happens inside, but the portal ID parts are pre-normalized (subtle correctness note)

4. `handleParticipantChange` guid cache guard could silently drop valid cache entries (low risk)

5. `resolveExistingGroupByGid` step 2 narrowing — correct but asymmetric with `resolveExistingGroupPortalID`

Fix 1: Unit tests for `participantSetsMatch` (must-have)

Fix 2: Async DB write in `resolveExistingGroupByGid` step 1 (nice-to-have)

Fix 3: Guid cache guard in `handleParticipantChange` misses alias-resolved portals (should-fix)

Highest-risk change: `participantSetsMatch` semantic tightening

`makeDeletePortalKey` multi-participant fallback produces inconsistent portal IDs (Medium)

Bug fix pushed (commit `15fd99e`)

`participantSetsMatch` fails for multi-handle users (Medium)