fix(linear): replace visibility-poll with in-process cache (MNG-741)#1357
Conversation
The visibility-poll added in commit fad4dda throws "Linear description visibility timed out for issue X after 1000ms" whenever Linear's read replica is slower than 1s — which is routine under load. Three planning runs failed back-to-back on prod (2026-05-12) on this signal: MNG-741 run 44c1bc3f 11m "no PM write recorded" (visibility throw propagated up; CLI exited; sidecar never written; gate failed) MNG-736 run 98ae7010 13m same MNG-739 run 1534ab74 16m same The 1s wait was always advisory — it only guarded against in-process consecutive `updateDescription` calls reading a stale GET between PUTs. The new contract removes the wait entirely and adds an in-process recent-description cache (60s TTL, keyed on issueId). After each successful PUT, the cache stores the new description. The next `updateDescription` call consults the cache before mutating — if GET returned a stale pre-PUT value (Linear's eventual-consistency window), the cached fresh value wins. This gives the same in-process read-after-write guarantee the wait was supposed to provide, without throwing on slow Linear days. The wait never solved cross-process consistency (the existing withDescriptionMutationLock is process-local too); the cache doesn't either, but that scope was never claimed. TDD test pins the worst case: Linear GET continues to return the stale pre-PUT description forever; createChecklist + addChecklistItem must still produce a correct PUT containing the appended item via the cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nhopeatall
left a comment
There was a problem hiding this comment.
Summary
Requesting changes because the replacement cache can drop newer Linear descriptions, and this does not satisfy the linked MNG-741 idempotency requirements.
Code Issues
Blocking
- src/pm/linear/adapter.ts:401 — The recent-description cache unconditionally overrides the provider read for 60s. If another worker or a human edits the description after this process's PUT, Linear can return that newer description on the next GET, but
baseDescriptionstill uses the older cache and the nextupdateIssueoverwrites the external edit. This is a data-loss regression introduced by the cache. The cache needs a stale-read check, for example by comparingupdatedAtto the cache timestamp or by storing enough pre/post state to decide whether the GET is stale, before it wins overissue.description.
Should Fix
- src/pm/linear/adapter.ts:305 / src/pm/linear/adapter.ts:341 — MNG-741 asks for idempotent inline checklist mutations after ambiguous success/retry, but the PR still uses
appendChecklistSection()andaddItemToChecklist(). Those helpers append unconditionally, so a retriedcascade-tools pm add-checklistor repeated create with the same heading and items still creates duplicate### ...sections or duplicate checkbox rows. The linked work item explicitly called for shared upsert/dedupe helpers and Linear adapter regressions for repeated create/bulk-create/add calls; those paths are not implemented here.
Verification
- Targeted
npx vitest run --project unit-core tests/unit/pm/linear/adapter.test.tspasses locally. - PR checks are currently 7/7 green.
🕵️ codex · gpt-5.5 · run details
| // in-process updates can read-modify-write over each other. | ||
| const cachedDescription = recallRecentDescription(issueId); | ||
| const baseDescription = | ||
| cachedDescription !== undefined ? cachedDescription : (issue.description ?? ''); |
There was a problem hiding this comment.
This cache wins even when the GET above returns a newer provider-visible description. Scenario: this process writes checklist A and caches it; within 60s a human or another worker updates the Linear description; the next checklist mutation receives that newer description in issue.description, but the cached value is still non-undefined, so this line mutates the old cached text and updateIssue drops the external edit. Please only let the cache override reads that are demonstrably stale, or merge with the fetched description before writing.
There was a problem hiding this comment.
Fixed. The cache is now a before/after chain (array of {before, after, timestamp} entries per issue ID) instead of a single entry.
recallRecentDescription takes the fetched description and only returns the cached latestAfter when:
- The fetched description matches a
beforein the chain → demonstrably stale (Linear read replica is behind our PUT), or - The fetched description matches a non-latest
after→ intermediate state (an earlier PUT propagated but the latest hasn't yet)
If the fetched description matches none of the above — e.g. a human or another worker updated the description after our PUT — the function returns undefined and the mutate path uses the live GET value as its base, so the external edit is preserved.
The test "cache is bypassed when GET returns a description different from the pre-PUT value (external edit)" pins this behaviour: it simulates an external actor editing the description after our createChecklist PUT, then verifies that the subsequent addChecklistItem PUT body includes the external edit text.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Review Feedback AddressedChanges MadeBlocking: Cache safety — prevents external edit data loss Replaced the single-entry The chain also handles the intermediate-state case: when a GET returns an earlier Should Fix: Idempotent checklist mutations — no duplicate sections or items on retry Added two helpers to
The Linear adapter's Verification
|
Summary
The Linear adapter's
waitForDescriptionVisibilitypoll — added in commitfad4dda1to ensure read-after-write consistency — has a 1-second deadline and throws on timeout. Linear's read replica is routinely slower than 1s under load. Result: three planning runs failed back-to-back on prod (2026-05-12) on the same gate:44c1bc3f"Agent completed but no PM write (checklist creation) was recorded"98ae70101534ab74The retry of MNG-741 on the new build (with my env-filter fix from #1355 applied) still failed identically (
1ce6ed4a, 12m 39s), and the running agent itself caught it mid-flight:cascade-tools pm add-checklistwas reached as a subprocess (env var plumbed), the gadget executed, the LinearupdateIssuePUT succeeded — but the 1-second visibility poll threw, the CLI exited non-zero, the sidecar was never written, and therequiresPMWritegate failed.Root cause
src/pm/linear/adapter.ts:waitForDescriptionVisibilityblocks for up to 1000ms polling Linear's GET every 25ms and throws if the API doesn't propagate the PUT within the deadline. The wait was advisory — it only guarded against in-process consecutiveupdateDescriptioncalls reading a stale GET between PUTs. The throw was load-bearing precisely zero of the time, but blocked the run every time Linear was slow.Fix
Remove the visibility wait entirely. Replace with a small module-level cache keyed on
issueId:updateIssuePUT →rememberRecentDescription(issueId, newDesc)(60 s TTL).updateDescriptionmutate → consult the cache; if a fresher value exists, use it as the mutation base instead of the (possibly stale) GET result.Same in-process read-after-write guarantee the wait was supposed to provide, without ever throwing on slow Linear days. The wait never solved cross-process consistency (the existing
withDescriptionMutationLockis process-local too); the cache doesn't either, but that scope was never claimed.Test plan
unit-core(334 files / all green).cascade-tools pm add-checklistcall should no longer throw on visibility timeout. The checklist should appear in Linear and therequiresPMWritegate should pass.Out of scope
updateDescriptionWithProviderRetryalready retries once onupdateIssuefailures — sufficient for the observed failure modes.Notes for reviewer
setInterval.__resetRecentDescriptionsForTestsis the test-only escape hatch — kept underscore-prefixed and clearly named so it doesn't get accidentally imported in production code.🤖 Generated with Claude Code