Skip to content

fix(linear): replace visibility-poll with in-process cache (MNG-741)#1357

Merged
zbigniewsobiecki merged 1 commit into
devfrom
fix/linear-description-visibility-advisory
May 12, 2026
Merged

fix(linear): replace visibility-poll with in-process cache (MNG-741)#1357
zbigniewsobiecki merged 1 commit into
devfrom
fix/linear-description-visibility-advisory

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

The Linear adapter's waitForDescriptionVisibility poll — added in commit fad4dda1 to ensure read-after-write consistency — has a 1-second deadline and throws on timeout. Linear's read replica is routinely slower than 1s under load. Result: three planning runs failed back-to-back on prod (2026-05-12) on the same gate:

Work item Run Duration Visible error
MNG-741 44c1bc3f 11m 19s "Agent completed but no PM write (checklist creation) was recorded"
MNG-736 98ae7010 12m 49s same
MNG-739 1534ab74 16m 05s same

The retry of MNG-741 on the new build (with my env-filter fix from #1355 applied) still failed identically (1ce6ed4a, 12m 39s), and the running agent itself caught it mid-flight:

"The checklist command hit the exact Linear visibility timeout this card is about. I'm checking the work item state before retrying so I don't blindly append a duplicate section."

cascade-tools pm add-checklist was reached as a subprocess (env var plumbed), the gadget executed, the Linear updateIssue PUT succeeded — but the 1-second visibility poll threw, the CLI exited non-zero, the sidecar was never written, and the requiresPMWrite gate failed.

Root cause

src/pm/linear/adapter.ts:waitForDescriptionVisibility blocks for up to 1000ms polling Linear's GET every 25ms and throws if the API doesn't propagate the PUT within the deadline. The wait was advisory — it only guarded against in-process consecutive updateDescription calls reading a stale GET between PUTs. The throw was load-bearing precisely zero of the time, but blocked the run every time Linear was slow.

Fix

Remove the visibility wait entirely. Replace with a small module-level cache keyed on issueId:

  • After each successful updateIssue PUT → rememberRecentDescription(issueId, newDesc) (60 s TTL).
  • Before each updateDescription mutate → consult the cache; if a fresher value exists, use it as the mutation base instead of the (possibly stale) GET result.

Same in-process read-after-write guarantee the wait was supposed to provide, without ever throwing on slow Linear days. The wait never solved cross-process consistency (the existing withDescriptionMutationLock is process-local too); the cache doesn't either, but that scope was never claimed.

Test plan

  • TDD-first: new test "proceeds without throwing when Linear NEVER returns the updated description (MNG-741)" — the worst case where Linear's read replica stays permanently stale. The old code would throw at 1s; the new code completes successfully via the cache. Updated the existing "waits for Linear read-after-write visibility..." test to reflect the new mechanism (cache, not poll).
  • Full unit-core (334 files / all green).
  • Typecheck + lint clean (no new warnings).
  • Pre-push hooks all green.
  • Post-deploy verification: re-trigger MNG-741 / MNG-736 / MNG-739 once this ships to prod. The planning agent's cascade-tools pm add-checklist call should no longer throw on visibility timeout. The checklist should appear in Linear and the requiresPMWrite gate should pass.

Out of scope

  • Cross-process / multi-worker race against Linear's eventual consistency. Never solved by the old visibility wait either; would require Redis-backed coordination similar to other dedup paths.
  • A formal Linear retry/backoff layer. The existing updateDescriptionWithProviderRetry already retries once on updateIssue failures — sufficient for the observed failure modes.

Notes for reviewer

  • The cache TTL is 60 s — comfortably longer than Linear's typical eventual-consistency window (sub-second to a few seconds), short enough that external updates aren't suppressed indefinitely.
  • A lazy-cleanup pass triggers when the map grows past 200 entries, so we don't need a setInterval.
  • __resetRecentDescriptionsForTests is the test-only escape hatch — kept underscore-prefixed and clearly named so it doesn't get accidentally imported in production code.

🤖 Generated with Claude Code

The visibility-poll added in commit fad4dda throws "Linear description
visibility timed out for issue X after 1000ms" whenever Linear's read
replica is slower than 1s — which is routine under load. Three planning
runs failed back-to-back on prod (2026-05-12) on this signal:

  MNG-741 run 44c1bc3f  11m  "no PM write recorded" (visibility throw
                              propagated up; CLI exited; sidecar never
                              written; gate failed)
  MNG-736 run 98ae7010  13m  same
  MNG-739 run 1534ab74  16m  same

The 1s wait was always advisory — it only guarded against in-process
consecutive `updateDescription` calls reading a stale GET between PUTs.
The new contract removes the wait entirely and adds an in-process
recent-description cache (60s TTL, keyed on issueId). After each
successful PUT, the cache stores the new description. The next
`updateDescription` call consults the cache before mutating — if GET
returned a stale pre-PUT value (Linear's eventual-consistency window),
the cached fresh value wins.

This gives the same in-process read-after-write guarantee the wait was
supposed to provide, without throwing on slow Linear days. The wait
never solved cross-process consistency (the existing
withDescriptionMutationLock is process-local too); the cache doesn't
either, but that scope was never claimed.

TDD test pins the worst case: Linear GET continues to return the stale
pre-PUT description forever; createChecklist + addChecklistItem must
still produce a correct PUT containing the appended item via the cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Requesting changes because the replacement cache can drop newer Linear descriptions, and this does not satisfy the linked MNG-741 idempotency requirements.

Code Issues

Blocking

  • src/pm/linear/adapter.ts:401 — The recent-description cache unconditionally overrides the provider read for 60s. If another worker or a human edits the description after this process's PUT, Linear can return that newer description on the next GET, but baseDescription still uses the older cache and the next updateIssue overwrites the external edit. This is a data-loss regression introduced by the cache. The cache needs a stale-read check, for example by comparing updatedAt to the cache timestamp or by storing enough pre/post state to decide whether the GET is stale, before it wins over issue.description.

Should Fix

  • src/pm/linear/adapter.ts:305 / src/pm/linear/adapter.ts:341 — MNG-741 asks for idempotent inline checklist mutations after ambiguous success/retry, but the PR still uses appendChecklistSection() and addItemToChecklist(). Those helpers append unconditionally, so a retried cascade-tools pm add-checklist or repeated create with the same heading and items still creates duplicate ### ... sections or duplicate checkbox rows. The linked work item explicitly called for shared upsert/dedupe helpers and Linear adapter regressions for repeated create/bulk-create/add calls; those paths are not implemented here.

Verification

  • Targeted npx vitest run --project unit-core tests/unit/pm/linear/adapter.test.ts passes locally.
  • PR checks are currently 7/7 green.

🕵️ codex · gpt-5.5 · run details

Comment thread src/pm/linear/adapter.ts
// in-process updates can read-modify-write over each other.
const cachedDescription = recallRecentDescription(issueId);
const baseDescription =
cachedDescription !== undefined ? cachedDescription : (issue.description ?? '');
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache wins even when the GET above returns a newer provider-visible description. Scenario: this process writes checklist A and caches it; within 60s a human or another worker updates the Linear description; the next checklist mutation receives that newer description in issue.description, but the cached value is still non-undefined, so this line mutates the old cached text and updateIssue drops the external edit. Please only let the cache override reads that are demonstrably stale, or merge with the fetched description before writing.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The cache is now a before/after chain (array of {before, after, timestamp} entries per issue ID) instead of a single entry.

recallRecentDescription takes the fetched description and only returns the cached latestAfter when:

  • The fetched description matches a before in the chain → demonstrably stale (Linear read replica is behind our PUT), or
  • The fetched description matches a non-latest after → intermediate state (an earlier PUT propagated but the latest hasn't yet)

If the fetched description matches none of the above — e.g. a human or another worker updated the description after our PUT — the function returns undefined and the mutate path uses the live GET value as its base, so the external edit is preserved.

The test "cache is bypassed when GET returns a description different from the pre-PUT value (external edit)" pins this behaviour: it simulates an external actor editing the description after our createChecklist PUT, then verifies that the subsequent addChecklistItem PUT body includes the external edit text.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 64.28571% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pm/linear/adapter.ts 64.28% 9 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@zbigniewsobiecki zbigniewsobiecki merged commit 9d04b8f into dev May 12, 2026
8 of 9 checks passed
@aaight
Copy link
Copy Markdown
Collaborator

aaight commented May 12, 2026

Review Feedback Addressed

Changes Made

Blocking: Cache safety — prevents external edit data loss

Replaced the single-entry {after, timestamp} cache with a before/after chain per issue ID. The cache now only overrides a Linear GET result when the fetched description is demonstrably stale — i.e. it matches a known pre-PUT value (before) or an intermediate state in the chain. If Linear's GET returns content that doesn't appear anywhere in the chain (external edit by a human or another worker), the cache is bypassed and the live GET value is used as the mutation base. This ensures external edits are preserved rather than overwritten.

The chain also handles the intermediate-state case: when a GET returns an earlier after from the same chain (not the latest), the cache still returns the latest after, so in-process concurrent mutations converge correctly.

Should Fix: Idempotent checklist mutations — no duplicate sections or items on retry

Added two helpers to src/pm/_shared/inline-checklist.ts:

  • upsertChecklistSection: skips appendChecklistSection if ### {checklistName} already exists in the description
  • upsertItemToChecklist: skips addItemToChecklist if an item with the same name already exists in the section

The Linear adapter's createChecklist, createChecklistWithItems, and addChecklistItem now use the upsert variants, so retried cascade-tools pm add-checklist calls (e.g. after a CLI timeout with a successful server-side PUT) cannot produce duplicate sections or duplicate checkbox rows.

Verification

  • All 3278 unit tests passing (163 test files)
  • 4 new targeted tests added (chain-based external-edit test + 3 idempotency tests)
  • 8 new helper tests added in inline-checklist.test.ts
  • Typecheck: clean
  • Lint: clean (0 errors)
  • Pre-push hooks: all green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants