Skip to content

fix: bound debounce() recursion in autoupdate flow#1337

Open
BadassBison wants to merge 1 commit into
oclif:mainfrom
BadassBison:bug2-debounce-update
Open

fix: bound debounce() recursion in autoupdate flow#1337
BadassBison wants to merge 1 commit into
oclif:mainfrom
BadassBison:bug2-debounce-update

Conversation

@BadassBison
Copy link
Copy Markdown
Contributor

@BadassBison BadassBison commented May 28, 2026

Summary

Closes #1277

The debounce() function in src/update.ts:353-372 was a recursive sleep loop with no termination bound. While the CLI user is active, the recursion never exits — each spawned autoupdate child stays pinned as a full node process and writes one log line per minute, forever. This is the root cause of the unbounded autoupdate.log growth in #1277.

Bug

Before:

// src/update.ts (before)
const debounce = async (cacheDir: string): Promise<void> => {
  let output = false
  const lastrunfile = join(cacheDir, 'lastrun')
  const m = await mtime(lastrunfile)
  m.setHours(m.getHours() + 1)
  if (m > new Date()) {
    const msg = `waiting until ${m.toISOString()} to update`
    if (output) {
      debug(msg)
    } else {
      ux.stdout(msg)
      output = true
    }

    await wait(60 * 1000)
    return debounce(cacheDir)
  }
  ux.stdout('time to update')
}

The recursion exits only when lastrun mtime is more than 1 hour stale. But every CLI invocation touches lastrun before debounce checks it (src/hooks/init.ts:62), so while the user is active, lastrun + 1hr keeps shifting forward and the recursion never terminates.

(Note: the let output = false declaration on the first line is actually unused for cross-iteration deduplication — it's local to each recursive call, so it resets to false on every recursion. That's a separate small bug; the new implementation actually achieves the deduplication it was trying to.)

After:

// src/update.ts (after)
const MAX_DEBOUNCE_WAIT_MS = 6 * 60 * 60 * 1000 // 6 hours
const DEBOUNCE_POLL_INTERVAL_MS = 60 * 1000

const debounce = async (cacheDir: string): Promise<void> => {
  const lastrunfile = join(cacheDir, 'lastrun')
  const startedAt = Date.now()
  let announced = false

  while (true) {
    const m = await mtime(lastrunfile)
    m.setHours(m.getHours() + 1)
    if (m <= new Date()) { ux.stdout('time to update'); return }
    if (Date.now() - startedAt >= MAX_DEBOUNCE_WAIT_MS) {
      ux.stdout('autoupdate: debounce wait exceeded; abandoning this run')
      return
    }
    const msg = `waiting until ${m.toISOString()} to update`
    if (announced) debug(msg)
    else { ux.stdout(msg); announced = true }
    await wait(DEBOUNCE_POLL_INTERVAL_MS)
  }
}

Three changes from the original:

  1. Bounded by wall-clock time. After MAX_DEBOUNCE_WAIT_MS (6 hours), the run is abandoned. The next CLI invocation will fire a fresh autoupdate when it's again needed.
  2. Iterative instead of recursive. Same logical behavior, but doesn't grow a call stack while sleeping and is easier to reason about.
  3. announced is moved to an outer scope so the if (announced) debug(msg) else { ux.stdout(msg); announced = true } dedupe actually fires — previously each recursive call had its own output = false, so every iteration printed to stdout.

Direct evidence the mechanism fires in the wild

  • This issue (autoupdate.log grows indefinitely #1277) — autoupdate.log grew to 11 GB on @martin-helmich's machine.
  • My own machine: ~/Library/Caches/<cli>/autoupdate.log is 7.1 MB of repeated waiting until 2026-05-26T00:32:00.650Z to update lines, ending in SIGTERM messages where pinned children were eventually reaped externally.

How to verify

Hard to demonstrate quickly (the leak accumulates over weeks of normal use), but on any long-lived CLI install:

wc -l ~/Library/Caches/<cli>/autoupdate.log    # multi-thousand lines, identical content
ps auxww | grep '<cli> update --autoupdate' | grep -v grep

You should see at least one pinned child process. With this PR, after at most 6 hours, that child exits on its own and the next CLI invocation can spawn a fresh autoupdate without competing with a stuck predecessor.

Test plan

  • All 9 existing tests pass (yarn test).
  • yarn lint clean.
  • Add a unit test for the wall-clock cap (will follow if the approach is accepted).

Open questions for reviewers

  • Is 6 hours the right MAX_DEBOUNCE_WAIT_MS? Long enough to wait through typical work sessions; short enough that a stuck child doesn't pin memory for days. Tunable.
  • Should this PR also rotate or truncate existing autoupdate.log files? The bounded debounce() alone vastly slows the growth (at most ~360 lines per autoupdate run instead of unbounded), but doesn't address existing multi-GB files. I'd be inclined to leave log rotation as a separate follow-up — happy to take that on too if desired.

Companion bug

A related-but-independent bug in src/hooks/init.ts (race window between autoupdateNeeded() and writeFile) can spawn multiple pinned children per debounce-window event, amplifying this leak. Filed and fixed separately as #1335 (issue) and #1336 (PR).

The previous debounce() in update.ts was a recursive sleep loop:

    const debounce = async (cacheDir) => {
      const m = await mtime(lastrunfile)
      m.setHours(m.getHours() + 1)
      if (m > new Date()) {
        await wait(60 * 1000)
        return debounce(cacheDir)   // recurse
      }
      ux.stdout('time to update')
    }

The recursion exits only when "lastrun" mtime is more than 1 hour
stale. But every CLI invocation touches "lastrun" before debounce()
checks it (hooks/init.ts:62), so while the user is actively running
CLI commands, "lastrun + 1hr" keeps shifting forward and the
recursion never terminates. Each spawned autoupdate child stays
pinned as a full node process for the entire active session, writing
"waiting until ..." to autoupdate.log every minute.

This is the root cause of oclif#1277 (the autoupdate.log growth symptom —
reported as 11 GB on another user's Mac). On my own machine the same
log accumulated 7.1 MB before the pinned children were SIGTERMed
externally.

Resolution: convert the recursion to a bounded iterative loop capped
at 6 hours of wall-clock wait. If the user is still active after
that, abandon this autoupdate run; the next CLI invocation will fire
a fresh one when it is again needed. Also dedupes the "waiting
until ..." stdout to a single line per child (the rest go to debug
output), which slows log growth further even within a single run.

All 9 existing tests continue to pass.
@salesforce-cla
Copy link
Copy Markdown

Thanks for the contribution! Before we can merge this, we need @BadassBison to sign the Salesforce Inc. Contributor License Agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

autoupdate.log grows indefinitely

1 participant