Skip to content

Workers: cover logging gaps so each one is visible at default log level#8

Merged
sysread merged 1 commit intomainfrom
claude/cover-logging-gaps
May 10, 2026
Merged

Workers: cover logging gaps so each one is visible at default log level#8
sysread merged 1 commit intomainfrom
claude/cover-logging-gaps

Conversation

@sysread
Copy link
Copy Markdown
Owner

@sysread sysread commented May 10, 2026

=head1 SYNOPSIS

Cover the three logging gaps an audit surfaced. Each background
worker now lands at least one .info line per cycle outcome that
matters, so all eight workers are visible at the drawer's default
log level.

=head1 PURPOSE

After shipping the wiki librarian, the user observed they
"weren't seeing the librarian fire" in the Logs drawer and asked
whether all the workers had consistent logging. They didn't.
Three real gaps:

  1. attachment-expiry had no createLogger at all. The only
    log channel was the legacy post({type:'log'}) inside the
    worker entry, which only fires on setup-session failure and
    lease loss. So an attachment-expiry problem in production
    would leave no breadcrumb in the drawer.
  2. samskara logged at .debug what the other agent workers
    log at .info. With the drawer at the default level, you saw
    every other worker fire but not this one.
  3. wiki-librarian had logs on too-small / reviewed /
    error but not on acquired-lease / polling / too-soon.
    Since too-soon is the most common state for the first 12
    hours after a deploy (the cross-device claim won't return
    true until the configured interval has elapsed), the user
    couldn't tell whether the librarian was firing at all - it
    was silent rather than off.

=head1 DESCRIPTION

=head2 Layer 1: how logging looked across the 8 workers

Source-tag convention is consistent (every manager declares
<feature>-worker as loggerSource, except embed-worker which
keeps its abbreviated form intentionally). Five of the eight
workers (embed, reflection, journal, wiki, summary)
follow the same shape: .info on lifecycle headlines (picked up X /
finished X / errored on X), .debug on per-row details. The other
three diverge:

  • attachment-expiry-worker: no createLogger. Loop is silent.
  • samskara-worker: lifecycle events on .debug (was unique
    among the eight in this).
  • wiki-librarian-worker: lifecycle covered for some outcomes
    (the work-happened path) but silent on the more common
    cooldown path.

=head2 Layer 2: what this PR changes, parallel to layer 1

attachment-expiry-worker gets a createLogger and four
lifecycle lines:

  • .info on lease acquired ("lease acquired - starting expiry
    sweep").
  • .info on each non-zero expire batch ("expired N attachment
    row(s) older than D days") - now you can see the worker
    actually doing work.
  • .info on RPC failures.
  • .debug on polling and empty-queue ticks so an idle worker
    isn't noisy.

samskara-worker has four lifecycle lines bumped from .debug
to .info:

  • assimilate: claimed substrate <id> (thread <thread>)
  • assimilate: saved substrate <id>
  • pair-relate: selected pair <a> <> <b> (cosine X.XXX)
  • pair-relate: associated <a> <> <b> (<kind>: <label>)
  • compound-regen: synthesizing summary from N sample row(s) (cap M)

Detail / error / null-result lines stay on .debug so the drawer
isn't flooded with samskara's higher per-cycle volume.

wiki-librarian-worker gets logs on every cycle outcome:

  • .info on lease acquired.
  • .debug on polling (60s loop, would otherwise fire every
    minute).
  • .info on too-soon, with the configured minimum interval
    named explicitly: "not yet eligible for a librarian run (min
    interval 12h since last successful run)". ~24 lines/day during
    cooldown gives the user the "yes, the librarian is alive and
    waiting" signal that was missing.
  • .info on claim acquired (the transition from waiting to
    actually running).
  • Existing .info lines on too-small / reviewed retained.
  • Error paths that were silently .debug (claim RPC, list,
    agent threw) are bumped to .info so a real failure isn't
    silent.

Embed-worker is left as-is (the abbreviated embed- source tag
is intentional per the user's preference; existing log shape is
fine).

=head2 Layer 3: how that resolves PURPOSE

After this PR, opening the Logs drawer at the default level shows
every background worker firing on its own cadence:

  • attachment-expiry on each expire batch.
  • samskara on each phase pickup and write.
  • wiki-librarian whenever it's awake, including the explicit
    "waiting on min-interval" line during the 12h cooldown.

So the user's "I'm not seeing the librarian fire" question is now
answered structurally: if the librarian truly is firing, you'll
see the .info too-soon or reviewed line. If you don't see
either, the worker isn't actually running - the lease isn't being
acquired, or the worker is disabled in Settings, or it crashed
without reporting.

=head1 Notes for AI reviewers

  • The librarian's too-soon line lands every idle interval (1h)
    during cooldown. ~24 lines/day. This is intentional - the user
    asked for visibility specifically because they couldn't tell
    whether the worker was alive. Throttling it to "first too-soon
    after acquisition only" would be silent again on an unlocked
    laptop that runs continuously through a cooldown window.
  • Samskara's bumps are bounded to lifecycle headlines (claimed
    / selected / associated / saved). Per-row details and per-cycle
    null-returns stay on .debug. The volume is bounded by the per-
    hour samskara claim cadence, not per-message.
  • attachment-expiry's empty-queue line is .debug deliberately.
    Once the backlog drains, that branch fires hourly forever, and
    a heartbeat at .info would dominate the drawer for the long
    steady state.

https://claude.ai/code/session_015XcR7xzLdij66ZbYERUdLH


Generated by Claude Code

Three workers were silent at the default Logs-drawer level:

**attachment-expiry-worker** had no `createLogger` at all - the only
log path was the legacy `post({type:'log'})` channel inside the
worker entry, which only fires on setup-session failure and lease
loss. The cycle itself had zero logging, so an attachment-expiry
problem in production left no breadcrumb. Now the loop logs at
.info on lease acquisition, on each non-zero expire batch ("expired
N attachment row(s) older than D days"), and on errors; .debug on
polling and empty-queue ticks so an idle worker isn't noisy.

**samskara-worker** logged at .debug for events the other agent
workers log at .info ("picked up thread X" / "finished thread X"),
and used .trace for some routine state. With the drawer at the
default level, samskara was invisible - you saw every other worker
fire but not this one. Bumped four lifecycle lines to .info:

  - assimilate: claimed substrate <id> (thread <thread>)
  - assimilate: saved substrate <id>
  - pair-relate: selected pair <a> <> <b> (cosine X.XXX)
  - pair-relate: associated <a> <> <b> (<kind>: <label>)
  - compound-regen: synthesizing summary from N sample row(s)

Detail / error / null-result lines stay on .debug so the drawer
isn't flooded with samskara's higher per-cycle volume.

**wiki-librarian-worker** had logs on `too-small` / `reviewed` /
`error` but none on `acquired-lease` / `polling` / `too-soon` -
and `too-soon` is the most common state for the first 12 hours
after deploy, so the user couldn't tell whether the librarian was
firing at all. Now:

  - lease acquired: .info ("lease acquired - checking eligibility
    on the next cycle")
  - polling for lease (another device holds it): .debug, since
    polling fires every 60s while waiting
  - too-soon: .info, with the configured min-interval named
    explicitly so the user knows the next eligible run timing
    without doing the math. ~24 lines/day during cooldown - not
    noisy, and gives the "yes, alive and waiting" signal that was
    missing.
  - claim acquired - starting librarian run: .info
  - claim RPC / list / agent throws were on .debug, bumped to
    .info so a real failure isn't silent.

Embed-worker logging left as-is (the abbreviated `embed-` source
tag and existing log shape; no functional issues).

https://claude.ai/code/session_015XcR7xzLdij66ZbYERUdLH
@sysread sysread merged commit e779771 into main May 10, 2026
1 check passed
@sysread sysread deleted the claude/cover-logging-gaps branch May 10, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants