Workers: cover logging gaps so each one is visible at default log level#8
Merged
Workers: cover logging gaps so each one is visible at default log level#8
Conversation
Three workers were silent at the default Logs-drawer level:
**attachment-expiry-worker** had no `createLogger` at all - the only
log path was the legacy `post({type:'log'})` channel inside the
worker entry, which only fires on setup-session failure and lease
loss. The cycle itself had zero logging, so an attachment-expiry
problem in production left no breadcrumb. Now the loop logs at
.info on lease acquisition, on each non-zero expire batch ("expired
N attachment row(s) older than D days"), and on errors; .debug on
polling and empty-queue ticks so an idle worker isn't noisy.
**samskara-worker** logged at .debug for events the other agent
workers log at .info ("picked up thread X" / "finished thread X"),
and used .trace for some routine state. With the drawer at the
default level, samskara was invisible - you saw every other worker
fire but not this one. Bumped four lifecycle lines to .info:
- assimilate: claimed substrate <id> (thread <thread>)
- assimilate: saved substrate <id>
- pair-relate: selected pair <a> <> <b> (cosine X.XXX)
- pair-relate: associated <a> <> <b> (<kind>: <label>)
- compound-regen: synthesizing summary from N sample row(s)
Detail / error / null-result lines stay on .debug so the drawer
isn't flooded with samskara's higher per-cycle volume.
**wiki-librarian-worker** had logs on `too-small` / `reviewed` /
`error` but none on `acquired-lease` / `polling` / `too-soon` -
and `too-soon` is the most common state for the first 12 hours
after deploy, so the user couldn't tell whether the librarian was
firing at all. Now:
- lease acquired: .info ("lease acquired - checking eligibility
on the next cycle")
- polling for lease (another device holds it): .debug, since
polling fires every 60s while waiting
- too-soon: .info, with the configured min-interval named
explicitly so the user knows the next eligible run timing
without doing the math. ~24 lines/day during cooldown - not
noisy, and gives the "yes, alive and waiting" signal that was
missing.
- claim acquired - starting librarian run: .info
- claim RPC / list / agent throws were on .debug, bumped to
.info so a real failure isn't silent.
Embed-worker logging left as-is (the abbreviated `embed-` source
tag and existing log shape; no functional issues).
https://claude.ai/code/session_015XcR7xzLdij66ZbYERUdLH
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
=head1 SYNOPSIS
Cover the three logging gaps an audit surfaced. Each background
worker now lands at least one .info line per cycle outcome that
matters, so all eight workers are visible at the drawer's default
log level.
=head1 PURPOSE
After shipping the wiki librarian, the user observed they
"weren't seeing the librarian fire" in the Logs drawer and asked
whether all the workers had consistent logging. They didn't.
Three real gaps:
createLoggerat all. The onlylog channel was the legacy
post({type:'log'})inside theworker entry, which only fires on setup-session failure and
lease loss. So an attachment-expiry problem in production
would leave no breadcrumb in the drawer.
log at .info. With the drawer at the default level, you saw
every other worker fire but not this one.
too-small/reviewed/errorbut not onacquired-lease/polling/too-soon.Since
too-soonis the most common state for the first 12hours after a deploy (the cross-device claim won't return
true until the configured interval has elapsed), the user
couldn't tell whether the librarian was firing at all - it
was silent rather than off.
=head1 DESCRIPTION
=head2 Layer 1: how logging looked across the 8 workers
Source-tag convention is consistent (every manager declares
<feature>-workerasloggerSource, exceptembed-workerwhichkeeps its abbreviated form intentionally). Five of the eight
workers (
embed,reflection,journal,wiki,summary)follow the same shape: .info on lifecycle headlines (picked up X /
finished X / errored on X), .debug on per-row details. The other
three diverge:
attachment-expiry-worker: nocreateLogger. Loop is silent.samskara-worker: lifecycle events on .debug (was uniqueamong the eight in this).
wiki-librarian-worker: lifecycle covered for some outcomes(the work-happened path) but silent on the more common
cooldown path.
=head2 Layer 2: what this PR changes, parallel to layer 1
attachment-expiry-worker gets a
createLoggerand fourlifecycle lines:
.infoon lease acquired ("lease acquired - starting expirysweep").
.infoon each non-zero expire batch ("expired N attachmentrow(s) older than D days") - now you can see the worker
actually doing work.
.infoon RPC failures..debugon polling and empty-queue ticks so an idle workerisn't noisy.
samskara-worker has four lifecycle lines bumped from .debug
to .info:
assimilate: claimed substrate <id> (thread <thread>)assimilate: saved substrate <id>pair-relate: selected pair <a> <> <b> (cosine X.XXX)pair-relate: associated <a> <> <b> (<kind>: <label>)compound-regen: synthesizing summary from N sample row(s) (cap M)Detail / error / null-result lines stay on .debug so the drawer
isn't flooded with samskara's higher per-cycle volume.
wiki-librarian-worker gets logs on every cycle outcome:
.infoon lease acquired..debugon polling (60s loop, would otherwise fire everyminute).
.infoon too-soon, with the configured minimum intervalnamed explicitly: "not yet eligible for a librarian run (min
interval 12h since last successful run)". ~24 lines/day during
cooldown gives the user the "yes, the librarian is alive and
waiting" signal that was missing.
.infoon claim acquired (the transition from waiting toactually running).
.infolines ontoo-small/reviewedretained..debug(claim RPC, list,agent threw) are bumped to
.infoso a real failure isn'tsilent.
Embed-worker is left as-is (the abbreviated
embed-source tagis intentional per the user's preference; existing log shape is
fine).
=head2 Layer 3: how that resolves PURPOSE
After this PR, opening the Logs drawer at the default level shows
every background worker firing on its own cadence:
"waiting on min-interval" line during the 12h cooldown.
So the user's "I'm not seeing the librarian fire" question is now
answered structurally: if the librarian truly is firing, you'll
see the .info
too-soonorreviewedline. If you don't seeeither, the worker isn't actually running - the lease isn't being
acquired, or the worker is disabled in Settings, or it crashed
without reporting.
=head1 Notes for AI reviewers
too-soonline lands every idle interval (1h)during cooldown. ~24 lines/day. This is intentional - the user
asked for visibility specifically because they couldn't tell
whether the worker was alive. Throttling it to "first too-soon
after acquisition only" would be silent again on an unlocked
laptop that runs continuously through a cooldown window.
/ selected / associated / saved). Per-row details and per-cycle
null-returns stay on .debug. The volume is bounded by the per-
hour samskara claim cadence, not per-message.
Once the backlog drains, that branch fires hourly forever, and
a heartbeat at .info would dominate the drawer for the long
steady state.
https://claude.ai/code/session_015XcR7xzLdij66ZbYERUdLH
Generated by Claude Code