Skip to content

Release v0.6.1#10

Merged
moreih29 merged 6 commits into
mainfrom
develop
Jun 6, 2026
Merged

Release v0.6.1#10
moreih29 merged 6 commits into
mainfrom
develop

Conversation

@moreih29
Copy link
Copy Markdown
Owner

@moreih29 moreih29 commented Jun 6, 2026

Release v0.6.1

0.6.0에서 도입된 lifecycle 이벤트(degraded 등)를 구버전 핸들러들이 "채널 사망"으로 오인해 발생한 회귀 일괄 수정.

Fixed

  • 로컬 워크스페이스 watch/PTY/LSP 회귀 (v0.6.0): 하트비트 1회 지연(degraded)에 로컬 채널 provider가 teardown되어 — fs.changed/git.changed push 영구 침묵(깃 상태·파일트리 자동 갱신 정지, autofetch 1분 주기만 동작), 기존 터미널 먹통, LSP 서버 전멸, 고아 에이전트 프로세스 잔류. teardown을 진짜 종료 이벤트(exit/failure)로 한정.
  • 하트비트 경계 걸침 제거: 송신 4s / 판정 5s 분리 (proto.go 단일 상수). 만성적 거짓 degraded 발화 원인 제거.

Added (내부)

  • 에이전트 채널 lifecycle 로깅 (agent-channel source, main.log) — spawn/ready/close/respawn을 PID·exit code·stderr tail과 함께 기록.
  • 재연결 시 fs/git watch 자동 replay — 에이전트가 진짜 교체돼도 감시가 복구됨.

Protocol & Remote 영향

  • Agent protocol version 변경 없음 (wire 호환).
  • 첫 SSH 부팅 시 에이전트 재업로드 발생 — Go 바이너리 변경(하트비트 송신 주기).

테스트: 신규 회귀 테스트 22개 (수정 전 코드에서 실패 확인), Go 전체 + TS 유닛 2,961개 통과.

🤖 Generated with Claude Code

moreih29 and others added 6 commits June 6, 2026 23:11
An agent child could die and be transparently respawned with zero trace —
the exact failure mode behind the v0.6.0 "push events stop while RPC keeps
working" report took two days to localize because nothing recorded when or
why a channel replaced its process.

Every transition in reconnecting-process-channel now leaves a main.log line
under source "agent-channel": spawn (phase, pid), ready (epoch), child close
(exit code, signal, wasReady, stderr tail), handshake/pipe/spawn failures,
respawn scheduling with backoff delay, epoch-mismatch daemon replacement,
fatal-failure escalation, and dispose. Channels are labeled local:<root> /
ssh:<host> via a new logLabel option since every workspace owns one.

Both the local and SSH channels route through this file, so one logging
site covers both transports. The logger is created lazily (same pattern as
pipe.ts) so test imports do not initialize electron-log.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…0 watch/PTY/LSP regression

56e411a added the degraded / degraded-recovered / ready lifecycle events,
but two consumers written before that commit still assumed "anything except
reconnecting/disposed = the channel died":

- handleLocalChannelLifecycle tore down the local workspace provider on a
  single late heartbeat: it dropped the channel reference WITHOUT disposing
  it (orphan agent process keeping every fs/git watch) and the next fs
  access lazily booted a fresh watch-less agent. Result: fs.changed /
  git.changed went permanently silent while RPC kept working — git status
  only refreshed via the 1-minute autofetch tick — and existing PTY
  sessions were stranded on the abandoned channel.
- The LSP host disposed every language server on the channel for the same
  spurious event.

The trigger was chronic: heartbeats are sent every 5s and judged at 5s, so
arrivals land at interval+epsilon and the 1-miss degraded check rides the
boundary — confirmed live (channel B ready 22:15:08.7, degraded teardown +
duplicate agent spawn at +5.1s, matching the two orphan agent processes
observed on the same workspace).

Both handlers now act only on genuine terminal events. The SSH manager
handler and PTY agent-host already handled all eight event types explicitly
(audited) and are unchanged.

Regression tests: degraded/degraded-recovered/ready must not tear down the
local provider or LSP server records; exit/failure (and held-then-expired
for LSP) still must. All eight fail on the pre-fix code.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Watch registrations (fsnotify) live in the agent process, so a respawned
agent starts with zero watches — and nothing re-issued them. Root and .git
watches were registered exactly once per workspace open (ensureRoot /
repo-info detection), so any agent replacement silently killed push events
until the workspace was reopened. Observed live: an agent with 62
expanded-dir watches (re-added by user navigation) but no root and no .git
watch — the two registrations with no natural re-issue point.

Three pieces:
- The channel now emits the `ready` lifecycle event on a successful
  no-epoch reconnect too (local agents / legacy remotes). Previously only
  the epoch-match reattach path emitted it, so a local reconnect completed
  silently. Existing consumers are safe by audit: the SSH manager
  broadcasts "connected", PTY restore is a no-op with no held sessions,
  and the local manager / LSP host ignore `ready` since the previous
  commit.
- AgentBackedProvider gains onAgentLifecycle (channel.onLifecycle
  passthrough on AgentFsProvider; the type guard now requires it).
- AgentFsWatcher replays every tracked relPath and AgentGitWatcher
  replays its gitDir on `ready`. Re-registering an existing watch is a
  no-op agent-side, so the replay is safe when the agent actually
  survived.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Heartbeats were sent every 5s and judged at 5s (degraded = 1 missed
interval), so arrivals landed at interval+epsilon and the client's
degraded check chronically rode the boundary — the heartbeat histogram
showed nearly all arrivals in the [1-2x] bucket, and a phase-aligned
check fired spurious degraded lifecycle events during normal operation
(the trigger for the v0.6.0 local-channel teardown regression).

The send cadence and the advertised judgment basis are now two constants
in proto.go shared by stdio and daemon modes: HeartbeatSendMs (4s) <
HeartbeatAdvertiseMs (5s). The ~1s wire-jitter margin removes the
boundary condition while keeping real-outage detection latency at 5s.
The TS client is unchanged — it already derives both thresholds from the
advertised value.

Verified against the built binary: ready frame advertises 5000ms, actual
inter-arrival measured at 3999-4000ms.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The AgentBackedProvider guard now requires onAgentLifecycle (watch replay,
0ff4f7c); the integration fixture predated it and failed the guard. Unit
fixtures were updated in that commit — this integration one was missed
because only tests/unit ran locally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@moreih29 moreih29 merged commit 0b2275d into main Jun 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant