Conversation
Symptom on Windows: imcodes daemon refuses to start. Watchdog log fills with
'"C:\Users\admin\AppData\Roaming\npm\imcodes.cmd"' is not recognized as
an internal or external command, operable program or batch file.
repeated forever. No daemon process exists.
Three independent root causes had to be fixed together:
1. UTF-8 BOM in watchdog.cmd
`writeWatchdogCmd` wrote the file as `EF BB BF` + UTF-8 content.
cmd.exe does NOT understand BOMs in batch files — instead it concatenates
the BOM bytes with the next token, producing the unknown command
"[BOM]@echo". The error is printed, the loop sleeps 5 seconds, and
repeats forever. No daemon ever starts.
Fix: write plain UTF-8, no BOM.
2. Hard-coded user paths instead of env-var expansion
The previous file referenced `C:\Users\<name>\AppData\Roaming\npm\
imcodes.cmd` directly. When the username contains non-ASCII characters
(e.g. Chinese or Japanese), no JS-side encoding can produce bytes that
cmd.exe interprets correctly across all locales.
Fix: route every path through `%APPDATA%` and `%USERPROFILE%`. cmd.exe
resolves these via the OS native wide-character API at runtime, so the
actual codepage of the user folder name is irrelevant.
3. Missing `call` prefix on the launch line
`"%APPDATA%\npm\imcodes.cmd" start --foreground` (no `call`) hands off
control to the .cmd shim and never returns to the loop after the daemon
exits. The watchdog dies silently after the first run.
Fix: prefix with `call ` so cmd.exe returns to the loop body.
PLUS — when an OLD broken watchdog is already in a crash-loop, just fixing
the file isn't enough. cmd.exe has the bad file mapped and the orphan
loop keeps writing the same error. We need to kill it before regenerating.
Added `killAllStaleWatchdogs()` in `windows-daemon.ts` and called it from:
- `restartWindowsDaemon()` (used by `imcodes restart`)
- `regenerateAllArtifacts()` (used by `imcodes repair-watchdog`)
- The `imcodes upgrade` batch (kills by command-line pattern via wmic
BEFORE running npm install, so the crash-loop is broken before any
file write. Works without a PIDFILE because we enumerate by cmd-line
pattern, not by parent of daemon PID.)
The wmic query and taskkill commands are language-independent so this
works on en-US, zh-CN, ja-JP and any other Windows locale.
── Tests ──
Two new Windows-only end-to-end test files (skipped on non-Windows hosts,
both Windows CI jobs include them):
- test/util/windows-launch-artifacts.cmd-parse.test.ts (3 tests)
Spawns `writeWatchdogCmd` in a fresh node child process (so vitest mocks
can't pollute it), then writes the bytes to disk and runs them through
REAL cmd.exe with /c. Asserts:
- first byte is not 0xEF (no UTF-8 BOM)
- cmd.exe parses every line ('is not recognized' must NOT appear)
- %USERPROFILE% / %APPDATA% expansion is used (no hard-coded paths)
- test/util/windows-stale-watchdog-cleanup.test.ts (3 tests)
Spawns a fake `daemon-watchdog.cmd` process that loops forever, then
invokes `killAllStaleWatchdogs()` from the dist module via a child node
process. Asserts:
- the fake watchdog dies within 5 seconds
- regenerateAllArtifacts() also kills it
- unrelated cmd.exe processes are NOT killed (precise targeting)
Existing unit tests updated to assert the new structural invariants:
- watchdog content has no BOM bytes
- watchdog uses %APPDATA%\npm\imcodes.cmd
- watchdog has 'call' prefix
- upgrade batch enumerates watchdogs via wmic command-line pattern
- upgrade batch tree-kills before npm install
Verified end-to-end on Windows: 105/105 Windows tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI failure on the new windows-stale-watchdog-cleanup tests:
AssertionError: fake watchdog PID 3512 did not appear
The test's `pidAlive()` helper used `wmic process where ProcessId=N`, which
returns nothing on the GitHub Actions Windows runner image (Server 2025
deprecates wmic). The test thought the freshly-spawned cmd.exe was already
dead and bailed out.
Fixes both the test and the production cleanup code:
- test pidAlive() — switched to `tasklist /fi "PID eq N" /fo csv /nh`
which exists on every Windows since 10.
- src/util/windows-daemon.ts (killAllStaleWatchdogs) — try PowerShell first
(`Get-CimInstance Win32_Process | Where-Object CommandLine -like
'*daemon-watchdog*'`), fall back to wmic for legacy Windows.
- src/util/windows-launch-artifacts.ts (killAllStaleWatchdogsBeforeRegen)
— same PowerShell-then-wmic strategy.
- src/util/windows-upgrade-script.ts (upgrade batch) — runs the same
PowerShell query via `for /f` then falls through to the wmic version.
This makes the cleanup logic work on:
- Windows 10 / 11 (PowerShell + wmic)
- Server 2016 / 2019 / 2022 (PowerShell + wmic)
- Server 2025 / newer Windows 11 (PowerShell only — wmic gone)
Verified locally: 54/54 windows tests pass on Windows 10 with the new
PowerShell path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI failure on the windows-stale-watchdog-cleanup test (after the
PowerShell switch was correct):
AssertionError: fake watchdog PID 1676 should be killed
The kill chain (PowerShell startup + Get-CimInstance + taskkill) takes
longer than 5 seconds on the GitHub Actions Windows runner — PowerShell
cold-start alone is 1-3 seconds. The test's `waitFor(() => !pidAlive,
5000)` exhausted before the kill landed.
Bumped the kill waitFor to 15s and the per-test timeout to 60s.
Also made the driver script print before/after kill markers so any
future failure produces actionable diagnostics in the assertion message.
Verified locally: 3/3 stale-watchdog cleanup tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…test
CI failure from previous commit (after the timeout bump):
AssertionError: fake watchdog PID 8620 should be killed.
Driver stdout:
Driver stderr: [driver] before kill, fakePid=8620
[driver] after kill
The driver's `killAllStaleWatchdogs()` ran to completion but the fake
watchdog process is still alive. Either:
- PowerShell's `Get-CimInstance Win32_Process | Where-Object CommandLine -like
'*daemon-watchdog*'` returned nothing on the GitHub Actions runner image
- or it returned PIDs but taskkill couldn't act on them
To diagnose: the driver now dumps every cmd.exe process and its full
CommandLine value via `Get-CimInstance Win32_Process | Format-List` BEFORE
calling killAllStaleWatchdogs. The output lands in the test's assertion
message so the next CI failure will show exactly what PowerShell sees on
the runner image — and we can then fix the filter accordingly.
Local tests still pass (3/3).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s broken)
Found via the diagnostic dump added in the previous commit. CI's PowerShell
diagnostic on the failing test showed the fake watchdog clearly:
ProcessId : 6956
CommandLine : cmd.exe /c C:\Users\RUNNER~1\AppData\Local\Temp\imcodes-stale-1-Q2RXWS\daemon-watchdog.cmd
So the process IS visible to PowerShell. Yet `killAllStaleWatchdogs()`
didn't kill it. Root cause:
The production code passed the PowerShell command via `-Command "..."`:
powershell -NoProfile -Command "Get-CimInstance Win32_Process -Filter
"Name='cmd.exe'" | Where-Object ..."
cmd.exe→powershell parses this as:
- Outer quote opens at `"Get-CimInstance ...`
- Outer quote CLOSES at the second `"` (just before `Name=`)
- PowerShell receives only `Get-CimInstance Win32_Process -Filter`
- Everything after becomes positional cmd.exe args (errors silently)
So PowerShell ran with no filter expression, the script effectively did
`Get-CimInstance Win32_Process -Filter` and the broken syntax produced no
output. killAllStaleWatchdogs() saw zero PIDs and bailed.
Fix: write the PowerShell script to a temp .ps1 file in mkdtempSync()
and invoke `powershell -NoProfile -NonInteractive -ExecutionPolicy Bypass
-File "<path>"`. No quote escaping involved, works on every Windows
version with PowerShell installed.
Three call sites updated identically:
- src/util/windows-daemon.ts (findStaleWatchdogPids)
- src/util/windows-launch-artifacts.ts (killAllStaleWatchdogsBeforeRegen)
- src/util/windows-upgrade-script.ts (upgrade batch — writes ps1 to
%SCRIPT_DIR%\find-stale-watchdog.ps1)
Verified locally: 54/54 windows tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`imcodes r` is now equivalent to `imcodes repair-watchdog`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…an filter
Ensures bind/restart/repair-watchdog/upgrade ALL handle every edge case:
1. Stale daemon.pid pointing at a dead process
2. Crash-loop watchdog cmd.exe (the BOM bug case)
3. Orphan daemon node.exe holding the named-pipe lock
4. Multiple watchdog races
5. Missing watchdog files
New ensureDaemonRunning() in src/util/windows-daemon.ts is the SINGLE
reusable entry point. Wired into bindFlow, restart, and repair-watchdog.
Internally calls:
1. killOrphanDaemonProcesses — finds node.exe orphans by command-line
pattern, kills via taskkill -> wmic delete -> PowerShell Stop-Process.
2. killAllStaleWatchdogs — finds cmd.exe whose command line references
daemon-watchdog and tree-kills them.
3. restartWindowsDaemon — spawns a fresh hidden watchdog via VBS.
CRITICAL: tightened the orphan-daemon filter from '*imcodes*' to
'*node_modules\imcodes\dist*'. The loose filter was killing the test
runner itself because the repo working directory is C:\Users\X\imcodes-src
which contains 'imcodes'. The new filter matches only the npm-installed
daemon entry point.
All execSync calls now pass windowsHide:true so nothing flashes a console.
REAL CI tests (not mocks) added to windows-stale-watchdog-cleanup.test.ts:
- killOrphanDaemonProcesses kills node.exe at production path
- killOrphanDaemonProcesses does NOT kill unrelated node.exe
Both tests spawn REAL processes via spawn() and call the production
module from a fresh node child process. 5/5 pass locally on Windows in 25s.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.