Observed behavior
LoopState.save() (in src/util/loop-state.ts) writes the state file with a plain writeFile call:
async save(): Promise<void> {
await mkdir(dirname(this.#path), { recursive: true });
await writeFile(
this.#path,
`${JSON.stringify(
{
completed: this.#completed,
failed: this.#failed,
inProgress: this.#inProgress,
},
null,
2,
)}\n`,
);
}
fs.writeFile opens the file with the w flag (truncate to zero), then writes the contents. The combined operation is not atomic on any common filesystem: if the process is killed (SIGKILL, OOM, power loss) between the truncate and the final byte, the on-disk file is left empty or partially written.
On the next run, LoopState.create() only handles ENOENT:
} catch (error) {
if (
error instanceof Error &&
'code' in error &&
error.code === 'ENOENT'
) {
return new LoopState(path);
}
throw error;
}
A truncated or partial JSON file produces a SyntaxError from JSON.parse, which is re-thrown. The loop refuses to start. There is no recovery path other than manually deleting (or hand-editing) the state file, at which point all completed and failed history is gone.
Expected behavior
The class docstring states:
Persisted state for a running or interrupted loop. Saved before and after every prompt execution so that any interruption loses at most one item's work.
For that guarantee to hold, an interruption during save() must leave the previous good state on disk. With the current implementation, an interruption during save() can leave a corrupted file and lose every previously-completed item.
Minimal reproduction
import { LoopState } from 'loop-the-loop/util/loop-state';
import { writeFile } from 'node:fs/promises';
const path = '/tmp/loop-state.json';
// Simulate a crash mid-write by leaving the file truncated.
const state = await LoopState.create(path);
await state.begin('a');
await state.end('a', { status: 'success', output: 'ok' });
await state.begin('b');
await state.end('b', { status: 'success', output: 'ok' });
// Crash mid-save: file ends up partially written.
await writeFile(path, '{\n \"completed\": [\n \"a\"');
// Next run: cannot recover.
await LoopState.create(path); // throws SyntaxError, all progress lost.
A long-running job that processes thousands of items and is killed at the wrong instant loses the entire run, not just one item.
Suggested fix
Write atomically: serialise the JSON to a sibling temp file, fsync it, then rename over the target. POSIX rename(2) is atomic with respect to readers, so a crash either leaves the previous file intact or replaces it cleanly. Roughly:
const tmp = `${this.#path}.${process.pid}.tmp`;
const handle = await open(tmp, 'w');
try {
await handle.writeFile(`${JSON.stringify(...)}\n`);
await handle.sync();
} finally {
await handle.close();
}
await rename(tmp, this.#path);
Optionally also handle the SyntaxError path in LoopState.create() more gracefully (e.g. fall back to a .bak copy taken before each save) so users can recover from any file the OS leaves in a half-written state.
Observed behavior
LoopState.save()(insrc/util/loop-state.ts) writes the state file with a plainwriteFilecall:fs.writeFileopens the file with thewflag (truncate to zero), then writes the contents. The combined operation is not atomic on any common filesystem: if the process is killed (SIGKILL, OOM, power loss) between the truncate and the final byte, the on-disk file is left empty or partially written.On the next run,
LoopState.create()only handlesENOENT:A truncated or partial JSON file produces a
SyntaxErrorfromJSON.parse, which is re-thrown. The loop refuses to start. There is no recovery path other than manually deleting (or hand-editing) the state file, at which point allcompletedandfailedhistory is gone.Expected behavior
The class docstring states:
For that guarantee to hold, an interruption during
save()must leave the previous good state on disk. With the current implementation, an interruption duringsave()can leave a corrupted file and lose every previously-completed item.Minimal reproduction
A long-running job that processes thousands of items and is killed at the wrong instant loses the entire run, not just one item.
Suggested fix
Write atomically: serialise the JSON to a sibling temp file,
fsyncit, thenrenameover the target. POSIXrename(2)is atomic with respect to readers, so a crash either leaves the previous file intact or replaces it cleanly. Roughly:Optionally also handle the
SyntaxErrorpath inLoopState.create()more gracefully (e.g. fall back to a.bakcopy taken before each save) so users can recover from any file the OS leaves in a half-written state.