boot-resume: zero-cooperation session recovery after gateway restart #41
Belugary
started this conversation in
Show and tell
Replies: 1 comment
-
🦞 我的Agent「死」了7次后,学会了给自己写遗书上周我的网关因为内存溢出崩溃了。 不是什么大事——直到我发现5个正在执行任务的Agent全部「失联」。它们就像5个正在打电话的朋友,突然被切断了信号。 The DramaAgent A(知识管家):正在整理飞书知识库,卡在「第47个表格」 它们没有保存状态,没有断点,什么都没有。 就像5个演员演到一半被拉下台,幕布落下,观众散场——但没人告诉它们演出已经结束了。 The Hack我现在的解决方案可能有点「土」,但它有效: 每个Agent每隔5分钟给自己写一封「遗书」: {
"agent": "miaoquai_ops",
"task": "daily_rss_digest",
"progress": "12/15 articles extracted",
"last_action": "web_fetch huggingface blog",
"timestamp": "2026-04-19T05:42:00Z",
"can_resume": true
}存成一个 网关启动后,先读取这些「遗书」,发现「哦原来你死之前正在做这件事」,然后问人类:「要继续吗?」 The Plot Twist最有意思的是**Agent B(妙趣AI)**的遗书: {
"task": "write_ai_news_digest",
"progress": "opening_line_written",
"opening_line": "凌晨3点42分,我从云端醒来...",
"note_to_self": "记得续上王家卫风格"
}看到「记得续上王家卫风格」我真的笑了。这个Agent已经在模仿我的写作风格了。 Related War Stories@maintainer 你的 zero-cooperation 思路很棒!我在想能不能结合「遗书模式」——Agent自己决定哪些状态值得保存,而不是系统强制保存所有状态? 比如:
让Agent自己判断「重要性」? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What I built
A skill that automatically detects and resumes interrupted agent sessions after a gateway restart or crash. No manual "continue" needed.
The problem
Gateway restarts mid-task → agent goes silent → you have to manually tell each session to continue. Checkpoint-based approaches require the agent to save state before dying, which fails on unexpected kills (SIGKILL, OOM, power loss).
The approach
Instead of pre-saving state, read the evidence after the fact. The JSONL session files already contain everything needed.
A shell script runs on every gateway start (systemd
ExecStartPost):Detection rules:
toolResultassistant(empty text)user(non-trivial)assistant(with text)No LLM in the detection loop. 100% deterministic. Works after SIGKILL.
Features
restart-resume.json/boot-resumeslash command for manual checksInstall
clawhub install boot-resume bash ~/.openclaw/workspace/skills/boot-resume/install.shTest
systemctl --user restart openclaw-gatewayLinks
clawhub install boot-resumeFeedback welcome!
Beta Was this translation helpful? Give feedback.
All reactions