fix(whatsapp): downgrade recovered watchdog disconnects#77026
fix(whatsapp): downgrade recovered watchdog disconnects#77026mcaxtr merged 1 commit intoopenclaw:mainfrom
Conversation
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. Source inspection gives a high-confidence reproduction path: current main emits a watchdog status 499 close with Real behavior proof Next step before merge Security Review detailsBest possible solution: Land this plugin-side fix after maintainer review and green required checks, keeping watchdog-specific recovery handling in the WhatsApp plugin and preserving generic disconnect behavior for real failures. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection gives a high-confidence reproduction path: current main emits a watchdog status 499 close with Is this the best way to solve the issue? Yes. The PR keys off the exact internal watchdog marker, downgrades only that recovery log, and clears only watchdog recovery history after a healthy reconnect while preserving ordinary retry, logged-out, conflict, and exhausted-retry behavior. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against d54bab4b887f. Re-review progress:
|
03830f4 to
648dd28
Compare
648dd28 to
19aab62
Compare
19aab62 to
fa403de
Compare
fa403de to
e5bf4b2
Compare
|
Merged via squash.
Thanks @rubencu! |
Summary
Describe the problem and fix in 2–5 bullets:
If this PR fixes a plugin beta-release blocker, title it
fix(<plugin-id>): beta blocker - <summary>and link the matchingBeta blocker: <plugin-name> - <summary>issue labeledbeta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.499to recover, but the reconnect path surfaced that watchdog recovery as a runtime error and left recent-reconnect status behind after the next healthy connect.WHATSAPP_WATCHDOG_TIMEOUT_ERRORmarker, the retry log is warning-style for that watchdog recovery path, and watchdog recovery status is cleared only after the socket becomes healthy again.CHANGELOG.mdis intentionally untouched for this contributor PR.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof (required for external PRs)
External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count. Be mindful of private information like IP addresses, API keys, phone numbers, non-public endpoints, or other private details when providing evidence.
fa403de6755e446f60544e5216152381fad1b4bd, real configured WhatsApp account with phone/account details redacted.healthpluschannels.statusover gateway RPC.monitorWebChannelgateway monitor from this checkout against the same real default WhatsApp auth state with shortened internal watchdog timing only (messageTimeoutMs=3000,watchdogCheckMs=250,transportTimeoutMs=60000) so the watchdog path can be observed without waiting for the production-length window.Gateway RPC health/status on the same branch also reported the real account linked/connected/healthy after the patch:
{ "health": { "ok": true, "channels": { "whatsapp": { "running": true, "configured": true, "healthState": "healthy", "reconnectAttempts": 0, "lastError": null } }, "eventLoop": { "degraded": false, "reasons": [] } }, "channelsStatus": { "whatsappAccounts": [ { "accountId": "default", "enabled": true, "configured": true, "linked": true, "running": true, "connected": true, "healthState": "healthy", "statusState": "linked", "reconnectAttempts": 0, "lastDisconnect": null, "lastError": null } ] } }status 499reconnect logs throughruntime.logas watchdog recovery from a stale transport, does not callruntime.error, emits the normal public reconnecting status without anexpectedfield, and clearslastDisconnect/reconnectAttemptsafter the next healthy connection.WhatsApp watchdog timeout (app-silent) - restarting connectionfollowed by user-facingWhatsApp Web connection closed (status 499)reconnect errors for the watchdog's own close reason.Root Cause (if applicable)
For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write
N/A. If the cause is unclear, writeUnknown.lastDisconnectandreconnectAttempts, so watchdog recoveries need to be cleared once the socket is healthy.Regression Test Plan (if applicable)
For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write
N/A.extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.tsextensions/whatsapp/src/auto-reply/monitor-state.test.tsextensions/whatsapp/src/status-issues.test.tsstatus 499reconnect logs as a watchdog recovery warning, never callsruntime.error, emits only the normal reconnecting public status shape, and clearslastDisconnect/reconnectAttemptsafter the next healthy connection; ordinary recent disconnects still report status issues.User-visible / Behavior Changes
Watchdog recovery from stale WhatsApp Web transport now shows as a warning-style reconnect log instead of a runtime error. Real disconnects, logged-out states, session conflicts, and exhausted retry attempts still surface as errors.
Diagram (if applicable)
Security Impact (required)
Yes/No): NoYes/No): NoYes/No): NoYes/No): NoYes/No): NoYes, explain risk + mitigation: N/ARepro + Verification
Environment
auth noneSteps
healthandchannels.statusthrough gateway RPC.Expected
runtime.error.Actual
health:ok=true, WhatsApprunning=true,configured=true,healthState=healthy,reconnectAttempts=0,lastError=null.channels.status: WhatsApp accountlinked=true,running=true,connected=true,healthState=healthy,statusState=linked,reconnectAttempts=0,lastDisconnect=null,lastError=null.ok=true,watchdogLog=true,runtime499Error=false,reconnectSnapshot=true,healthyAfterReconnect=true.pnpm check:changedpassed.Evidence
Attach at least one:
Validation run:
Human Verification (required)
What you personally verified (not just CI), and how:
origin/mainat the time of the rebase; noCHANGELOG.mddiff remains; live foreground gateway starts from this branch; WhatsApp is linked/connected/healthy through gateway RPC; live watchdog reconnect path is forced against a real linked WhatsApp auth state; watchdog recovery branch has e2e assertions for log level, status cleanup, and no leakedexpectedfield in the public status payload; changed gate passes.runtime.error; terminal logged-out/conflict branches are untouched; ordinary recent disconnect status issue coverage still passes viastatus-issues.test.ts.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes/No): YesYes/No): NoYes/No): NoRisks and Mitigations