Fix agent recovery races and add demo flow#475
Fix agent recovery races and add demo flow#475justinmoon wants to merge 1 commit intosledtools:masterfrom
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughAcross Rust backend, iOS, and Android platforms, these changes introduce Changes
Sequence DiagramsequenceDiagram
participant Client as iOS/Android Client
participant Core as AppCore
participant Allowlist as Agent Allowlist<br/>Service
participant KeyPkg as Key Package<br/>Publisher
participant Server as pika-server<br/>agent_api
participant VM as Agent VM
Client->>Core: ensure_agent(npub)
Core->>Core: validate_local_keys()
Core->>Allowlist: refresh_agent_allowlist()<br/>(token: 1)
Allowlist->>Server: check_allowlist(npub)
Server->>Allowlist: allowlisted=true
Allowlist->>Core: InternalEvent::AgentAllowlistResolved<br/>{token:1, allowlisted:true}
Core->>Core: update_agent_button_state(allowed)
Client->>Core: create_direct_chat(peer_npub)
Core->>KeyPkg: publish_key_package()<br/>(token: 1)
Note over KeyPkg: Store pending_direct_chat_creation<br/>if not yet published
KeyPkg->>Server: publish_key_package()
Server->>KeyPkg: success
KeyPkg->>Core: InternalEvent::KeyPackagePublished<br/>{token:1, ok:true}
Core->>Core: continue_pending_chat_creation()
Core->>Server: ensure_agent()
Server->>VM: provision_agent_for_owner()
VM->>Server: AgentInstance created
Server->>Core: AgentInstance
Core->>Client: update_app_state(agent_button)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Superseded by #476 so CI can run from an in-repo branch. |
| AgentAppState::Error => { | ||
| recover_my_agent(&client, &keys, &base_url).await?; | ||
| if attempt < AGENT_POLL_MAX_ATTEMPTS { | ||
| tokio::time::sleep(AGENT_POLL_DELAY).await; | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 Unbounded recover calls in Error state can spawn up to 45 orphaned VMs
In run_agent_flow, the AgentAppState::Error branch calls recover_my_agent on every poll iteration without any rate-limiting guard. Unlike the Creating state, which uses recovered_stalled_creating to limit recovery to one attempt, the Error state will call recover on every 2-second cycle. If the server-side recovery provisions a new VM that immediately fails (e.g., spawner network error), the new agent enters Error state, and the next poll triggers another recovery — creating yet another VM. Over 45 iterations this can produce up to 45 orphaned VMs.
Comparison with the guarded Creating handler
The Creating handler at rust/src/core/agent.rs:289-296 correctly guards with recovered_stalled_creating, ensuring recovery is attempted at most once. The Error handler at line 298 has no equivalent guard.
On the server side (crates/pika-server/src/agent_api.rs:436-448), each recover call to an errored agent provisions a brand new VM via provision_agent_for_owner. If the spawner is down, each call creates a new DB row that transitions to Error, and the cycle repeats.
Prompt for agents
In rust/src/core/agent.rs, function run_agent_flow, around lines 283-314: Add a guard (similar to recovered_stalled_creating) to prevent repeated recover_my_agent calls in the AgentAppState::Error branch. For example, add a `recovered_error` boolean flag that is set to true after the first recover attempt in the Error state, and skip subsequent recover calls. Alternatively, reuse the existing recovered_stalled_creating flag or introduce a single recovered flag for both Error and stalled-Creating states. The key change is: in the Error match arm (line 298), only call recover_my_agent if a guard flag is false, then set the flag to true after the call.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
just agent-demoflow for live reset/recover/chat verificationValidation
just agent-demoSummary by CodeRabbit
Release Notes
New Features
Bug Fixes