bug(discord): rollout restart causes silent Gateway session conflict — bot stops receiving events

### Description

When `kubectl rollout restart` is used to restart an OpenAB deployment, the new pod connects to Discord Gateway while the old pod is still in graceful termination (still holding its Gateway session). Discord sees two concurrent Gateway sessions for the same bot token and silently drops the new one — no error, no disconnect, just zero events delivered.

The bot logs show `discord bot connected user=AgentBroker` but never receives any message events. This is extremely hard to diagnose because there is no error or warning in the logs.

```
  ┌─ rollout restart (BROKEN) ──────────────────────────────────────────┐
  │                                                                     │
  │  time ──────────────────────────────────────────────────────►       │
  │                                                                     │
  │  Old Pod  ████████████████████░░░░░░░░░░                            │
  │           │ Gateway A (active) │ terminating │                      │
  │           │ receives events ✅  │ still open  │                      │
  │                                                                     │
  │  New Pod            ░░░░████████████████████████████████            │
  │                     │init│ Gateway B connected                      │
  │                          │ "bot connected" in logs ✅                │
  │                          │ receives events ❌ (silently dropped)     │
  │                                                                     │
  │  Discord   ──────────────┤                                          │
  │  Gateway:  2 sessions    │ same token = drop newer session          │
  │            for same      │ no error sent to client                  │
  │            bot token     │                                          │
  └─────────────────────────────────────────────────────────────────────┘

  ┌─ scale 0 → 1 (WORKS) ──────────────────────────────────────────────┐
  │                                                                     │
  │  time ──────────────────────────────────────────────────────►       │
  │                                                                     │
  │  Old Pod  ████████████░░░                                           │
  │           │ Gateway A  │ terminated                                 │
  │           │            │ session closed ✅                           │
  │                                                                     │
  │                    ← 5s gap →                                       │
  │                                                                     │
  │  New Pod                    ░░░░████████████████████████            │
  │                             │init│ Gateway B connected              │
  │                                  │ only session ✅                   │
  │                                  │ receives events ✅                │
  │                                                                     │
  │  Discord   ──────────────────────┤                                  │
  │  Gateway:  1 session at a time   │ events delivered normally        │
  └─────────────────────────────────────────────────────────────────────┘
```

**Workaround:**

```bash
kubectl scale deployment/openab-kiro --replicas=0 && sleep 5 && kubectl scale deployment/openab-kiro --replicas=1
```

### Steps to Reproduce

1. Deploy OpenAB with Discord adapter on Kubernetes
2. Run `kubectl rollout restart deployment/openab-kiro`
3. Wait for new pod to show `1/1 Running` and logs show `discord bot connected`
4. Send `@Bot hello` in the allowed Discord channel
5. Observe: no response, no log entries for the message event

### Expected Behavior

After `rollout restart`, the new pod should receive Discord Gateway events normally. Suggested fixes:

- **Option A (recommended):** Add a `preStop` hook that explicitly closes the Discord Gateway connection (send close frame / shutdown shard) before the pod terminates, so the old session is gone before the new pod connects
- **Option B:** Add a startup probe or health check that detects "connected but no events received within N seconds" and forces a Gateway reconnect
- **Option C:** Document the `scale 0 → 1` workaround in the troubleshooting guide

### Environment

- OpenAB v0.7.8-beta.5 (`ghcr.io/openabdev/openab:0.7.8-beta.5`)
- Kubernetes: OrbStack (local k3s)
- Deployment strategy: Recreate (PVC-backed)
- Discord library: serenity 0.12.x
- Observed with AgentBroker (kiro-cli agent) — AgentDealer on the same cluster with a different bot token was unaffected

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(discord): rollout restart causes silent Gateway session conflict — bot stops receiving events #455

Description

Steps to Reproduce

Expected Behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug(discord): rollout restart causes silent Gateway session conflict — bot stops receiving events #455

Description

Description

Steps to Reproduce

Expected Behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions