Realtime: iOS Safari reconnect fails on second+ backgrounding — pusher-js `lives: 2` kill-switch

## Background

#4588 + #4590 address iOS Safari backgrounding by forcing a WebSocket reconnect via `visibilitychange`/`pageshow`. That fix handles the scenarios in #4588 (one backgrounding cycle) and has been merged for rc.2.

While verifying the fix on a staging forum, I hit a second, related failure mode that #4590 doesn't cover: **after the second (and subsequent) iOS backgrounding in a row, `forceReconnect` no longer produces a working socket.** Events stop arriving, the connection looks dead but never emits `'connected'` again.

The #4590 thread [suggested splitting this into its own issue](https://github.com/flarum/framework/pull/4590#pullrequestreview-2596523333); filing it here with repro + root cause.

## Reproduction

On an iPhone (tested on iOS 18.x Safari) against a Flarum 2.0 forum with `flarum-realtime` enabled and `visibilitychange`/`pageshow` handlers from #4590 deployed:

1. Open the forum home in Safari; confirm push events arrive (post from another device → item appears).
2. Switch to another app for >5 s, return. Reconnect fires. Push events arrive again. ✅ cycle 1 OK.
3. Switch away again for >5 s, return. Reconnect fires. **No push events arrive.** App looks alive but realtime is silently dead.
4. Hard-reload the page → realtime works again for exactly one more cycle.

Reload count scales linearly with iOS backgroundings. The symptom is not a race condition — it is fully deterministic on cycle 2.

## Root cause

pusher-js 7.6.0's `default_strategy.ts` wraps the WebSocket transport with a `lives: 2` budget:

```
new BestConnectedEverStrategy([
  new CachedStrategy(ws_loop, transports, { ttl: 1800*1000, lives: 2, ... }),
  ...
])
```

The budget is enforced via `TransportManager.reportDeath()` / `AssistantToTheTransportManager` (see `src/core/transports/transport_manager.ts` and `assistant_to_the_transport_manager.ts`). Each time a WebSocket closes "uncleanly" (code 1006 — which is what iOS issues on silent backgrounded closures) the manager decrements `livesLeft`. When it hits 0, `isAlive()` flips to false and the strategy reports itself unsupported.

Once `isSupported()` returns false, every subsequent `connect()` transitions **straight to `'failed'`** — the state machine no longer attempts a WebSocket connection at all. From the app's perspective the connection looks broken, and there's no recovery path short of constructing a new `Pusher` instance (which has its own fresh strategy tree with full `livesLeft`).

In a desktop environment the `lives: 2` budget is fine — if WebSockets are genuinely unsupported on the network, giving up after 2 attempts and failing over is the right behaviour. On iOS, though, **every backgrounding over ~30 s produces one 1006 close**, and that close is not a symptom of broken transport — it's just iOS reclaiming the socket. After two cycles the kill-switch trips on a perfectly healthy network.

## Evidence

pusher-js debug logs from staging, trimmed to show the state transitions. Forum runs beta.8 with the #4590 patch; logs from actual iPhone Safari session:

**Cycle 1 (works):**
```
Pusher: State changed : connecting -> connected with socket ID ...
# app-switch
Pusher: Connection closed: code=1006 reason=""
Pusher: State changed : connected -> disconnected
Pusher: State changed : disconnected -> connecting
Pusher: State changed : connecting -> connected with socket ID ...
# push events flow
```

**Cycle 2 (breaks):**
```
# app-switch
Pusher: Connection closed: code=1006 reason=""
Pusher: State changed : connected -> disconnected
Pusher: State changed : disconnected -> connecting
Pusher: State changed : connecting -> failed
# no further state changes; no push events ever arrive
```

Inspecting `app.websocket.connection.strategy` at this point, the cached WSTransport's `TransportManager.livesLeft` is `0` and `isAlive()` returns false.

## Proposed approaches

Two options discussed in the #4590 review:

### Option A — construct a new Pusher in `forceReconnect`

```ts
const forceReconnect = () => {
  app.websocket?.disconnect();
  app.websocket = new Pusher(key, options);  // fresh strategy tree
  // re-subscribe channels; re-bind handlers
};
```

Cleanest: zero private-API reach, no runtime introspection. The cost is that the existing channel subscriptions and event bindings (see `extensions/realtime/js/src/forum/extend/Application.ts` and `DiscussionList/NewActivity.ts`) have to be re-established against the new `Pusher` instance. `RealtimeState`'s one-shot `notifyUserChannelReady` / `notifyPublicChannelReady` callback model assumes channels are established exactly once — that needs to either be made re-firable, or every consumer needs to re-bind explicitly.

### Option B — walk pusher-js's strategy tree and neutralise the kill-switch

Runtime-patch `reportDeath` / `isAlive` / `livesLeft` on every `TransportManager` instance inside `app.websocket`. This is what my staging patch does. Pros: zero refactor, about 20 lines. Cons: reaches into pusher-js private API (`reportDeath`, `isAlive`, `livesLeft` are not public); duck-typing-based, so a pusher-js upgrade that renames or restructures these silently regresses; and disables the budget for *all* failure modes, not just iOS backgrounding — a legitimately broken transport (blocked by network middlebox, TLS failure) will now hammer forever.

### My take

Option A is the right long-term answer. The refactor is non-trivial but contained to `extensions/realtime/js/src/forum/`: `RealtimeState` flips to a re-firable model, `Application.ts`'s channel subscription and notification-binding block gets extracted into a function that `forceReconnect` can call, and `DiscussionList/NewActivity.ts`'s `oncreate`/`onremove` bindings need to be re-established on reconnect (currently they're bound against the specific channel objects that existed at `IndexPage.oncreate` time).

Happy to send a PR once this has triage — would also appreciate guidance on whether the refactor should land in the same PR or split between a `RealtimeState` refactor and the reconnect-aware rebuild.

## Workaround until this is fixed

A forum admin can drop Option B as a runtime patch in their own extension or theme (walk `app.websocket`, no-op `reportDeath`, pin `livesLeft = Infinity`). Not recommended as a public default for the reasons in the review of #4590, but works for production installs that need the fix today. I'm running it on my own forum pending this follow-up.

## Environment

- Flarum `v2.0.0-beta.8` + #4590 patch
- pusher-js `7.6.0` (current `flarum/realtime` dep)
- iPhone Safari, iOS 18.x
- Both `private-user=…` and `public` channel subscribers affected.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Realtime: iOS Safari reconnect fails on second+ backgrounding — pusher-js `lives: 2` kill-switch #4597

Background

Reproduction

Root cause

Evidence

Proposed approaches

Option A — construct a new Pusher in `forceReconnect`

Option B — walk pusher-js's strategy tree and neutralise the kill-switch

My take

Workaround until this is fixed

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Realtime: iOS Safari reconnect fails on second+ backgrounding — pusher-js lives: 2 kill-switch #4597

Description

Background

Reproduction

Root cause

Evidence

Proposed approaches

Option A — construct a new Pusher in forceReconnect

Option B — walk pusher-js's strategy tree and neutralise the kill-switch

My take

Workaround until this is fixed

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Realtime: iOS Safari reconnect fails on second+ backgrounding — pusher-js `lives: 2` kill-switch #4597

Option A — construct a new Pusher in `forceReconnect`