Skip to content

Realtime: iOS Safari reconnect fails on second+ backgrounding — pusher-js lives: 2 kill-switch #4597

@ekumanov

Description

@ekumanov

Background

#4588 + #4590 address iOS Safari backgrounding by forcing a WebSocket reconnect via visibilitychange/pageshow. That fix handles the scenarios in #4588 (one backgrounding cycle) and has been merged for rc.2.

While verifying the fix on a staging forum, I hit a second, related failure mode that #4590 doesn't cover: after the second (and subsequent) iOS backgrounding in a row, forceReconnect no longer produces a working socket. Events stop arriving, the connection looks dead but never emits 'connected' again.

The #4590 thread suggested splitting this into its own issue; filing it here with repro + root cause.

Reproduction

On an iPhone (tested on iOS 18.x Safari) against a Flarum 2.0 forum with flarum-realtime enabled and visibilitychange/pageshow handlers from #4590 deployed:

  1. Open the forum home in Safari; confirm push events arrive (post from another device → item appears).
  2. Switch to another app for >5 s, return. Reconnect fires. Push events arrive again. ✅ cycle 1 OK.
  3. Switch away again for >5 s, return. Reconnect fires. No push events arrive. App looks alive but realtime is silently dead.
  4. Hard-reload the page → realtime works again for exactly one more cycle.

Reload count scales linearly with iOS backgroundings. The symptom is not a race condition — it is fully deterministic on cycle 2.

Root cause

pusher-js 7.6.0's default_strategy.ts wraps the WebSocket transport with a lives: 2 budget:

new BestConnectedEverStrategy([
  new CachedStrategy(ws_loop, transports, { ttl: 1800*1000, lives: 2, ... }),
  ...
])

The budget is enforced via TransportManager.reportDeath() / AssistantToTheTransportManager (see src/core/transports/transport_manager.ts and assistant_to_the_transport_manager.ts). Each time a WebSocket closes "uncleanly" (code 1006 — which is what iOS issues on silent backgrounded closures) the manager decrements livesLeft. When it hits 0, isAlive() flips to false and the strategy reports itself unsupported.

Once isSupported() returns false, every subsequent connect() transitions straight to 'failed' — the state machine no longer attempts a WebSocket connection at all. From the app's perspective the connection looks broken, and there's no recovery path short of constructing a new Pusher instance (which has its own fresh strategy tree with full livesLeft).

In a desktop environment the lives: 2 budget is fine — if WebSockets are genuinely unsupported on the network, giving up after 2 attempts and failing over is the right behaviour. On iOS, though, every backgrounding over ~30 s produces one 1006 close, and that close is not a symptom of broken transport — it's just iOS reclaiming the socket. After two cycles the kill-switch trips on a perfectly healthy network.

Evidence

pusher-js debug logs from staging, trimmed to show the state transitions. Forum runs beta.8 with the #4590 patch; logs from actual iPhone Safari session:

Cycle 1 (works):

Pusher: State changed : connecting -> connected with socket ID ...
# app-switch
Pusher: Connection closed: code=1006 reason=""
Pusher: State changed : connected -> disconnected
Pusher: State changed : disconnected -> connecting
Pusher: State changed : connecting -> connected with socket ID ...
# push events flow

Cycle 2 (breaks):

# app-switch
Pusher: Connection closed: code=1006 reason=""
Pusher: State changed : connected -> disconnected
Pusher: State changed : disconnected -> connecting
Pusher: State changed : connecting -> failed
# no further state changes; no push events ever arrive

Inspecting app.websocket.connection.strategy at this point, the cached WSTransport's TransportManager.livesLeft is 0 and isAlive() returns false.

Proposed approaches

Two options discussed in the #4590 review:

Option A — construct a new Pusher in forceReconnect

const forceReconnect = () => {
  app.websocket?.disconnect();
  app.websocket = new Pusher(key, options);  // fresh strategy tree
  // re-subscribe channels; re-bind handlers
};

Cleanest: zero private-API reach, no runtime introspection. The cost is that the existing channel subscriptions and event bindings (see extensions/realtime/js/src/forum/extend/Application.ts and DiscussionList/NewActivity.ts) have to be re-established against the new Pusher instance. RealtimeState's one-shot notifyUserChannelReady / notifyPublicChannelReady callback model assumes channels are established exactly once — that needs to either be made re-firable, or every consumer needs to re-bind explicitly.

Option B — walk pusher-js's strategy tree and neutralise the kill-switch

Runtime-patch reportDeath / isAlive / livesLeft on every TransportManager instance inside app.websocket. This is what my staging patch does. Pros: zero refactor, about 20 lines. Cons: reaches into pusher-js private API (reportDeath, isAlive, livesLeft are not public); duck-typing-based, so a pusher-js upgrade that renames or restructures these silently regresses; and disables the budget for all failure modes, not just iOS backgrounding — a legitimately broken transport (blocked by network middlebox, TLS failure) will now hammer forever.

My take

Option A is the right long-term answer. The refactor is non-trivial but contained to extensions/realtime/js/src/forum/: RealtimeState flips to a re-firable model, Application.ts's channel subscription and notification-binding block gets extracted into a function that forceReconnect can call, and DiscussionList/NewActivity.ts's oncreate/onremove bindings need to be re-established on reconnect (currently they're bound against the specific channel objects that existed at IndexPage.oncreate time).

Happy to send a PR once this has triage — would also appreciate guidance on whether the refactor should land in the same PR or split between a RealtimeState refactor and the reconnect-aware rebuild.

Workaround until this is fixed

A forum admin can drop Option B as a runtime patch in their own extension or theme (walk app.websocket, no-op reportDeath, pin livesLeft = Infinity). Not recommended as a public default for the reasons in the review of #4590, but works for production installs that need the fix today. I'm running it on my own forum pending this follow-up.

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions