fix: handle Presence list call timeout without crashing channel #360

w3b6x9 · 2022-11-30T06:56:43Z

What kind of change does this PR introduce?

Bug fix

What is the current behavior?

Phoenix Presence shard can be overwhelmed if many channels of the same topic try to call Presence.list at or around the same time. This leads to shard time out errors after 5 seconds which causes the channel to crash.

What is the new behavior?

Channel will check to see if topic is currently tracking any presences by fetching from ets table directly. If there are presences then it reverts back to Presence.list call b/c only the shard state has access to down replicas.

chasers · 2022-11-30T22:09:24Z

idk do we even know for sure this is an issue? Presence has Shards for a reason. I'd rather see some instrumentation around this call before we up and bypass a bunch of shit in the lib.

Also, what if they change the underlying ETS structure and this breaks on a version update?

w3b6x9 · 2022-12-01T00:04:43Z

idk do we even know for sure this is an issue?
I'd rather see some instrumentation around this call before we up and bypass a bunch of shit in the lib.

This is 100% an issue. See logs from this month. The same developers keep reaching out about how they are having repeated Realtime issues and this has been the culprit for them.

Presence has Shards for a reason.

Right but if channels are all trying to sync/track Presence over a single topic then it's all over a single shard process. Additional shards won't help in that scenario. All channels listening to a topic will be routed to the same shard process on that node.

Also, what if they change the underlying ETS structure and this breaks on a version update?

Well the Tracker state ets table hasn't changed in 6 years but I've updated this PR to account for that so reverts back to calling Presence.list. See https://github.com/supabase/realtime/pull/360/files#diff-2a3a48674821fef237a3451af5d33ca977ee5c7d2f6bc8ca18bd7f18b44a7a26R639 and https://github.com/supabase/realtime/pull/360/files#diff-2a3a48674821fef237a3451af5d33ca977ee5c7d2f6bc8ca18bd7f18b44a7a26R241-R242.

chasers · 2022-12-01T21:36:05Z

Hmm, yeah happened when SIN had connection issues.

Did you see the dirty_list?

https://github.com/phoenixframework/phoenix_pubsub/blob/ca2b47c8cf31324b0bf96cea862058f783a3e7bd/lib/phoenix/tracker/shard.ex#L78

w3b6x9 · 2022-12-01T21:41:18Z

Did you see the dirty_list?

https://github.com/phoenixframework/phoenix_pubsub/blob/ca2b47c8cf31324b0bf96cea862058f783a3e7bd/lib/phoenix/tracker/shard.ex#L78

No but I just swapped it in: https://github.com/supabase/realtime/pull/360/files#diff-2a3a48674821fef237a3451af5d33ca977ee5c7d2f6bc8ca18bd7f18b44a7a26R637.

chasers · 2022-12-01T21:44:51Z

This is all the same stuff I did when trying to use Tracker to track rate limits across the cluster. Could never get it working.

Pass JWT errors up so we can log them

abc3 marked this pull request as ready for review November 30, 2022 17:46

w3b6x9 force-pushed the fix/channel-presence-exit branch 2 times, most recently from cd37848 to 887f1ad Compare November 30, 2022 23:58

w3b6x9 requested review from abc3 and chasers December 1, 2022 21:29

fix: handle Presence list call timeout without crashing channel

56ed84a

w3b6x9 force-pushed the fix/channel-presence-exit branch from 887f1ad to 56ed84a Compare December 1, 2022 21:40

chasers approved these changes Dec 1, 2022

View reviewed changes

w3b6x9 merged commit 02a8266 into rc-dev Dec 1, 2022

w3b6x9 deleted the fix/channel-presence-exit branch December 1, 2022 22:58

w3b6x9 added a commit that referenced this pull request Dec 1, 2022

fix: handle Presence list call timeout without crashing channel (#360)

a38a0d2

w3b6x9 added a commit that referenced this pull request Dec 5, 2022

fix: handle Presence list call timeout without crashing channel (#360)

4af9109

w3b6x9 pushed a commit that referenced this pull request Dec 22, 2022

Merge pull request #360 from supabase/better-jwt-errors

4938264

Pass JWT errors up so we can log them

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle Presence list call timeout without crashing channel #360

fix: handle Presence list call timeout without crashing channel #360

w3b6x9 commented Nov 30, 2022 •

edited

chasers commented Nov 30, 2022

w3b6x9 commented Dec 1, 2022

chasers commented Dec 1, 2022

w3b6x9 commented Dec 1, 2022

chasers commented Dec 1, 2022

fix: handle Presence list call timeout without crashing channel #360

fix: handle Presence list call timeout without crashing channel #360

Conversation

w3b6x9 commented Nov 30, 2022 • edited

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

chasers commented Nov 30, 2022

w3b6x9 commented Dec 1, 2022

chasers commented Dec 1, 2022

w3b6x9 commented Dec 1, 2022

chasers commented Dec 1, 2022

w3b6x9 commented Nov 30, 2022 •

edited