-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle Presence list call timeout without crashing channel #360
Conversation
idk do we even know for sure this is an issue? Presence has Shards for a reason. I'd rather see some instrumentation around this call before we up and bypass a bunch of shit in the lib. Also, what if they change the underlying ETS structure and this breaks on a version update? |
cd37848
to
887f1ad
Compare
This is 100% an issue. See logs from this month. The same developers keep reaching out about how they are having repeated Realtime issues and this has been the culprit for them.
Right but if channels are all trying to sync/track Presence over a single topic then it's all over a single shard process. Additional shards won't help in that scenario. All channels listening to a topic will be routed to the same shard process on that node.
Well the Tracker state ets table hasn't changed in 6 years but I've updated this PR to account for that so reverts back to calling |
Hmm, yeah happened when SIN had connection issues. Did you see the |
887f1ad
to
56ed84a
Compare
No but I just swapped it in: https://github.com/supabase/realtime/pull/360/files#diff-2a3a48674821fef237a3451af5d33ca977ee5c7d2f6bc8ca18bd7f18b44a7a26R637. |
This is all the same stuff I did when trying to use Tracker to track rate limits across the cluster. Could never get it working. |
Pass JWT errors up so we can log them
What kind of change does this PR introduce?
Bug fix
What is the current behavior?
Phoenix Presence shard can be overwhelmed if many channels of the same topic try to call
Presence.list
at or around the same time. This leads to shard time out errors after 5 seconds which causes the channel to crash.What is the new behavior?
Channel will check to see if topic is currently tracking any presences by fetching from
ets
table directly. If there are presences then it reverts back toPresence.list
call b/c only the shard state has access to down replicas.