ActionCable: Sporadically not receiving confirmation for (re)subscription to same channel #38668

stevschmid · 2020-03-06T16:44:37Z

I've been playing around with ActionCable and ran into a strange phenomenon. From time to time, the js triggers depending on action cable would suddenly stop. In my setup, I create a subscription ( consumer.subscriptions.create) in a Stimulus controller during Controller::connect() and unsubscribe (subscription.unsubscribe) in Controller::disconnect(). One of the websocket messages reloads the page (via Turbolinks). Turbolinks replaces parts of the DOM, thus recreating the Stimulus controller, which in turn leads to the resubscription to the same channel.

After a long debugging session, I finally noticed that sometimes the subscription would not get confirmed. Turns out that resubscribing (i.e. unsubscribe quickly followed by subscribe) to the same channel sometimes results in the server not confirming the subscription. Screenshots from Chrome WS Messages in Network tab.

Normal behavior in (confirmation received):

Unexpected behavior (no confirmation received):

I digged deeper into the ActionCable source, added some debug messages and it seems that in those "subscription losing" cases the two messages get handled by different threads at the same time, and the subscribe message gets handled before the unsubscribe message. This is problematic since

rails/actioncable/lib/action_cable/connection/subscriptions.rb

Line 31 in b2eb1d1

return if subscriptions.key?(id_key)

is preventing the creation of a new subscription, so the subscribe message is practically ignored. Meanwhile, a tiny bit later, the unsubscribe message will remove the only existing subscription.

So this race condition leads to an unexpected "loss of subscription". And it seems that even the ~40ms delay between unsubscribe and subscribe (see previous screenshot) can cause such a loss in unfortunate circumstances. I encountered this on Rails 5.2.4 with the aforementioned Stimulus+Turbolinks setup, and recreated this issue in a fresh new Rails app with 6.0.2.1, using this piece of code to automatically resubscribe until connected() would not get called anymore (i.e. in case of lack of confirmation). Setting the DELAY_RESUBSCRIBE in the Gist example to 0 causes the problem to appear nearly instantly for me. But even with a value of 10, such a loss occurs after a couple of minutes.

Culprit seems to be this return, which according to blame has its origin in another race condition addressing PR #26547. Removing this line solves the issue for me, since subscribe simply creates a second subscription for the same channel, whereas the subsequent unsubscribe pops on of those, leaving me with one subscription. So in that case the order of the handling does not matter. I didn't have time yet to dig into PR #26547 and find out why that return is necessary.

Steps to reproduce

unsubscribe and quickly (re)subscribe to the same ActionCable channel. See Gist

Expected behavior

Resubscription is always confirmed

Actual behavior

Resubscription is sporadically not getting confirmed

System configuration

Rails version: Tested 5.2.4 and 6.0.2.1

Ruby version: ruby 2.6.1p33

The text was updated successfully, but these errors were encountered:

rails-bot · 2020-06-05T16:26:34Z

This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 6-0-stable branch or on master, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

xuanchien · 2021-01-27T09:35:40Z

I got the same issue in one project today (also with Stimulus). I had to work around this by delaying the "create subscription" process a bit using setTimeout (500ms). I am not sure if there is any other better way to do it.

jonsgreen · 2021-02-05T21:36:11Z

I am finding this same problem in a new rails 6.1.1 app which is including turbo via hotwire. @stevschmid I am curious if you ever got a satisfying response to this issue?

stevschmid · 2021-02-08T08:40:08Z

@jonsgreen No response, unfortunately. In the end we decided to omit websocket/actioncable for our fairly small use case.

nicastelo · 2021-04-13T14:57:10Z

Seeing the same on 5.2.4

diegodelvalle · 2021-04-13T15:00:26Z

We're having the same problem on rails 5.2.4.4, we're not using hotwire though.

sergey-koba-mobidev · 2021-04-22T15:25:47Z

same problem Hotwire rails 6.1

sergey-koba-mobidev · 2021-04-26T07:56:35Z

Here is the dirty hack solution for Turbo-rails if someone faces this problem with Hotwire
hotwired/turbo-rails#170

glennfu · 2021-05-28T19:39:48Z

I can confirm this still happens on Rails 6.0.3. I hope this can get reopened.

zzak · 2021-05-28T23:40:42Z

@glennfu Could you give #41581 a try and see if it fixes your issue?

glennfu · 2021-05-29T20:26:23Z

I'll definitely give this a try on Monday. On the one hand I like that the solution makes the client more resilient to any connection issue, but I still feel like waiting for half a second and trying again is just a slow bandaid. My real world use-case is experiencing exactly the symptoms this Issue describes, but the behavior is basically just a reload of the page. It sends a request to the server, and the response is a Turbolinks redirect to the same url. I feel like there's got to be a way to more properly reload a page without Actioncable tripping over itself. Like can I synchronously tell Actioncable to tear down, and then do my request, to ensure that the unsubscribe gets properly handled before the page gets reloaded with minimal delay?

zzak · 2021-05-30T00:57:46Z

@glennfu It might help if you could share your use-case in that PR with code examples. 🙏

glennfu · 2021-06-01T18:37:22Z

I have an app up for you: glennfu/cable_test@ebb2e23

Here's the approach:

Load page - connection triggers a job, observe that the job sends the socket a message
Press button to reload page - 5% of the time, subscription will not be re-created and Job will not be enqueued

The PR definitely fixes it. I tested it by grabbing the action_cable.js file from the PR and dropping it directly into my node_modules folder

I'm unsatisfied with this as a best solution because it's just reacting to the problem and retrying rather than preventing the problem. My 3rd commit on that repo you can see an alternate solution: At the time of the redirect, I send the controller an event to force an unsubscribe in an attempt to synchronously unsubscribe before the Turbolinks.visit() to avoid any chance of a race condition. I feel like there should be a better way say "hey ActionCable, it's time to tear down everything, we're about to leave the page" but I can't find the appropriate way to do that. I tried reading through the source but I can't figure out what's needed here. If that can be done, it would absolutely be my preferred solution to my current problem.

Despite me wanting a server-side solution to prevent the problem, this PR is still very worthwhile as there can be other ways a subscription is missed, and retrying and recovering is great. Either a network glitch, or fast tab switching (I've confirmed both Chrome and Safari can break subscriptions if you switch tabs as a page is loading)

Also in the original Issue here it was mentioned "Culprit seems to be this return" - I also tested removing that, however I often end up with 2 subscriptions instead of 0, so while it's definitely related to the problem here, removing it isn't a solution.

Lastly (I know I have a lot here, sorry), you might notice in my Channel I'm calling stop_all_streams. If you comment this out, the message sends grow out of control and never clean up. I've never seen this suggested in any ActionCable guide which makes me think I'm doing something wrong. Am I? Or is this a related bug?

zzak · 2021-06-02T10:43:02Z

@glennfu Thank you for this additional context, I need some time to sort things out but please feel free to ping me if you haven't heard back. I will also be keeping a close eye on #41581 if anything changes I will let you know 🙇

glennfu · 2021-06-08T14:06:11Z

Just wanted to follow up here and say I've applied the PR to my production environment where I was logging hundreds of failures to connect per day and now they're pretty much all connecting now. Big win.

rails-bot bot added the stale label Jun 5, 2020

rails-bot bot closed this as completed Jun 12, 2020

leastbad mentioned this issue Feb 6, 2021

stream_from stimulusreflex/cable_ready#104

Merged

jonsgreen mentioned this issue Feb 6, 2021

Snapshot caching issue after upgrading from turbolinks to turbo hotwired/turbo#117

Closed

spinosa mentioned this issue Mar 1, 2021

ActionCable client ensures subscribe command is confirmed. #41581

Merged

dhh assigned jeremy Aug 25, 2021

sj26 mentioned this issue Mar 10, 2022

ActionCable: Repeated subscription attempts #44652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ActionCable: Sporadically not receiving confirmation for (re)subscription to same channel #38668

ActionCable: Sporadically not receiving confirmation for (re)subscription to same channel #38668

stevschmid commented Mar 6, 2020 •

edited

rails-bot bot commented Jun 5, 2020

xuanchien commented Jan 27, 2021

jonsgreen commented Feb 5, 2021 •

edited

stevschmid commented Feb 8, 2021

nicastelo commented Apr 13, 2021

diegodelvalle commented Apr 13, 2021

sergey-koba-mobidev commented Apr 22, 2021

sergey-koba-mobidev commented Apr 26, 2021

glennfu commented May 28, 2021

zzak commented May 28, 2021

glennfu commented May 29, 2021

zzak commented May 30, 2021

glennfu commented Jun 1, 2021 •

edited

zzak commented Jun 2, 2021

glennfu commented Jun 8, 2021

ActionCable: Sporadically not receiving confirmation for (re)subscription to same channel #38668

ActionCable: Sporadically not receiving confirmation for (re)subscription to same channel #38668

Comments

stevschmid commented Mar 6, 2020 • edited

Steps to reproduce

Expected behavior

Actual behavior

System configuration

rails-bot bot commented Jun 5, 2020

xuanchien commented Jan 27, 2021

jonsgreen commented Feb 5, 2021 • edited

stevschmid commented Feb 8, 2021

nicastelo commented Apr 13, 2021

diegodelvalle commented Apr 13, 2021

sergey-koba-mobidev commented Apr 22, 2021

sergey-koba-mobidev commented Apr 26, 2021

glennfu commented May 28, 2021

zzak commented May 28, 2021

glennfu commented May 29, 2021

zzak commented May 30, 2021

glennfu commented Jun 1, 2021 • edited

zzak commented Jun 2, 2021

glennfu commented Jun 8, 2021

stevschmid commented Mar 6, 2020 •

edited

jonsgreen commented Feb 5, 2021 •

edited

glennfu commented Jun 1, 2021 •

edited