Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort long running websockets on shutdown with nginx standalone does not behave as expected #2211

Closed
EerdeBruining opened this issue Jun 25, 2019 · 8 comments

Comments

@EerdeBruining
Copy link

EerdeBruining commented Jun 25, 2019

Issue report

Question 1: What is the problem?

  • What is the expected behavior?

On shutdown of passenger in combination with nginx standalone with open websockets, these websocket connections should be closed. So that nginx shuts down.

  • What is the actual behavior?

Nginx does not shut down when websocket connections are still open. The default configuration (https://www.phusionpassenger.com/docs/references/config_reference/nginx/#passenger_abort_websockets_on_process_shutdown) says it should abort the websocket connections. This does not happen.

Since nginx gets a graceful kill command Nginx will wait until web sockets are finished.
https://github.com/phusion/passenger/blob/master/src/ruby_supportlib/phusion_passenger/standalone/start_command/nginx_engine.rb#L164

  • How can we reproduce it?

Using Rails 5.2 with actioncable on passenger standalone. Open a websocket connection and stop passenger. Keep track of the open processes.

Be as detailed as possible in your descriptions, include any logs and stack traces (don't just cut/paste the error, provide some logging before that too).

When we stop the websocket client ourselves (close the browser window)

==> passenger.stdout.log <==
[2019-06-20 17:43:03 +0200] Stopping web server...

## The process waits here until the last websocket is closed from the *client* side
## After almost a minute, the last client disconnects and the process shutdown continues cleanly.

==> passenger.log <==
2019/06/20 17:43:57 [alert] 97973#0: *49 open socket #3 left in connection 3
2019/06/20 17:43:57 [alert] 97973#0: *50 open socket #6 left in connection 6
2019/06/20 17:43:57 [alert] 97973#0: aborting

==> staging.log <==
I, [2019-06-20T17:43:57.861087 #98467]  INFO -- : Finished "/cable/" [WebSocket] for *ip* at 2019-06-20 17:43:57 +0200

==> passenger.log <==

[ N 2019-06-20 17:43:57.8645 97964/T5 age/Cor/CoreMain.cpp:636 ]: Signal received. Gracefully shutting down... (send signal 2 more time(s) to force shutdown)
[ N 2019-06-20 17:43:57.8646 97964/T1 age/Cor/CoreMain.cpp:1262 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected...
[ N 2019-06-20 17:43:57.8647 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98076, application ***/current (staging)
[ N 2019-06-20 17:43:57.8649 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98120, application ***/current (staging)
[ N 2019-06-20 17:43:57.8649 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98163, application ***/current (staging)
[ N 2019-06-20 17:43:57.8650 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98206, application ***/current (staging)
[ N 2019-06-20 17:43:57.8650 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98249, application ***/current (staging)
[ N 2019-06-20 17:43:57.8651 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98467, application ***/current#unlimited_concurrency
[ N 2019-06-20 17:43:57.8651 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98516, application ***/current#unlimited_concurrency
[ N 2019-06-20 17:43:57.8652 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98559, application ***/current#unlimited_concurrency
[ N 2019-06-20 17:43:57.8653 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98612, application ***/current#unlimited_concurrency
[ N 2019-06-20 17:43:57.8654 97964/Ta Ser/Server.h:887 ]: [ServerThr.2] Freed 0 spare client objects
[ N 2019-06-20 17:43:57.8654 97964/T5 Ser/Server.h:887 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2019-06-20 17:43:57.8654 97964/Ta Ser/Server.h:543 ]: [ServerThr.2] Shutdown finished
[ N 2019-06-20 17:43:57.8655 97964/T5 Ser/Server.h:543 ]: [ServerThr.1] Shutdown finished
[ N 2019-06-20 17:43:57.8655 97964/Tc Ser/Server.h:887 ]: [ServerThr.3] Freed 0 spare client objects
[ N 2019-06-20 17:43:57.8655 97964/Tc Ser/Server.h:543 ]: [ServerThr.3] Shutdown finished
[ N 2019-06-20 17:43:57.8655 97964/Te Ser/Server.h:887 ]: [ServerThr.4] Freed 0 spare client objects
[ N 2019-06-20 17:43:57.8656 97964/Te Ser/Server.h:543 ]: [ServerThr.4] Shutdown finished
[ N 2019-06-20 17:43:57.8656 97964/Tf Ser/Server.h:887 ]: [ApiServer] Freed 0 spare client objects
[ N 2019-06-20 17:43:57.8657 97964/Tf Ser/Server.h:543 ]: [ApiServer] Shutdown finished
[ N 2019-06-20 17:43:57.8658 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98467, application ***/current#unlimited_concurrency
[ N 2019-06-20 17:43:57.8659 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98516, application ***/current#unlimited_concurrency
[ N 2019-06-20 17:43:57.8659 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98559, application ***/current#unlimited_concurrency
[ N 2019-06-20 17:43:57.8660 97964/T1 age/Cor/CoreMain.cpp:1163 ]: Checking whether to disconnect long-running connections for process 98612, application ***/current#unlimited_concurrency

==> passenger.stdout.log <==
 done
 
 ==> passenger.log <==
[ N 2019-06-20 17:43:59.3542 97964/T1 age/Cor/CoreMain.cpp:1341 ]: Passenger core shutdown finished
[ W 2019-06-20 17:43:59.3728 99010/T1 age/Wat/WatchdogMain.cpp:421 ]: Cannot open cleanup PID file temp_dir_toucher.pid (/tmp/passenger-standalone.115iijp/temp_dir_toucher.pid)

(if you are requesting a feature instead of reporting an issue, describe here what you have in mind and how it would help you)

Your answer:
We think there are at least a few options:

  1. Change: https://github.com/phusion/passenger/blob/master/src/ruby_supportlib/phusion_passenger/standalone/start_command/nginx_engine.rb#L164 to either term or remove it like it was by default before: https://github.com/phusion/passenger/commit/80b64fd9d46a60f44995cfc24a5627fbc80aed07#diff-7525566659db4eaa36a7f6b8ac369595
  2. Add worker_shutdown_timeout to nginx.conf
  3. Implement in the stop command in the nginx_module the shutdown of web sockets using abort_websockets_on_process_shutdown. Possible in:
/**
 * Called when Nginx exits. Not called when Nginx is restarted.
 */
static void
exit_master(ngx_cycle_t *cycle) {
    shutdown_watchdog();
}

https://github.com/phusion/passenger/blob/master/src/nginx_module/ngx_http_passenger_module.c#L584

Question 2: Passenger version and integration mode:

nginx/1.14.0 Phusion_Passenger/5.3.6 Standalone

Question 3: OS or Linux distro, platform (including version):

Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64

Question 4: Passenger installation method:

Your answer:
[ ] RubyGems + Gemfile
[x] RubyGems, no Gemfile
[ ] Phusion APT repo
[ ] Phusion YUM repo
[ ] OS X Homebrew
[ ] source tarball
[ ] Other, please specify:

Question 5: Your app's programming language (including any version managers) and framework (including versions):

Ruby 2.5.5, Rails 5.2.3

Question 6: Are you using a PaaS and/or containerization? If so which one?
Your answer:

No

@EerdeBruining
Copy link
Author

We are an enterprise customer by the way.

@CamJN
Copy link
Member

CamJN commented Jun 25, 2019

Have you tried this config option? The default is different in standalone than nginx: https://www.phusionpassenger.com/docs/references/config_reference/standalone/#--no-abort-websockets-on-process-shutdown-abort_websockets_on_process_shutdown

@EerdeBruining
Copy link
Author

Setting this you mean:

--no-abort-websockets-on-process-shutdown

We use a startup script with passenger command, so I think the default is what we want.

@CamJN
Copy link
Member

CamJN commented Jun 25, 2019

Hmm yeah, I can reproduce this. Thanks for the report. I'll get on this.

@EerdeBruining
Copy link
Author

Could you update the status of this issue please?

@CamJN
Copy link
Member

CamJN commented Dec 23, 2019

Sorry, there's not much to update currently, I'm still working on it.

@jasperroel
Copy link

Could you update the status of this issue please?

@FooBarWidget
Copy link
Member

@CamJN I can reproduce it too. The problem is as follows:

  1. Passenger Standalone, when commanded to stop, sends SIGQUIT to Nginx (telling Nginx to gracefully shutdown), then waits until Nginx is actually shut down.
  2. When Nginx receives the graceful shutdown signal, it waits until all WebSocket connections are gone, before it proceeds with commanding the Passenger Watchdog (and agents lower down the stack) to shutdown too.
  3. abort_websockets_on_process_shutdown has no effect. Aborting is done by the Core Controller during its shutdown. But because of the behavior described in point 2, the Core Controller never begins to shut down.

W.r.t. solutions, we have a few alternatives:

  1. Make Passenger Standalone, when commanded to stop, to send a SIGINT to Nginx, telling Nginx to non-gracefully shut down. This is a big hammer and I'd rather not use it.
  2. Set worker_shutdown_timeout in Passenger Standalone's Nginx config template. But only if abort_websockets_on_process_shutdown is true (so don't set this if it's false). I've verified that setting this option works, though with a delay. I think a reasonable value is 10s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants