Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passenger hangs on spawning (Ubuntu 18.04 Rails 5.2.2 Passenger 6.0.1) #2181

Closed
ianrandmckenzie opened this issue Feb 19, 2019 · 17 comments
Closed

Comments

@ianrandmckenzie
Copy link

Issue report

Are you sure this is a bug in Passenger?
I have scoured Google/Stack Overflow for people with the same issue and have found nothing.

Please try with the newest version of Passenger to avoid issues that have already been fixed
Using 6.0.1

Question 1: What is the problem?
Passenger hangs on process spawning, then presents the error page because it goes past the 90-second limit. I eventually changed the limit to 60 seconds to speed up my debugging. I tried disabling smart spawning, to no effect. I also tried upgrading the server to 8gig RAM and 4 CPUs, also to no effect. I should also note that the only traffic the server is getting is just me, SSH access and a single browser request at a time.

  • What is the expected behavior?
    The process spawns within a few seconds and the Rails application serves the requested page.

  • What is the actual behavior?
    Passenger times out on the process spawning.

  • How can we reproduce it? Please try to provide a sample application (or Virtual Machine) demonstrating the issue. Otherwise, if we can't reproduce it, we might have to ask you a number of followup questions or run certain commands to try and figure out the problem.
    I am using Rails 5.2.2, Ruby 2.5.1, Passenger 6.0.1, nginx 1.14.0, Ubuntu 18.04.2

There are no stack traces. Passenger seems to think its a server load problem, but again, there is nothing tying up the resources.

Question 2: Passenger version and integration mode:

  • open source 6.0.1/nginx

Question 3: OS or Linux distro, platform (including version):

  • Ubuntu 18.04.2, x86_64

Question 4: Passenger installation method:

Your answer:
[ ] RubyGems + Gemfile
[ ] RubyGems, no Gemfile
[*] Phusion APT repo
[ ] Phusion YUM repo
[ ] OS X Homebrew
[ ] source tarball
[ ] Other, please specify:

Question 5: Your app's programming language (including any version managers) and framework (including versions):

  • Ruby 2.5.1, Rails 5.2.2

Question 6: Are you using a PaaS and/or containerization? If so which one?

  • No

Question 7: Anything else about your setup that we should know?
No, not as far as I know, it's a fairly vanilla setup, it's for an extremely freshly built application.


We strive for quality and appreciate you taking the time to submit a report! Please note that if you want guaranteed response times and priority issue support we encourage you to join our enterprise customer base. They also provide us with the means to continue our high level of open source support!

@sokkalf
Copy link

sokkalf commented Feb 19, 2019

Have the same issue. Passenger just got updated on a production box, and the whole thing went down.

Ubuntu 16.04, nginx/passenger for passenger repo in my case.
Ruby 2.5.3, Rails 5.1

Downgraded to 6.0.0 for the time being, which works.

@CamJN
Copy link
Contributor

CamJN commented Feb 19, 2019

So, I'm not seeing this on my end. Can either (or both) of you try sending a SIGQUIT to the ruby process to see what it's doing while it's taking so long? That will cause ruby to print a backtrace which should be captured by the passenger log (which is written to the nginx error log by default unless you set passenger_log_file)

@sokkalf
Copy link

sokkalf commented Feb 20, 2019

Hi,

Can't get it to output anything meaningful (or anything at all, really, sending it SIGQUIT). It just terminates without logging.

If I let it time out, it will stop at the "Initialize language runtime" step after 90 seconds.

I use Linode which uses their own kernel, which is ahead of stock Ubuntu, and it seems to be related to the kernel version.

It doesn't work with kernel versions:
4.18.16
4.17.17
4.16.11

But it works fine with version
4.15.18 (and below)

Passenger 6.0.0 still works with the newer versions.

@sokkalf
Copy link

sokkalf commented Feb 20, 2019

Some more info,

passenger-status also hangs (indefinitly, it seems). According to strace, both the ruby prespawn process and passenger-status hangs at "getrandom"

Edit:
It seems to be related to RNG changes in 4.16 and above, affecting other software as well.
Ref. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897572

It would probably proceed after gaining enough entropy, if the timeout is increased sufficiently, but more than 90 seconds waiting for passenger to start would be unacceptable anyway.

@CamJN
Copy link
Contributor

CamJN commented Feb 20, 2019

Ah, that does sound like a good lead. We also wouldn't experience that in our testing as our systems are long-running and have had time to generate entropy.

However the fact that you don't have trouble with 6.0.0 is odd, there are only 2 non-trivial changes in 6.0.1, I fixed a problem with generic language app spawning (which shouldn't be related), and I made the container detection more thorough. I don't directly call getrandom, however perhaps the implementation of the virtual filesystem at /proc uses getrandom... I saw in that thread that systemd is affected, and I adapted my container detection from them, perhaps it's related.

@sokkalf since you are able to use strace, would it be too much to ask for a line number for the hanging getrandom?

@ianrandmckenzie can you provide the output of uname -a so that we can check your kernel version?

@sokkalf
Copy link

sokkalf commented Feb 20, 2019

Can't seem to get a line number for that address. Had a go with strace, ltrace and gdb, but I'm no expert in either. It does, however, seem like Passenger 6.0.0 also is affected, contrary to what I stated above.

I guess that by the time I got to reverting the version, the system had enough entropy, and voilá.
I gets weirder though.
Something simple like ruby -e 'puts "Hello, world?"' will also hang for ages on the getrandom call. Tried this with ubuntu system ruby (2.3), the "passenger_system_ruby" binary I guess is provided with passenger, and with 2.5.3 we are using for our app, provided by rbenv - with the same result.

So, I guess this really is a ruby bug, and rules out passenger as the culprit.

I'll downgrade the kernel to 4.15 for now, and look out for a patch.
I found this https://bugs.ruby-lang.org/issues/14837 on the ruby bugtracker, so it seems they're aware of it.

@CamJN
Copy link
Contributor

CamJN commented Feb 21, 2019

Ok so that covers that case.

@ianrandmckenzie have you had time to check your kernel version?

@ianrandmckenzie
Copy link
Author

Sorry about the delay @CamJN – my time for this particular project is very limited availability. Here's the uname output:

Linux server-name 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@CamJN
Copy link
Contributor

CamJN commented Feb 25, 2019

Thanks, so it looks like a different issue from sokkalf. Can you, when you have a moment, set passenger_log_level 7; and provide the log of passenger trying to spawn the app?

I understand if you don't have much time, I can wait.

@ianrandmckenzie
Copy link
Author

ianrandmckenzie commented Jul 30, 2019

Hey @CamJN I'm finally back on this project.

Edit: Here's my updated uname -a:

Linux server-name 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I got the following showing up:

The Phusion Passenger application server tried to start the web application, but this took too much time, so Passenger put a stop to that.

The stdout/stderr output of the subprocess so far is:

[ D 2019-07-30 23:14:16.9690 1778/T1 age/Spa/SpawnEnvSetupperMain.cpp:743 ]: shellName = 'sh' detected as supporting '-l': true
+ [ $  ]
+ [  ]
+ id -u
+ [ 1000 -eq 0 ]
+ PS1=$ 
+ [ -d /etc/profile.d ]
+ [ -r /etc/profile.d/01-locale-fix.sh ]
+ . /etc/profile.d/01-locale-fix.sh
+ /usr/bin/locale-check C.UTF-8
+ eval
+ [ -r /etc/profile.d/Z97-byobu.sh ]
+ . /etc/profile.d/Z97-byobu.sh
+ [ -r /usr/bin/byobu-launch ]
+ [  = 0 ]
+ [  = 1 ]
+ [ -e /etc/byobu/autolaunch ]
+ [  = byobu ]
+ [  = byobu-screen ]
+ [  = byobu-tmux ]
+ [ -r /etc/profile.d/Z99-cloud-locale-test.sh ]
+ . /etc/profile.d/Z99-cloud-locale-test.sh
+ [ -f /home/devops/.cloud-locale-test.skip -o -f /var/lib/cloud/instance/locale-check.skip ]
+ unset locale_warn
+ [ -r /etc/profile.d/Z99-cloudinit-warnings.sh ]
+ . /etc/profile.d/Z99-cloudinit-warnings.sh
+ cloud_init_warnings
+ command -v local
+ local _local=local
+ local warning= idir=/var/lib/cloud/instance n=0
+ local warndir=/var/lib/cloud/instance/warnings
+ local ufile=/home/devops/.cloud-warnings.skip sfile=/var/lib/cloud/instance/warnings/.skip
+ [ -d /var/lib/cloud/instance/warnings ]
+ return 0
+ unset cloud_init_warnings
+ [ -r /etc/profile.d/apps-bin-path.sh ]
+ . /etc/profile.d/apps-bin-path.sh
+ snap_bin_path=/snap/bin
+ [ -n /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin -a -n /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ]
+ export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
+ [ -z  ]
+ export XDG_DATA_DIRS=/usr/local/share:/usr/share
+ snap_xdg_path=/var/lib/snapd/desktop
+ [ -n /usr/local/share:/usr/share -a -n /usr/local/share:/usr/share ]
+ export XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
+ [ -r /etc/profile.d/bash_completion.sh ]
+ . /etc/profile.d/bash_completion.sh
+ [ -n  -a -n $  -a -z  ]
+ [ -r /etc/profile.d/cedilla-portuguese.sh ]
+ . /etc/profile.d/cedilla-portuguese.sh
+ [  = pt -a  != pt ]
+ [ -r /etc/profile.d/postgres.sh ]
+ . /etc/profile.d/postgres.sh
+ export PGDATA=/var/lib/postgresql/10/main
+ export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/usr/lib/postgresql/10/bin
+ [ -r /etc/profile.d/rbenv.sh ]
+ . /etc/profile.d/rbenv.sh
+ export PATH=/home/devops/.rbenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/usr/lib/postgresql/10/bin
+ export RBENV_ROOT=/home/devops/.rbenv
+ rbenv init -
+ eval export PATH="/home/devops/.rbenv/shims:${PATH}"
export RBENV_SHELL=sh
command rbenv rehash 2>/dev/null
rbenv() {
  local command
  command="$1"
  if [ "$#" -gt 0 ]; then
    shift
  fi

  case "$command" in
  rehash|shell|update|use)
    eval "$(rbenv "sh-$command" "$@")";;
  *)
    command rbenv "$command" "$@";;
  esac
}
+ export PATH=/home/devops/.rbenv/shims:/home/devops/.rbenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/usr/lib/postgresql/10/bin
+ export RBENV_SHELL=sh
+ command rbenv rehash
+ [ -r /etc/profile.d/sysstat.sh ]
+ . /etc/profile.d/sysstat.sh
+ [ -z  ]
+ S_COLORS=auto
+ export S_COLORS
+ unset i
+ exec /bin/bash

@CamJN
Copy link
Contributor

CamJN commented Jul 31, 2019

Interesting, it looks like the bash config is taking more than the startup time limit on its own. You might want to find out what's so slow there before we continue.

@ianrandmckenzie
Copy link
Author

Thank for the reply, @CamJN

When you say bash config, are you talking about files like .bash_profile?

@CamJN
Copy link
Contributor

CamJN commented Jul 31, 2019

Yes though I said bash and should have said sh. The shell likely starts with /etc/profile and then you can see in the output the scripts that get called after that (ie the contents of /etc/profile.d get sourced).

@ianrandmckenzie
Copy link
Author

@CamJN I simply removed the off-the-shelf .bashrc files from my root and home folders and everything works fine now.

I don't know if these are default .bashrc files with Digital Ocean or Ubuntu, but getting rid of them fixed the issue. Would you like me to copy/paste the culprit file?

This stuff definitely seems above my paygrade, but I'd certainly like to figure out why this problem happened. Shell scripts are not something I'm terribly experienced with.

@ianrandmckenzie
Copy link
Author

ianrandmckenzie commented Jul 31, 2019

Another question: How did you know it was a problem with the bash config based on looking at the logs? The logfiles are hard for me to understand. Is it because the log ended on exec /bin/bash?

Either way, thank you for helping me figure this out.

@CamJN
Copy link
Contributor

CamJN commented Jul 31, 2019

The combination of Passenger saying that app startup takes too long, and the output from the app not getting past the shell script stage are what told me it was the shell config.

When Passenger starts your app it actually launches a shell first and then starts the app, this is meant to simplify user's lives as they often set environment variables in their shells, and expect them to be present in their apps. By turning the log level to 7, the shell is instructed to print all of the scripts that it runs as it runs them (lines starting with + in the log you provided), and we can see that the actual ruby app is never started. Therefore we can conclude that the script(s) took too long.

As for debugging why a script was too slow, that's outside the scope of this project, but you can probably find help on stack overflow or a similar site.

@ianrandmckenzie
Copy link
Author

Sounds good @CamJN , thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants