App takes longer to boot on Ubuntu 18 #1969

collimarco · 2019-09-15T11:26:07Z

Steps to reproduce

Send a simple request to puma and everything works properly
Enable cluster mode (e.g. by passing -w 2)
Request hangs forever and no response is received (after some time you need to terminate the connection from client)

This is the only information that you see in the puma logs:

=== puma startup: 2019-09-15 11:01:53 +0000 ===
[892] ! Terminating timed out worker: 1145
[892] ! Terminating timed out worker: 1146
[892] ! Terminating timed out worker: 1147
[892] ! Terminating timed out worker: 1148
[892] ! Terminating timed out worker: 1397
[892] ! Terminating timed out worker: 1399
...

Expected behavior

Puma should work fine in cluster mode.

Actual behavior

Request hang forever and no response is received. Logs don't give enough information.

System configuration

Ruby version: 2.6.4
Rails version: 5.2.3
Puma version: 4.1.1
OS/version: Ubuntu 18.04 LTS
Server: DigitalOcean CPU-Optimized 4vCPU

The text was updated successfully, but these errors were encountered:

collimarco · 2019-09-15T11:49:04Z

I could partially solve the issue by setting this:

worker_timeout 300

The workers start responding properly after about 2 minutes. However that really seems a strange startup time.

We run this Rails app in production since 2015 and we were using DigitalOcean Standard 2vCPU. We have always used clustered mode and we had never increased the worker_timeout: everything was working fine. Now we use more powerful servers with dedicated hyperthreads and more CPUs (i.e. DigitalOcean CPU-Optimized 4vCPU) and the startup is extremely slow (causing that Terminating timed out worker if we don't increase worker_timeout).

collimarco · 2019-09-15T12:06:55Z

I have made more tests: after a restart Puma takes ~90 seconds before responding to the first request. During that time there is no CPU usage (i.e. top says 99.5 idle) and there is a lot of free RAM (i.e. top says 8GB free). So why the worker startup is so low?

nateberkopec · 2019-09-15T13:21:17Z

So, there's not really a bug in Puma, it's just that your app is suddenly taking a long time to start up?

Does it take so long to start in another application server, like Unicorn?

nateberkopec · 2019-09-15T13:22:12Z

Also does the application boot quickly in single mode? That is, after it boots and you immediately send it a request, how long does it take for that request to return?

collimarco · 2019-09-15T13:47:02Z

The app is currently (today) running on the old servers (Ubuntu 16.04) without the need of worker_timeout: everything works fine and the startup is fast (in any case < 60s).
The exact same app, with the same configuration, also runs on some new servers (which have the newer Ubuntu 18.04 and much more powerful hardware): on these new servers the startup is slow and the timeout occurs if you don't increase it.

Also does the application boot quickly in single mode?

In single mode takes about the same time as in cluster mode (the only difference is that it returns Connection refused in the meantime, instead of keeping the connection).

collimarco · 2019-09-15T14:17:20Z

I have tried with ruby 2.5 and the problem persists on the new server.
On the old server the startup (with same number of threads and workers) takes only 10s.
So it's really weird:

DigitalOcean 4vCPU dedicated ($80): 90s
DigitalOcean 2vCPU shared ($20): 10s

The problem is the same on all the new servers. Apart from hardware, the only difference is Ubuntu (on old server was 16.04LTS, on the new server is 18.04LTS).

collimarco · 2019-09-15T14:24:21Z

Another note: the startup time doesn't change if I increase or decrease the number of workers.

nateberkopec · 2019-09-15T14:27:38Z

🤷‍♂ Not really any evidence of a Puma issue here, though. Could be any one of a million things in your app thats different w/the new boxes. The new OS version especially - there's literally thousands of different new packages.

If you try Unicorn and it boots in 10s, please let us know.

The SIGINFO/log_thread_status changes on master may help you debug this eventually once we actually get that working on linux (#1964), until then you're stuck using gdb to figure out where it might be hanging.

collimarco · 2019-09-15T14:55:35Z

Yes, it might not be strictly related to Puma and it is also not related to our app (it's working fine on the previous servers).

Now I have also realized that another similar issue happens on the new database server (Ubuntu 18.04): rake db:migrate takes about ~90 seconds (same time as Puma boot) to run, even when there are no migrations to run (on the previous server took only a few seconds).

In general the performance of the new servers (tested with bechmarks, like redis benchmark) are very high. So this must be a bug specific to Rails / Ruby / Ubuntu 18. If you have any other suggestion on how to debug this would be appreciated.

nateberkopec · 2019-09-15T15:01:28Z

rake db:migrate takes about ~90 seconds

That task + booting the app both have to set up the Rails "environment". My guess is that $ rails runner "puts 'hello'" also takes 90 seconds.

You might try rbspy to see where the time is going. I think it would work well here.

collimarco · 2019-09-15T15:54:29Z

@nateberkopec Thank you! That command saved my day... The runner indeed takes 90 seconds, exactly the same.

So I have tried to CTRL-C and it was clear that the process was stuck on Redis gem. After some investigation I found that there was this line in config/environments/production.rb:

config.cache_store = :redis_cache_store, { host: '10.129.123.123' }

That cache store was never used, but it was configured. The problem is that the host was pointing to a machine in the old datacenter: when we tried to connect from the old servers (located in the old datacenter), the connection was simply rejected immediately. However in the new datacenter, probably due to different firewall configurations, that line produced a long waiting for a response.

nateberkopec added the needs-repro label Sep 15, 2019

nateberkopec closed this as completed Sep 15, 2019

nateberkopec changed the title ~~Puma not accepting requests in cluster mode~~ App takes longer to boot on Ubuntu 18 Sep 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

App takes longer to boot on Ubuntu 18 #1969

App takes longer to boot on Ubuntu 18 #1969

collimarco commented Sep 15, 2019

collimarco commented Sep 15, 2019

collimarco commented Sep 15, 2019

nateberkopec commented Sep 15, 2019

nateberkopec commented Sep 15, 2019

collimarco commented Sep 15, 2019

collimarco commented Sep 15, 2019 •

edited

collimarco commented Sep 15, 2019

nateberkopec commented Sep 15, 2019

collimarco commented Sep 15, 2019

nateberkopec commented Sep 15, 2019 •

edited

collimarco commented Sep 15, 2019

App takes longer to boot on Ubuntu 18 #1969

App takes longer to boot on Ubuntu 18 #1969

Comments

collimarco commented Sep 15, 2019

Steps to reproduce

Expected behavior

Actual behavior

System configuration

collimarco commented Sep 15, 2019

collimarco commented Sep 15, 2019

nateberkopec commented Sep 15, 2019

nateberkopec commented Sep 15, 2019

collimarco commented Sep 15, 2019

collimarco commented Sep 15, 2019 • edited

collimarco commented Sep 15, 2019

nateberkopec commented Sep 15, 2019

collimarco commented Sep 15, 2019

nateberkopec commented Sep 15, 2019 • edited

collimarco commented Sep 15, 2019

collimarco commented Sep 15, 2019 •

edited

nateberkopec commented Sep 15, 2019 •

edited