after_fork hook for daemonize without clustering? #335

Closed
grk opened this Issue Jul 18, 2013 · 22 comments

Comments

Projects
None yet
@grk

grk commented Jul 18, 2013

Hi,

One of our customers ran into an issue with running newrelic instrumentation on puma servers with the following config:

environment 'production'
pidfile '/tmp/puma.0.pid'
state_path '/tmp/puma.0.state'
stdout_redirect '/tmp/log/puma.stdout', '/tmp/log/puma.stderr'
threads 4, 16
bind 'tcp://0.0.0.0:2500'
daemonize true

It looks like the newrelic agent loads in the main process, but doesn't send any data after forking. We did a workaround for this by running puma in clustered mode with 1 worker and the following addition to the config:

on_worker_boot do
  require 'newrelic_rpm'
  NewRelic::Agent.manual_start
end

and adding require: false to newrelic_rpm in the Gemfile.

Do you have any suggestions for how to make it work in non-clustered mode with daemonize? I think a after_fork hook similar to on_worker_boot would be a solution.

@grk grk referenced this issue Jul 18, 2013

Closed

New Relic support #128

@jasonrclark

This comment has been minimized.

Show comment
Hide comment
@jasonrclark

jasonrclark Jul 30, 2013

Hey @grk. I'd mentioned in another thread that I was hopeful this might be addressed by our cluster mode fix. Turns out we're only partway there.

The newly released 3.6.6.147 version of the Ruby agent has cluster mode support. From my testing, you shouldn't need to :require => false or add the on_worker_boot workarounds anymore, but you do continue to need to set cluster mode with 1 worker to get things working right.

If you're interested, I'd love to hear whether that works for you. I've also entered the Puma daemonize problem onto our issues list, so at some point we can hopefully get it fixed and comment back here.

You're absolutely right that an after_fork hook in Puma itself would make this a lot cleaner for us in the agent, but we might be able to find a fix without requiring Puma to change.

Hey @grk. I'd mentioned in another thread that I was hopeful this might be addressed by our cluster mode fix. Turns out we're only partway there.

The newly released 3.6.6.147 version of the Ruby agent has cluster mode support. From my testing, you shouldn't need to :require => false or add the on_worker_boot workarounds anymore, but you do continue to need to set cluster mode with 1 worker to get things working right.

If you're interested, I'd love to hear whether that works for you. I've also entered the Puma daemonize problem onto our issues list, so at some point we can hopefully get it fixed and comment back here.

You're absolutely right that an after_fork hook in Puma itself would make this a lot cleaner for us in the agent, but we might be able to find a fix without requiring Puma to change.

@grk

This comment has been minimized.

Show comment
Hide comment
@grk

grk Jul 30, 2013

@jasonrclark that's great to hear! This fixes the issue for our customer, since we can run puma in clustered mode for him. However, clustered mode doesn't work on jruby, so a fix for daemonize would be nice. Luckily so far noone used a combination of jruby + puma + newrelic yet.

grk commented Jul 30, 2013

@jasonrclark that's great to hear! This fixes the issue for our customer, since we can run puma in clustered mode for him. However, clustered mode doesn't work on jruby, so a fix for daemonize would be nice. Luckily so far noone used a combination of jruby + puma + newrelic yet.

@renatosnrg

This comment has been minimized.

Show comment
Hide comment
@renatosnrg

renatosnrg Jul 31, 2013

I'm having this same issue: NewRelic is not reporting data when running Puma in non-clustered mode with daemonize. If I enable the clustered mode it works fine.

I'm using the latest versions of both newrelic_rpm (3.6.6.147) and puma (2.4.0) gems. I also have an open ticket with NewRelic support. This issue thread finally explain what is happening :)

Thanks!

I'm having this same issue: NewRelic is not reporting data when running Puma in non-clustered mode with daemonize. If I enable the clustered mode it works fine.

I'm using the latest versions of both newrelic_rpm (3.6.6.147) and puma (2.4.0) gems. I also have an open ticket with NewRelic support. This issue thread finally explain what is happening :)

Thanks!

@ondrejbartas

This comment has been minimized.

Show comment
Hide comment
@ondrejbartas

ondrejbartas Sep 8, 2013

Hello, I had same issue. Daemonize and 0 workers.

When I set workers to 3 (didn't try 1) it just started to work.

Thank you guys. This really helped me.

Hello, I had same issue. Daemonize and 0 workers.

When I set workers to 3 (didn't try 1) it just started to work.

Thank you guys. This really helped me.

@HDLP9

This comment has been minimized.

Show comment
Hide comment
@HDLP9

HDLP9 Sep 18, 2013

Hi,

Still no fix for non-clustered mode? With clustered mode it works like a charm but it would be great to have a solution for this.

Thanks!

HDLP9 commented Sep 18, 2013

Hi,

Still no fix for non-clustered mode? With clustered mode it works like a charm but it would be great to have a solution for this.

Thanks!

@jasonrclark

This comment has been minimized.

Show comment
Hide comment
@jasonrclark

jasonrclark Sep 18, 2013

Hey @helio-pereira

We don't have a fix to this yet, but it is definitely on our list to address.

That said, spent a few minutes researching this morning and there's a hook since 2.3 for on_booted after the Process.daemon call that could be what we need. However, I'm not seeing where we can get to it from the agent (https://github.com/puma/puma/blob/master/lib/puma/events.rb#L95-L97) since I haven't found a way to access the current Runner or CLI objects Puma's running with.

Anyone more familiar with Puma have any ideas if there's a good way to do that I'm missing?

Hey @helio-pereira

We don't have a fix to this yet, but it is definitely on our list to address.

That said, spent a few minutes researching this morning and there's a hook since 2.3 for on_booted after the Process.daemon call that could be what we need. However, I'm not seeing where we can get to it from the agent (https://github.com/puma/puma/blob/master/lib/puma/events.rb#L95-L97) since I haven't found a way to access the current Runner or CLI objects Puma's running with.

Anyone more familiar with Puma have any ideas if there's a good way to do that I'm missing?

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Sep 19, 2013

Member

I guess I need to dig into what changes newrelic made...

Member

evanphx commented Sep 19, 2013

I guess I need to dig into what changes newrelic made...

@jasonrclark

This comment has been minimized.

Show comment
Hide comment
@jasonrclark

jasonrclark Sep 19, 2013

@evanphx Happy to help out in any way I can.

What we did with cluster-mode was to have our instrumentation automatically add an on_worker_boot handler. https://github.com/newrelic/rpm/blob/master/lib/new_relic/agent/instrumentation/puma.rb

I think we could do something similar for daemonize if we had a way to tap into the on_booted event, but haven't figured how to get to it from the agent.

@evanphx Happy to help out in any way I can.

What we did with cluster-mode was to have our instrumentation automatically add an on_worker_boot handler. https://github.com/newrelic/rpm/blob/master/lib/new_relic/agent/instrumentation/puma.rb

I think we could do something similar for daemonize if we had a way to tap into the on_booted event, but haven't figured how to get to it from the agent.

@jasonrclark

This comment has been minimized.

Show comment
Hide comment
@jasonrclark

jasonrclark Dec 20, 2013

Just released newrelic_rpm 3.7.1.180 which contains a (currently optional) mode that I believe should get the New Relic thread restarted cleanly even after daemonizing. If this works out for folks, we may enable it by default and be able to get away from having to register hooks in Puma itself, which would be awesome!

To enable the feature, update your agent to 3.7.1.180 and add this to your newrelic.yml

  restart_thread_in_children: true

Keen for any feedback on whether that helps. Thanks!

Note: original comment had the wrong key name (restart_threads_in_child) which would not work. I've updated the comment here so anyone finding it won't have to follow the comment trail to find my typo.

Just released newrelic_rpm 3.7.1.180 which contains a (currently optional) mode that I believe should get the New Relic thread restarted cleanly even after daemonizing. If this works out for folks, we may enable it by default and be able to get away from having to register hooks in Puma itself, which would be awesome!

To enable the feature, update your agent to 3.7.1.180 and add this to your newrelic.yml

  restart_thread_in_children: true

Keen for any feedback on whether that helps. Thanks!

Note: original comment had the wrong key name (restart_threads_in_child) which would not work. I've updated the comment here so anyone finding it won't have to follow the comment trail to find my typo.

@HDLP9

This comment has been minimized.

Show comment
Hide comment
@HDLP9

HDLP9 Dec 26, 2013

Hi @jasonrclark!

Thanks for all the work. I've just updated to 3.7.1.180 version, added restart_thread_in_child: true to newrelic.yml and deployed the app. (Gemfile with gem 'newrelic_rpm')

In newrelic the app is now active (green) but no data is being sent. When checking the HTML I see that newrelic js tags are missing.

Do I need to make any more configuration?

This is really anoying beacause for a small, but important app, I need to have 2 ruby processes (more memory usage) to be able to send data to newrelic.

Many thanks!

HDLP9 commented Dec 26, 2013

Hi @jasonrclark!

Thanks for all the work. I've just updated to 3.7.1.180 version, added restart_thread_in_child: true to newrelic.yml and deployed the app. (Gemfile with gem 'newrelic_rpm')

In newrelic the app is now active (green) but no data is being sent. When checking the HTML I see that newrelic js tags are missing.

Do I need to make any more configuration?

This is really anoying beacause for a small, but important app, I need to have 2 ruby processes (more memory usage) to be able to send data to newrelic.

Many thanks!

@seuros

This comment has been minimized.

Show comment
Hide comment
@seuros

seuros Dec 26, 2013

same issue here!

seuros commented Dec 26, 2013

same issue here!

@benweint

This comment has been minimized.

Show comment
Hide comment
@benweint

benweint Dec 26, 2013

Hi @helio-pereira - thanks for trying this out! There shouldn't be any additional configuration that's needed.

It's possible that you're just hitting a bug in our implementation (as Jason alluded to, the restart_thread_in_children feature is still considered experimental), or there might be an unrelated reason that your app isn't sending data in this scenario.

It's difficult to say what the issue is remotely without more information, specifically:

  1. The contents of your newrelic_agent.log after a restart of Puma (extending for a few minutes after the restart, preferably). This will be most useful if you're able to set log_level: debug in your newrelic.yml file in order to increase the verbosity of the log.
  2. The exact command you're using to start puma
  3. The version of Puma you're using, and a brief description of your app (is it a Rails app, Sinatra, something else?)

If you don't want to post the log contents publicly, feel free to follow up via a support ticket (mention my name and it'll get routed to the right place) or with me directly.

Edit: original comment had an incorrect name for the configuration key. The correct name is restart_thread_in_children, not restart_thread_in_child.

Hi @helio-pereira - thanks for trying this out! There shouldn't be any additional configuration that's needed.

It's possible that you're just hitting a bug in our implementation (as Jason alluded to, the restart_thread_in_children feature is still considered experimental), or there might be an unrelated reason that your app isn't sending data in this scenario.

It's difficult to say what the issue is remotely without more information, specifically:

  1. The contents of your newrelic_agent.log after a restart of Puma (extending for a few minutes after the restart, preferably). This will be most useful if you're able to set log_level: debug in your newrelic.yml file in order to increase the verbosity of the log.
  2. The exact command you're using to start puma
  3. The version of Puma you're using, and a brief description of your app (is it a Rails app, Sinatra, something else?)

If you don't want to post the log contents publicly, feel free to follow up via a support ticket (mention my name and it'll get routed to the right place) or with me directly.

Edit: original comment had an incorrect name for the configuration key. The correct name is restart_thread_in_children, not restart_thread_in_child.

@benweint

This comment has been minimized.

Show comment
Hide comment
@benweint

benweint Dec 30, 2013

@helio-pereira we just uncovered an issue that may be causing the restart_thread_in_child functionality to not work, depending on what version of Ruby you're on.

If you're on MRI < 2.0.0-p353, you're likely hitting https://bugs.ruby-lang.org/issues/8616
If you're on Rubinius 2.x, you're likely hitting rubinius/rubinius#2861

In short: we rely on Thread#alive? to return an accurate value in order for our automatic restarting to work, and those two bugs cause it not to. We can probably figure out how to work around them, since a large number of folks are likely on versions of MRI / Rubinius that are affected.

@helio-pereira we just uncovered an issue that may be causing the restart_thread_in_child functionality to not work, depending on what version of Ruby you're on.

If you're on MRI < 2.0.0-p353, you're likely hitting https://bugs.ruby-lang.org/issues/8616
If you're on Rubinius 2.x, you're likely hitting rubinius/rubinius#2861

In short: we rely on Thread#alive? to return an accurate value in order for our automatic restarting to work, and those two bugs cause it not to. We can probably figure out how to work around them, since a large number of folks are likely on versions of MRI / Rubinius that are affected.

@HDLP9

This comment has been minimized.

Show comment
Hide comment
@HDLP9

HDLP9 Dec 31, 2013

Hi @benweint,

sorry for not responding but you know...holidays!!!!

I'm using:

  • Puma 2.7.1
  • Rails 4.0.2 with turbolink enabled
  • ruby-2.0.0-p247

So the problem must be the ruby version. For me I don´t think that you need to work around this issue, not worth the trouble.

I´ll try to update to the new stable version 2.1 and let you know.

HDLP9 commented Dec 31, 2013

Hi @benweint,

sorry for not responding but you know...holidays!!!!

I'm using:

  • Puma 2.7.1
  • Rails 4.0.2 with turbolink enabled
  • ruby-2.0.0-p247

So the problem must be the ruby version. For me I don´t think that you need to work around this issue, not worth the trouble.

I´ll try to update to the new stable version 2.1 and let you know.

@jeroenj

This comment has been minimized.

Show comment
Hide comment
@jeroenj

jeroenj Jan 2, 2014

Hi @benweint,

I've also tried this config but it doesn't seem to work either:

[01/02/14 09:57:08 +0100 server (5772)] INFO : Starting the New Relic agent in "production" environment.
[01/02/14 09:57:08 +0100 server (5772)] INFO : To prevent agent startup add a NEWRELIC_ENABLE=false environment variable or modify the "production" section of your newrelic.yml.
[01/02/14 09:57:08 +0100 server (5772)] INFO : Reading configuration from config/newrelic.yml
[01/02/14 09:57:08 +0100 server (5772)] INFO : Enabling the Request Sampler.
[01/02/14 09:57:08 +0100 server (5772)] INFO : Environment: production
[01/02/14 09:57:08 +0100 server (5772)] INFO : Dispatcher: puma
[01/02/14 09:57:08 +0100 server (5772)] INFO : Application: trouw
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing ActiveRecord instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Net instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing deferred Rack instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Puma cluster mode support
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Rails 3 Controller instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Rails 3.1/3.2 view instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Rails3 Error instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Finished instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Doing deferred dependency-detection before Rack startup
[01/02/14 09:57:09 +0100 server (5772)] INFO : Starting Agent shutdown
[01/02/14 09:57:09 +0100 server (5790)] INFO : Starting Agent shutdown

I don't have any special configuration set for puma and restart_thread_in_child: true is set. I'm using rails 3.2.15 on ruby 2.1.0. If you need anything else, please let me know. :)

jeroenj commented Jan 2, 2014

Hi @benweint,

I've also tried this config but it doesn't seem to work either:

[01/02/14 09:57:08 +0100 server (5772)] INFO : Starting the New Relic agent in "production" environment.
[01/02/14 09:57:08 +0100 server (5772)] INFO : To prevent agent startup add a NEWRELIC_ENABLE=false environment variable or modify the "production" section of your newrelic.yml.
[01/02/14 09:57:08 +0100 server (5772)] INFO : Reading configuration from config/newrelic.yml
[01/02/14 09:57:08 +0100 server (5772)] INFO : Enabling the Request Sampler.
[01/02/14 09:57:08 +0100 server (5772)] INFO : Environment: production
[01/02/14 09:57:08 +0100 server (5772)] INFO : Dispatcher: puma
[01/02/14 09:57:08 +0100 server (5772)] INFO : Application: trouw
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing ActiveRecord instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Net instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing deferred Rack instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Puma cluster mode support
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Rails 3 Controller instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Rails 3.1/3.2 view instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Installing Rails3 Error instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Finished instrumentation
[01/02/14 09:57:09 +0100 server (5772)] INFO : Doing deferred dependency-detection before Rack startup
[01/02/14 09:57:09 +0100 server (5772)] INFO : Starting Agent shutdown
[01/02/14 09:57:09 +0100 server (5790)] INFO : Starting Agent shutdown

I don't have any special configuration set for puma and restart_thread_in_child: true is set. I'm using rails 3.2.15 on ruby 2.1.0. If you need anything else, please let me know. :)

@benweint

This comment has been minimized.

Show comment
Hide comment
@benweint

benweint Jan 2, 2014

@jeroenj well this is embarrassing - the setting is actually called restart_thread_in_children, not restart_thread_in_child. The CHANGELOG and Jason and I's comments above are incorrect (the setting is declared in the source here). Mind giving it a try with the correctly-named setting?

Sorry for the mix-up! I'll get the CHANGELOG corrected.

benweint commented Jan 2, 2014

@jeroenj well this is embarrassing - the setting is actually called restart_thread_in_children, not restart_thread_in_child. The CHANGELOG and Jason and I's comments above are incorrect (the setting is declared in the source here). Mind giving it a try with the correctly-named setting?

Sorry for the mix-up! I'll get the CHANGELOG corrected.

@jeroenj

This comment has been minimized.

Show comment
Hide comment
@jeroenj

jeroenj Jan 2, 2014

We all make mistakes, right?! :)

Anyway, that seems to fix it. My application is now reporting to New Relic. 👍

jeroenj commented Jan 2, 2014

We all make mistakes, right?! :)

Anyway, that seems to fix it. My application is now reporting to New Relic. 👍

@benweint

This comment has been minimized.

Show comment
Hide comment
@benweint

benweint Jan 2, 2014

@jeroenj Glad to hear it, thanks for following up!

benweint commented Jan 2, 2014

@jeroenj Glad to hear it, thanks for following up!

@jasonrclark

This comment has been minimized.

Show comment
Hide comment
@jasonrclark

jasonrclark Jan 3, 2014

😞 ack! I've revised my comment above with a note so anyone stumbling on it will hopefully get the right key name first.

😞 ack! I've revised my comment above with a note so anyone stumbling on it will hopefully get the right key name first.

@washu

This comment has been minimized.

Show comment
Hide comment
@washu

washu Jan 15, 2014

I just created a new gem call daemon_signals, require it after puma in your gemfile and you will get a nice SIGCONT after daemonization is called.

washu commented Jan 15, 2014

I just created a new gem call daemon_signals, require it after puma in your gemfile and you will get a nice SIGCONT after daemonization is called.

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Feb 28, 2014

Contributor

Is this still an issue? The original issue reads somewhat like a feature request. Can we close this issue?

Contributor

schneems commented Feb 28, 2014

Is this still an issue? The original issue reads somewhat like a feature request. Can we close this issue?

@evanphx evanphx closed this Mar 1, 2014

@benweint

This comment has been minimized.

Show comment
Hide comment
@benweint

benweint Apr 23, 2014

Hey folks - I know this issue's closed, but I wanted to give you a brief update. We've just released a new version of the newrelic_rpm gem (3.8.0) that should fix this issue.

In the latest version of the gem, we will automatically restart our background thread when we notice that a web transaction is processed and the PID we last started the thread in differs from the current PID.

This is effectively the same as the restart_thread_in_children setting mentioned above switching from default off to default on.

If you're still having trouble getting Puma (in any mode) working with newrelic_rpm, please let us know via a support ticket or in our forum and we'll be happy to help you out.

Hey folks - I know this issue's closed, but I wanted to give you a brief update. We've just released a new version of the newrelic_rpm gem (3.8.0) that should fix this issue.

In the latest version of the gem, we will automatically restart our background thread when we notice that a web transaction is processed and the PID we last started the thread in differs from the current PID.

This is effectively the same as the restart_thread_in_children setting mentioned above switching from default off to default on.

If you're still having trouble getting Puma (in any mode) working with newrelic_rpm, please let us know via a support ticket or in our forum and we'll be happy to help you out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment