Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing the SIGTERM handler #1521

Closed
dyladan opened this issue Sep 11, 2020 · 5 comments · Fixed by #1522
Closed

Consider removing the SIGTERM handler #1521

dyladan opened this issue Sep 11, 2020 · 5 comments · Fixed by #1522
Labels
Discussion Issue or PR that needs/is extended discussion.

Comments

@dyladan
Copy link
Member

dyladan commented Sep 11, 2020

While working on #1501, I have become more and more convinced that we should remove the SIGTERM listener from our library. Regarding unload in the browser, I am less sure because I have less expericne, but I am leaning towards removing that too. My recommended solution would be to document how the exposed shutdown methods can be used within signal handlers of end-user applications.

Reasoning (NodeJS)

As I said above, I am less familiar with the browser behavior so this reasoning is specific to NodeJS. Maybe someone with more browser knowledge can fill in the gaps here like @obecny.

Signal handling cannot be done in a general way across all platforms.

The signal handling spec in node is complex and has many pitfalls. ctrl+c sends SIGINT. Elastic beanstalk sends SIGTERM. SIGTERM is not supported by windows but can be listened to and mocked in node. Closing your terminal sends SIGHUP on windows, but SIGHUP happens in "other similar situations on other platforms." SIGKILL and SIGSTOP cannot be listened to.

Adding a signal listener modifies global state, and potentially program behavior

The default signal handler for SIGTERM and SIGINT closes the program in node. Adding a listener for one of these signals removes the original handler. If we add a listener, we need to exit the program to maintain the default behavior. Otherwise we stop the program from being terminated. If a user application installs a listener in order to prevent SIGTERM or SIGINT from closing the application, our listener would exit the application. There is no way for us to know if a user has added or will in the future add a signal handler, so we cannot know if we should exit the program or not. We also cannot be sure that our signal handler will be called first, last, or in any other order.

Additionally, the number of currently registered signal handlers is limited in node. It is easily possible that the user has registered the maximum number of listeners and is unaware that our listener pushes them over the limit. This can be easily seen in the tests currently where we go over the maximum number of signal listeners and a warning is logged by the node runtime.

Solution

Implement a shutdown method in the node SDK and document that it may be used to flush data not yet exported and stop all timers used by the opentelemetry SDK. The end user application can use this method in their own signal handlers to safely shut down the OpenTelemetry SDK as part of their normal process shutdown.

@dyladan dyladan added the Discussion Issue or PR that needs/is extended discussion. label Sep 11, 2020
@obecny
Copy link
Member

obecny commented Sep 11, 2020

There is a config param for that so you have to enable it explicitly to be able to use this whole mechanism. Browser just mimics this behaviour to be compatible on that too. I might imagine some people might use it but the majority of people might not. We are supporting now both cases so if you enable this you should know what you doing. I think after your changes you are now always registering the event - previously it wasn't like this so unless you don't enable that the problem with limited number of sigterms is not the case and to prevent that maybe the first and major listening should only be done in first call only so then you don't have problem with it.

@dyladan
Copy link
Member Author

dyladan commented Sep 11, 2020

We still can't know if we should call process.exit or not in our handler. We could add a config for that, but then we're not simplifying the process for them at all. Is it really that much easier to have users pass gracefulShutdown = true than have them register a signal handler like process.on("SIGTERM", () => { sdk.shutdown(); /* other shutdown tasks */ process.exit(0); });?

@dyladan
Copy link
Member Author

dyladan commented Sep 11, 2020

If we decide not to call process.exit, it will be impossible for the user to register their own handler which calls exit without the possibility of exiting while we are still shutting down the SDK.

If we decide to call process.exit, we may exit the process while the user has other signal handlers running and interrupt their work.

@obecny
Copy link
Member

obecny commented Sep 11, 2020

We still can't know if we should call process.exit or not in our handler. We could add a config for that, but then we're not simplifying the process for them at all. Is it really that much easier to have users pass gracefulShutdown = true than have them register a signal handler like process.on("SIGTERM", () => { sdk.shutdown(); /* other shutdown tasks */ process.exit(0); });?

I think that's a perfect example and argument for not having the graceful shutdown. My understanding is / was that the graceful shutdown is something that was required from a spec etc. and that's why it was implemented. But it looks like it is not. If that is the case then I can just regret it hasn't been caught earlier especially that is causing some pain now :/. Personally if we don't need that I'm happy to drop it too.

@Flarna
Copy link
Member

Flarna commented Sep 11, 2020

SIGTERM is only one of the reasons why a process ends. There are also

  • no handles left in libuv (e.g. all servers closed,...)
  • process.exit() is called
  • uncaught exception (and no uncaughtException listener is set)
  • and unhandled rejection restults in an uncaught exception if command line argument --unhandled-rejections=strict is used
  • Fatal errors
  • SIGBREAK in windows

With quite some effort (and risk) it's possible to install signal handlers and mimic original behavior (e.g. patch process.on to ensure that you know about all listeners, call them delayed if needed).

But the main question is why. There is currently no magic during initializing the SDK so why adding magic for shutdown? And for all the other exit causes the problem is not yet solved.
Most production applications have already handling for above exit scenarios as they need it for their logic. Adding an (async) shutdown call for OTel should be not that hard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Issue or PR that needs/is extended discussion.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants