Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown: Add support for Ctrl-Close event on Windows platforms to grace... #8993

Merged
merged 1 commit into from Dec 22, 2014

Conversation

@tlrx
Copy link
Member

commented Dec 17, 2014

...fully shutdown node

This commit adds the support for the Ctrl-Close event on Windows using native system calls. This way, it is possible to catch the Ctrl-Close event sent by a 'taskill /pid' command (or when the user closes the console window where elasticsearch.bat was started) and gracefully close the node. Before this commit, the node was simply killed on taskkill/window closing.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

@spinscale @Mpdreamz @costin please could you review this?

@costin

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

Too bad the JVM still doesn't seem to properly intercept these calls (maybe JDK8 has improved in this area?). It looks like the JNA hook only works on 32-bit JVMs - what about 64-bit ones?

Also I'm not sure about System.exit(0) - let the JVM shutdown by itself or the OS to kill it instead of killing it yourself.

@Mpdreamz

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

I'm on a windows 10 64bit VM and this PR looks good to me.

I added some logging to closeAndExit() and am seeing the following:

[2014-12-19 14:55:36,014][INFO ][bootstrap ] running graceful exit on windows

When I close the bat file directly or by sending taskkill /pid <PID>

Some nitpicks:

  • rename closeAndExit() to gracefulExitOnWindows() to fully indicate the intend of the method.
  • Include a log line to signal we are attempting a graceful shutdown (on windows). Could be useful to diferentiate between hard and soft kills.
  • You should not need System.exit(0), like @costin suggested, for CTRL_CLOSE_EVENT since at that point you can not cancel the close either. From some quick testing removing that line has no impact on behaviour.

Some random notes:

kernel32.dll still exists on 64bit windows and is actually named the same even though it actually is a 64bit dll.

One thing to note that the cleanup generally has to finish within 5 seconds from CTRL_CLOSE_EVENT or windows will prompt the user with a force quit dialog which will look like the application is misbehaving.

As indicated by: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016%28v=vs.85%29.aspx

The system generates CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, and CTRL_SHUTDOWN_EVENT signals when the user closes the console, logs off, or shuts down the system so that the process has an opportunity to clean up before termination.

Not sure how we currently behave on the latter two events but if we do not behave well we could include them to gracefully exit under those conditions as well. These two events have a timeout of 20 seconds.

However we're we to handle the other these two events it'd be good to note they behave slightly differently when the process runs as service: http://msdn.microsoft.com/en-us/library/ms683242(VS.85).aspx

When a console application is run as a service, it receives a modified default console control handler. This modified handler does not call ExitProcess when processing the CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT signals.

@tlrx

This comment has been minimized.

Copy link
Member Author

commented Dec 19, 2014

@costin thanks for the feedback.

Too bad the JVM still doesn't seem to properly intercept these calls (maybe JDK8 has improved in this area?).

AFAIK there's nothing new concerning signal handling in JDK8.

It looks like the JNA hook only works on 32-bit JVMs - what about 64-bit ones?

Can you please point me where and why the hook only works on 32-bit JVM? I may have miss something... I tested the hook on the following configurations:

Windows 8.1 
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
Windows 8.1 
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
Java HotSpot(TM) Client VM (build 25.25-b02, mixed mode, sharing) 
Windows 7
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

I did not test it on other JVM.

Also I'm not sure about System.exit(0) - let the JVM shutdown by itself or the OS to kill it instead of killing it yourself.

That's what I did first, but I noticed that in this case the JVM is killed and the ShutdownHooks are not executed (for example, the PID file is not removed). There's a similar call in the TransportNodesShutdownAction class. I can remove this System.exit() call which is not mandatory to me.

Here is how I test the hook on Windows:

  • execute bin\elasticsearch.bat -Des.pidfile=es.pid -Des.logger.level=DEBUG
  • check for the following lines in log
[common.jna               ] windows/Kernel32 library loaded
[common.jna               ] console ctrl handler correctly set
  • check if PID file exist
  • close the window (X button)
  • check that the node close itself (quick message)
  • check that PID file is deleted (indicates that shutdown hooks are executed)
@costin

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

Can you please point me where and why the hook only works on 32-bit JVM?

Looking at the various names: Kernel32, com.sun.jna.win32 - that indicates there's a Kernel64 and jna.win64 out there. If that's not the case, then what's the meaning of 32 in the name?

@costin

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

Shutting down an app is always tricky when the semantics are ambiguous. The JNA hook intercepts an event that the JVM does not. So on one hand calling System.exit allows the JVM to shutdown gracefully (which by the way, means the JNA hook can simply call System.exit which will trigger ES to shut down properly. I would actually favour this since it means going forward, there's only one code path that gets executed and needs debugging).
On the other hand, by calling System.exit now there are cases where the JVM is shutdown properly instead of being killed and some env might depend; in other words, there are going to be (positive) side-effects which need to be explanined - I expect the majority of folks won't care for it but for those corner-cases where they do, they can easily find out about the flag to disable the hook.

P.S. Any idea how JNA works on the other JVMs out there - in particular Zulu/Azure?

@tlrx

This comment has been minimized.

Copy link
Member Author

commented Dec 19, 2014

@costin You're right, and that's why I added the flag to disable the hook. I updated the code and now the hook directly calls System.exit(0).

I did not found any info on how JNA behaves on other JVMs. I just test it on Zulu/Windows 8.1 and it works. I'm afraid we will have to test all supported platforms ourselves :)

@costin

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

Between Oracle's JDK and Zulu I think we are covered. LGTM.
P.S. You're a brave man to touch JNA :)
P.P.S. It looks like Kernel32 refers to the DLL used by JNA internally. @Mpdreamz @gmarz do you know if this is enough or whether there should be another DLL used for the 64 bit Windows API?

@Mpdreamz

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

internal = (Kernel32)Native.synchronizedLibrary((Kernel32)Native.loadLibrary("kernel32", Kernel32.class));

As i pointed out earlier: kernel32 can either be a 32 or 64bit dll depending on which windows version you have, the 32 bit in the name is kept for backwards compatibility.

Much of the Windows API was initially called win32 but those API's are actually 64bits on 64bits operation systems.

To quote from http://technet.microsoft.com/en-us/library/bb496995.aspx

Note The Windows API was formerly called the Win32 API. The name Windows API more accurately reflects its roots in 16-bit Windows and its support on 64-bit Windows. The name Windows API is used in this volume except when comparing 32-bit Windows programming with 64-bit Windows programming.

I would like to stress again the timeout constraint:

 logger.info("running graceful exit on windows");
try {
    Thread.sleep(30 * 1000);
} catch (InterruptedException e) {
    e.printStackTrace();
}
logger.info("still made it here on windows ");
bootstrap.destroy();
keepAliveLatch.countDown();

The second log line will never get written in this case (the actual force kill timeout is 5s).

@tlrx

This comment has been minimized.

Copy link
Member Author

commented Dec 19, 2014

@Mpdreamz Agreed for the timeout, I also tested it. The node has 5 sec to shutdown otherwise it will be killed (potentially in the middle of the shutdown process). For what I've seen, we can't change or disable this timeout constraint, right?

@Mpdreamz

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

Thats correct this timeout is not documented anywhere as something that can be changed.

In fact the only location i found that sort of documents the timeout is:

http://msdn.microsoft.com/en-us/library/ms683242(VS.85).aspx

A process can use the SetProcessShutdownParameters function to prevent the system from displaying a dialog box to the user during logoff or shutdown. In this case, the system terminates the process when HandlerRoutine returns TRUE or when the time-out period elapses.

SetProcessShutdownParameters only controls wheter a "Application X is still running" should pop up running when you shutdown windows.

@costin

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

@Mpdreamz Thanks for the links and explanation - missed your previous sentence on kernel32 (facepalm)

…acefully shutdown node

This commit adds the support for the Ctrl-Close event on Windows using native system calls. This way, it is possible to catch the Ctrl-Close event sent by a 'taskill /pid' command (or when the user closes the console window where elasticsearch.bat was started) and gracefully close the node. Before this commit, the node was simply killed on taskkill/window closing.
@tlrx tlrx force-pushed the tlrx:windows-on-close-event branch to a4133ec Dec 22, 2014
@tlrx tlrx merged commit a4133ec into elastic:master Dec 22, 2014
1 check passed
1 check passed
CLA Commit author is a member of Elasticsearch
Details
@tlrx tlrx deleted the tlrx:windows-on-close-event branch Dec 22, 2014
tlrx added a commit that referenced this pull request Dec 22, 2014
…acefully shutdown node

This commit adds the support for the Ctrl-Close event on Windows using native system calls. This way, it is possible to catch the Ctrl-Close event sent by a 'taskill /pid' command (or when the user closes the console window where elasticsearch.bat was started) and gracefully close the node. Before this commit, the node was simply killed on taskkill/window closing.

See #8993
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Dec 28, 2014
@clintongormley clintongormley added >enhancement and removed review labels Mar 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.