-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify KubernetesClient shutdown behaviour [HZ-1921] #24613
Modify KubernetesClient shutdown behaviour [HZ-1921] #24613
Conversation
The overall goal of this change is to change the shutdown behaviour of KubernetesClient so our Stateful Set monitor thread shuts down before our `ClusterTopologyIntentTracker`, to allow the intent tracker to fully process all completed messages before Node shutdown. **The Current Problem** In its current state, the Stateful Set monitor thread is intended to shutdown after `Thread#interrupt` is called, triggering the `Thread#interrupted` check within the main `while(running)` loop of the Runnable. However, this check is not reached as the call to `WatchResponse#readLine` from within the main `run()` method is a blocking call that waits until data is available to read before proceeding. Since this call waits for non-null data before completing, the result is always non-null, and therefore this code block never exits under normal conditions: ```java while ((message = watchResponse.nextLine()) != null) { onMessage(message); } ``` Since this `while` loop cannot exit, and the `#readLine` method (which passes to `BufferedReader#readLine` under the hood) is a blocking I/O operation which cannot be interrupted, this operation does not end when `Thread#interrupt` is called. This leads to the Stateful Set monitor thread out-living the `ClusterTopologyIntentTracker`, even if the StsMonitor is interrupted first. As a result, during shutdown, it is possible for the StsMonitor to send data to the intent tracker after it has been destroyed and its executor is not longer accepting tasks. **The Root of the Problem** To reach our goal of ensuring that the Stateful Set monitor thread can no longer send requests to the `ClusterTopologyIntentTracker`, we need to add synchronization between the two objects that guarantees the intent tracker shuts down after the StsMonitor thread has completed. This can be achieved using a simple `CountDownLatch` which is counted down after the thread has completed, and awaited before shutting down the tracker. The main obstacle to achieving this is, as mentioned above, that the StsMonitor thread cannot be interrupted when waiting for `WatchResponse#readLine` to complete, and so the thread never completes. The only way this thread can complete is to either force its termination, or alter the message reading approach to allow interruption as intended. **Identifying Resolution Paths** We don't want to force termination of our Stateful Set monitor thread as this could result in message handling being terminated after it has been received, but not before it has finished being processed. Therefore the only way we can allow this thread to be interrupted as intended is to alter the message reading mechanics in a way that allows it to be interrupted as well. There is no way for us to know if more messages are pending from the k8s watch besides waiting for data to be received, so the best we can do is allow the StsMonitor to finish processing any messages it has already received (preventing process corruption), but terminate the stream of new messages it is waiting for before we shutdown the intent tracker. **Potential Solutions** So we've identified the root of the problem as our `#readLine` function blocking through interrupts, so how do we make it interruptible? Sadly one of the shortcomings of I/O operations in Java is that they usually cannot be interrupted in the traditional manner, so we have a few approaches to consider: 1) We could modify the message reading code to use `BufferedReader#ready` and `Thread#sleep` to periodically check if there is data to be read before calling any read functions. The problem with this approach is that A) `#ready` returns true if any data is available, not just if there is a full line of data to be read; and B) utilizing a sleep loop can result in delayed message handling at the least, or busy-waiting overhead at worst. 2) We could use "hacky" solutions to interrupt the `BufferedReader#readLine` such as closing underlying sockets or connections, propagating an exception to the reader. The problem with this solution is that everything related to our reading operation is handled in `syncrhonized` blocks, and since our shutdown process starts outside the StsMonitor thread, our calling thread is unable to obtain these locks (being held by the StsMonitor)! 3) It's possible that we could rewrite the `WatchResponse` mechanics to use Java NIO magic such as `Selector` for interruptible I/O operations. The issue with this approach is that it would require fairly significant refactoring of the related codebase, and may not end up providing the exact functionality we are looking for in our use case. 4) We can introduce an `Executor` to handle our I/O operations within the StsMonitor thread, allowing us to wait on a `Future#get` call instead of our `BufferedReader#readLine` call, where a `Future#get` operation can be interrupted by the waiting thread being interrupted. The downside to this solution is we have to introduce an additional thread on top of the existing StsMonitor thread itself. **Selecting a Solution** Considering the above information, I concluded the most sensible approach was to use (4) and introduce an `Executor` thread for the I/O operations. By using a separate thread for this call we can be rougher with it, as we know that worse case scenario we interrupt a message being read that has not started being processed yet (but we're shutting down anyway). This solution also allows for the least amount of underlying code changes, as our `Future#get` can be interrupted without issue, maintaining the current approach used for handling the StsMonitor shutdown. The only downside for this approach is the additional of another thread alongside the StsMonitor thread, but the actual impact of this should be minimal as both threads will still be waiting most of the time, so the only negative impact is being 1 tiny step closer to possible thread starvation. Generally I think this is the best solution at hand which allows quick shutdown of the StsMonitor thread while minimising potential for data loss or corruption. Combined with the `CountDownLatch` used, this allows for consistent service shutdown order between the Stateful Set monitor thread and the `ClusterTopologyIntentTracker`.
run-lab-run |
The job Click to expand the log file---------ERRORS----------- -------------------------- [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-compiler_2/hazelcast/src/main/java/com/hazelcast/kubernetes/KubernetesClient.java:49:8: Unused import - java.util.concurrent.CountDownLatch. [UnusedImports] -------------------------- |
The job Click to expand the log file---------ERRORS----------- -------------------------- [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-builder_2/hazelcast/src/main/java/com/hazelcast/kubernetes/KubernetesClient.java:49:8: Unused import - java.util.concurrent.CountDownLatch. [UnusedImports] -------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on slack, the separate single-threaded read executor is probably the least bad fix that is also backportable to 5.2.z & 5.3.z. Looking forward to using the new JDK 11 http client in a subsequent PR to master
.
Thanks for the excellent write-up and alternative solutions discussion.
Backport of: #24613 The overall goal of this change is to change the shutdown behaviour of KubernetesClient so our Stateful Set monitor thread shuts down before our `ClusterTopologyIntentTracker`, to allow the intent tracker to fully process all completed messages before Node shutdown. **The Current Problem** In its current state, the Stateful Set monitor thread is intended to shutdown after `Thread#interrupt` is called, triggering the `Thread#interrupted` check within the main `while(running)` loop of the Runnable. However, this check is not reached as the call to `WatchResponse#readLine` from within the main `run()` method is a blocking call that waits until data is available to read before proceeding. Since this call waits for non-null data before completing, the result is always non-null, and therefore this code block never exits under normal conditions: ```java while ((message = watchResponse.nextLine()) != null) { onMessage(message); } ``` Since this `while` loop cannot exit, and the `#readLine` method (which passes to `BufferedReader#readLine` under the hood) is a blocking I/O operation which cannot be interrupted, this operation does not end when `Thread#interrupt` is called. This leads to the Stateful Set monitor thread out-living the `ClusterTopologyIntentTracker`, even if the StsMonitor is interrupted first. As a result, during shutdown, it is possible for the StsMonitor to send data to the intent tracker after it has been destroyed and its executor is no longer accepting tasks. **The Root of the Problem** To reach our goal of ensuring that the Stateful Set monitor thread can no longer send requests to the `ClusterTopologyIntentTracker`, we need to add synchronization between the two objects that guarantees the intent tracker shuts down after the StsMonitor thread has completed. This can be achieved using a simple `CountDownLatch` which is counted down after the thread has completed, and awaited before shutting down the tracker. The main obstacle to achieving this is, as mentioned above, that the StsMonitor thread cannot be interrupted when waiting for `WatchResponse#readLine` to complete, and so the thread never completes. The only way this thread can complete is to either force its termination, or alter the message reading approach to allow interruption as intended. **Identifying Resolution Paths** We don't want to force termination of our Stateful Set monitor thread as this could result in message handling being terminated after it has been received, but not before it has finished being processed. Therefore the only way we can allow this thread to be interrupted as intended is to alter the message reading mechanics in a way that allows it to be interrupted as well. There is no way for us to know if more messages are pending from the k8s watch besides waiting for data to be received, so the best we can do is allow the StsMonitor to finish processing any messages it has already received (preventing process corruption), but terminate the stream of new messages it is waiting for before we shutdown the intent tracker. **Potential Solutions** So we've identified the root of the problem as our `#readLine` function blocking through interrupts, so how do we make it interruptible? Sadly one of the shortcomings of I/O operations in Java is that they usually cannot be interrupted in the traditional manner, so we have a few approaches to consider: 1) We could modify the message reading code to use `BufferedReader#ready` and `Thread#sleep` to periodically check if there is data to be read before calling any read functions. The problem with this approach is that A) `#ready` returns true if any data is available, not just if there is a full line of data to be read; and B) utilizing a sleep loop can result in delayed message handling at the least, or busy-waiting overhead at worst. 2) We could use "hacky" solutions to interrupt the `BufferedReader#readLine` such as closing underlying sockets or connections, propagating an exception to the reader. The problem with this solution is that everything related to our reading operation is handled in `syncrhonized` blocks, and since our shutdown process starts outside the StsMonitor thread, our calling thread is unable to obtain these locks (being held by the StsMonitor)! 3) It's possible that we could rewrite the `WatchResponse` mechanics to use Java NIO magic such as `Selector` for interruptible I/O operations. The issue with this approach is that it would require fairly significant refactoring of the related codebase, and may not end up providing the exact functionality we are looking for in our use case. 4) We can introduce an `Executor` to handle our I/O operations within the StsMonitor thread, allowing us to wait on a `Future#get` call instead of our `BufferedReader#readLine` call, where a `Future#get` operation can be interrupted by the waiting thread being interrupted. The downside to this solution is we have to introduce an additional thread on top of the existing StsMonitor thread itself. **Selecting a Solution** Considering the above information, I concluded the most sensible approach was to use (4) and introduce an `Executor` thread for the I/O operations. By using a separate thread for this call we can be rougher with it, as we know that worse case scenario we interrupt a message being read that has not started being processed yet (but we're shutting down anyway). This solution also allows for the least amount of underlying code changes, as our `Future#get` can be interrupted without issue, maintaining the current approach used for handling the StsMonitor shutdown. The only downside for this approach is the addition of another thread alongside the StsMonitor thread, but the actual impact of this should be minimal as both threads will still be waiting most of the time, so the only negative impact is being 1 tiny step closer to possible thread starvation. Generally I think this is the best solution at hand which allows quick shutdown of the StsMonitor thread while minimising potential for data loss or corruption. Combined with the `CountDownLatch` used, this allows for consistent service shutdown order between the `StsMonitor` thread and the `ClusterTopologyIntentTracker`.
Backport of: #24613 The overall goal of this change is to change the shutdown behaviour of KubernetesClient so our Stateful Set monitor thread shuts down before our `ClusterTopologyIntentTracker`, to allow the intent tracker to fully process all completed messages before Node shutdown. **The Current Problem** In its current state, the Stateful Set monitor thread is intended to shutdown after `Thread#interrupt` is called, triggering the `Thread#interrupted` check within the main `while(running)` loop of the Runnable. However, this check is not reached as the call to `WatchResponse#readLine` from within the main `run()` method is a blocking call that waits until data is available to read before proceeding. Since this call waits for non-null data before completing, the result is always non-null, and therefore this code block never exits under normal conditions: ```java while ((message = watchResponse.nextLine()) != null) { onMessage(message); } ``` Since this `while` loop cannot exit, and the `#readLine` method (which passes to `BufferedReader#readLine` under the hood) is a blocking I/O operation which cannot be interrupted, this operation does not end when `Thread#interrupt` is called. This leads to the Stateful Set monitor thread out-living the `ClusterTopologyIntentTracker`, even if the StsMonitor is interrupted first. As a result, during shutdown, it is possible for the StsMonitor to send data to the intent tracker after it has been destroyed and its executor is no longer accepting tasks. **The Root of the Problem** To reach our goal of ensuring that the Stateful Set monitor thread can no longer send requests to the `ClusterTopologyIntentTracker`, we need to add synchronization between the two objects that guarantees the intent tracker shuts down after the StsMonitor thread has completed. This can be achieved using a simple `CountDownLatch` which is counted down after the thread has completed, and awaited before shutting down the tracker. The main obstacle to achieving this is, as mentioned above, that the StsMonitor thread cannot be interrupted when waiting for `WatchResponse#readLine` to complete, and so the thread never completes. The only way this thread can complete is to either force its termination, or alter the message reading approach to allow interruption as intended. **Identifying Resolution Paths** We don't want to force termination of our Stateful Set monitor thread as this could result in message handling being terminated after it has been received, but not before it has finished being processed. Therefore the only way we can allow this thread to be interrupted as intended is to alter the message reading mechanics in a way that allows it to be interrupted as well. There is no way for us to know if more messages are pending from the k8s watch besides waiting for data to be received, so the best we can do is allow the StsMonitor to finish processing any messages it has already received (preventing process corruption), but terminate the stream of new messages it is waiting for before we shutdown the intent tracker. **Potential Solutions** So we've identified the root of the problem as our `#readLine` function blocking through interrupts, so how do we make it interruptible? Sadly one of the shortcomings of I/O operations in Java is that they usually cannot be interrupted in the traditional manner, so we have a few approaches to consider: 1) We could modify the message reading code to use `BufferedReader#ready` and `Thread#sleep` to periodically check if there is data to be read before calling any read functions. The problem with this approach is that A) `#ready` returns true if any data is available, not just if there is a full line of data to be read; and B) utilizing a sleep loop can result in delayed message handling at the least, or busy-waiting overhead at worst. 2) We could use "hacky" solutions to interrupt the `BufferedReader#readLine` such as closing underlying sockets or connections, propagating an exception to the reader. The problem with this solution is that everything related to our reading operation is handled in `syncrhonized` blocks, and since our shutdown process starts outside the StsMonitor thread, our calling thread is unable to obtain these locks (being held by the StsMonitor)! 3) It's possible that we could rewrite the `WatchResponse` mechanics to use Java NIO magic such as `Selector` for interruptible I/O operations. The issue with this approach is that it would require fairly significant refactoring of the related codebase, and may not end up providing the exact functionality we are looking for in our use case. 4) We can introduce an `Executor` to handle our I/O operations within the StsMonitor thread, allowing us to wait on a `Future#get` call instead of our `BufferedReader#readLine` call, where a `Future#get` operation can be interrupted by the waiting thread being interrupted. The downside to this solution is we have to introduce an additional thread on top of the existing StsMonitor thread itself. **Selecting a Solution** Considering the above information, I concluded the most sensible approach was to use (4) and introduce an `Executor` thread for the I/O operations. By using a separate thread for this call we can be rougher with it, as we know that worse case scenario we interrupt a message being read that has not started being processed yet (but we're shutting down anyway). This solution also allows for the least amount of underlying code changes, as our `Future#get` can be interrupted without issue, maintaining the current approach used for handling the StsMonitor shutdown. The only downside for this approach is the addition of another thread alongside the StsMonitor thread, but the actual impact of this should be minimal as both threads will still be waiting most of the time, so the only negative impact is being 1 tiny step closer to possible thread starvation. Generally I think this is the best solution at hand which allows quick shutdown of the StsMonitor thread while minimising potential for data loss or corruption. Combined with the `CountDownLatch` used, this allows for consistent service shutdown order between the `StsMonitor` thread and the `ClusterTopologyIntentTracker`.
During the investigation of - and solution generation for - #24613, it was noted that Java 11 provides the new `HttpClient` that features interruptible connections - the original solution was kept in place for 5.3, but with 5.4 we have now dropped JDK 8 support, so we can introduce this JDK 11 feature 🎉 This PR does exactly that, introducing `HttpClient` within the `RestClient` implementation we maintain as a Proof of Concept - this allows connections to be interrupted, as demonstrated in the `KubernetesClient` implementation, which is where the majority of focus for this PoC has been. Changes are generally fairly minimal outside of the `RestClient` itself and `KubernetesClient` implementation, with the main areas requiring changes being those that used `#withCaCertificates()` (now replaced by a separate constructor for clients using SSL), and timeout changes (connection/read timeouts are now unified). This PR now also includes a regression test for the original issue that was encountered, ensuring the `HttpClient` solution still resolves the issue.
…#25654) During the investigation of - and solution generation for - hazelcast#24613, it was noted that Java 11 provides the new `HttpClient` that features interruptible connections - the original solution was kept in place for 5.3, but with 5.4 we have now dropped JDK 8 support, so we can introduce this JDK 11 feature 🎉 This PR does exactly that, introducing `HttpClient` within the `RestClient` implementation we maintain as a Proof of Concept - this allows connections to be interrupted, as demonstrated in the `KubernetesClient` implementation, which is where the majority of focus for this PoC has been. Changes are generally fairly minimal outside of the `RestClient` itself and `KubernetesClient` implementation, with the main areas requiring changes being those that used `#withCaCertificates()` (now replaced by a separate constructor for clients using SSL), and timeout changes (connection/read timeouts are now unified). This PR now also includes a regression test for the original issue that was encountered, ensuring the `HttpClient` solution still resolves the issue.
The overall goal of this change is to change the shutdown behaviour of KubernetesClient so our Stateful Set monitor thread shuts down before our
ClusterTopologyIntentTracker
, to allow the intent tracker to fully process all completed messages before Node shutdown.The Current Problem
In its current state, the Stateful Set monitor thread is intended to shutdown after
Thread#interrupt
is called, triggering theThread#interrupted
check within the mainwhile(running)
loop of the Runnable. However, this check is not reached as the call toWatchResponse#readLine
from within the mainrun()
method is a blocking call that waits until data is available to read before proceeding. Since this call waits for non-null data before completing, the result is always non-null, and therefore this code block never exits under normal conditions:Since this
while
loop cannot exit, and the#readLine
method (which passes toBufferedReader#readLine
under the hood) is a blocking I/O operation which cannot be interrupted, this operation does not end whenThread#interrupt
is called. This leads to the Stateful Set monitor thread out-living theClusterTopologyIntentTracker
, even if the StsMonitor is interrupted first. As a result, during shutdown, it is possible for the StsMonitor to send data to the intent tracker after it has been destroyed and its executor is no longer accepting tasks.The Root of the Problem
To reach our goal of ensuring that the Stateful Set monitor thread can no longer send requests to the
ClusterTopologyIntentTracker
, we need to add synchronization between the two objects that guarantees the intent tracker shuts down after the StsMonitor thread has completed. This can be achieved using a simpleCountDownLatch
which is counted down after the thread has completed, and awaited before shutting down the tracker.The main obstacle to achieving this is, as mentioned above, that the StsMonitor thread cannot be interrupted when waiting for
WatchResponse#readLine
to complete, and so the thread never completes. The only way this thread can complete is to either force its termination, or alter the message reading approach to allow interruption as intended.Identifying Resolution Paths
We don't want to force termination of our Stateful Set monitor thread as this could result in message handling being terminated after it has been received, but not before it has finished being processed. Therefore the only way we can allow this thread to be interrupted as intended is to alter the message reading mechanics in a way that allows it to be interrupted as well.
There is no way for us to know if more messages are pending from the k8s watch besides waiting for data to be received, so the best we can do is allow the StsMonitor to finish processing any messages it has already received (preventing process corruption), but terminate the stream of new messages it is waiting for before we shutdown the intent tracker.
Potential Solutions
So we've identified the root of the problem as our
#readLine
function blocking through interrupts, so how do we make it interruptible? Sadly one of the shortcomings of I/O operations in Java is that they usually cannot be interrupted in the traditional manner, so we have a few approaches to consider:We could modify the message reading code to use
BufferedReader#ready
andThread#sleep
to periodically check if there is data to be read before calling any read functions. The problem with this approach is that A)#ready
returns true if any data is available, not just if there is a full line of data to be read; and B) utilizing a sleep loop can result in delayed message handling at the least, or busy-waiting overhead at worst.We could use "hacky" solutions to interrupt the
BufferedReader#readLine
such as closing underlying sockets or connections, propagating an exception to the reader. The problem with this solution is that everything related to our reading operation is handled insyncrhonized
blocks, and since our shutdown process starts outside the StsMonitor thread, our calling thread is unable to obtain these locks (being held by the StsMonitor)!It's possible that we could rewrite the
WatchResponse
mechanics to use Java NIO magic such asSelector
for interruptible I/O operations. The issue with this approach is that it would require fairly significant refactoring of the related codebase, and may not end up providing the exact functionality we are looking for in our use case.We can introduce an
Executor
to handle our I/O operations within the StsMonitor thread, allowing us to wait on aFuture#get
call instead of ourBufferedReader#readLine
call, where aFuture#get
operation can be interrupted by the waiting thread being interrupted. The downside to this solution is we have to introduce an additional thread on top of the existing StsMonitor thread itself.Selecting a Solution
Considering the above information, I concluded the most sensible approach was to use (4) and introduce an
Executor
thread for the I/O operations. By using a separate thread for this call we can be rougher with it, as we know that worse case scenario we interrupt a message being read that has not started being processed yet (but we're shutting down anyway).This solution also allows for the least amount of underlying code changes, as our
Future#get
can be interrupted without issue, maintaining the current approach used for handling the StsMonitor shutdown. The only downside for this approach is the addition of another thread alongside the StsMonitor thread, but the actual impact of this should be minimal as both threads will still be waiting most of the time, so the only negative impact is being 1 tiny step closer to possible thread starvation.Generally I think this is the best solution at hand which allows quick shutdown of the StsMonitor thread while minimising potential for data loss or corruption. Combined with the
CountDownLatch
used, this allows for consistent service shutdown order between theStsMonitor
thread and theClusterTopologyIntentTracker
.