Graceful shutdown improvements #7989

Merged
merged 5 commits into from Apr 26, 2016

Projects

None yet

6 participants

@mdogan
Member
mdogan commented Apr 20, 2016

Goal is to ensure data safety while shutting down multiple nodes concurrently. Data safety is guaranteed when nodes are shutdown (gracefully) even when there are no backups configured.
During shutdown process, all data (partition replicas) owned by shutting down member will be migrated to other alive nodes in the cluster. A member is only allowed to shutdown, when it has no remaining ownership assignments in partition table.

For more info see: https://hazelcast.atlassian.net/wiki/pages/viewpage.action?spaceKey=EN&title=Graceful+Shutdown+Improvements+Design

@mdogan mdogan added this to the 3.7 milestone Apr 20, 2016
metanet and others added some commits Apr 7, 2016
@metanet @mdogan metanet Start graceful shutdown impl
- Simplified PartitionStateGenerator interface
	* Merged `initialize`and `reArrange` methods to single `arrange`.
- Introduce ShutdownRequest operation and shutdown requested node state in MigrationManager
- Rollback successfully completed migration if destination is requested to shutdown
- Exclude shutdown requested members during repartitioning
- Check shutdown requested nodes after migrations are completed
- Apply AllowedDuringPassiveState interface only for replica sync and partition table operations
	* Migrations are not allowed during node is in PASSIVE state. Only replications and partition table operations are allowed
070dd37
@mdogan mdogan Implement safe shutdown await and response mechanisms
- Allow shutdown response operation when node is passive
- Shutdown request & response should not return response
- Lite member should skip safe shutdown procedure
- Allow migration operations when node is passive, except migration-commit
- Master should directly call onShutdownRequest() instead of sending op
- Apply graceful shutdown fixes
	* Send shutdown operation immediately if partition table is not initialized
	* Fix CheckShutdownRequestsTask scheduling in RepartitioningTask
- MigrationCommitOperation should check node state explicitly
- Migration planner logging improvements
- PartitionStateGenerator should fill null gaps in partition table by
swapping cold replicas with hotter ones
- Add graceful shutdown tests
5bf9f55
@metanet @mdogan metanet Improve & simplify SHIFT DOWN & UP migrations
- Improve SHIFT UP migration decisions when source is not null
- SHIFT DOWN migration is not needed if source new index is already owned by another node. If it is the case, we can perform 2 MOVE migrations instead of a single SHIFT DOWN and safety is not broken. SHIFT DOWN is only needed if source new index is not owned by any node before.
- Remove unused `MigrationAwareService.clearPartitionReplica()`
83779c2
@metanet @mdogan metanet Node running and state check improvements
- Check if node has shut down before sending operation failure response
	* This check should be done with node state instead of isRunning() method since nodes have more work to do (migrations) now during shutdown.
- Invocation.engineActive() and MockConnectionManager should check node-state explicitly
- Check if master is running during migration commit if master is the destination
- Master should allow direct shutdown when cluster is FROZEN or PASSIVE
- Remove locking in partition state publishing methods
- Improved & added graceful shutdown tests
ea788b6
@mdogan mdogan Implement PartitionService safety checking and enforcing
- `prepareToSafeShutdown()` should return immediately if node is not joined
- FetchMostRecentPartitionTableTask shouldn't increment version when partition table is not initialized
- `hasMissingReplicaOwners()` should check for unknown members in ptable
- MockConnectionManager.getOrConnect() should not create new connection after it's stopped/shutdown.
- Change migration result assertion in MigrationCommitServiceTest
- Add more graceful shutdown tests, including cluster states
e8a32f0
@eminn
Collaborator
eminn commented Apr 22, 2016

The changes complies with design document and explanatory commit messages in place this PR looks good to be merged ๐Ÿ‘

@enesakar enesakar commented on the diff Apr 25, 2016
...cast/partition/AbstractPartitionLostListenerTest.java
@@ -50,10 +57,33 @@ public void tearDown() {
hazelcastInstanceFactory.terminateAll();
}
- final protected void terminateInstances(List<HazelcastInstance> terminatingInstances) {
- for (HazelcastInstance instance : terminatingInstances) {
- instance.getLifecycleService().terminate();
+ final protected void stopInstances(List<HazelcastInstance> terminatingInstances, final NodeLeaveType nodeLeaveType) {
@enesakar
enesakar Apr 25, 2016 Member

minor one: I would prefer to rename terminatingInstances to stoppingInstances.

@jerrinot
jerrinot Apr 26, 2016 Contributor

๐Ÿ‘

@gurbuzali
Member

๐Ÿ‘

@jerrinot jerrinot commented on the diff Apr 26, 2016
...ternal/partition/GracefulShutdownCorrectnessTest.java
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+
+import static org.hamcrest.Matchers.greaterThanOrEqualTo;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertTrue;
+
+@RunWith(Parameterized.class)
+@Parameterized.UseParametersRunnerFactory(HazelcastParametersRunnerFactory.class)
+@Category({QuickTest.class, ParallelTest.class})
+public class GracefulShutdownCorrectnessTest extends PartitionCorrectnessTestSupport {
@jerrinot
jerrinot Apr 26, 2016 Contributor

this is running 6 minutes on my box. do we want to run it with each merge build?

@mdogan
mdogan Apr 26, 2016 Member

We can cut its time down later, either reducing parameters or partition count or both. For now we need to ensure it's passing on each build to be able to detect any possible hidden issue.
Also if we convert this to a slow test now, it will run loooonger because nighty build doesn't run parallel.

@jerrinot
jerrinot Apr 26, 2016 Contributor

does it matter if a nightly build is longer?
cc @hasancelik

@mdogan
mdogan Apr 26, 2016 Member

Doesn't it?

@jerrinot
jerrinot Apr 26, 2016 Contributor

If I have to choose between e.g. 1 extra minute in a merge build or 10 minutes in a nightly then I'll pick the 2nd option. It's not a on critical path during development.

@mdogan
mdogan Apr 26, 2016 Member

Then we have to convert nearly all new migration tests to nightly too. Most of them takes more than a minute, maybe more than a few. Are you ok with this?

@jerrinot
jerrinot Apr 26, 2016 Contributor

Good balance is needed. obviously the definition of "good balance" is open to interpretation:)

Also it's a question what's impact of 5 minutes long parallel test on a merge build duration. chances are it won't be that bad thanks to JVM-level parallelization.

@jerrinot
jerrinot Apr 26, 2016 Contributor

for the record: I don't consider this a blocker for this PR. we can move the tests to a nightly category later.

@mdogan
mdogan Apr 26, 2016 edited Member

Before merge: https://hazelcast-l337.ci.cloudbees.com/job/new-lab-fast-pr/2743/consoleFull

11:10:02 [INFO] ------------------------------------------------------------------------
11:10:02 [INFO] Reactor Summary:
11:10:02 [INFO] 
11:10:02 [INFO] Hazelcast Root .................................... SUCCESS [0.510s]
11:10:02 [INFO] hazelcast ......................................... SUCCESS [16:58.865s]
11:10:02 [INFO] hazelcast-client .................................. SUCCESS [5:45.603s]
11:10:02 [INFO] hazelcast-spring .................................. SUCCESS [59.595s]
11:10:02 [INFO] hazelcast-build-utils ............................. SUCCESS [0.681s]
11:10:02 [INFO] ------------------------------------------------------------------------

After merge: https://hazelcast-l337.ci.cloudbees.com/job/new-lab-fast-pr/2749/consoleFull

12:41:22 [INFO] ------------------------------------------------------------------------
12:41:22 [INFO] Reactor Summary:
12:41:22 [INFO] 
12:41:22 [INFO] Hazelcast Root .................................... SUCCESS [0.488s]
12:41:22 [INFO] hazelcast ......................................... SUCCESS [18:49.985s]
12:41:22 [INFO] hazelcast-client .................................. SUCCESS [7:56.258s]
12:41:22 [INFO] hazelcast-spring .................................. SUCCESS [1:04.291s]
12:41:22 [INFO] hazelcast-build-utils ............................. SUCCESS [0.681s]
12:41:22 [INFO] ------------------------------------------------------------------------

Effect on core module is 2mins. Client module build time seems increased 2mins too, I guess reason is client tests are using hazelcast.shutdown().

@jerrinot
Contributor

run-lab-run

@enesakar enesakar commented on the diff Apr 26, 2016
...rnal/partition/impl/InternalPartitionServiceImpl.java
@@ -691,7 +689,62 @@ public InternalPartition getPartition(int partitionId, boolean triggerOwnerAssig
@Override
public boolean prepareToSafeShutdown(long timeout, TimeUnit unit) {
- return partitionReplicaStateChecker.prepareToSafeShutdown(timeout, unit);
+ if (!node.joined()) {
+ return true;
+ }
+
+ if (node.isLiteMember()) {
+ return true;
@enesakar
enesakar Apr 26, 2016 Member

How do we handle the ongoing tasks run by executor service? Is it in scope of "fault tolerant executor service"?

@mdogan
mdogan Apr 26, 2016 Member

We only guarantee partitions' safety (resources related to partitions) during shutdown. Services should handle safety of other non-partition resources by theirselves.

@metanet
Contributor
metanet commented Apr 26, 2016

Thanks for the party guys. Merging this one

@metanet metanet merged commit c6bc1c4 into hazelcast:master Apr 26, 2016

1 check passed

default 9285 tests run, 35 skipped, 0 failed.
Details
@mdogan mdogan deleted the mdogan:graceful-shutdown-pr branch Apr 26, 2016
@enesakar enesakar commented on the diff Apr 26, 2016
...nal/partition/operation/MigrationCommitOperation.java
@@ -48,6 +50,11 @@ public MigrationCommitOperation(PartitionRuntimeState partitionState) {
@Override
public void run() {
+ NodeEngine nodeEngine = getNodeEngine();
+ if (!nodeEngine.isRunning()) {
@enesakar
enesakar Apr 26, 2016 Member

I see usages with nodeEngine.getNode().getState() == NodeState.SHUT_DOWN . Are they equivalent? nodeEngine.isRunning and nodeState?

@metanet
metanet Apr 26, 2016 Contributor

When a node starts the shut down / termination process, its node state becomes PASSIVE. After this moment, nodeEngine.isRunning() always returns false. node state becomes SHUT_DOWN after the shutdown process is completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment