Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow operations while node is shutting down #17028

Merged
merged 1 commit into from Jun 4, 2020

Conversation

mdogan
Copy link
Contributor

@mdogan mdogan commented Jun 1, 2020

Currently, we are rejecting all operations, except the ones marked as
AllowedDuringPassiveState, while node is shutting down (nodeState == PASSIVE).
But this constraint does not provide any additional safety guarantees.

There are four major tasks are executed during node shutdown:

  • Partition replicas owned by shutting down member are migrated to other members.
  • Latest un-replicated CRDT state is replicated to other members.
  • If the member is CP and persistence is not enabled, it removes itself from CP groups.
  • CP sessions are closed.

All of these have their own mechanisms to provide safety.

With this change, we will allow submitting & executing operations
while a node is shutting down. This is especially important when node holds
large data (hundreds of GBs). Because graceful shutdown process more than
a few minutes, rejecting/blocking operations on & from that node
during that period reduces availability of the cluster.

Fixes #16932

Currently, we are rejecting all operations, except the ones marked as
`AllowedDuringPassiveState`, while node is shutting down (`nodeState == PASSIVE`).
But this constraint does not provide any additional safety guarantees.

There are four major tasks are executed during node shutdown:
- Partition replicas owned by shutting down member are migrated to other members.
- Latest un-replicated CRDT state is replicated to other members.
- If the member is CP and persistence is not enabled, it removes itself from CP groups.
- CP sessions are closed.

All of these have their own mechanisms to provide safety.

With this change, we will allow submitting & executing operations
while a node is shutting down. This is especially important when node holds
large data (hundreds of GBs). Because graceful shutdown process more than
a few minutes. And rejecting/blocking operations on & from that node
during that period reduces availability of the cluster.
@mdogan mdogan added this to the 4.1 milestone Jun 1, 2020
@mdogan mdogan requested a review from a team as a code owner June 1, 2020 12:30
@mmedenjak mmedenjak added the Source: Internal PR or issue was opened by an employee label Jun 2, 2020
@mdogan mdogan merged commit 26253c4 into hazelcast:master Jun 4, 2020
@mdogan mdogan deleted the node-shutdown-allow-mutations branch June 4, 2020 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HazelcastInstanceNotActiveException on grid shutdown after using CPSubsystem
5 participants