New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements for running Hazelcast persistence on kubernetes #21844
Improvements for running Hazelcast persistence on kubernetes #21844
Conversation
32feb6d
to
7796bca
Compare
426c074
to
cbaada2
Compare
hazelcast/src/main/java/com/hazelcast/spi/properties/ClusterProperty.java
Outdated
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/kubernetes/KubernetesClient.java
Outdated
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/kubernetes/KubernetesClient.java
Outdated
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/kubernetes/KubernetesClient.java
Outdated
Show resolved
Hide resolved
* Since a watch implies a stream of updates from the server will be consumed, unlike other methods | ||
* in this class, it is the responsibility of the consumer to disconnect the connection | ||
* (by invoking {@link WatchResponse#disconnect()}) once the watch is no longer required. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upcoming question is not related to the PR 🙂
What if we want to convert the existing discovery mechanism into a mechanism like this one, a more dynamic version? In the current version, the latest member discovers existing ones via running related REST call-based methods.
hazelcast/src/main/java/com/hazelcast/internal/ascii/rest/HttpGetCommandProcessor.java
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/instance/impl/ClusterTopologyIntentTracker.java
Outdated
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/internal/cluster/impl/ClusterStateManager.java
Show resolved
Hide resolved
if (getNodeEngine().isStartCompleted()) { | ||
initializeIndexes(); | ||
} else { | ||
initializeLocalIndexes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this fixes HZ-1192? Or to be more exact, why cluster wide add index fails but local add index doesn't fail during recovery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster-wide index addition clashes with operation execution restrictions during recovery.
I think it is anyway wrong to perform index initialization cluster-wide anyway in MapProxySupport#initialize
and I would remove the initializeIndexes
call altogether. We should only concern ourselves with locally owned partitions in proxy initialization.
@ahmetmircik wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the reason was to create indexes before start-completed.
Isn't it an option to throw exception if start is not completed yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look at this again, making this local only seems like a behavior change. With this change, instead of relying on operation system guarantees, remote nodes proxies will be created by eventing system guarantees. This can introduce unexpected changes in effective behavior, when eventing system is busy and it drops events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with Ahmet, I will prepare a separate PR for the HZ-1192 fix, so it is easier to track and revert this commit before we merge this PR. For now, I am leaving the commit in to facilitate testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extracted in #22485 -- I still keep the commit as part of this PR as there is still some testing ongoing with those branches. Will revert it before merge.
hazelcast/src/main/java/com/hazelcast/spi/impl/operationservice/Operation.java
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/internal/services/PreJoinAwareService.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have few minor comments (basically what is left unresolved) left but they are not blockers for merge. Two importantish TODOs before merge could be:
- Update ClusterTopologyIntentTracker Javadoc.
- Separate 1192 changes from this PR.
@vbekiaris thank you for your efforts on this huge endevour.
hazelcast/src/main/java/com/hazelcast/instance/impl/ClusterTopologyIntentTracker.java
Outdated
Show resolved
Hide resolved
The job Click to expand the log file-------------------------- ---------SUMMARY---------- -------------------------- [ERROR] COMPILATION ERROR : -------------------------- [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/hotrestart/HotRestartIntegrationService.java:[100,7] error: HotRestartIntegrationService is not abstract and does not override abstract method setClusterTopologyIntentOnMaster(ClusterTopologyIntent) in InternalHotRestartService -------------------------- [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast-enterprise: Compilation failure -------------------------- ---------ERRORS----------- -------------------------- [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/hotrestart/HotRestartIntegrationService.java:[100,7] error: HotRestartIntegrationService is not abstract and does not override abstract method setClusterTopologyIntentOnMaster(ClusterTopologyIntent) in InternalHotRestartService -------------------------- [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/hotrestart/HotRestartIntegrationService.java:[100,7] error: HotRestartIntegrationService is not abstract and does not override abstract method setClusterTopologyIntentOnMaster(ClusterTopologyIntent) in InternalHotRestartService -------------------------- |
Thanks for your comments & reviews, they made this PR much better. |
The job Click to expand the log file-------------------------- ---------SUMMARY---------- -------------------------- [ERROR] COMPILATION ERROR : -------------------------- [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/hotrestart/HotRestartIntegrationService.java:[100,7] error: HotRestartIntegrationService is not abstract and does not override abstract method setClusterTopologyIntentOnMaster(ClusterTopologyIntent) in InternalHotRestartService -------------------------- [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast-enterprise: Compilation failure -------------------------- ---------ERRORS----------- -------------------------- [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/hotrestart/HotRestartIntegrationService.java:[100,7] error: HotRestartIntegrationService is not abstract and does not override abstract method setClusterTopologyIntentOnMaster(ClusterTopologyIntent) in InternalHotRestartService -------------------------- [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/hotrestart/HotRestartIntegrationService.java:[100,7] error: HotRestartIntegrationService is not abstract and does not override abstract method setClusterTopologyIntentOnMaster(ClusterTopologyIntent) in InternalHotRestartService -------------------------- |
…st#21844) - Adds automated cluster state management for persistence on kubernetes - Supports cluster-wide shutdown, rolling restart and partial member recovery from failure on kubernetes [HZ-1190] [HZ-1191] [HZ-1193] - Fixes behaviour of readiness probe with persistence enabled [HZ-1349] - Allows tuning either for speedy crash recovery with FROZEN state or availability of in-memory data structures with NO_MIGRATION state for missing members [HZ-1311] - Fixes backup sync after single member crash recovery [HZ-1349] Design document in EE side: https://github.com/vbekiaris/hazelcast-enterprise/blob/enhancements/5.2/k8s-persistence/docs/design/persistence/04-persistence-kubernetes-improvements.md (cherry picked from commit 1ddc16e)
…st#21844) - Adds automated cluster state management for persistence on kubernetes - Supports cluster-wide shutdown, rolling restart and partial member recovery from failure on kubernetes [HZ-1190] [HZ-1191] [HZ-1193] - Fixes behaviour of readiness probe with persistence enabled [HZ-1349] - Allows tuning either for speedy crash recovery with FROZEN state or availability of in-memory data structures with NO_MIGRATION state for missing members [HZ-1311] - Fixes backup sync after single member crash recovery [HZ-1349] Design document in EE side: https://github.com/vbekiaris/hazelcast-enterprise/blob/enhancements/5.2/k8s-persistence/docs/design/persistence/04-persistence-kubernetes-improvements.md (cherry picked from commit 1ddc16e)
…22501) - Adds automated cluster state management for persistence on kubernetes - Supports cluster-wide shutdown, rolling restart and partial member recovery from failure on kubernetes [HZ-1190] [HZ-1191] [HZ-1193] - Fixes behaviour of readiness probe with persistence enabled [HZ-1349] - Allows tuning either for speedy crash recovery with FROZEN state or availability of in-memory data structures with NO_MIGRATION state for missing members [HZ-1311] - Fixes backup sync after single member crash recovery [HZ-1349] Design document in EE side: https://github.com/vbekiaris/hazelcast-enterprise/blob/enhancements/5.2/k8s-persistence/docs/design/persistence/04-persistence-kubernetes-improvements.md (cherry picked from commit 1ddc16e) 1:1 clean backport of #21844 to 5.2.0 release branch Also includes backport of #22512 Co-authored-by: Łukasz Dziedziul <lukasz.dziedziul@hazelcast.com>
…22502) - Adds automated cluster state management for persistence on kubernetes - Supports cluster-wide shutdown, rolling restart and partial member recovery from failure on kubernetes [HZ-1190] [HZ-1191] [HZ-1193] - Fixes behaviour of readiness probe with persistence enabled [HZ-1349] - Allows tuning either for speedy crash recovery with FROZEN state or availability of in-memory data structures with NO_MIGRATION state for missing members [HZ-1311] - Fixes backup sync after single member crash recovery [HZ-1349] Design document in EE side: https://github.com/vbekiaris/hazelcast-enterprise/blob/enhancements/5.2/k8s-persistence/docs/design/persistence/04-persistence-kubernetes-improvements.md (cherry picked from commit 1ddc16e) 1:1 clean backport from #21844 Also includes backport of #22512 Co-authored-by: Łukasz Dziedziul <lukasz.dziedziul@hazelcast.com>
Design document in EE side PR: https://github.com/vbekiaris/hazelcast-enterprise/blob/enhancements/5.2/k8s-persistence/docs/design/persistence/04-persistence-kubernetes-improvements.md
EE counterpart: https://github.com/hazelcast/hazelcast-enterprise/pull/5140
Best reviewed commit-by-commit
OperationRunnerImpl
,MapProxySupport
and making pre-join opsAllowedDuringPassiveState
to5.0.z
and5.1.z