Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-14782 Unable to reenable rebalance after cluster scale up #10834

Merged
merged 2 commits into from Apr 25, 2023

Conversation

jabolina
Copy link
Member

@jabolina jabolina commented Apr 18, 2023

https://issues.redhat.com/browse/ISPN-14782
https://issues.redhat.com/browse/ISPN-14793

We do not broadcast the topology as stable if it wasn't yet restored.

@jabolina jabolina added the Image Required Set this label in order for a server image to be built with the PR changes and pushed to quay.io label Apr 18, 2023
@infinispanrelease
Copy link

Image pushed for Jenkins build #1:

quay.io/infinispan-test/server:PR-10834

@jabolina
Copy link
Member Author

Replayed CI.

@infinispanrelease
Copy link

Image pushed for Jenkins build #2:

quay.io/infinispan-test/server:PR-10834

@infinispanrelease
Copy link

Image pushed for Jenkins build #3:

quay.io/infinispan-test/server:PR-10834

@infinispanrelease
Copy link

Image pushed for Jenkins build #4:

quay.io/infinispan-test/server:PR-10834

@jabolina
Copy link
Member Author

Added a commit for https://issues.redhat.com/browse/ISPN-14793

This changes how the nodes handle the CacheStatusRequest a bit. This request is sent by the coordinator after a view MERGE. If the node has a cache that needs to be recreated from the persistent state, it will send a join request to the coordinator. The coordinator only handles this request after receiving all the states from the nodes, and, in case we are partitioned again during this, the current join mechanism should kick in.

Copy link
Contributor

@ryanemerson ryanemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just two minor points.

I've backported to speed things up while you're on PTO: #10849

I removed the System.out comment, but I haven't changed the AggregateCompletionStage usage.

@@ -295,6 +309,7 @@ public CompletionStage<ManagerStatusResponse> handleStatusRequest(int viewId) {
// As long as we have an older view, we can still process topologies from the old coordinator
return withView(viewId, getGlobalTimeout(), MILLISECONDS).thenApply(ignored -> {
Map<String, CacheStatusResponse> caches = new HashMap<>();
AggregateCompletionStage<Void> joins = CompletionStages.aggregateCompletionStage();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need an Agreggate here? AFAICT we only depend on a single CompletionStage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was leftover from the first solution. Updating to remove it.

@@ -535,6 +564,7 @@ private CompletionStage<Void> doHandleStableTopologyUpdate(String cacheName, Cac
CacheTopology stableTopology = cacheStatus.getStableTopology();
if (stableTopology == null || stableTopology.getTopologyId() < newStableTopology.getTopologyId()) {
log.tracef("Updating stable topology for cache %s: %s", cacheName, newStableTopology);
//System.out.printf("[%s] Updating stable topology for cache %s: %s\n", transport.getAddress(), cacheName, newStableTopology);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed

* After receiving a CacheStatusRequest from the coordinator, the nodes
will send a join request for the caches which need to be recovered from
the persistent state.
@jabolina
Copy link
Member Author

Updated with suggestions.

@infinispanrelease
Copy link

Image pushed for Jenkins build #5:

quay.io/infinispan-test/server:PR-10834

@ryanemerson ryanemerson merged commit 9f35812 into infinispan:main Apr 25, 2023
2 of 4 checks passed
@jabolina jabolina deleted the ISPN-14782 branch April 25, 2023 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Image Required Set this label in order for a server image to be built with the PR changes and pushed to quay.io
Projects
None yet
3 participants