Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BACKPORT] Do not try to connect to the old member list after the cluster changes [API-2086] #24944

Conversation

ihsandemir
Copy link
Contributor

Backports #24745

…s [API-2038] (hazelcast#24745)

During failover, when the client is about to try to connect to the next
cluster, it resets some services including the cluster service.

In its reset logic, cluster service clears the member list version, but
leaves the members as it is, as there are some services/proxies that
rely on the fact that the member list cannot be empty.

This creates a problem for the failover case. After the cluster change,
the client still tries to connect to the last known member list, which
is the members of the previous cluster.

To solve the problem, I have introduced a new method to the cluster
service, which returns the "effective" member list. That is, it returns
an empty list after the service is reset, but returns the member list as
expected otherwise. With that, the connection logic simply skips the old
member list after the client decides to change the cluster to try to
connect to.
@ihsandemir ihsandemir added this to the 5.2.z milestone Jul 4, 2023
@ihsandemir ihsandemir self-assigned this Jul 4, 2023
@ihsandemir ihsandemir requested a review from a team as a code owner July 4, 2023 11:21
@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   SessionAwareSemaphoreReleaseAcquiredSessionsOnFailureTest.testDrainPermits_shouldReleaseSessionsOnRuntimeError:123 DrainPermits request should have been completed with InterruptedException
[INFO] 
[ERROR] Tests run: 50949, Failures: 1, Errors: 0, Skipped: 238
[INFO] 

[ERROR] There are test failures.

@ihsandemir
Copy link
Contributor Author

run-lab-run

1 similar comment
@ihsandemir
Copy link
Contributor Author

run-lab-run

@ihsandemir ihsandemir marked this pull request as ready for review July 5, 2023 10:46
@ihsandemir ihsandemir changed the title [BACKPORT] Do not try to connect to the old member list after the cluster changes [BACKPORT] Do not try to connect to the old member list after the cluster changes [API-2086] Jul 5, 2023
@ihsandemir ihsandemir merged commit d814d97 into hazelcast:5.2.z Jul 6, 2023
9 checks passed
@ihsandemir ihsandemir deleted the backports/5.2.z/failover-old-memberlist branch July 6, 2023 10:53
devOpsHazelcast pushed a commit that referenced this pull request Feb 26, 2024
…uster changes [API-2086] (#24944) (#790)

[BACKPORT] Do not try to connect to the old member list after the
cluster changes [API-2086] (#24944)

Backports #24745

[API-2086]:
https://hazelcast.atlassian.net/browse/API-2086?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Co-authored-by: Metin Dumandag <29387993+mdumandag@users.noreply.github.com>
GitOrigin-RevId: 457f91f42fe0fcc87f28fddd99c996b29391b6f5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants