Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

voting ping-pong part II #599

Closed
Excpt0r opened this issue May 31, 2021 · 2 comments
Closed

voting ping-pong part II #599

Excpt0r opened this issue May 31, 2021 · 2 comments

Comments

@Excpt0r
Copy link

Excpt0r commented May 31, 2021

Hi,

this is a follow up of #583.
Big thanks for your quick response and the new release!

I have tested now with the new version 1.3.7 that includes fix #586 from @fengjiachun
The logs of the test are attached, election duration 22s:
van-device-sub-0_jraft1.3.7.log
van-device-sub-2_jraft1.3.7.log

After that I tried out a different timeout as suggest from @killme2008 here
I reduced the rpcTimeout from 1000ms to 500ms an re-run the tests, logs attached, election duration 25s:
van-device-sub-0_jraft1.3.7_rpcTimeout500ms.log
van-device-sub-2_jraft1.3.7_rpcTimeout500ms.log

As the problem occurs not every time, it's not easy to debug. Any further hints would be appreciated.
As a next step I will try to play with electionTimeout parameter (does that need a different heartbeat factor too?).
Could it help to use gRPC instead of Bolt, and how to configure that?
Or maybe changing RaftOptions#maxElectionDelayMs or stepDownWhenVoteTimedout=false?

Did you see a similar behaviour on your side too, or is it possible there is a general network problem on my side?

Thanks for your help

@fengjiachun
Copy link
Contributor

Hi, @Excpt0r
I think I have found the reason,a prevote or a vote process will try to send a small packet to the other end to make sure the connection is available, and then it will be blocked by a dead node during the election.

The first solution:
Bolt has a small problem, it will be blocked for at least 1 second during the creation of connection, and this parameter cannot be modified. I have discussed this with the PMC of bolt, and he will fix this problem by the end of June. I recommend that you can increase election_timeout to 2 seconds (because decreasing the rpc_timeout will not work).

The second solution:
You can use grpc as the network framework by simply introducing jraft-extension/rpc-grpc-impl into your POM and decreasing the rpc_timeout.

@Excpt0r
Copy link
Author

Excpt0r commented Jun 1, 2021

Hi @fengjiachun
thanks a lot!! That fixed the problem.
The election duration is now down to 0-2s (with grpc and rpcTimeout 500ms) 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants