New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster: safeguard consensus not set when calling ID #238
Conversation
SwarmConnect on the ipfs connector calls rpc Peers() which requests IDs for every peer member. If that peer member is booting, it might get the request after RPC is setup but before consensus is initialized. In which case a panic happens. Probability that this happens is small, but still. Also increase the connect swarms delay to 30 seconds, which should be a bit longer than the default wait_for_leader timeout, otherwise we might connect swarms while there's not even a leader. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
d7a72cf
to
a656e45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sorry @ZenGround0 , just added something to fix the test that failed (which is not related, and randomly fails, but still)... so you need to re-approve |
bd9e74a
to
a6d72b6
Compare
make sure we save a new config if the new peerset is different than the one in the configuration at boot. Hopefully this fixes a race condition in PeerAdd test License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>
a6d72b6
to
1f93662
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries
Use it to find out the number of peers in the config and prevent peerAdd test failures. License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>
@ZenGround0 look at all that greenness :) thanks again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SwarmConnect on the ipfs connector calls rpc Peers() which
requests IDs for every peer member. If that peer member
is booting, it might get the request after RPC is setup
but before consensus is initialized. In which case
a panic happens. Probability that this happens is small, but still.
Also increase the connect swarms delay to 30 seconds, which
should be a bit longer than the default wait_for_leader timeout,
otherwise we might connect swarms while there's not even a leader.
License: MIT
Signed-off-by: Hector Sanjuan hector@protocol.ai