Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elder promotion and demotion #1929

Merged
merged 29 commits into from Dec 17, 2019
Merged

Conversation

@jeanphilippeD
Copy link
Contributor

jeanphilippeD commented Dec 2, 2019

Taking over from #1856
Implement part of #1835

This implements the logic for promoting and demoting Elders as needed and updates the splitting logic correspondingly.

@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 2, 2019

Added a few patch that address the seed failure seen that Connected event in Relocation should be ignored and not fail the test

@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 2, 2019

Additional commits to handle sig accumulation and hop message forwarding as adults

@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 2, 2019

Fix seed that was showing us disconnecting from our neighbour elder

@madadam madadam changed the title WIP: Elder promotion and demotion WIP: Elder promotion and demotion (take 2) Dec 3, 2019
@jeanphilippeD jeanphilippeD force-pushed the jeanphilippeD:bart_promotion branch from 064b11b to 6809474 Dec 3, 2019
@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 3, 2019

Rebased on fleming

@madadam madadam changed the title WIP: Elder promotion and demotion (take 2) WIP: Elder promotion and demotion Dec 3, 2019
tests/mock_network/churn.rs Outdated Show resolved Hide resolved
src/mock/parsec/state.rs Outdated Show resolved Hide resolved
@@ -86,7 +95,7 @@ pub enum Event {
/// Our own section has been split, resulting in the included `Prefix` for our new section.
SectionSplit(Prefix<XorName>),
/// The client has successfully connected to a proxy node on the network.
Connected,
Connected(ConnectEvent),

This comment has been minimized.

Copy link
@madadam

madadam Dec 3, 2019

Contributor

Minor: I might be cleaner to have a separate Event::Relocated event.

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Dec 3, 2019

Author Contributor

Could be, Can probably be a separate PR if that make sense.
The thing is that these events represent the same state of node (reaching adulthood), so I'm not 100% clear that would be better.

This comment has been minimized.

Copy link
@madadam

madadam Dec 3, 2019

Contributor

When relocating, the node does not really reach adulthood, as it was already adult (or elder). So we might even chose to not raise any event at all. But maybe it is useful for vault?

I don't mind postponing this to a separate PR though.

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Dec 3, 2019

Author Contributor

I would think that having been an Adult in a previous section does not help very much being adult in a new section, if there is stuff you need to do, probably need to be done again in new section. However, yes, lets leave it for later as it does not really impact the rest of this PR.

@jeanphilippeD jeanphilippeD force-pushed the jeanphilippeD:bart_promotion branch from 6809474 to 35c478c Dec 4, 2019
@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 4, 2019

Rebase on latest fleming

@jeanphilippeD jeanphilippeD force-pushed the jeanphilippeD:bart_promotion branch from 35c478c to 077dffa Dec 4, 2019
@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 4, 2019

Rebase on latest fleming

src/states/elder/mod.rs Outdated Show resolved Hide resolved
@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 4, 2019

Rebase on latest fleming

@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 5, 2019

Add fix for joining node into a splitting section (Messy)

@jeanphilippeD jeanphilippeD force-pushed the jeanphilippeD:bart_promotion branch from e1731a9 to e43a300 Dec 5, 2019
@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 5, 2019

Make sure the new commit have new date.

src/states/elder/mod.rs Outdated Show resolved Hide resolved
src/states/elder/mod.rs Outdated Show resolved Hide resolved
@@ -71,17 +70,29 @@ pub enum BootstrapResponse {
Error(BootstrapResponseError),
}

/// Request to join a section
#[cfg_attr(feature = "mock_serialise", derive(Clone))]

This comment has been minimized.

Copy link
@madadam

madadam Dec 5, 2019

Contributor

Why not derive Clone unconditionally?

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Dec 5, 2019

Author Contributor

Followed BootstrapResponse pattern. I think the main thing is DirectMessage should not be clone in production.
But probably these could be cleaned up as they could be clone maybe unconditionaly, Probably do that another time though.

src/states/joining_peer.rs Outdated Show resolved Hide resolved
src/states/elder/mod.rs Outdated Show resolved Hide resolved
madadam added 5 commits Dec 16, 2019
All backloged message will become stale quickly.
Remove the backlog when we see a new elder set we are not part of.
fix messages_during_churn test to take relocations fully into account
When a message destination includes our neighbours, send to all of the
matching nodes, not just the delivery group. This is intended to fix
issues with messages received by lagging nodes that have not processed
a split of their own section.
@jeanphilippeD jeanphilippeD force-pushed the jeanphilippeD:bart_promotion branch from d16d9d8 to 9e9c17f Dec 16, 2019
@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 16, 2019

Rebase on fleming, clean up history.

@jeanphilippeD jeanphilippeD changed the title WIP: Elder promotion and demotion Elder promotion and demotion Dec 16, 2019
@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 16, 2019

@octol @madadam Pushed a small commit to address outstanding review comments.
This PR is no longer WIP and we are looking to merge it. The commits have been rebased multiple times so we cannot ignore part of the history. Any comments on patches not by me will be addressed with new commits. (I think it is probably a reasonably sized PR, but we can extract some more parts if needed).

Copy link
Contributor Author

jeanphilippeD left a comment

ok, went through the change as a whole approved for me (cannot really approve it).
Some parts can probably do with improvments but these can happen later, and I see nothing blocking.

Adults were disconnecting from relocated node, elder did not.
The problem in this seed is that the Adult will do that as it catches
up with Parsec, as it is about to be promoted. So we don't want to
disconnect from a potential neighbour.

Fixes:
SEED="[3467112354, 1436145184, 989989653, 3612989132]" RUST_LOG=trace cargo test --release --features=mock -- aggressive_churn

Test:
clippy + soak test
@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 17, 2019

Fix soak failure by removing incorrect disconnect

@octol
octol approved these changes Dec 17, 2019
src/chain/member_info.rs Show resolved Hide resolved
src/chain/network_event.rs Show resolved Hide resolved
src/states/adult.rs Show resolved Hide resolved
self, p2p_node
);
let response = BootstrapResponse::Rebootstrap(conn_infos);
self.respond_to_bootstrap_request(&p2p_node, &destination);

This comment has been minimized.

Copy link
@madadam

madadam Dec 17, 2019

Contributor

All this function is doing now is calling another function. I suggest we move the body of respond_to_bootstrap_request back here and avoid the unnecessary indirection.

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Dec 17, 2019

Author Contributor

This was copied from elders/mod.rs respond_to_bootstrap_request.
I think we probably would refactor it so it is in approved.

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Dec 18, 2019

Author Contributor

Fixed with #1959

src/states/adult.rs Show resolved Hide resolved
src/states/adult.rs Show resolved Hide resolved
@@ -61,6 +61,7 @@ use crate::messages::Message;
/// Time after which a `Ticked` event is sent.
const TICK_TIMEOUT: Duration = Duration::from_secs(15);
const GOSSIP_TIMEOUT: Duration = Duration::from_secs(2);
const INITIAL_RELOCATE_COOL_DOWN_COUNT_DOWN: i32 = 10;

This comment has been minimized.

Copy link
@madadam

madadam Dec 17, 2019

Contributor

Better to add a comment explaining this const?

let our_prefix = match node.inner.our_prefix() {
Some(pfx) => pfx,
None => {
return;

This comment has been minimized.

Copy link
@madadam

madadam Dec 17, 2019

Contributor

Why is this OK now? After poll_and_resend, every node should be approved (infant, adult or elder) so it should have prefix. Or am I wrong?

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Dec 18, 2019

Author Contributor

Fixed with #1959

let our_prefix = match node.inner.our_prefix() {
Some(pfx) => pfx,
None => continue,
};
Comment on lines +767 to +770

This comment has been minimized.

Copy link
@madadam

madadam Dec 17, 2019

Contributor

I think this can be replaced with just node.our_prefix() (which unwraps internally), because we are iterating only elders and so are guaranteed to have prefix.

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Dec 18, 2019

Author Contributor

Fixed with #1959

@jeanphilippeD

This comment has been minimized.

Copy link
Contributor Author

jeanphilippeD commented Dec 17, 2019

As discussed, will fix this last batch of review comment as a follow up PR after this is merged @madadam

@jeanphilippeD jeanphilippeD merged commit a0358d4 into maidsafe:fleming Dec 17, 2019
7 of 16 checks passed
7 of 16 checks passed
Test (ubuntu-latest) Test (ubuntu-latest)
Details
Check (ubuntu-latest)
Details
Test (windows-latest) Test (windows-latest)
Details
Check (windows-latest)
Details
Test (macOS-latest) Test (macOS-latest)
Details
Check (macOS-latest)
Details
Test (ubuntu-latest, --release --features mock) Test (ubuntu-latest, --release --features mock)
Details
Test (ubuntu-latest, --release) Test (ubuntu-latest, --release)
Details
Test (windows-latest, --release --features mock) Test (windows-latest, --release --features mock)
Details
Test (windows-latest, --release) Test (windows-latest, --release)
Details
Test (macOS-latest, --release --features mock) Test (macOS-latest, --release --features mock)
Details
Test (macOS-latest, --release) Test (macOS-latest, --release)
Details
Rustfmt
Details
Clippy
Details
Travis CI - Pull Request Build Passed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@jeanphilippeD jeanphilippeD deleted the jeanphilippeD:bart_promotion branch Dec 17, 2019
jeanphilippeD added a commit to jeanphilippeD/routing that referenced this pull request Dec 18, 2019
For maidsafe#1929, we relaxed what was needed for our_prefix while tracking
failures. This is no longer necessary.

Also use the simpler unwraping TestNode function ensuring that it
provides enough context in panic message.

Test:
clippy + soak test.
jeanphilippeD added a commit to jeanphilippeD/routing that referenced this pull request Dec 18, 2019
During review of maidsafe#1929, this was raised as an issue.
Fix this issue and correct comment as well as clarify why this could be
a problem in the long run.
jeanphilippeD added a commit to jeanphilippeD/routing that referenced this pull request Dec 18, 2019
For maidsafe#1929, we relaxed what was needed for our_prefix while tracking
failures. This is no longer necessary.

Also use the simpler unwraping TestNode function ensuring that it
provides enough context in panic message.

Test:
clippy + soak test.
jeanphilippeD added a commit to jeanphilippeD/routing that referenced this pull request Dec 18, 2019
During review of maidsafe#1929, this was raised as an issue.
Fix this issue and correct comment as well as clarify why this could be
a problem in the long run.
jeanphilippeD added a commit to jeanphilippeD/routing that referenced this pull request Dec 18, 2019
During review of maidsafe#1929, this was raised as an issue.
Fix this issue and correct comment as well as clarify why this could be
a problem in the long run.
jeanphilippeD added a commit that referenced this pull request Dec 18, 2019
Clean up review comments from #1929
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.