Skip to content
This repository has been archived by the owner on Jun 25, 2021. It is now read-only.

Refactor Node state: extract EstablishingNode state #1616

Merged
merged 15 commits into from May 9, 2019

Conversation

madadam
Copy link
Contributor

@madadam madadam commented Apr 24, 2019

This is a refactoring PR with almost no functional changes. Exceptions are some minor changes for the sake of simplicity, clarity or performance. They are listed below:

  1. When EstablishingNode receives NodeApproval, it backlogs it and once it becomes Node it logs and ignores it. Previously it would immediately log and ignore it.
  2. In EstablishingNode, handle_hop_message and send_routing_message_via_route now work the same way as in ProvingNode as opposed to the way they work in Node. This makes it more explicit what the responsibilities of each of the states are.

@madadam madadam force-pushed the node-state-refactor-part-two branch 2 times, most recently from e92e9ff to ca95759 Compare April 25, 2019 10:54
}
}

fn handle_hop_message(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at node.rs handle_hop_message it is not clear to me how we have that implementation:

    fn handle_hop_message(
        &mut self,
        hop_msg: HopMessage,
        pub_id: PublicId,
        _: &mut EventBox,
    ) -> Result<Transition, RoutingError> {
        hop_msg.verify(pub_id.signing_public_key())?;
        let mut client_ip = None;
        let mut hop_name_result = match self.peer_mgr.get_peer(&pub_id).map(Peer::state) {
            Some(&PeerState::Bootstrapper { .. }) => {
                warn!(
                    "{} Hop message received from bootstrapper {:?}, disconnecting.",
                    self, pub_id
                );
                Err(RoutingError::InvalidStateForOperation)
            }
            Some(&PeerState::Client { ip, .. }) => {
                client_ip = Some(ip);
                Ok(*self.name())
            }
            Some(&PeerState::JoiningNode) => Ok(*self.name()),
            Some(&PeerState::Candidate) | Some(&PeerState::Proxy) | Some(&PeerState::Routing) => {
                Ok(*pub_id.name())
            }
            Some(&PeerState::ConnectionInfoPreparing { .. })
            | Some(&PeerState::ConnectionInfoReady(_))
            | Some(&PeerState::CrustConnecting)
            | Some(&PeerState::Connected)
            | None => {
                if self.dropped_clients.contains_key(&pub_id) {
                    debug!(
                        "{} Ignoring {:?} from recently-disconnected client {:?}.",
                        self, hop_msg, pub_id
                    );
                    return Ok(Transition::Stay);
                } else {
                    Ok(*self.name())
                    // FIXME - confirm we can return with an error here by running soak tests
                    // debug!("{} Invalid sender {} of {:?}", self, pub_id, hop_msg);
                    // return Err(RoutingError::InvalidSource);
                }
            }
        };

        if let Some(ip) = client_ip {
            match self.check_valid_client_message(&ip, hop_msg.content.routing_message()) {
                Ok(added_bytes) => {
                    self.proxy_load_amount += added_bytes;
                    self.peer_mgr.add_client_traffic(&pub_id, added_bytes);
                }
                Err(e) => hop_name_result = Err(e),
            }
        }

        match hop_name_result {
            Ok(hop_name) => {
                let HopMessage {
                    content,
                    route,
                    sent_to,
                    ..
                } = hop_msg;
                self.handle_signed_message(content, route, hop_name, &sent_to)
                    .map(|()| Transition::Stay)
            }
            Err(RoutingError::ExceedsRateLimit(hash)) => {
                trace!(
                    "{} Temporarily can't proxy messages from client {:?} (rate-limit hit).",
                    self,
                    pub_id
                );
                self.send_direct_message(
                    pub_id,
                    DirectMessage::ProxyRateLimitExceeded {
                        ack: Ack::compute(hop_msg.content.routing_message())?,
                    },
                );
                Err(RoutingError::ExceedsRateLimit(hash))
            }
            Err(error) => {
                self.ban_and_disconnect_peer(&pub_id);
                Err(error)
            }
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I will have to confirm this with @Viv-Rajkumar or someone, but I assumed that while in the establishing state, the node doesn't need to handle any of those things yet. That assumption might have been wrong though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't resolved yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After spending some time on this I'm now about 90% convinced this is actually correct. Although it's a logic change I believe the way this was handled before was actually somehow buggy (although in a harmless way) because until a node is established, it isn't supposed to handle these things. That's why I'd prefer to keep things as they are now. Worst case, this will pop up in the soak testing at which point I'll change it back. I'm going to at least mention this in the PR description.

&mut self.timer
}

fn send_routing_message_via_route(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, looking at node.rs, not clear how we come to this implmenation.

    // Constructs a signed message, finds the node responsible for accumulation, and either sends
    // this node a signature or tries to accumulate signatures for this message (on success, the
    // accumulator handles or forwards the message).
    fn send_routing_message_via_route(
        &mut self,
        routing_msg: RoutingMessage,
        route: u8,
        expires_at: Option<Instant>,
    ) -> Result<(), RoutingError> {
        if !self.in_authority(&routing_msg.src) {
            if route == 0 {
                log_or_panic!(
                    LogLevel::Error,
                    "{} Not part of the source authority. Not sending message {:?}.",
                    self,
                    routing_msg
                );
            }
            return Ok(());
        }
        if !self.add_to_pending_acks(&routing_msg, route, expires_at) {
            debug!(
                "{} already received an ack for {:?} - so not resending it.",
                self, routing_msg
            );
            return Ok(());
        }
        use crate::routing_table::Authority::*;
        let sending_sec = match routing_msg.src {
            ClientManager(_) | NaeManager(_) | NodeManager(_) | ManagedNode(_) | Section(_)
                if self.chain.is_member() =>
            {
                Some(self.chain.our_info().clone())
            }
            PrefixSection(ref pfx) if self.chain.is_member() => {
                let src_section = match self.chain.our_info_for_prefix(pfx) {
                    Some(a) => a.clone(),
                    None => {
                        // Can no longer represent sending Pfx.
                        return Ok(());
                    }
                };
                Some(src_section)
            }
            Client { .. } => None,
            _ => {
                // Cannot send routing msgs as a Node until established.
                return Ok(());
            }
        };

        if route > 0 {
            trace!(
                "{} Resending Msg: {:?} via route: {} and src_section: {:?}",
                self,
                routing_msg,
                route,
                sending_sec
            );
        }

        let signed_msg = SignedMessage::new(routing_msg, &self.full_id, sending_sec)?;

        match self.get_signature_target(&signed_msg.routing_message().src, route) {
            None => Ok(()),
            Some(our_name) if our_name == *self.name() => {
                let min_section_size = self.min_section_size();
                if let Some((mut msg, route)) =
                    self.sig_accumulator
                        .add_message(signed_msg, min_section_size, route)
                {
                    if self.in_authority(&msg.routing_message().dst) {
                        self.handle_signed_message(msg, route, our_name, &BTreeSet::new())?;
                    } else {
                        self.send_signed_message(&mut msg, route, &our_name, &BTreeSet::new())?;
                    }
                }
                Ok(())
            }
            Some(target_name) => {
                if let Some(&pub_id) = self.peer_mgr.get_pub_id(&target_name) {
                    let direct_msg = signed_msg
                        .routing_message()
                        .to_signature(self.full_id.signing_private_key())?;
                    self.send_direct_message(pub_id, direct_msg);
                    Ok(())
                } else {
                    Err(RoutingError::RoutingTable(RoutingTableError::NoSuchPeer))
                }
            }
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as #1616 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't resolved yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here.

Copy link
Contributor

@jeanphilippeD jeanphilippeD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks really great.
A couple of things that are a bit difficult to understand without help, but nothing major.

@madadam madadam force-pushed the node-state-refactor-part-two branch from ca95759 to 2a8a0a8 Compare April 26, 2019 09:20
src/states/node.rs Outdated Show resolved Hide resolved
@madadam madadam closed this Apr 29, 2019
@madadam madadam deleted the node-state-refactor-part-two branch April 29, 2019 13:09
@madadam madadam restored the node-state-refactor-part-two branch April 29, 2019 13:09
@madadam madadam reopened this Apr 29, 2019
@madadam madadam force-pushed the node-state-refactor-part-two branch from 2a8a0a8 to 0e97a30 Compare April 29, 2019 13:14
@madadam madadam changed the title (WIP) Refactor Node state: extract EstablishingNode state Refactor Node state: extract EstablishingNode state May 2, 2019
@madadam madadam force-pushed the node-state-refactor-part-two branch from 0e97a30 to f5253b4 Compare May 2, 2019 12:47
Copy link
Contributor

@jeanphilippeD jeanphilippeD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at last commit, it looks good, do you want to squash these last 2 changes together?

@madadam madadam force-pushed the node-state-refactor-part-two branch from 098e416 to 6317c3c Compare May 3, 2019 07:12
@pierrechevalier83 pierrechevalier83 self-requested a review May 3, 2019 09:40
@madadam madadam force-pushed the node-state-refactor-part-two branch 2 times, most recently from df85895 to ea68e1e Compare May 3, 2019 12:18
Copy link
Contributor

@pierrechevalier83 pierrechevalier83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done reviewing: I was a bit fast on the largest commit about extracting EstablishingNodeState, so will now focus on it.

Here are my comments for the other commits in the meantime, so we can parallelize our work 🙂

src/state_machine.rs Show resolved Hide resolved
src/state_machine.rs Show resolved Hide resolved
src/state_machine.rs Show resolved Hide resolved
// │ │ │ │ │ │
// Base * * * * * *
// Bootstrapped * * * * *
// NotEstablished * * * *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the name wasn't very clear. Now seeing this, I think BootstrappedNotEstablished is a better name: consistent with RelocatedNotEstablished and more obviously excluding the Bootstraping state.

// ▼ ▼
// ┌────────┐ ┌──────┐
// │ Client │ │ Node │
// └────────┘ └──────┘
//
//
// # Common traits
// Bootstrapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency between Bootstrapping and RelocatingNode
Should we remove the Node suffix from Relocating, Proving and Establishing? The context of the state_machine is probably sufficient for every one to know these are node states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bootstrapping can become Client or Node, but the other states are all going to become only Node. I think it makes some sense to have this distinction. On the other hand I don't feel too strongly about it and having shorter names would be nice too. Your call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see the distinction now. Cheers :) I probably don't feel too strongly about it then. Mildly in favour of removing it still, but I really won't insist on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so I wonder: I can remove the Node suffix, but that would still make the Client and Node names inconsistent with the rest, as the rest are verbs in the present continuous tense (Bootstrapping, Relocating, Proving, ...), but Client and Node are nouns. I can maybe go the opposite way and rename Node and Client to EstablishedNode and EstablishedClient, but that would still leave Bootstrapping out as it can be both node and client. Not sure. Naming stuff is hard. Maybe best to leave this as is for now and maybe return to it in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could switch Bootstrapping -> BootstrappingPeer. we normally go with Peer when common to Node & Client but not like it was enforced.

pub(super) full_id: FullId,
pub(super) min_section_size: usize,
pub(super) timer: Timer,
action_sender: RoutingActionSender,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to just use composition to avoid repeating all these fields?

pub struct ProvingNode {
pub(super) ack_mgr: AckManager,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to just use composition to avoid repeating all these fields?

pub proxy_pub_id: PublicId,
pub timer: Timer,
}

pub struct RelocatingNode {
action_sender: RoutingActionSender,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to just use composition to avoid repeating all these fields?

pub routing_msg_filter: RoutingMessageFilter,
pub timer: Timer,
}

pub struct Node {
ack_mgr: AckManager,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to just use composition to avoid repeating all these fields?

pub proxy_pub_id: PublicId,
pub timer: Timer,
}

/// A node connecting a user to the network, as opposed to a routing / data storage node.
///
/// Each client has a _proxy_: a node through which all requests are routed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to just use composition to avoid repeating all these fields?

Copy link
Contributor

@pierrechevalier83 pierrechevalier83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once soak the tests pass and the existing comments are addressed, I'm good with this PR in general 😄

// ▼ ▼
// ┌────────┐ ┌──────┐
// │ Client │ │ Node │
// └────────┘ └──────┘
//
//
// # Common traits
// Bootstrapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see the distinction now. Cheers :) I probably don't feel too strongly about it then. Mildly in favour of removing it still, but I really won't insist on it.

@madadam madadam force-pushed the node-state-refactor-part-two branch 2 times, most recently from 4518368 to f5ed47a Compare May 7, 2019 11:22
jeanphilippeD
jeanphilippeD previously approved these changes May 7, 2019
Copy link
Contributor

@jeanphilippeD jeanphilippeD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Very big so we will have to trust soak test to find any issue.

Copy link
Contributor

@pierrechevalier83 pierrechevalier83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me so far. Not approving yet as Viv had a comment about another improvement to do with fixing the tests. Let's hear him out first 😄

@@ -184,10 +184,21 @@ where
unimplemented!()
}

// TODO: rename this to `has_unpolled_observations`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't done yet because you would want to update Parsec to one that exposes the new API in the same PR?
I would add this to the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd like to make the change in Parsec first, to avoid the situation where routing becomes uncompilable with --features=mock_base (that is, with real parsec).

madadam added 10 commits May 8, 2019 12:43
- Try to reconnect to a dropped peer immediatelly, not delayed
- Change `has_unconsensused_observations` to `has_unpolled_observations` to prevent terminating the tests prematurely
- If node loses all connections, emit `RestartRequired` even if we are the first node (unless during network startup).
- Randomize the drop order in the node_restart test to make it more likely to catch bugs
Also move send_signed_message_to_peer from Relocated back to Node, as it is only used there.
So all state names are nouns now, for consistency.
Change the return type of the various `into_` function to `Result<State>` so in case of error we return `Err` instead of silently ignoring it and returning `State::Terminated`.
@madadam madadam force-pushed the node-state-refactor-part-two branch from 2453690 to 2dd16fb Compare May 8, 2019 10:43
jeanphilippeD
jeanphilippeD previously approved these changes May 8, 2019
Copy link
Contributor

@jeanphilippeD jeanphilippeD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the push forces and the Fix failures in node_drops, node_restart tests commit

@Viv-Rajkumar Viv-Rajkumar merged commit bb1e692 into maidsafe:fleming May 9, 2019
@madadam madadam deleted the node-state-refactor-part-two branch May 9, 2019 08:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants