Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Node state: extract EstablishingNode state #1616

Merged
merged 15 commits into from May 9, 2019

Conversation

Projects
None yet
4 participants
@madadam
Copy link
Contributor

commented Apr 24, 2019

This is a refactoring PR with almost no functional changes. Exceptions are some minor changes for the sake of simplicity, clarity or performance. They are listed below:

  1. When EstablishingNode receives NodeApproval, it backlogs it and once it becomes Node it logs and ignores it. Previously it would immediately log and ignore it.
  2. In EstablishingNode, handle_hop_message and send_routing_message_via_route now work the same way as in ProvingNode as opposed to the way they work in Node. This makes it more explicit what the responsibilities of each of the states are.

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch 2 times, most recently from e92e9ff to ca95759 Apr 25, 2019

}
}

fn handle_hop_message(

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Apr 25, 2019

Looking at node.rs handle_hop_message it is not clear to me how we have that implementation:

    fn handle_hop_message(
        &mut self,
        hop_msg: HopMessage,
        pub_id: PublicId,
        _: &mut EventBox,
    ) -> Result<Transition, RoutingError> {
        hop_msg.verify(pub_id.signing_public_key())?;
        let mut client_ip = None;
        let mut hop_name_result = match self.peer_mgr.get_peer(&pub_id).map(Peer::state) {
            Some(&PeerState::Bootstrapper { .. }) => {
                warn!(
                    "{} Hop message received from bootstrapper {:?}, disconnecting.",
                    self, pub_id
                );
                Err(RoutingError::InvalidStateForOperation)
            }
            Some(&PeerState::Client { ip, .. }) => {
                client_ip = Some(ip);
                Ok(*self.name())
            }
            Some(&PeerState::JoiningNode) => Ok(*self.name()),
            Some(&PeerState::Candidate) | Some(&PeerState::Proxy) | Some(&PeerState::Routing) => {
                Ok(*pub_id.name())
            }
            Some(&PeerState::ConnectionInfoPreparing { .. })
            | Some(&PeerState::ConnectionInfoReady(_))
            | Some(&PeerState::CrustConnecting)
            | Some(&PeerState::Connected)
            | None => {
                if self.dropped_clients.contains_key(&pub_id) {
                    debug!(
                        "{} Ignoring {:?} from recently-disconnected client {:?}.",
                        self, hop_msg, pub_id
                    );
                    return Ok(Transition::Stay);
                } else {
                    Ok(*self.name())
                    // FIXME - confirm we can return with an error here by running soak tests
                    // debug!("{} Invalid sender {} of {:?}", self, pub_id, hop_msg);
                    // return Err(RoutingError::InvalidSource);
                }
            }
        };

        if let Some(ip) = client_ip {
            match self.check_valid_client_message(&ip, hop_msg.content.routing_message()) {
                Ok(added_bytes) => {
                    self.proxy_load_amount += added_bytes;
                    self.peer_mgr.add_client_traffic(&pub_id, added_bytes);
                }
                Err(e) => hop_name_result = Err(e),
            }
        }

        match hop_name_result {
            Ok(hop_name) => {
                let HopMessage {
                    content,
                    route,
                    sent_to,
                    ..
                } = hop_msg;
                self.handle_signed_message(content, route, hop_name, &sent_to)
                    .map(|()| Transition::Stay)
            }
            Err(RoutingError::ExceedsRateLimit(hash)) => {
                trace!(
                    "{} Temporarily can't proxy messages from client {:?} (rate-limit hit).",
                    self,
                    pub_id
                );
                self.send_direct_message(
                    pub_id,
                    DirectMessage::ProxyRateLimitExceeded {
                        ack: Ack::compute(hop_msg.content.routing_message())?,
                    },
                );
                Err(RoutingError::ExceedsRateLimit(hash))
            }
            Err(error) => {
                self.ban_and_disconnect_peer(&pub_id);
                Err(error)
            }
        }
    }

This comment has been minimized.

Copy link
@madadam

madadam Apr 25, 2019

Author Contributor

This is a good point. I will have to confirm this with @Viv-Rajkumar or someone, but I assumed that while in the establishing state, the node doesn't need to handle any of those things yet. That assumption might have been wrong though...

This comment has been minimized.

Copy link
@madadam

madadam May 2, 2019

Author Contributor

This isn't resolved yet.

This comment has been minimized.

Copy link
@madadam

madadam May 7, 2019

Author Contributor

After spending some time on this I'm now about 90% convinced this is actually correct. Although it's a logic change I believe the way this was handled before was actually somehow buggy (although in a harmless way) because until a node is established, it isn't supposed to handle these things. That's why I'd prefer to keep things as they are now. Worst case, this will pop up in the soak testing at which point I'll change it back. I'm going to at least mention this in the PR description.

&mut self.timer
}

fn send_routing_message_via_route(

This comment has been minimized.

Copy link
@jeanphilippeD

jeanphilippeD Apr 25, 2019

Similarly, looking at node.rs, not clear how we come to this implmenation.

    // Constructs a signed message, finds the node responsible for accumulation, and either sends
    // this node a signature or tries to accumulate signatures for this message (on success, the
    // accumulator handles or forwards the message).
    fn send_routing_message_via_route(
        &mut self,
        routing_msg: RoutingMessage,
        route: u8,
        expires_at: Option<Instant>,
    ) -> Result<(), RoutingError> {
        if !self.in_authority(&routing_msg.src) {
            if route == 0 {
                log_or_panic!(
                    LogLevel::Error,
                    "{} Not part of the source authority. Not sending message {:?}.",
                    self,
                    routing_msg
                );
            }
            return Ok(());
        }
        if !self.add_to_pending_acks(&routing_msg, route, expires_at) {
            debug!(
                "{} already received an ack for {:?} - so not resending it.",
                self, routing_msg
            );
            return Ok(());
        }
        use crate::routing_table::Authority::*;
        let sending_sec = match routing_msg.src {
            ClientManager(_) | NaeManager(_) | NodeManager(_) | ManagedNode(_) | Section(_)
                if self.chain.is_member() =>
            {
                Some(self.chain.our_info().clone())
            }
            PrefixSection(ref pfx) if self.chain.is_member() => {
                let src_section = match self.chain.our_info_for_prefix(pfx) {
                    Some(a) => a.clone(),
                    None => {
                        // Can no longer represent sending Pfx.
                        return Ok(());
                    }
                };
                Some(src_section)
            }
            Client { .. } => None,
            _ => {
                // Cannot send routing msgs as a Node until established.
                return Ok(());
            }
        };

        if route > 0 {
            trace!(
                "{} Resending Msg: {:?} via route: {} and src_section: {:?}",
                self,
                routing_msg,
                route,
                sending_sec
            );
        }

        let signed_msg = SignedMessage::new(routing_msg, &self.full_id, sending_sec)?;

        match self.get_signature_target(&signed_msg.routing_message().src, route) {
            None => Ok(()),
            Some(our_name) if our_name == *self.name() => {
                let min_section_size = self.min_section_size();
                if let Some((mut msg, route)) =
                    self.sig_accumulator
                        .add_message(signed_msg, min_section_size, route)
                {
                    if self.in_authority(&msg.routing_message().dst) {
                        self.handle_signed_message(msg, route, our_name, &BTreeSet::new())?;
                    } else {
                        self.send_signed_message(&mut msg, route, &our_name, &BTreeSet::new())?;
                    }
                }
                Ok(())
            }
            Some(target_name) => {
                if let Some(&pub_id) = self.peer_mgr.get_pub_id(&target_name) {
                    let direct_msg = signed_msg
                        .routing_message()
                        .to_signature(self.full_id.signing_private_key())?;
                    self.send_direct_message(pub_id, direct_msg);
                    Ok(())
                } else {
                    Err(RoutingError::RoutingTable(RoutingTableError::NoSuchPeer))
                }
            }
        }
    }

This comment has been minimized.

Copy link
@madadam

madadam Apr 25, 2019

Author Contributor

Same issue as #1616 (comment)

This comment has been minimized.

Copy link
@madadam

madadam May 2, 2019

Author Contributor

This isn't resolved yet.

This comment has been minimized.

Copy link
@madadam

madadam May 7, 2019

Author Contributor

See here.

@jeanphilippeD
Copy link

left a comment

All looks really great.
A couple of things that are a bit difficult to understand without help, but nothing major.

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch from ca95759 to 2a8a0a8 Apr 26, 2019

Show resolved Hide resolved src/states/node.rs Outdated

@madadam madadam closed this Apr 29, 2019

@madadam madadam deleted the madadam:node-state-refactor-part-two branch Apr 29, 2019

@madadam madadam restored the madadam:node-state-refactor-part-two branch Apr 29, 2019

@madadam madadam reopened this Apr 29, 2019

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch from 2a8a0a8 to 0e97a30 Apr 29, 2019

@madadam madadam changed the title (WIP) Refactor Node state: extract EstablishingNode state Refactor Node state: extract EstablishingNode state May 2, 2019

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch from 0e97a30 to f5253b4 May 2, 2019

@jeanphilippeD
Copy link

left a comment

Looked at last commit, it looks good, do you want to squash these last 2 changes together?

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch from 098e416 to 6317c3c May 3, 2019

@pierrechevalier83 pierrechevalier83 self-requested a review May 3, 2019

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch 2 times, most recently from df85895 to ea68e1e May 3, 2019

@pierrechevalier83
Copy link
Contributor

left a comment

I'm not done reviewing: I was a bit fast on the largest commit about extracting EstablishingNodeState, so will now focus on it.

Here are my comments for the other commits in the meantime, so we can parallelize our work 🙂

Show resolved Hide resolved src/state_machine.rs
Show resolved Hide resolved src/state_machine.rs
Show resolved Hide resolved src/state_machine.rs
// │ │ │ │ │ │
// Base * * * * * *
// Bootstrapped * * * * *
// NotEstablished * * * *

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

I was thinking the name wasn't very clear. Now seeing this, I think BootstrappedNotEstablished is a better name: consistent with RelocatedNotEstablished and more obviously excluding the Bootstraping state.

// ▼ ▼
// ┌────────┐ ┌──────┐
// │ Client │ │ Node │
// └────────┘ └──────┘
//
//
// # Common traits
// Bootstrapping

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

There is an inconsistency between Bootstrapping and RelocatingNode
Should we remove the Node suffix from Relocating, Proving and Establishing? The context of the state_machine is probably sufficient for every one to know these are node states.

This comment has been minimized.

Copy link
@madadam

madadam May 3, 2019

Author Contributor

Bootstrapping can become Client or Node, but the other states are all going to become only Node. I think it makes some sense to have this distinction. On the other hand I don't feel too strongly about it and having shorter names would be nice too. Your call.

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

Ah. I see the distinction now. Cheers :) I probably don't feel too strongly about it then. Mildly in favour of removing it still, but I really won't insist on it.

This comment has been minimized.

Copy link
@madadam

madadam May 7, 2019

Author Contributor

Hm, so I wonder: I can remove the Node suffix, but that would still make the Client and Node names inconsistent with the rest, as the rest are verbs in the present continuous tense (Bootstrapping, Relocating, Proving, ...), but Client and Node are nouns. I can maybe go the opposite way and rename Node and Client to EstablishedNode and EstablishedClient, but that would still leave Bootstrapping out as it can be both node and client. Not sure. Naming stuff is hard. Maybe best to leave this as is for now and maybe return to it in the future?

This comment has been minimized.

Copy link
@Viv-Rajkumar

Viv-Rajkumar May 7, 2019

Member

you could switch Bootstrapping -> BootstrappingPeer. we normally go with Peer when common to Node & Client but not like it was enforced.

pub(super) full_id: FullId,
pub(super) min_section_size: usize,
pub(super) timer: Timer,
action_sender: RoutingActionSender,

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

Wouldn't it make sense to just use composition to avoid repeating all these fields?

pub struct ProvingNode {
pub(super) ack_mgr: AckManager,

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

Wouldn't it make sense to just use composition to avoid repeating all these fields?

pub proxy_pub_id: PublicId,
pub timer: Timer,
}

pub struct RelocatingNode {
action_sender: RoutingActionSender,

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

Wouldn't it make sense to just use composition to avoid repeating all these fields?

pub routing_msg_filter: RoutingMessageFilter,
pub timer: Timer,
}

pub struct Node {
ack_mgr: AckManager,

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

Wouldn't it make sense to just use composition to avoid repeating all these fields?

pub proxy_pub_id: PublicId,
pub timer: Timer,
}

/// A node connecting a user to the network, as opposed to a routing / data storage node.
///
/// Each client has a _proxy_: a node through which all requests are routed.

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

Wouldn't it make sense to just use composition to avoid repeating all these fields?

@pierrechevalier83
Copy link
Contributor

left a comment

Once soak the tests pass and the existing comments are addressed, I'm good with this PR in general 😄

// ▼ ▼
// ┌────────┐ ┌──────┐
// │ Client │ │ Node │
// └────────┘ └──────┘
//
//
// # Common traits
// Bootstrapping

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 3, 2019

Contributor

Ah. I see the distinction now. Cheers :) I probably don't feel too strongly about it then. Mildly in favour of removing it still, but I really won't insist on it.

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch 2 times, most recently from 4518368 to f5ed47a May 7, 2019

@jeanphilippeD
Copy link

left a comment

Looks good.
Very big so we will have to trust soak test to find any issue.

@pierrechevalier83
Copy link
Contributor

left a comment

Looks good to me so far. Not approving yet as Viv had a comment about another improvement to do with fixing the tests. Let's hear him out first 😄

@@ -184,10 +184,21 @@ where
unimplemented!()
}

// TODO: rename this to `has_unpolled_observations`

This comment has been minimized.

Copy link
@pierrechevalier83

pierrechevalier83 May 7, 2019

Contributor

This isn't done yet because you would want to update Parsec to one that exposes the new API in the same PR?
I would add this to the comment.

This comment has been minimized.

Copy link
@madadam

madadam May 8, 2019

Author Contributor

Yes, I'd like to make the change in Parsec first, to avoid the situation where routing becomes uncompilable with --features=mock_base (that is, with real parsec).

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch from 4746613 to 2453690 May 8, 2019

madadam added some commits Apr 25, 2019

Minor fix in sending RoutingMessage as unestablished
Also move send_signed_message_to_peer from Relocated back to Node, as it is only used there.
Fix failures in node_drops, node_restart tests
- Try to reconnect to a dropped peer immediatelly, not delayed
- Change `has_unconsensused_observations` to `has_unpolled_observations` to prevent terminating the tests prematurely
- If node loses all connections, emit `RestartRequired` even if we are the first node (unless during network startup).
- Randomize the drop order in the node_restart test to make it more likely to catch bugs
Rename Bootstrapping to BootstrappingPeer
So all state names are nouns now, for consistency.
Make state conversions more explicit
Change the return type of the various `into_` function to `Result<State>` so in case of error we return `Err` instead of silently ignoring it and returning `State::Terminated`.

@madadam madadam force-pushed the madadam:node-state-refactor-part-two branch from 2453690 to 2dd16fb May 8, 2019

@jeanphilippeD
Copy link

left a comment

Reviewed the push forces and the Fix failures in node_drops, node_restart tests commit

@Viv-Rajkumar Viv-Rajkumar merged commit bb1e692 into maidsafe:fleming May 9, 2019

2 checks passed

Travis CI - Pull Request Build Passed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@madadam madadam deleted the madadam:node-state-refactor-part-two branch May 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.