Refactor Node state: extract EstablishingNode state #1616
Refactor Node state: extract EstablishingNode state #1616
Conversation
e92e9ff
to
ca95759
Compare
} | ||
} | ||
|
||
fn handle_hop_message( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at node.rs handle_hop_message it is not clear to me how we have that implementation:
fn handle_hop_message(
&mut self,
hop_msg: HopMessage,
pub_id: PublicId,
_: &mut EventBox,
) -> Result<Transition, RoutingError> {
hop_msg.verify(pub_id.signing_public_key())?;
let mut client_ip = None;
let mut hop_name_result = match self.peer_mgr.get_peer(&pub_id).map(Peer::state) {
Some(&PeerState::Bootstrapper { .. }) => {
warn!(
"{} Hop message received from bootstrapper {:?}, disconnecting.",
self, pub_id
);
Err(RoutingError::InvalidStateForOperation)
}
Some(&PeerState::Client { ip, .. }) => {
client_ip = Some(ip);
Ok(*self.name())
}
Some(&PeerState::JoiningNode) => Ok(*self.name()),
Some(&PeerState::Candidate) | Some(&PeerState::Proxy) | Some(&PeerState::Routing) => {
Ok(*pub_id.name())
}
Some(&PeerState::ConnectionInfoPreparing { .. })
| Some(&PeerState::ConnectionInfoReady(_))
| Some(&PeerState::CrustConnecting)
| Some(&PeerState::Connected)
| None => {
if self.dropped_clients.contains_key(&pub_id) {
debug!(
"{} Ignoring {:?} from recently-disconnected client {:?}.",
self, hop_msg, pub_id
);
return Ok(Transition::Stay);
} else {
Ok(*self.name())
// FIXME - confirm we can return with an error here by running soak tests
// debug!("{} Invalid sender {} of {:?}", self, pub_id, hop_msg);
// return Err(RoutingError::InvalidSource);
}
}
};
if let Some(ip) = client_ip {
match self.check_valid_client_message(&ip, hop_msg.content.routing_message()) {
Ok(added_bytes) => {
self.proxy_load_amount += added_bytes;
self.peer_mgr.add_client_traffic(&pub_id, added_bytes);
}
Err(e) => hop_name_result = Err(e),
}
}
match hop_name_result {
Ok(hop_name) => {
let HopMessage {
content,
route,
sent_to,
..
} = hop_msg;
self.handle_signed_message(content, route, hop_name, &sent_to)
.map(|()| Transition::Stay)
}
Err(RoutingError::ExceedsRateLimit(hash)) => {
trace!(
"{} Temporarily can't proxy messages from client {:?} (rate-limit hit).",
self,
pub_id
);
self.send_direct_message(
pub_id,
DirectMessage::ProxyRateLimitExceeded {
ack: Ack::compute(hop_msg.content.routing_message())?,
},
);
Err(RoutingError::ExceedsRateLimit(hash))
}
Err(error) => {
self.ban_and_disconnect_peer(&pub_id);
Err(error)
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I will have to confirm this with @Viv-Rajkumar or someone, but I assumed that while in the establishing state, the node doesn't need to handle any of those things yet. That assumption might have been wrong though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't resolved yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After spending some time on this I'm now about 90% convinced this is actually correct. Although it's a logic change I believe the way this was handled before was actually somehow buggy (although in a harmless way) because until a node is established, it isn't supposed to handle these things. That's why I'd prefer to keep things as they are now. Worst case, this will pop up in the soak testing at which point I'll change it back. I'm going to at least mention this in the PR description.
&mut self.timer | ||
} | ||
|
||
fn send_routing_message_via_route( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, looking at node.rs, not clear how we come to this implmenation.
// Constructs a signed message, finds the node responsible for accumulation, and either sends
// this node a signature or tries to accumulate signatures for this message (on success, the
// accumulator handles or forwards the message).
fn send_routing_message_via_route(
&mut self,
routing_msg: RoutingMessage,
route: u8,
expires_at: Option<Instant>,
) -> Result<(), RoutingError> {
if !self.in_authority(&routing_msg.src) {
if route == 0 {
log_or_panic!(
LogLevel::Error,
"{} Not part of the source authority. Not sending message {:?}.",
self,
routing_msg
);
}
return Ok(());
}
if !self.add_to_pending_acks(&routing_msg, route, expires_at) {
debug!(
"{} already received an ack for {:?} - so not resending it.",
self, routing_msg
);
return Ok(());
}
use crate::routing_table::Authority::*;
let sending_sec = match routing_msg.src {
ClientManager(_) | NaeManager(_) | NodeManager(_) | ManagedNode(_) | Section(_)
if self.chain.is_member() =>
{
Some(self.chain.our_info().clone())
}
PrefixSection(ref pfx) if self.chain.is_member() => {
let src_section = match self.chain.our_info_for_prefix(pfx) {
Some(a) => a.clone(),
None => {
// Can no longer represent sending Pfx.
return Ok(());
}
};
Some(src_section)
}
Client { .. } => None,
_ => {
// Cannot send routing msgs as a Node until established.
return Ok(());
}
};
if route > 0 {
trace!(
"{} Resending Msg: {:?} via route: {} and src_section: {:?}",
self,
routing_msg,
route,
sending_sec
);
}
let signed_msg = SignedMessage::new(routing_msg, &self.full_id, sending_sec)?;
match self.get_signature_target(&signed_msg.routing_message().src, route) {
None => Ok(()),
Some(our_name) if our_name == *self.name() => {
let min_section_size = self.min_section_size();
if let Some((mut msg, route)) =
self.sig_accumulator
.add_message(signed_msg, min_section_size, route)
{
if self.in_authority(&msg.routing_message().dst) {
self.handle_signed_message(msg, route, our_name, &BTreeSet::new())?;
} else {
self.send_signed_message(&mut msg, route, &our_name, &BTreeSet::new())?;
}
}
Ok(())
}
Some(target_name) => {
if let Some(&pub_id) = self.peer_mgr.get_pub_id(&target_name) {
let direct_msg = signed_msg
.routing_message()
.to_signature(self.full_id.signing_private_key())?;
self.send_direct_message(pub_id, direct_msg);
Ok(())
} else {
Err(RoutingError::RoutingTable(RoutingTableError::NoSuchPeer))
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as #1616 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't resolved yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks really great.
A couple of things that are a bit difficult to understand without help, but nothing major.
ca95759
to
2a8a0a8
Compare
2a8a0a8
to
0e97a30
Compare
0e97a30
to
f5253b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at last commit, it looks good, do you want to squash these last 2 changes together?
098e416
to
6317c3c
Compare
df85895
to
ea68e1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not done reviewing: I was a bit fast on the largest commit about extracting EstablishingNodeState
, so will now focus on it.
Here are my comments for the other commits in the meantime, so we can parallelize our work 🙂
src/states/mod.rs
Outdated
// │ │ │ │ │ │ | ||
// Base * * * * * * | ||
// Bootstrapped * * * * * | ||
// NotEstablished * * * * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the name wasn't very clear. Now seeing this, I think BootstrappedNotEstablished
is a better name: consistent with RelocatedNotEstablished and more obviously excluding the Bootstraping
state.
src/states/mod.rs
Outdated
// ▼ ▼ | ||
// ┌────────┐ ┌──────┐ | ||
// │ Client │ │ Node │ | ||
// └────────┘ └──────┘ | ||
// | ||
// | ||
// # Common traits | ||
// Bootstrapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an inconsistency between Bootstrapping
and RelocatingNode
Should we remove the Node
suffix from Relocating
, Proving
and Establishing
? The context of the state_machine is probably sufficient for every one to know these are node states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bootstrapping
can become Client
or Node
, but the other states are all going to become only Node
. I think it makes some sense to have this distinction. On the other hand I don't feel too strongly about it and having shorter names would be nice too. Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I see the distinction now. Cheers :) I probably don't feel too strongly about it then. Mildly in favour of removing it still, but I really won't insist on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so I wonder: I can remove the Node
suffix, but that would still make the Client
and Node
names inconsistent with the rest, as the rest are verbs in the present continuous tense (Bootstrapping
, Relocating
, Proving
, ...), but Client
and Node
are nouns. I can maybe go the opposite way and rename Node
and Client
to EstablishedNode
and EstablishedClient
, but that would still leave Bootstrapping
out as it can be both node and client. Not sure. Naming stuff is hard. Maybe best to leave this as is for now and maybe return to it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could switch Bootstrapping
-> BootstrappingPeer
. we normally go with Peer
when common to Node & Client but not like it was enforced.
pub(super) full_id: FullId, | ||
pub(super) min_section_size: usize, | ||
pub(super) timer: Timer, | ||
action_sender: RoutingActionSender, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to just use composition to avoid repeating all these fields?
src/states/proving_node.rs
Outdated
pub struct ProvingNode { | ||
pub(super) ack_mgr: AckManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to just use composition to avoid repeating all these fields?
pub proxy_pub_id: PublicId, | ||
pub timer: Timer, | ||
} | ||
|
||
pub struct RelocatingNode { | ||
action_sender: RoutingActionSender, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to just use composition to avoid repeating all these fields?
pub routing_msg_filter: RoutingMessageFilter, | ||
pub timer: Timer, | ||
} | ||
|
||
pub struct Node { | ||
ack_mgr: AckManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to just use composition to avoid repeating all these fields?
pub proxy_pub_id: PublicId, | ||
pub timer: Timer, | ||
} | ||
|
||
/// A node connecting a user to the network, as opposed to a routing / data storage node. | ||
/// | ||
/// Each client has a _proxy_: a node through which all requests are routed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to just use composition to avoid repeating all these fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once soak the tests pass and the existing comments are addressed, I'm good with this PR in general 😄
src/states/mod.rs
Outdated
// ▼ ▼ | ||
// ┌────────┐ ┌──────┐ | ||
// │ Client │ │ Node │ | ||
// └────────┘ └──────┘ | ||
// | ||
// | ||
// # Common traits | ||
// Bootstrapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I see the distinction now. Cheers :) I probably don't feel too strongly about it then. Mildly in favour of removing it still, but I really won't insist on it.
4518368
to
f5ed47a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Very big so we will have to trust soak test to find any issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me so far. Not approving yet as Viv had a comment about another improvement to do with fixing the tests. Let's hear him out first 😄
@@ -184,10 +184,21 @@ where | |||
unimplemented!() | |||
} | |||
|
|||
// TODO: rename this to `has_unpolled_observations` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't done yet because you would want to update Parsec to one that exposes the new API in the same PR?
I would add this to the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd like to make the change in Parsec first, to avoid the situation where routing becomes uncompilable with --features=mock_base
(that is, with real parsec).
4746613
to
2453690
Compare
- Try to reconnect to a dropped peer immediatelly, not delayed - Change `has_unconsensused_observations` to `has_unpolled_observations` to prevent terminating the tests prematurely - If node loses all connections, emit `RestartRequired` even if we are the first node (unless during network startup). - Randomize the drop order in the node_restart test to make it more likely to catch bugs
Also move send_signed_message_to_peer from Relocated back to Node, as it is only used there.
So all state names are nouns now, for consistency.
Change the return type of the various `into_` function to `Result<State>` so in case of error we return `Err` instead of silently ignoring it and returning `State::Terminated`.
2453690
to
2dd16fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the push forces and the Fix failures in node_drops, node_restart tests
commit
This is a refactoring PR with almost no functional changes. Exceptions are some minor changes for the sake of simplicity, clarity or performance. They are listed below:
EstablishingNode
receivesNodeApproval
, it backlogs it and once it becomesNode
it logs and ignores it. Previously it would immediately log and ignore it.EstablishingNode
,handle_hop_message
andsend_routing_message_via_route
now work the same way as inProvingNode
as opposed to the way they work inNode
. This makes it more explicit what the responsibilities of each of the states are.