-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p-governor: fix various P2P governor bugs (with tests) #2526
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in relation to other similar bugs, I'd like us to adjust the tests so that they can catch these bugs, and then fix them.
@dcoutts by the end of the day I will put here all the changes that now in |
fd3769d
to
dadfabd
Compare
b9ac698
to
b8a8f19
Compare
b8a8f19
to
8a1cc19
Compare
2658: Network version: NodeToNodeV_4 r=coot a=coot `NodeToNodeV_4` extends NodeToNodeVersionData with `DiffusionMode` flag. This flag decides weather a node runs responder or not and is presented when negotiation connection options using `Handshake` protocol. It is also passed through `DiffusionArguments`: if `IitiatorOnlyDiffusionMode` is used, the diffusion will not run the responder. `NodeToNodeV_4` is not used by the node yet. It can be taken by `p2p-node` (in PR #2526) or by consensus whoever will need it first. Handshake is extended to return 'agreedOptions'. For 'NodeToNode' we then abuse it by allowing to escape with the existencial 'vData'. This is because in near future we will refactor the handshake which does not need any more its dependent type machinery. Instead of it we use version aware codec of 'NodeToNodeVersionData', which this PR introduces (together with tests). Note that `InitiatorAndResponderDiffusionMode` will be enforced by the codec if the version is strictly lower than 'NodeToNodeV_4'. Co-authored-by: Marcin Szamotulski <profunctor@pm.me>
179331e
to
163162d
Compare
163162d
to
76b2207
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally really good, and a very cleanly constructed series of patches, which made it much easier to review than many PRs I've seen.
I think most of my comments are about the EstablishedPeers stuff. We could probably merge the earlier patches pretty quickly if that's easier.
let (failCount, knownPeers') = KnownPeers.incrementFailCount | ||
peeraddr | ||
(knownPeers st) | ||
|
||
-- exponential backoff: 5s, 10s, 20s, 40s, 80s, 160s. | ||
delay :: DiffTime | ||
delay = fromIntegral $ | ||
2 ^ (pred failCount `min` maxColdPeerRetryBackoff) * baseColdPeerRetryDiffTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we structure this so it's more similar to the existing backoff code in the RootPeers job handler? It'd make it easier to see that we're doing the same thing, or what the differences the the parameters are.
Why do we want to pick 2^5 * 5 = 160 as the max time limit? For the root peers backoff we backed off for a max of around an hour. I'm not saying this is wrong, just that we should probably say something about why we chose these numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we structure this so it's more similar to the existing backoff code in the RootPeers job handler? It'd make it easier to see that we're doing the same thing, or what the differences the the parameters are.
You'd like to count number of consecutive failures or successes? This could be a useful metric.
So instead incrementFailCount
/ resetFailCount
we'll use decrementBackoff
/ incrementBackoff
.
Why do we want to pick 2^5 * 5 = 160 as the max time limit? For the root peers backoff we backed off for a max of around an hour. I'm not saying this is wrong, just that we should probably say something about why we chose these numbers.
Let's see: if we have 100 cold peers and 30% is offline, then will will try to reconnect to one of them every: 160/30=5.33s (assuming that the policy keeps asking for the 30% offline peers and assuming that the governor is constantly looking for more warm peers). Do you think we should have a slower asymptotic behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really just meant the structure of the code should be similar, since as far as I understand it they're using the same way of counting failure and same backoff strategy.
Lets include that explanation in the code next to where we pick the backoff parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed refactoring in 1f90e16. I think it makes sense to use decrementBackoff
nad incrementBackoff
.
ouroboros-network/src/Ouroboros/Network/PeerSelection/KnownPeers.hs
Outdated
Show resolved
Hide resolved
ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor/Types.hs
Outdated
Show resolved
Hide resolved
|
||
|
||
prop_governor_overlapping_local_and_root_peers :: Property | ||
prop_governor_overlapping_local_and_root_peers = prop_governor_nolivelock env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So perhaps this was a bug then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unpatched assertion indeed fails on this test case, so the commit message is right the local and root peers might overlap. I checked that in this test case it is true that local and public peers of env :: GovernorMockEnvironment
intersect. So it's actually the test generator that should be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure I follow.
Which bug did this test case check for? Which patch fixed it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I am not totally sure why I originally included this test case. The environment in this test case contain non trivial intersection of local and public root peers and the test succeeds. This means that the governor keeps local and root peers sets non-overlapping.
ouroboros-network/src/Ouroboros/Network/PeerSelection/KnownPeers.hs
Outdated
Show resolved
Hide resolved
ouroboros-network/src/Ouroboros/Network/PeerSelection/KnownPeers.hs
Outdated
Show resolved
Hide resolved
ouroboros-network/src/Ouroboros/Network/PeerSelection/KnownPeers.hs
Outdated
Show resolved
Hide resolved
ouroboros-network/src/Ouroboros/Network/PeerSelection/KnownPeers.hs
Outdated
Show resolved
Hide resolved
| Map.null (EstablishedPeers.establishedReady establishedPeers) | ||
= GuardedSkip (Min <$> EstablishedPeers.minActivateTime establishedPeers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so. This whole thing with a time delay on the re-promotion from warm to hot is rather crude.
It's just one fixed delay, when in the end what we'd want is to look at the tip sample protocol result and only consider promoting if that tells us that the node might be a good hot candidate again.
I mean it's not terrible, and it's probably ok for now, but we should think about the longer term too. Perhaps what we want is to allow the policy for picking warm peers to promote to be allowed to make no progress, if none of the current warm peers are suitable. Then we could have a policy that relies on looking at the tip sample results.
If we went that direction we'd need a governor change to let it deal with warm->hot picking policies that do not always make progress. It'd need some back-off and retry there, so it can try again once we have some more info or some new candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole thing with a time delay on the re-promotion from warm to hot is rather crude.
What is progress? A few possible choices:
- good progress: adding a peer from whom we have a chance of getting a new header / block fastest with some reasonable chance of success
- fake progress: adding a peer from whom we have little chance of getting a new header before getting it from any other peer.
- governor progress: progress from the governor perspective (I think this is the meaning you're referring to)
Is fake progress a bad thing? Maybe not: having more hot peers allows transactions to flow faster between peers, and since SPOs are prepared that all their hot peers are good (both can deliver blocks and tx-s), they will be prepared for fake ones too. But on the other side, if we organize the network based on dissemination of headers / blocks, the transactions should have similarly good propagation characteristics (they flow in different direction but using the same path as headers). So we don't need to worry about them.
As of governor progress I think we should allow no progress for the above reasons.
Technically it shouldn't be difficult to add additional metrics to KnownPeerInfo
available in policies that can be read from an external store, it just needs an STM
interface, but indeed we'd need to extend the governor side. When no progress is made for a longer period, maybe it should do more aggressive churn of warm peers. But we should avoid making it too complex.
It'd need some back-off and retry there, so it can try again once we have some more info or some new candidates.
The current implementation will retry as long as its under its targets and as soon as there are available warm peers. Are you thinking about changing the rate of governor adaptation? The information about peers will slowly change, and we want the governor to adapt at a similar scale and it could be slower than the changes in the environment (connection quality fluctuations, peers becoming busy at times, etc).
1f90e16
to
15a429b
Compare
It's good to keep peer state changes callbacks in a seprate recrod, they will likely not change much as we go towards decentralisation.
This is only enabled in the re-enabled `prop_governor_nolivelock`. Running the test throws an exception (failed assertion), this is a bug fixed in the next commit. If a peer during asynchronous demotion is demoted to 'ColdPeer', we interpret this as an error and throw an exception. This corresponds to a real scenario. This is important because otherwise the governor will return and finish the transition without running its error handlers which bring the governor to the right state. The above corresponds to the property of 'PeerStateActions': on failures the peer state changes to 'PeerCold'.
'setConnectTime' is supposed to update 'availableToConnect' and `nextConnectTimes' rather than 'availableForGossip' and 'nextGossipTimes'.
When checking if we need to udate the state, we need to take 'nextConnectTimes' into account.
182ae16
to
a37383f
Compare
When a connection errored, the governor will see it as a demotion to a cold peer. In such a case we update re-connect time as well as increment fail count.
The connection monitoring needs to update inProgressPromoteWarm, by removing peers that fail to be promoted.
It's more useful to get assertion failures where the state changes. This is only done for some of the state changes.
Use Debug.Trace to annotate the invariant. This gives a better debugging information.
EstablisehdPeers keeps track of established peer connection and their status. It is using the same logic as KnownPeers to track peers which are not ready to promote to hot state. This patch just replaces the original logic. This is just a preparation for the following patch.
When `chain-sync` returns, we use `activateDelay :: DiffTime` to delay re-promotion of that peer to `hot` again.
TracePromoteWarmFailed - show target & number of active peers TracePromoteWarmDone - show target & number of active peers TraceDemoteHotFailed - show target & number of active peers TraceDemoteHotDone - show target & number of active peers TracePromoteColdFailed - show target & number of established peers TracePromoteColdDone - show target & number of established peers TraceDemoteWarmFailed - show target & number of established peers TraceDemoteWarmDone - show target & number of established peers
We do not want to check that the available sets are non-empty since these conditions are checked at the call sites, and the condition at the call sites are more subtle, since it's other subsets that have to be considered.
To match names used in other abstractions like KnownPeers.
The EstablishedPeers contained a Map that kept track of the cold/warm/hot status of each peer. This is redundant information since it can be derived from the known, established and active peer sets. We can simplify things a bit by eliminating the tracking of this redundant information. The info was only being used in one place in the implementation (to determine the async demotions) and in one test (to compare the environment and governor's views of the peers status). So those places have been changed to use the established and active peer sets.
We do not use it in any existing policy. If we need it, we can derive the info by other means, and do so on demand, as policies need it. We do not need to maintain it incrementally.
Turns out we don't currently use any of the KnownPeerInfo in our policies.
We do not use it in any existing policy. If we need it, we can derive the info by other means, and do so on demand, as policies need it. We do not need to maintain it incrementally. This also fixes a bug in the way were were incrementally maintaining the knownPeerAdvertise info. Not tracking it at all is the simple solution.
So also remove KnownPeers.toMap and switch the last remaining uses of it over to toSet.
a37383f
to
3d3068b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a few changes on top of Marcin's original:
- I've dropped the patch that relaxed the invariant to knownPeerSource. I wasn't happy with relaxing the invariant, since my view is that it was simply accepting that the results were wrong. So that also dropped the huge unit test (a saved test that QC found)
- I've removed most of the info in the KnownPeerInfo since it was redundant. It was incrementally maintaining info that we could calculate from other canonical state. For the peer source it was being maintained incorrectly (the above point). So that removed the peer source and the advertising state.
- I've stopped passing the
KnownPeerInfo
to the pick policy functions. We can add any of this info back if we need it later, but in an on-demand style as an attribute lookup function. - This meant adjusting the "available peers" calculations to use sets rather than maps.
@@ -92,6 +92,7 @@ belowTarget actions | |||
-- If we could connect except that there are no peers currently available | |||
-- then we return the next wakeup time (if any) | |||
| numEstablishedPeers + numConnectInProgress < targetNumberOfEstablishedPeers | |||
, Set.null (availableToConnect Set.\\ inProgressOrEstablishedPeers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this because it is redundant.
The guard above was
| numEstablishedPeers + numConnectInProgress < targetNumberOfEstablishedPeers
, Set.size availableToConnect - numEstablishedPeers - numConnectInProgress > 0
and our guard here repeats the first clause. So we know already that the second clause of the guard is false. Thus we know that
Set.size availableToConnect - numEstablishedPeers - numConnectInProgress = 0
(it cannot be negative since the established and in-progress ones are a subset of the available ones).
But what is this guard?
Set.null (availableToConnect Set.\\ inProgressOrEstablishedPeers)
Isn't it exactly saying that the number available to connect, minus the ones that are established or in progress is zero?
Again, we know this because the established and in-progress are a subset of the available, and they don't overlap with each other. So in this case counting becomes equivalent to the set operations.
And so this is exactly the negation of the previous guard, which we already knew!
QED.
-- * the script returns 'Noop' | ||
-- * peer demoted to 'PeerCold' | ||
-- + the script returns 'Noop' | ||
-- + peer demoted to 'PeerCold' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I heard there are plans to fix this in haddock :), it will just generate a warning.
btw, the policies used in |
LGTM from me (I cannot approve as I am the author of the pr). |
bors merge |
Build succeeded: |
No description provided.