Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull-mode flooding #3479

Merged
merged 11 commits into from
Sep 13, 2022
Merged

Conversation

hidenori-shinohara
Copy link
Contributor

Description

Resolves #3384.

Based on #2432. Some differences in the design are:

  • Implementation of flow control for adverts and demands
  • Use of SHA-256 to hash txns.
  • Round-robin when creating demands

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@marta-lokhova marta-lokhova self-requested a review July 7, 2022 22:04
Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a preliminary pass, thanks for putting this together! Couple of questions, and I think we need to move the code around to appropriate modules.

src/overlay/Peer.h Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Floodgate.cpp Outdated Show resolved Hide resolved
src/overlay/Floodgate.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Show resolved Hide resolved
src/overlay/Floodgate.cpp Outdated Show resolved Hide resolved
src/overlay/Floodgate.h Outdated Show resolved Hide resolved
src/overlay/Floodgate.cpp Outdated Show resolved Hide resolved
src/overlay/Floodgate.h Outdated Show resolved Hide resolved
src/overlay/Floodgate.h Outdated Show resolved Hide resolved
Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the updates! I did another round, and I think there are a few things that can be causing instabilities you're seeing

src/overlay/Peer.h Show resolved Hide resolved
src/overlay/Floodgate.cpp Outdated Show resolved Hide resolved
src/overlay/Floodgate.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 4 times, most recently from 98adf39 to c238aff Compare July 25, 2022 14:35
@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 3 times, most recently from fd66a5b to d8965a6 Compare July 26, 2022 21:18
src/overlay/OverlayMetrics.cpp Outdated Show resolved Hide resolved
src/overlay/OverlayMetrics.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/test/OverlayTests.cpp Show resolved Hide resolved
src/overlay/Floodgate.cpp Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.h Outdated Show resolved Hide resolved
src/overlay/Floodgate.h Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
@MonsieurNicolas MonsieurNicolas added this to In progress in v19.4.0 Jul 28, 2022
@hidenori-shinohara hidenori-shinohara marked this pull request as ready for review August 1, 2022 18:24
@hidenori-shinohara hidenori-shinohara changed the title Draft: Pull-mode flooding Pull-mode flooding Aug 1, 2022
Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good to me, thank you! Just a few minor things. At this point I think @MonsieurNicolas should take a look.

I have one remaining question regarding tx queue flooding that I'd like to get input on. Right now because we have a timer for adverts and a timer for tx flooding, we may end up in an awkward situation where advert flooding is delayed by FLOOD_TX_PERIOD_MS + FLOOD_ADVERT_PERIOD_MS in the worst case scenario, which doesn't seem ideal, but I'm not sure how big of a problem it is. The reason is that we wanted tx advertising to be self-contained in overlay. Otherwise, tx queue needed to trigger advert flushing manually, which would violate abstraction barriers. Touching the tx queue timer itself didn't seem like an option as well, since there's non-trivial functionality to order txs there.

src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.h Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/main/Config.cpp Outdated Show resolved Hide resolved
@marta-lokhova
Copy link
Contributor

a few more things:

  • we're missing the outbound queue drop metrics in this PR. I made the change for testing yesterday, feel free to just grab this commit: marta-lokhova@7bca3c3
  • based on what we discussed, I think doing pure round-robin is not ideal for the demand logic. Specifically, I'm thinking about the current network conditions: we have a limit of 1000 operations per ledger, with 40 ops broadcasted every 200 ms. This means that nodes with many connections (which is very common on the network) may end up sending a lot of small demands to its peers. This defeats the purpose of batching, and can get inefficient. To avoid this, I think it's better to iterate over all shuffled peers, and for each peer either fill the demand completely or until it has no more hashes. This way we have proper batching regardless of the number of connections, and load balancing due to peer randomization.

@hidenori-shinohara
Copy link
Contributor Author

hidenori-shinohara commented Aug 3, 2022

I updated the demand function so it'll take advantage of both shuffling and batching with a simpler implementation. @marta-lokhova let me know what you think

@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 2 times, most recently from bc35de7 to 4b2c8b1 Compare August 3, 2022 17:52
Copy link
Contributor

@MonsieurNicolas MonsieurNicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed most of the code. Looks pretty good. Comments are mostly around cleanup.

// Pull mode
case FLOOD_ADVERT:
FloodAdvert floodAdvert;
case FLOOD_DEMAND:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting an error trying to pull your branch because of the commit referenced by the protocol-next xdr.

fatal: Fetched in submodule path 'src/protocol-next/xdr', but it did not contain c59b1a13a587aa5aaf3bed97c01a5de559ff6650. Direct fetching of that commit failed.

I think we should just merge the xdr changes in master in xdr next when we're ready to merge this PR, that way we can point to a stable commit hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to open a PR in xdr vNext repo and bump version submodule in this PR after the other one is merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a PR stellar/stellar-xdr#24 (not ready to be merged yet)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge that other PR (ahead of merging this change) -> 1 action item in that PR for you @hidenori-shinohara

docs/stellar-core_example.cfg Outdated Show resolved Hide resolved
docs/stellar-core_example.cfg Outdated Show resolved Hide resolved
src/herder/TransactionQueue.cpp Outdated Show resolved Hide resolved
src/herder/Herder.h Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Floodgate.cpp Outdated Show resolved Hide resolved
src/overlay/test/FloodTests.cpp Show resolved Hide resolved
src/overlay/OverlayMetrics.cpp Show resolved Hide resolved
src/herder/Herder.h Show resolved Hide resolved
src/herder/TxQueueLimiter.h Outdated Show resolved Hide resolved
src/overlay/OverlayManager.h Show resolved Hide resolved
src/overlay/OverlayManagerImpl.h Show resolved Hide resolved
src/overlay/OverlayManagerImpl.h Outdated Show resolved Hide resolved
src/main/Config.cpp Outdated Show resolved Hide resolved
@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 3 times, most recently from a7848ba to bdded33 Compare August 31, 2022 20:22
src/overlay/OverlayManagerImpl.h Outdated Show resolved Hide resolved
src/protocol-curr/xdr/Stellar-overlay.x Outdated Show resolved Hide resolved
docs/stellar-core_example.cfg Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/test/TxAdvertQueueTests.cpp Show resolved Hide resolved
while (mAdvertQueueTxHashCount > limit)
{
dropped++;
mAdvertQueueTxHashCount -=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you made it signed instead of just following the pattern above: you still need to add asserts in the code base, and now you also introduce a bunch of warnings because int and size_t don't mix well

src/overlay/Peer.cpp Show resolved Hide resolved
OverlayManagerImpl::getMaxDemandSize() const
{
auto const& cfg = mApp.getConfig();
double ledgerCloseTime =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting warning: int64->double conversion (need explicit cast to avoid it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you used long here, but now the warning is int64 -> long; as you're now using bigDivide you can just use int64 here.
(same problem in getMaxAdvertSize)

src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Show resolved Hide resolved
src/overlay/OverlayManagerImpl.cpp Outdated Show resolved Hide resolved
src/overlay/test/TxDemandQueueTests.cpp Outdated Show resolved Hide resolved
src/util/test/TimerTests.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@MonsieurNicolas MonsieurNicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks very close!

{
msgQInd = 2;
size_t s = msg->floodDemand().txHashes.size();
releaseAssert(s <= INT32_MAX - mDemandQueueTxHashCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condition is wrong (wrong boundary), and I'm still getting warnings here because you made mDemandQueueTxHashCount an uint32 and the compiler does not like += size_t. We can remove this assert and use size_t instead (pushed a commit for you with those changes as we've been bouncing over warnings that you cannot see)

while (mAdvertQueueTxHashCount > limit)
{
dropped++;
mAdvertQueueTxHashCount -=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still getting warnings because you're using uint32 and doing things like u32 -= u64. I am pushing an update to your branch. there is no need to use u32 when u64 is perfectly fine here (and we don't need to worry about overflow)

@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 3 times, most recently from 80c9c47 to 01ed73e Compare September 2, 2022 21:03
}
},
"sendTxDemand");
getOverlayMetrics().mMessagesDemanded.Mark(demands.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demans was moved from, so this metric always reports 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed it for CLOG_TRACE above when I was looking at logs, but I completely missed this. Thank you! Fixed it.

@marta-lokhova
Copy link
Contributor

r+ 88da68c

@latobarita latobarita merged commit 0d0cdc8 into stellar:master Sep 13, 2022
v19.4.0 automation moved this from In progress to Done Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

transaction queues: pull mode
5 participants