Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus Refactor #1941

Closed
wants to merge 43 commits into from

Conversation

Projects
None yet
7 participants
@bachase
Copy link
Contributor

bachase commented Dec 21, 2016

This PR encompasses the next phase in the consensus refactoring effort (RIPD-1011/RIPD-1342). As a reminder, the goal is to create a generic version of the consensus algorithm for use in a testing and simulation framework. This phase achieves that goal with a modest amount of refactoring and introduces some proof of concept unit tests. The current design can certainly be improved, but I am deferring on more substantial refactoring until more robust testing is in place. As such, please focus primarily on the correctness of the changes below. Suggestions on improving the design are appreciated too, but depending on the scope, I may handle them in a subsequent refactoring.

I've tried to break each changeset into a palatable chunk, either migrating one piece of functionality to the new design or refactoring specific behavior. The unit tests and documentation are added in the final two changesets.

At a high-level,

  • Generic consensus code is in src/ripple/consensus
    • Consensus<Derived,Traits> is the generic implementation and uses CRTP to define the interface
      for specific
    • The Traits class defines types representing transactions, transactions sets, ledgers and proposals that are operated on in the generic code.
  • Ripple consensus implementation is in src/ripple/app/consensus
    • RCLConsensus : public Consensus<RCLConsensus, RCLCxTraits> implements the CRTP interface to hook into rippled application
    • RCLCxTx, RCLCxTxSet, RCLCxLedger and LedgerProposal are the types defined in RCLCxTraits and conform to type requirements set by generic Consensus.
  • The current functionality in LedgerConsensusImp and ConsensusImp is migrated to the new framework, with generic functionality in Consensus and application-specific functionality in RCLConsensus.

Doc preview: http://bachase.github.io/ripd1011/index.html

@bachase bachase requested a review from JoelKatz Dec 21, 2016

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Dec 21, 2016

Codecov Report

Merging #1941 into develop will increase coverage by 0.75%.
The diff coverage is 72.59%.

@@             Coverage Diff             @@
##           develop    #1941      +/-   ##
===========================================
+ Coverage    67.25%   68.01%   +0.75%     
===========================================
  Files          683      681       -2     
  Lines        49231    49224       -7     
===========================================
+ Hits         33112    33479     +367     
+ Misses       16119    15745     -374
Impacted Files Coverage Δ
src/ripple/app/main/Application.cpp 60.26% <ø> (-0.11%)
src/ripple/app/misc/Validations.cpp 68.3% <ø> (ø)
src/ripple/app/ledger/Ledger.cpp 81.36% <ø> (ø)
src/ripple/app/misc/NetworkOPs.h 100% <ø> (ø)
src/ripple/overlay/impl/PeerImp.h 0% <ø> (ø)
src/ripple/protocol/STValidation.h 52.38% <ø> (ø)
src/ripple/app/consensus/RCLCxPeerPos.h 0% <0%> (ø)
src/ripple/app/consensus/RCLCxPeerPos.cpp 3.44% <0%> (ø)
src/ripple/overlay/impl/PeerImp.cpp 0% <0%> (ø)
src/ripple/app/ledger/impl/LedgerMaster.cpp 43.86% <0%> (-0.78%)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 460dd8f...88766d5. Read the comment docs.

@bachase bachase force-pushed the bachase:consensus_prep branch from 628d3b6 to b4a0c83 Dec 21, 2016

@nbougalis
Copy link
Member

nbougalis left a comment

I reviewed each commit independently, so it's possible some of my comments were addressed in subsequent commits.

No show-stoppers that I could see.

auto peerCutoff = app_.timeKeeper().closeTime();
auto ourCutoff = peerCutoff - PROPOSE_INTERVAL;
peerCutoff -= PROPOSE_FRESHNESS;
auto peerCutoff = now_ - PROPOSE_FRESHNESS;

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

This should be const

@@ -938,7 +943,6 @@ void LedgerConsensusImp<Traits>::accept (TxSet_t const& set)
// suppress it if we receive it - FIXME: wrong suppression
app_.getHashRouter ().addSuppression (signingHash);
app_.getValidations ().addValidation (v, "local");
consensus_.setLastValidation (v);
Blob validation = v->getSigned ();

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

General comment: getSigned is poorly named. It gets a serialized version of whatever is in v. It's got nothing to do with anything "signed".

This comment has been minimized.

Copy link
@bachase

bachase Dec 27, 2016

Author Contributor

This is the only usage of that member; switching to asBlob for now.

This comment has been minimized.

Copy link
@bachase

bachase Dec 27, 2016

Author Contributor

Actually, getSerialized makes more sense for now.


MutableRCLTxSet (RCLTxSet const&);
/** Constructr

This comment has been minimized.

Copy link
@nbougalis
//==============================================================================
#include <BeastConfig.h>

#include <ripple/consensus/LedgerTiming.cpp>

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

Missing newline at EOF

return mCurrentHash;
}
uint256 const& getPrevLedger () const
Blob const& getSignature () const

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

Please consider using Buffer for the member variable, and Slice for the return type.

private:
Derived &
impl()

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

Indentation

if (auto buildLCL = impl().acquireLedger(prevLedgerHash_))
{
JLOG (j_.info()) <<
"Have the consensus ledger " << prevLedgerHash_;

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

Indentation

@param proposal Proposed peer position
@param nodeID ID of peer
*/

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

Indentation

void Consensus<Derived, Traits>::updateOurPositions ()
{
// Compute a cutoff time
auto peerCutoff = now_ - PROPOSE_FRESHNESS;

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

Should be const


// Update votes on disputed transactions
{
boost::optional <TxSet_t> changedSet;

This comment has been minimized.

Copy link
@nbougalis

nbougalis Dec 26, 2016

Member

I don't understand the need for changedSet at all. What's the point and why can't we use ourNewSet directly?

This comment has been minimized.

Copy link
@bachase

bachase Dec 27, 2016

Author Contributor

You are correct. This was due to incomplete refactoring on my part. Previously, there was a distinct MutableTxSet_t type that complemented TxSet_t and provided a type-restricted interface for adding/removing transactions. So changedSet was of type boost::optional<MutableTxSet_t> and was need to do the type transformation.

I ended up simplifying and adding the mutating capability to TxSet_t instead but forgot to simplify this spot.

@bachase bachase force-pushed the bachase:consensus_prep branch from b4a0c83 to 8561529 Dec 27, 2016

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Dec 27, 2016

Rebased on 0.50.0-b1 and addressed PR comments. Separate update forthcoming to address the dual use of ConsensusPosition and LedgerProposal

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Dec 28, 2016

Added 6d635e4 to refactor LedgerProposal and ConsensusPosition as suggested by @nbougalis. The idea now is that generic Consensus only cares about ConsensusPosition, which is suitable for either our own or our peer's position. For RCL version of consensus, a new RCLCxPeerPos represents a signed ConsensusPosition from a peer.

There is only one aspect of the change I don't like, and that was having to add a conversion operator back to ConsensusPosition. This is because Consensus only cares about the ConsensusPosition, but can still request the specific implementation to relay the original peer position.

Updated docs http://bachase.github.io/ripd1011/index.html

{
if (!ourNewSet)
ourNewSet = *ourSet_;

This comment has been minimized.

Copy link
@JoelKatz

JoelKatz Dec 29, 2016

Member

Somehow, this isn't taking a mutable snapshot, so when we call insert or erase, we're trying to modify an immutable, shared map.

@JoelKatz

This comment has been minimized.

Copy link
Member

JoelKatz commented Dec 29, 2016

Pushed a change to ensure that consensusBuilt gets called. With this change, my server consistently reports that it built a ledger a split second before confirming it fully validated it. This is perfect.

@bachase bachase force-pushed the bachase:consensus_prep branch from 4698214 to 38944bc Jan 3, 2017

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Jan 3, 2017

Rebased on 0.50.0-b2

@bachase bachase force-pushed the bachase:consensus_prep branch from 38944bc to dcf3148 Jan 3, 2017

@bachase bachase force-pushed the bachase:consensus_prep branch from a147d78 to a7b0fc1 Jan 17, 2017

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Jan 17, 2017

Rebased on 0.50.0-b5

@bachase bachase force-pushed the bachase:consensus_prep branch from a7b0fc1 to afc9861 Jan 19, 2017

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Jan 19, 2017

Rebased on 0.50.0-rc1

@param h The hash of the ledger of interest
@return the number of proposers that validated a ledger
*/
int

This comment has been minimized.

Copy link
@nbougalis

nbougalis Jan 27, 2017

Member

I'm curious why we return int from here? Negative values carry no special meaning, so it seems that std::size_t is a more fitting type.

Also, just to play the devil's advocate, does the num prefix add anything here? I think the function is just as clear without it. I'm not saying we should change it, but maybe take another look with fresh eyes?

This comment has been minimized.

Copy link
@bachase

bachase Jan 27, 2017

Author Contributor

Both good points; I will fix.


if (validating_)
validating_ = ledgerMaster_.isCompatible(*sharedLCL.ledger_,
app_.journal("LedgerConsensus").warn(),

This comment has been minimized.

Copy link
@nbougalis

nbougalis Jan 27, 2017

Member

Indentation in this and the next line.

else
{
// We agreed on a close time

This comment has been minimized.

Copy link
@nbougalis

nbougalis Jan 27, 2017

Member

Sorry to nitpick, but too much whitespace here.

CanonicalTXSet retriableTxs{ set.id() };

auto sharedLCL
= buildLCL(previousLedger_, set,

This comment has been minimized.

Copy link
@nbougalis

nbougalis Jan 27, 2017

Member

Nitpicking: unindent this line and the next two please.

{
// FIXME: Use median, not average
JLOG (j_.info())
<< beast::lexicalCastThrow <std::string> (p.second)

This comment has been minimized.

Copy link
@nbougalis

nbougalis Jan 27, 2017

Member

These two calls to beast::lexicalCastThrow can just be replaced with std::to_string.

This comment has been minimized.

Copy link
@bachase

bachase Jan 27, 2017

Author Contributor

This was just maintaining what was there before, but I agree that throwing doesn't make sense anyway.

However, @vinniefalco has advocated for boost::lexical_cast instead of std::to_string. Should I go with that instead?

This comment has been minimized.

Copy link
@HowardHinnant

HowardHinnant Jan 27, 2017

Contributor

I'm not going to loose any sleep over this, but my general recommendation is that when there is a std facility and boost facility that do the same thing, prefer the std facility. It is likely to be better specified in the corner cases, and less likely to have a bug in it.

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Jan 27, 2017

Contributor

forgot when I advocated for lexical_cast but it wasn't for converting integers to strings (use std::to_string for that)

else
{
// Don't advertise ledgers we're not willing to serve
std::uint32_t early = ledgerMaster_.getEarliestFetch ();

This comment has been minimized.

Copy link
@nbougalis

nbougalis Jan 27, 2017

Member

uMin = std::max(uMin, ledgerMaster_.getEarliestFetch ());

@return Whether we should continue validating
*/
bool
accept(

This comment has been minimized.

Copy link
@nbougalis

nbougalis Jan 27, 2017

Member

Just a general comment: functions which accept 15 arguments, 4 of which are bool are almost impossible to use right or, really audit. Perhaps it can't be helped in this instance, but we should try to simplify this.

This comment has been minimized.

Copy link
@bachase

bachase Jan 27, 2017

Author Contributor

I agree (see the comment at the top of the function). This was one of the major remaining warts. I have some ideas about aggregating some of these into more useful types. After working more on the unit test framework, I also think some arguments are not really needed by the generic implementation and could be managed inside RCLConsensus. I'll work on solidifying those and see if it is reasonable to make the changes now or to what until more tests are in place.

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Jan 27, 2017

Contributor

The question, is how likely is it that someone will, in the future, add one or more new call sites for the function with a large parameter list? If the answer is "almost never" then its not worth investing time in.

This comment has been minimized.

Copy link
@bachase

bachase Jan 27, 2017

Author Contributor

For now, its "almost never" since the only caller is the generic consensus code itself and only in one spot

@@ -56,16 +56,16 @@ class LedgerConsensus : public Traits

virtual LgrID_t getLCL () = 0;

virtual void gotMap (TxSet_t const& map) = 0;
virtual void gotMap (Time_t const& now, TxSet_t const& map) = 0;

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Jan 27, 2017

Contributor

Why Time_t instead of NetClock::time_opint?

This comment has been minimized.

Copy link
@bachase

bachase Jan 27, 2017

Author Contributor

This is now here, but still uses a time point specified in the traits class so that the unit test version is free to use whatever clock it wants for network time.

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Jan 27, 2017

Contributor

I'm not sure we need that customization point, as long as the implementation gets its time from abstract_clock<NetClock> then unit tests can provide their own clock.

This comment has been minimized.

Copy link
@bachase

bachase Jan 27, 2017

Author Contributor

Consensus actually deals with network times (NetClock::time_point) and internal times (std::chrono::steady_clock). The latter is used to measure the progress of consensus locally and independent of the network time.

@JoelKatz, is it important for the internal time to be independent and have higher resolution?

This comment has been minimized.

Copy link
@bachase

bachase Jan 27, 2017

Author Contributor

On further thought, all the instances of the NetClock times are really the next close time and are provided as an argument to public methods of Consensus. So I can do as you suggest and drop the customization point. The unit test can deal with network times as it wants to by converting values manual_clock<std::chrono::steady_clock> to NetClock::time_point. In the future, it could model the network clock drift explicitly if helpful.

@bachase bachase force-pushed the bachase:consensus_prep branch from 5aa6bda to 0ed400d Jan 27, 2017

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Jan 27, 2017

Rebased on 0.50.0, dropped the Time_t trait from consensus and added the latest incarnation of the testing and simulation framework.

@bachase bachase force-pushed the bachase:consensus_prep branch 2 times, most recently from b2420f0 to e98c93d Jan 27, 2017

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Jan 30, 2017

Rebased on 0.50.1

@ximinez

This comment has been minimized.

Copy link
Contributor

ximinez commented Jan 30, 2017

I was looking through the doc preview, and noticed some problematic formatting. For example, on https://bachase.github.io/ripd1011/rippled/ref/ripple__Consensus.html, in the Description section, the lists are coming out as one paragraph. I think you need /li instead of *. I also notice the Template Parameters section didn't render the links correctly. Not sure how to fix that.

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Jan 30, 2017

@ximinez I had created vinniefalco/docca#15 to handle the list issue but I totally forgot about it. I'll take a closer look at the template parameter issue.

@vinniefalco

This comment has been minimized.

Copy link
Contributor

vinniefalco commented Jan 30, 2017

Lists have to be specified like this:

/** ...

    @li List item 1
    @li List item 2
*/
@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Jan 30, 2017

My PR for docca was actually just for nested list support. My issue here was missing whitespace. I updated the docs and https://bachase.github.io/ripd1011/rippled/ref/ripple__Consensus.html looks much better.

Doxygen actually supports markdown style lists and the existing xsl handles it, so no need for @li unless you prefer it.

@ximinez to handle the poor rendering in the template section, I manually changed https://github.com/vinniefalco/docca/blob/master/include/docca/doxygen.xsl#L754 to instead render just like the mode=markup section above https://github.com/vinniefalco/docca/blob/master/include/docca/doxygen.xsl#L710. I'm not sure what else that impacts though.

@bachase bachase force-pushed the bachase:consensus_prep branch from 1de38bf to 280e37b Jan 31, 2017

bachase and others added some commits Dec 20, 2016

[FOLD] Migrate LedgerConsensusImp::accept calls
* RCLConsensus::dispatchAccept for dispatch accept job to queue
* Transfered (large) LedgerConsensusImp::accept to RCLConsensus
  * Broke out validate and buildLCL stages as member functions.
  * Replay logic entirely within buildLCL
[FOLD] Refactor LedgerPosition to RCLCxPeerPos:
RCLCxPeerPos corresponds to a signed ConsensusProposal
from a peer.  This replaces the LedgerProposal type that
appeared to also represent our own unsigned ConsensusProposal.

Additionally, updates variable/type names for consistency:

   *  `proposal` is an instance of ConsensusProposal
      `proposal.position()` is the proposed position
   *  `peerPos` is an instance of RCLCxPeerPos
       peerPos.proposal() is the peer proposal
[FOLD] Add MutableTxSet:
Fixes the bug when trying to insert/erase RCLTxSet
entries in `updateOurPositions` when the underlying
SHAMap is immutable.
[FOLD] Improve testing and simulation framework:
* Break out consensus simulation framework (csf) components.
* Add unit tests for 2-cliques
* Add simulations for clock-skew and scale-free network.
[FOLD] Separate open and converge time:
* Adds openTime_ in Consensus to measure the duration the ledger is open.
* Changes roundTime_ to only measure progress of consensus,i.e. post close
* Renames closePercent_t to convergePercent_ to make clear it measures
  how far along consensus is in converging (excluding open ledger interval)
[FOLD] Drop NetTime_t trait:
Always use NetClock::time_point for network times in
generic consensus.
[FOLD] Remove getJSON call
Remove getJSON call in generic Consensus::accept in favor of RCLConsensus
calling. This avoids imposing an expensive getJSON call on all consensus
implementations.
[FOLD] Fix consensus edge cases:
Prevent consensus stopping if NetworkOPs switches to disconnect
mode while consensus accepts a ledger.  If accept completes while
disconnected, the next round was not started, so returning to a connected
state would not drive consensus forward.

Also prevent a corner case in which Consensus::gotTxSet or
Consensus::peerProposal has the potential to update internal state
while an dispatched accept job is running.  Although the former acquire
locks, accept waits to grab the lock until it is almost done to limit
blocking, but this introduced a minor race condition.
[FOLD] Only reset proposing_ on new round
Distinguish external and internal calls to startNewRound.  Only
external calls can reset the proposing_ state of consensus.
@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Mar 1, 2017

Rebased on 0.60.0-b5

@bachase bachase force-pushed the bachase:consensus_prep branch from 8688e6a to 88766d5 Mar 1, 2017

@bachase

This comment has been minimized.

Copy link
Contributor Author

bachase commented Mar 7, 2017

Replaced with #2040

@bachase bachase closed this Mar 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.