Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamize trusted validator list and quorum (RIPD-1220): #1842

Closed
wants to merge 8 commits into from

Conversation

@wilsonianb
Copy link

commented Sep 12, 2016

Instead of specifying a static list of trusted validators in the config
or validators file, the configuration can now include trusted validator
list publisher keys and URIs where validator lists are served.
Publisher public keys should use the account public Base58 encoding to
distinguish them from node public keys. The sites are regularly queried
for the latest recommended list of validators from the trusted
publishers.

The trusted validator list and quorum are now reset each consensus
round using the fetched validator lists and the list of recent
validations seen. The validation quorum is no longer configurable.

Validator lists fetched from remote sites are expected to be served as JSON and include the
following fields:

  • "manifest": Base64-encoded serialization of a manifest containing the
    validator publisher's master and signing public keys. This has the
    same format as the [validation_manifest] config section.
  • "blob": Base64-encoded JSON string containing a "sequence" and
    "validators" field. "validators" contains an array of objects with a
    "validation_public_key" field.
    "validation_public_key" must be the master public key.
  • "signature": Hex-encoded signature of the blob using the publisher's
    signing key.
  • "version": 1
  • "refreshInterval" (optional)
@wilsonianb

This comment has been minimized.

Copy link
Author

commented Sep 12, 2016

Here are the docs generated from Manifest.h and ValidatorList.h
https://wilsonianb.github.io/rippled/

@codecov-io

This comment has been minimized.

Copy link

commented Sep 12, 2016

Codecov Report

Merging #1842 into develop will increase coverage by 0.18%.
The diff coverage is 77.38%.

@@             Coverage Diff             @@
##           develop    #1842      +/-   ##
===========================================
+ Coverage    67.25%   67.44%   +0.18%     
===========================================
  Files          683      687       +4     
  Lines        49231    49656     +425     
===========================================
+ Hits         33112    33490     +378     
- Misses       16119    16166      +47
Impacted Files Coverage Δ
src/ripple/rpc/impl/Handler.cpp 97.67% <ø> (ø)
src/ripple/overlay/impl/OverlayImpl.h 47.05% <ø> (+4.95%)
src/ripple/core/JobTypes.h 98.3% <ø> (-0.03%)
src/ripple/net/impl/RPCCall.cpp 36.16% <ø> (+0.83%)
src/ripple/app/misc/Validations.h 100% <ø> (ø)
src/ripple/app/ledger/LedgerConsensus.h 100% <ø> (ø)
src/ripple/overlay/Overlay.h 61.11% <ø> (ø)
src/ripple/app/ledger/LedgerMaster.h 83.33% <ø> (ø)
src/ripple/net/InfoSub.h 75% <ø> (ø)
src/ripple/app/ledger/impl/LedgerConsensusImp.h 50% <ø> (ø)
... and 48 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 460dd8f...d759cb4. Read the comment docs.

@par Thread Safety
May be called concurrently

This comment has been minimized.

Copy link
@mellery451

mellery451 Sep 12, 2016

Contributor

in order to ensure this, wouldn't quorum_ need to be atomic? OR we could somehow guarantee that this read is only called in the same thread as the write in onConsensusStart ?

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 14, 2016

Author

You're right. One option could be to add the mutex lock to quorum() which would allow it to return the latest quorum if one is already in the works in onConsensusStart.

This comment has been minimized.

Copy link
@mellery451

mellery451 Sep 14, 2016

Contributor

...or declare the read/write values as std::atomic<>

This comment has been minimized.

Copy link
@wilsonianb
std::vector<std::string> const& publisherKeys,
std::vector<std::string> const& configManifest)
{
static boost::regex const re (

This comment has been minimized.

Copy link
@mellery451

mellery451 Sep 12, 2016

Contributor

should we just use std::regex ?

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Sep 12, 2016

Contributor

That's a question for @HowardHinnant . Support for std::regex was unfortunately, inconsistent across the range of standard libraries that we use. boost::regex is always consistent.

This comment has been minimized.

Copy link
@HowardHinnant

HowardHinnant Sep 12, 2016

Contributor

I have a memory of us trying to use std::regex a year or two ago and having to revert back to boost because of inconsistencies. I don't recall if that was this use or not though. Generally I like to prefer std over boost when given a choice. But it would take some careful testing to convert regex due to its known difficulty of implementation and known inconsistency among std::implementations. We probably need better motivation for such an effort than "prefer std to boost.

I would have no objection if somebody wants to take that task on. But I don't see it as a high priority or big pay off.

This comment has been minimized.

Copy link
@mellery451

mellery451 Sep 13, 2016

Contributor

makes sense - thanks for the clarification

if (iVal->second <= 1)
keyListings_.erase (iVal);
else
--iVal->second;

This comment has been minimized.

Copy link
@mellery451

mellery451 Sep 13, 2016

Contributor

I find this particular statement hard to read - something like iVal->second -= 1 makes more sense to me, but I don't feel strongly about it.

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Sep 13, 2016

Contributor

Consider retraining your eyes, prefix decrement is a staple of the language!

bool preverified,
boost::asio::ssl::verify_context& ctx)
{
if (boost::asio::ssl::rfc2818_verification (domain) (preverified, ctx))

This comment has been minimized.

Copy link
@mellery451

mellery451 Sep 13, 2016

Contributor

I think you could just return then rfc2818_verification result directly (the if serves no purpose)

This comment has been minimized.

Copy link
@wilsonianb
auto const sequence = list["sequence"].asInt ();
if (sequence <= iter->second.sequence)
{
JLOG (j_.debug()) <<

This comment has been minimized.

Copy link
@mellery451

mellery451 Sep 13, 2016

Contributor

I wonder if these messages deserve to be warn level instead of debug (since it seems to be an error case)

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 14, 2016

Author

sequence == iter->second.sequence should be normal. There has been no change since the last time we fetched. I could split this into a check for == and a check for < and log debug and warn respectively.
The bad signature message below this should definitely be changed to warn.

This comment has been minimized.

Copy link
@wilsonianb
@li @c "blob": Base64-encoded JSON string containing a @c "sequence" and
@c "validators" field. @c "validators" contains an array of objects with
@c "validation_public_key" and @c "manifest" fields.
@c "validation_public_key" must be an Ed25519 master public key.

This comment has been minimized.

Copy link
@JoelKatz

JoelKatz Sep 14, 2016

Member

Is there any particular reason we require an Ed25519 public key? It would be much nicer if we did everything with a generic key/signature type. That way if we want to add another algorithm, we only need to add it in one place rather than everywhere we have public keys and signatures. I thought we already went through the effort of having generic handlers for keys/signatures.

This comment has been minimized.

Copy link
@wilsonianb
@JoelKatz

This comment has been minimized.

Copy link
Member

commented Sep 14, 2016

Losing the ability to manually lock the quorum at a chosen value will make it much more difficult to cold start a network after a catastrophic failure. The ability to completely control the UNL and the quorum and prevent either from being changed is essential to reliably cold starting a crashed network.

Restoring the command line switch to lock the quorum to a particular value will be enough to preserve cold start capability.

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Sep 15, 2016

I've re-added the command line --quorum option:
475cbf8
The quorum is normally set to the greater of:

  • the hard minimum (32% (80% * 40%) of the total number of known/listed validators)
  • a subset of the number of trusted validators

Running with --quorum specified now overrides the hard minimum. This will allow you to cold start after catastrophic failure. There will be no trusted validators since we'll have received no validations, and the specified hard minimum from --quorum will be used.
I think this is preferable to locking the quorum because it allows the quorum to increase as other validators come online. (You'd still want to restart without --quorum once everything has recovered.)

@nbougalis
Copy link
Member

left a comment

Left one comment in Validations.cpp that I want to see addressed.

inline bool
operator==(Manifest const& rhs) const
{
return serialized == rhs.serialized && masterKey == rhs.masterKey &&

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 15, 2016

Member

Reorder the tests so that sequence == rhs.sequence is first: comparisons of integers are faster than comparisons of strings and binary blobs.

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Sep 15, 2016

Contributor

oooh good idea

This comment has been minimized.

Copy link
@wilsonianb
m_overlay->saveValidatorKeyManifests (getWalletDB ());
validators_->stop ();

manifestCache_->save (getWalletDB ());

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 15, 2016

Member

Insert comment about the use of wallet.db here.

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 15, 2016

Author

Something like
//TODO Store manifests in manifests.sqlite instead of wallet.db?

This comment has been minimized.

Copy link
@wilsonianb
{
ScopedLockType sl (mLock);

if (!findCreateSet (hash)->insert (std::make_pair (node, val)).second)
if (!findCreateSet (hash)->insert (std::make_pair (signer, val)).second)

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 15, 2016

Member

Consider this scenario:

I trust V which is using manifests; the current manifest seq. is 7, and I see a validation with that sequence. I happily add it.

Concurrently, as chance would have it, I process a new manifest for V with seq. 8.

Oops! What happens now?

There's a validation with manifest sequence number 7 in there, marked as trusted; but should it still be? After all, we have a new manifest for V which revokes manifest 7. To make matters worse, V can issue a validation with the new manifest, and that, when received, will be added again as a new and distinct validation.

Am I missing something? How do we protect against this?

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 17, 2016

Author

I think this change accounts for that scenario:
1fcd583
Validations are now stored in ValidationSet by their master public key (when there is one).
A validation with the latest signing key replaces a validation for the same ledger sequence with a revoked signing key.


// "Iterate" the listed keys in random order so that the rank of multiple
// keys with the same number of listings is not deterministic
std::vector<std::size_t> indexes (

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 15, 2016

Member

Alternative:

std::vector<std::size_t> indexes (keyListings_.size());
std::iota (indexes.begin(), indexes.end(), 0);

This comment has been minimized.

Copy link
@wilsonianb
app_.getOPs().pubManifest (*make_Manifest(serialized));
{
app_.getOPs().pubManifest (
*Manifest::make_Manifest(serialized));

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 15, 2016

Member

If we avoid the std::move(*mo) on line 673, we can avoid the copy of mo->serialized and this make_Manifest and just use *mo directly.

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 16, 2016

Author

error: use of deleted function 'ripple::Manifest::Manifest(const ripple::Manifest&)'
It looks like we would need to replace the std::move(*mo) with
Manifest (mo->serialized, mo->masterKey, mo->signingKey, mo->sequence)
or finally add a copy constructor to Manifest.
Or we can leave this as is.

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Sep 16, 2016

Contributor

I'd like to understand why its insisting on making a copy. We should avoid unnecessary copies if possible.

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 17, 2016

Author

I think the Manifest isn't passed in to applyManifest as a const reference so that it can be stored on the cache via std::move.
https://github.com/wilsonianb/rippled/blob/dynamic-unl/src/ripple/app/misc/impl/Manifest.cpp#L250
We've been passing Manifests to applyManifest with std::move, but there are some cases (like here) that it'd be nice to still have the Manifest around.

@@ -714,21 +662,23 @@ OverlayImpl::onManifests (
{
auto& s = m->list ().Get (i).stobject ();

if (auto mo = make_Manifest (s))
if (auto mo = Manifest::make_Manifest (s))
{
uint256 const hash = mo->hash ();

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 15, 2016

Member

hash isn't needed; use mo->hash() directly in the addSuppressionPeer call

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 16, 2016

Author

I might leave it as is since hash can also be passed to shouldRelay below.

std::to_string(*sites_[siteIdx].pUrl.port),
io_service_,
std::bind(
&ValidatorList::onSiteFetch,

This comment has been minimized.

Copy link
@bachase

bachase Sep 16, 2016

Contributor

Is there a reason to use std::bind Instead of a lambda?

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Sep 16, 2016

Contributor

lambda lambda lambda and omega mu

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 19, 2016

Author

I don't have a good reason.
I can change it if no one objects.

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Sep 21, 2016

I just pushed 596fb1a which ensures that no two rippleds using the same published validator lists can experience a ledger fork.
Unfortunately, this guarantee doesn't hold when a change is made to one or more lists. Instead of hoping we don't get unlucky during the fetch interval (currently five minutes), I'm wondering if updated trusted published lists should be forwarded to peers.

@scottschurr

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2016

@wilsonianb: Not my area of expertise, but I'll hazard on opinion. Doesn't pushing the list only reduce the size of the window from 5 minutes to the duration of the uncertainty of arrival of the list? To my unseasoned eye it seems like pushing may reduce the risk of a fork, but not remove the potential. Reducing the likelihood may be sufficient; dunno.

siteIdx));
[this, siteIdx](error_code const& err, detail::response_type&& resp)
{
onSiteFetch (err, std::move(resp), siteIdx);

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Sep 22, 2016

Contributor

hella better

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Sep 22, 2016

@scottschurr That's correct. Forwarding the updated list would likely reduce the window of time during which one node has seen the changes and another hasn't.
To me, it seems comparable to validator manifests, with which you are dependent on peer forwarding to learn about updated validation signing keys.

@JoelKatz

This comment has been minimized.

Copy link
Member

commented Sep 28, 2016

It has to be the responsibility of those who publish lists that they don't change them so quickly and drastically that they break the network.

I like the idea of forwarding updated lists. It's not essential, but it would be nice to have. All it would take is one protocol message. If the list is new for you, and you trust it, forward it. Otherwise don't. That would also be nice for servers that don't want to send queries because they reveal their net location. They can just rely on servers they're connected to.

#
# List of URIs serving lists of recommended validators. Validator lists are
# fetched using https if a scheme is not specified.

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 28, 2016

Member

The scheme is not optional in the URI specification. We shouldn't treat it as such. People should have to spell out the correct scheme - whether it be http or https.

This comment has been minimized.

Copy link
@vinniefalco

vinniefalco Sep 28, 2016

Contributor

I agree

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 28, 2016

Author

Scheme is now required
b33b698

std::string comment;
boost::optional<Manifest> m;
};
using MapType = hash_map<PublicKey, Manifest>;

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 28, 2016

Member

I'd ditch this using and change line 156 to:

hash_map<PublicKey, Manifest> map_;

This comment has been minimized.

Copy link
@wilsonianb
{
ScopedLockType sl (mLock);

if (!findCreateSet (hash)->insert (std::make_pair (node, val)).second)
auto const masterKey = app_.manifestCache ().getMasterKey (signer);

This comment has been minimized.

Copy link
@nbougalis

nbougalis Sep 28, 2016

Member

I started by observing that we use this idiom in a number of places: call getMasterKey with some input, then depending on the result itself, we use either the result or the original input. So I was going to suggest, why not simply have getMasterKey return signer and avoid this conditional?

But then it occurred to me, what if the signer here is a just-revoked key? Notice that we call listed without any locks.

Can you explain the logic here and why there isn't a race?

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 28, 2016

Author

You're right. There is a race.
I'm working on solution. One thing that definitely needs to change is storing master public keys on the "trusted" list instead of ephemeral signing keys. Otherwise, we'll continue trusting a signing key until the list gets reset, even if the key gets revoked.

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Sep 28, 2016

Author

This should address the race condition:
1132685

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Sep 28, 2016

You can try this out on the altnet by adding the following to your config:

[ips]
54.94.245.104 51235
52.11.28.194 51235

[validator_list_sites]
https://wilsonianb.github.io/validators/altnet.json

[validator_list_keys]
aKEKiic24cH5kmyhmHLhT6FNGPSLpbtkYJU4f42LHdfC4NxQc3pX
@wilsonianb

This comment has been minimized.

Copy link
Author

commented Oct 3, 2016

You can see the latest docs for ValidatorList and ManifestCache here:
https://wilsonianb.github.io/rippled/

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Oct 18, 2016

Squashed and rebased on 0.40.0-b8

@JoelKatz

This comment has been minimized.

Copy link
Member

commented Oct 20, 2016

I think something is badly wrong with the quorum logic. The quorum_ field holds the number of keys in trustedKeys that must sign something for us to accept it.

So, for example, LedgerMaster::checkAccept accepts a ledger as fully-validated if the number of trusted validations it has exceeds the quorum. That makes sense if and only if a validation is trusted if and only if it was signed by a key in trustedKeys.

But that's not what we're doing. If you look in addValidation we set a validation to trusted if it passes the getTrustedKey test and it doesn't have to pass getListedKey if it's already set to trusted. But getTrustedKey doesn't care if a key is in trustedKeys or not! It only cares if there's a manifest, not whether it was chosen and included in the quorum.

Or do I misunderstand something?

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Oct 20, 2016

@JoelKatz getTrustedKey does check that trustedKeys_ contains the key:
08906f9#diff-7f8c7a926e17debbef064bc5c2f572dbR307
It first calls getMasterKey to use the master public key (if there is one) to do the find in trustedKeys_.

@JoelKatz

This comment has been minimized.

Copy link
Member

commented Oct 21, 2016

👍

@wilsonianb wilsonianb force-pushed the wilsonianb:dynamic-unl branch from 08906f9 to e0b1e46 Oct 28, 2016

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Oct 28, 2016

rebased on 0.40.0-b9
I've started work on forwarding fetched validator lists to peers.
https://github.com/wilsonianb/rippled/commits/forward-val-list
I'm going to also take the opportunity to split out the list fetching functionality from the ValidatorList class.

I don't see any harm (other than cluttering commit history) in saving these changes for a separate pull request.

JLOG(j_.info())
<< "Manifest in db is no longer trusted";
}
applyTrustedManifest (std::move(*mo));

This comment has been minimized.

Copy link
@bachase

bachase Jan 23, 2017

Contributor

Is it possible for the manifest to become untrusted after the prior shutdown? (Not sure if that question is well-posed..)

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Jan 23, 2017

Author

Yeah, you could run while trusting a validator in your config ([validators]) and receive its manifest from the network. Then you could remove it from your config and restart.
There's minimal downside to having an untrusted manifest in the cache. You'd do the signature verification work and forward it to peers at start up. You'd finally drop it the next time you shutdown.
What we don't want to do is verify the signature and store untrusted manifests received from peers.

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Jan 26, 2017

Latest fold commit adds a new ManifestCache for publisher manifests. Let me know if I should take this opportunity to move manifests to manifests.sqlite instead of adding a second manifests table to wallet.db.

@vinniefalco

This comment has been minimized.

Copy link
Contributor

commented Jan 26, 2017

Surely you mean attestor manifests ^_^

@wilsonianb wilsonianb force-pushed the wilsonianb:dynamic-unl branch 2 times, most recently from 2f672c4 to dfc4836 Jan 26, 2017

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Jan 30, 2017

Rebased on 0.50.1 and improved fork safety in latest fold commit.

@wilsonianb wilsonianb force-pushed the wilsonianb:dynamic-unl branch from dfc4836 to f0940f1 Feb 1, 2017

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Feb 1, 2017

Squashed and rebased on 0.50.2
Added fold commit to address Nik's suggestion #1975 (comment)

@wilsonianb wilsonianb force-pushed the wilsonianb:dynamic-unl branch from f0940f1 to 96d52cd Feb 1, 2017

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Feb 1, 2017

Rebased on 0.60.0-b2 and added approved changes from #1975

@wilsonianb wilsonianb force-pushed the wilsonianb:dynamic-unl branch from 4245567 to 1126b6c Feb 8, 2017

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Feb 8, 2017

Rebased on 0.60.0-b4 and fixed signed/unsigned quorum comparisons.

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Feb 21, 2017

If that last fix looks good, I'll squash things down for the next beta candidate.
1126b6c

@nbougalis

This comment has been minimized.

Copy link
Member

commented Feb 23, 2017

👍

@wilsonianb wilsonianb force-pushed the wilsonianb:dynamic-unl branch from 0e9523e to b98913c Feb 23, 2017

wilsonianb added 7 commits Jul 11, 2016
wilsonianb
Dynamize trusted validator list and quorum (RIPD-1220):
Instead of specifying a static list of trusted validators in the config
or validators file, the configuration can now include trusted validator
list publisher keys.

The trusted validator list and quorum are now reset each consensus
round using the latest validator lists and the list of recent
validations seen. The minimum validation quorum is now only
configurable via the command line.
wilsonianb
Fetch validator lists from remote sites:
Validator lists from configured remote sites are fetched at a regular
interval. Fetched lists are expected to be in JSON format and contain the
following fields:

* "manifest": Base64-encoded serialization of a manifest containing the
  validator publisher's master and signing public keys.

* "blob": Base64-encoded JSON string containing a "sequence",
  "expiration" and "validators" field. "expiration" contains the Ripple
   timestamp (seconds since January 1st, 2000 (00:00 UTC)) for when the
  list expires. "validators" contains an array of objects with a
  "validation_public_key" field.

* "signature": Hex-encoded signature of the blob using the publisher's
  signing key.

* "version": 1

* "refreshInterval" (optional)
wilsonianb

@wilsonianb wilsonianb force-pushed the wilsonianb:dynamic-unl branch from b98913c to 6bb0764 Feb 28, 2017

@wilsonianb

This comment has been minimized.

Copy link
Author

commented Feb 28, 2017

rebased on 0.60.0-b5

wilsonianb
Add validator key revocations:
Allow manifest revoking validator keys to be stored in a separate
[validator_key_revocation] config field, so the validator can run
again with new keys and token.
@wilsonianb

This comment has been minimized.

Copy link
Author

commented Mar 1, 2017

Added approved changes from #2019

@seelabs

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2017

In 0.60.0-b7

@seelabs seelabs closed this Mar 1, 2017

@wilsonianb wilsonianb deleted the wilsonianb:dynamic-unl branch May 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.