Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clique PoA protocol & Rinkeby PoA testnet #225

Open
karalabe opened this issue Mar 6, 2017 · 116 comments
Open

Clique PoA protocol & Rinkeby PoA testnet #225

karalabe opened this issue Mar 6, 2017 · 116 comments

Comments

@karalabe
Copy link
Member

@karalabe karalabe commented Mar 6, 2017

Changelog:

  • Apr 4, 2017:
    • Mention the cascading proposal-execution corner case and its avoidance.
  • Mar 14, 2017:
    • Expanded the Clique block authorization section, added a strategy proposal.
    • Expanded the Clique signer voting section, added a strategy proposal.
  • Mar 13, 2017:
    • Polished up the constants in the Clique consensus protocol spec.
    • Added the two difficulty values and described in-turn/out-of-turn signing.
  • Mar 11, 2017:
    • Added initial technical specs for the Clique PoA consensus protocol.
    • Added checkpointing to reset votes and embed the list of signers into epoch headers.
    • Reintroduced authorized signer vanity extra-data as a fixed 32 byte allowance.
  • Mar 6, 2017
    • First proposal of the Rinkeby testnet and its PoA implementation ideas.

Clique proof-of-authority consensus protocol

Note, for the background and rationale behind the proposed proof-of-authority consensus protocol, please read the sections after this technical specification. I've placed this on top to have an easy to find reference for implementers without having to dig through the discussions.

We define the following constants:

  • EPOCH_LENGTH: Number of blocks after which to checkpoint and reset the pending votes.
    • Suggested 30000 for the testnet to remain analogous to the mainnet ethash epoch.
  • BLOCK_PERIOD: Minimum difference between two consecutive block's timestamps.
    • Suggested 15s for the testnet to remain analogous to the mainnet ethash target.
  • EXTRA_VANITY: Fixed number of extra-data prefix bytes reserved for signer vanity.
    • Suggested 32 bytes to retain the current extra-data allowance and/or use.
  • EXTRA_SEAL: Fixed number of extra-data suffix bytes reserved for signer seal.
    • 65 bytes fixed as signatures are based on the standard secp256k1 curve.
  • NONCE_AUTH: Magic nonce number 0xffffffffffffffff to vote on adding a new signer.
  • NONCE_DROP: Magic nonce number 0x0000000000000000 to vote on removing a signer.
  • UNCLE_HASH: Always Keccak256(RLP([])) as uncles are meaningless outside of PoW.
  • DIFF_NOTURN: Block score (difficulty) for blocks containing out-of-turn signatures.
    • Suggested 1 since it just needs to be an arbitrary baseline constant.
  • DIFF_INTURN: Block score (difficulty) for blocks containing in-turn signatures.
    • Suggested 2 to show a slight preference over out-of-turn signatures.

We also define the following per-block constants:

  • BLOCK_NUMBER: Block height in the chain, where the height of the genesis is block 0.
  • SIGNER_COUNT: Number of authorized signers valid at a particular instance in the chain.
  • SIGNER_INDEX: Index of the block signer in the sorted list of current authorized signers.
  • SIGNER_LIMIT: Number of consecutive blocks out of which a signer may only sign one.
    • Must be floor(SIGNER_COUNT / 2) + 1 to enforce majority consensus on a chain.

We repurpose the ethash header fields as follows:

  • beneficiary: Address to propose modifying the list of authorized signers with.
    • Should be filled with zeroes normally, modified only while voting.
    • Arbitrary values are permitted nonetheless (even meaningless ones such as voting out non signers) to avoid extra complexity in implementations around voting mechanics.
    • Must be filled with zeroes on checkpoint (i.e. epoch transition) blocks.
  • nonce: Signer proposal regarding the account defined by the beneficiary field.
    • Should be NONCE_DROP to propose deauthorizing beneficiary as a existing signer.
    • Should be NONCE_AUTH to propose authorizing beneficiary as a new signer.
    • Must be filled with zeroes on checkpoint (i.e. epoch transition) blocks.
    • Must not take up any other value apart from the two above (for now).
  • extraData: Combined field for signer vanity, checkpointing and signer signatures.
    • First EXTRA_VANITY bytes (fixed) may contain arbitrary signer vanity data.
    • Last EXTRA_SEAL bytes (fixed) is the signer's signature sealing the header.
    • Checkpoint blocks must contain a list of signers (N*20 bytes) in between, omitted otherwise.
    • The list of signers in checkpoint block extra-data sections must be sorted in ascending order.
  • mixHash: Reserved for fork protection logic, similar to the extra-data during the DAO.
    • Must be filled with zeroes during normal operation.
  • ommersHash: Must be UNCLE_HASH as uncles are meaningless outside of PoW.
  • timestamp: Must be at least the parent timestamp + BLOCK_PERIOD.
  • difficulty: Contains the standalone score of the block to derive the quality of a chain.
    • Must be DIFF_NOTURN if BLOCK_NUMBER % SIGNER_COUNT != SIGNER_INDEX
    • Must be DIFF_INTURN if BLOCK_NUMBER % SIGNER_COUNT == SIGNER_INDEX

Authorizing a block

To authorize a block for the network, the signer needs to sign the block's hash containing everything except the signature itself. The means that the hash contains every field of the header (nonce and mixDigest included), and also the extraData with the exception of the 65 byte signature suffix. The fields are hashed in the order of their definition in the yellow paper.

This hash is signed using the standard secp256k1 curve, and the resulting 65 byte signature (R, S, V, where V is 0 or 1) is embedded into the extraData as the trailing 65 byte suffix.

To ensure malicious signers (loss of signing key) cannot wreck havoc in the network, each singer is allowed to sign maximum one out of SIGNER_LIMIT consecutive blocks. The order is not fixed, but in-turn signing weighs more (DIFF_INTURN) than out of turn one (DIFF_NOTURN).

Authorization strategies

As long as signers conform to the above specs, they can authorize and distribute blocks as they see fit. The following suggested strategy will however reduce network traffic and small forks, so it's a suggested feature:

  • If a signer is allowed to sign a block (is on the authorized list and didn't sign recently).
    • Calculate the optimal signing time of the next block (parent + BLOCK_PERIOD).
    • If the signer is in-turn, wait for the exact time to arrive, sign and broadcast immediately.
    • If the signer is out-of-turn, delay signing by rand(SIGNER_COUNT * 500ms).

This small strategy will ensure that the in-turn signer (who's block weighs more) has a slight advantage to sign and propagate versus the out-of-turn signers. Also the scheme allows a bit of scale with the increase of the number of signers.

Voting on signers

Every epoch transition (genesis block included) acts as a stateless checkpoint, from which capable clients should be able to sync without requiring any previous state. This means epoch headers must not contain votes, all non settled votes are discarded, and tallying starts from scratch.

For all non-epoch transition blocks:

  • Signers may cast one vote per own block to propose a change to the authorization list.
  • Only the latest proposal per target beneficiary is kept from a single signer.
  • Votes are tallied live as the chain progresses (concurrent proposals allowed).
  • Proposals reaching majority consensus SIGNER_LIMIT come into effect immediately.
  • Invalid proposals are not to be penalized for client implementation simplicity.

A proposal coming into effect entails discarding all pending votes for that proposal (both for and against) and starting with a clean slate.

Cascading votes

A complex corner case may arise during signer deauthorization. When a previously authorized signer is dropped, the number of signers required to approve a proposal might decrease by one. This might cause one or more pending proposals to reach majority consensus, the execution of which might further cascade into new proposals passing.

Handling this scenario is non obvious when multiple conflicting proposals pass simultaneously (e.g. add a new signer vs. drop an existing one), where the evaluation order might drastically change the outcome of the final authorization list. Since signers may invert their own votes in every block they mint, it's not so obvious which proposal would be "first".

To avoid the pitfalls cascading executions would entail, the Clique proposal explicitly forbids cascading effects. In other words: Only the beneficiary of the current header/vote may be added to/dropped from the authorization list. If that causes other proposals to reach consensus, those will be executed when their respective beneficiaries are "touched" again (given that majority consensus still holds at that point).

Voting strategies

Since the blockchain can have small reorgs, a naive voting mechanism of "cast-and-forget" may not be optimal, since a block containing a singleton vote may not end up on the final chain.

A simplistic but working strategy is to allow users to configure "proposals" on the signers (e.g. "add 0x...", "drop 0x..."). The signing code can then pick a random proposal for every block it signs and inject it. This ensures that multiple concurrent proposals as well as reorgs get eventually noted on the chain.

This list may be expired after a certain number of blocks / epochs, but it's important to realize that "seeing" a proposal pass doesn't mean it won't get reorged, so it should not be immediately dropped when the proposal passes.

Background

Ethereum's first official testnet was Morden. It ran from July 2015 to about November 2016, when due to the accumulated junk and some testnet consensus issues between Geth and Parity, it was finally laid to rest in favor of a testnet reboot.

Ropsten was thus born, clearing out all the junk and starting with a clean slate. This ran well until the end of February 2017, when malicious actors decided to abuse the low PoW and gradually inflate the block gas limits to 9 billion (from the normal 4.7 million), at which point sending in gigantic transactions crippling the entire network. Even before that, attackers attempted multiple extremely long reorgs, causing network splits between different clients, and even different versions.

The root cause of these attacks is that a PoW network is only as secure as the computing capacity placed behind it. Restarting a new testnet from zero wouldn't solve anything, since the attacker can mount the same attack over and over again. The Parity team decided to go with an emergency solution of rolling back a significant number of blocks, and enacting a soft-fork rule that disallows gas limits above a certain threshold.

While this solution may work in the short term:

  • It's not elegant: Ethereum supposed to have dynamic block limits
  • It's not portable: other clients need to implement new fork logic themselves
  • It's not compatible with sync modes: fast and light clients are both out of luck
  • It's just prolonging the attacks: junk can still be steadily pushed in ad infinitum

Parity's solution although not perfect, is nonetheless workable. I'd like to propose a longer term alternative solution, which is more involved, yet should be simple enough to allow rolling out in a reasonable amount of time.

Standardized proof-of-authority

As reasoned above, proof-of-work cannot work securely in a network with no value. Ethereum has its long term goal of proof-of-stake based on Casper, but that is heavy research so we cannot rely on that any time soon to fix today's problems. One solution however is easy enough to implement, yet effective enough to fix the testnet properly, namely a proof-of-authority scheme.

Note, Parity does have an implementation of PoA, though it seems more complex than needed and without much documentation on the protocol, it's hard to see how it could play along with other clients. I welcome feedback from them on this proposal from their experience.

The main design goals of the PoA protocol described here is that it should be very simple to implement and embed into any existing Ethereum client, while at the same time allow using existing sync technologies (fast, light, warp) without needing client developers to add custom logic to critical software.

Proof-of-authority 101

For those not aware of how PoA works, it's a very simplistic protocol, where instead of miners racing to find a solution to a difficult problem, authorized signers can at any time at their own discretion create new blocks.

The challenges revolve around how to control minting frequency, how to distribute minting load (and opportunity) between the various signers and how to dynamically adapt the list of signers. The next section defines a proposed protocol to handle all these scenarios.

Rinkeby proof-of-authority

There are two approaches to syncing a blockchain in general:

  • The classical approach is to take the genesis block and crunch through all the transactions one by one. This is tried and proven, but in Ethereum complexity networks quickly turns out to be very costly computationally.
  • The other is to only download the chain of block headers and verify their validity, after which point an arbitrary recent state may be downloaded from the network and checked against recent headers.

A PoA scheme is based on the idea that blocks may only be minted by trusted signers. As such, every block (or header) that a client sees can be matched against the list of trusted signers. The challenge here is how to maintain a list of authorized signers that can change in time? The obvious answer (store it in an Ethereum contract) is also the wrong answer: fast, light and warp sync don't have access to the state during syncing.

The protocol of maintaining the list of authorized signers must be fully contained in the block headers.

The next obvious idea would be to change the structure of the block headers so it drops the notions of PoW, and introduces new fields to cater for voting mechanisms. This is also the wrong answer: changing such a core data structure in multiple implementations would be a nightmare development, maintenance and security wise.

The protocol of maintaining the list of authorized signers must fit fully into the current data models.

So, according to the above, we can't use the EVM for voting, rather have to resort to headers. And we can't change header fields, rather have to resort to the currently available ones. Not much wiggle room.

Repurposing header fields for signing and voting

The most obvious field that currently is used solely as fun metadata is the 32 byte extra-data section in block headers. Miners usually place their client and version in there, but some fill it with alternative "messages". The protocol would extend this field to with 65 bytes with the purpose of a secp256k1 miner signature. This would allow anyone obtaining a block to verify it against a list of authorized signers. It also makes the miner section in block headers obsolete (since the address can be derived from the signature).

Note, changing the length of a header field is a non invasive operation as all code (such as RLP encoding, hashing) is agnostic to that, so clients wouldn't need custom logic.

The above is enough to validate a chain, but how can we update a dynamic list of signers. The answer is that we can repurpose the newly obsoleted miner field and the PoA obsoleted nonce field to create a voting protocol:

  • During regular blocks, both of these fields would be set to zero.
  • If a signer wishes to enact a change to the list of authorized signers, it will:
    • Set the miner to the signer it wishes to vote about
    • Set the nonce to 0 or 0xff...f to vote in favor of adding or kicking out

Any clients syncing the chain can "tally" up the votes during block processing, and maintain a dynamically changing list of authorized signers by popular vote. The initial set of signers can be given as genesis chain parameters (to avoid the complexity of deploying an "initial voters list" contract in the genesis state).

To avoid having an infinite window to tally up votes in, and also to allow periodically flushing stale proposals, we can reuse the concept of an epoch from ethash, where every epoch transition flushes all pending votes. Furthermore, these epoch transitions can also act as stateless checkpoints containing the list of current authorized signers within the header extra-data. This permits clients to sync up based only on a checkpoint hash without having to replay all the voting that was done on the chain up to that point. It also allows the genesis header to fully define the chain, containing the list of initial signers.

Attack vector: Malicious signer

It may happen that a malicious user gets added to the list of signers, or that a signer key/machine is compromised. In such a scenario the protocol needs to be able to defend itself against reorganizations and spamming. The proposed solution is that given a list of N authorized signers, any signer may only mint 1 block out of every K. This ensures that damage is limited, and the remainder of the miners can vote out the malicious user.

Attack vector: Censoring signer

Another interesting attack vector is if a signer (or group of signers) attempts to censor out blocks that vote on removing them from the authorization list. To work around this, we restrict the allowed minting frequency of signers to 1 out of N/2. This ensures that malicious signers need to control at least 51% of signing accounts, at which case it's game over anyway.

Attack vector: Spamming signer

A final small attack vector is that of malicious signers injecting new vote proposals inside every block they mint. Since nodes need to tally up all votes to create the actual list of authorized signers, they need to track all votes through time. Without placing a limit on the vote window, this could grow slowly, yet unbounded. The solution is to place a moving window of W blocks after which votes are considered stale. A sane window might be 1-2 epochs. We'll call this an epoch.

Attack vector: Concurrent blocks

If the number of authorized signers are N, and we allow each signer to mint 1 block out of K, then at any point in time N-K+1 miners are allowed to mint. To avoid these racing for blocks, every signer would add a small random "offset" to the time it releases a new block. This ensures that small forks are rare, but occasionally still happen (as on the main net). If a signer is caught abusing it's authority and causing chaos, it can be voted out.

Notes

Does this suggest we use a censored testnet?

So and so. The proposal suggests that given the malicious nature of certain actors and given the weakness of the PoW scheme in a "monopoly money" network, it is better to have a network with a bit of spam filtering enabled that developers can rely on to test their programs vs. to have a wild wild west chain that dies due to its uselessness.

Why standardize proof-of-authority?

Different clients are better at different scenarios. Go may be awesome in capable server side environments, but CPP may be better suited to run on an RPI Zero. Having a possibility to mix clients in private environments too would be a net win for the ecosystem, as well as being able to participate in a single spamless testnet would be a win for everyone at large.

Doesn't manual voting get messy?

This is an implementation detail, but signers may implement contract based voting strategy leveraging the full capabilities of the EVM, only pushing the results into the headers for average nodes to verify.

Clarifications and feedback

  • This proposal does not rule out clients running a PoW based testnet side by side, whether Ropsten or a new one based on it. The ideal scenario would be that clients provide a way to attach to both PoW as well as PoA based test networks (#225 (comment)).
  • The protocol parameters although can be made configurable at client implementers' discression, the Rinkeby network should be as close to the main network as possible. That includes dynamic gas limits, variable block times around 15 seconds, gas prices and such (#225 (comment)).
  • The scheme requires that at least K signers are online at any time, since that is the minimum number required to ensure "minting" diversity. This means that if more than K drop off, the network stalls. This should be solved by ensuring the signers are high-uptime machines and failing ones should be voted out in a timely fashion before too many failures occur (#225 (comment)).
  • The proposal does not address "legitimate" spam, as in an attacker validly spending testnet ether to create junk, however without PoW mining, an attacker may not be able to obtain infinite ether to mount the attack in the first place. One possibility would be to have a faucet giving out ether based on GitHub (or whatever else) accounts in a limited fashion (e.g. 10 / day) (#225 (comment)).
  • A suggestion was made to create checkpoint blocks for every epoch that contains a list of authorized signers at that point in time. This would allow light clients at a later point to say "sync from here" without needing to start from the genesis. This could be added to the extradata field as a prefix before the signature (#225 (comment)).
@miohtama
Copy link

@miohtama miohtama commented Mar 6, 2017

As the word censored is in cursive, I'd like to point out that while this proposal proposes a new public testnet with less decentralized characteristics, it's possible for anyone to run their own PoW testnet. Then you bear the infrastructure cost of doing so and the proposal does not limit your ability for this any way. This has been true from Ethereum day zero, as Ethereum clients have been very user friendly for running your private testnet.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 6, 2017

Just to add on that, the proposal also does not restrict clients to run this exclusively. The proposal can run side-by-side with the current testnet, so users would be free to chose between the PoW Ropsten or the PoA Rinkeby.

@christoph2806
Copy link
Contributor

@christoph2806 christoph2806 commented Mar 6, 2017

We greatly support this approach! As a DApp Developer, we urgently need a public safe and reliable testnet, which obviously cannot be secured by PoW. DApps are beginning to interact heavily - only to mention status.im, metamask, uport, or other wallets - and only on a broadly accepted public testnet all projects will be present and able to test dependencies to others. For similar reasons, the new testnet should be as similar as possible to the mainnet - only then it can serve as a valid reference for developement. I'd prefer:

  • similar gas limit
  • similar block time
  • similar gas price
  • and for each parameter, a similar statistical distribution
    only then you can consider an application which runs on testnet as "tested".
    I appreciate the parity solution with kovan, because it gives some relief for short term, but I would like to encourage all involved parties to work together on a shared solution.
@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 6, 2017

@christoph2806 Definitely, added to the proposal's clarification section.

@Nashatyrev
Copy link
Member

@Nashatyrev Nashatyrev commented Mar 6, 2017

With time some signers can go offline. Couldn't it be the case when at some block all of (N-K) signers who can mint the next block are stale and the network stuck?

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 6, 2017

For my proposal the network operators should ensure that stale signers are removed/replaced in a timely fashion. For testnet purposes this would probably be only a handful of signers that we can guarantee uptime.

@hrishikeshio
Copy link

@hrishikeshio hrishikeshio commented Mar 6, 2017

How will the ether be distributed? It is important since spammer can try to get as much ether as possible from various sources and then use it to spam the network.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 6, 2017

@hrishikeshio The issue with Ropsten was that the attacker minted tens of thousands of blocks, producing huge reorgs and pushing the gas limit up to 9B. These two scenarios could be avoided since only signers can mint blocks, so they could also retain some sanity limits.

The proposal does not specify any means for spam filtering for individual transactions as that is a new can of worms. I'll have to think a bit how best to solve that issue (around miner strategies), but limiting ether availability on a testnet is imho a bad idea. We want to be as inclusive as possible.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 6, 2017

One possible solution would be to have a faucet that grants X ether / Y time (e.g 10 / day) but is bound to some OAuth protocol that has proper protection against mass account creation (e.g. github accout, email address, etc).

@3esmit
Copy link
Contributor

@3esmit 3esmit commented Mar 6, 2017

Snippet to claim a github user ownership to an ethereum address

contract GitHubOracle is  usingOraclize {
    //constant for oraclize commits callbacks
    uint8 constant CLAIM_USER = 0;
    //temporary storage enumerating oraclize calls
    mapping (bytes32 => uint8) claimType;
    //temporary storage for oraclize user register queries
    mapping (bytes32 => UserClaim) userClaim;
    //permanent storage of sha3(login) of github users
    mapping (bytes32 => address) users;
    //events
    event UserSet(string githubLogin, address account);
    //stores temporary data for oraclize user register request
    struct UserClaim {
        address sender;
        bytes32 githubid;
        string login;
    }

    //register or change a github user ethereum address
    function register(string _github_user, string _gistid)
     payable {
        bytes32 ocid = oraclize_query("URL", strConcat("https://gist.githubusercontent.com/",_github_user,"/",_gistid,"/raw/"));
        claimType[ocid] = CLAIM_USER;
        userClaim[ocid] = UserClaim({sender: msg.sender, githubid: sha3(_github_user), login: _github_user});
    }
  //oraclize response callback
    function __callback(bytes32 _ocid, string _result) {
        if (msg.sender != oraclize_cbAddress()) throw;
        uint8 callback_type = claimType[_ocid];
        if(callback_type==CLAIM_USER){
            if(strCompare(_result,"404: Not Found") != 0){    
                address githubowner = parseAddr(_result);
                if(userClaim[_ocid].sender == githubowner){
                    _register(userClaim[_ocid].githubid,userClaim[_ocid].login,githubowner);
                }
            }
            delete userClaim[_ocid]; //should always be deleted
        }
        delete claimType[_ocid]; //should always be deleted
    }
    function _register(bytes32 githubid, string login, address githubowner) 
     internal {
        users[githubid] = githubowner;
        UserSet(login, githubowner);
    }
}

User create a gist with his public address and call register passing _github_user + _gistid

From https://github.com/ethereans/github-token/blob/master/contracts/GitHubToken.sol

@mightypenguin
Copy link

@mightypenguin mightypenguin commented Mar 6, 2017

There could be a light quick proof of stake system where (like the github oraclize above) people need 5ETH locked to a live net contract address that then allows them to be on the testnet. Misbehave, and the ethereum foundation (or who ever runs it) confiscates your eth.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 6, 2017

Yeah, side chains are an interesting idea but those are a whole new can of worms :)

@maurycyp
Copy link

@maurycyp maurycyp commented Mar 6, 2017

Two thoughts:

Last week, INFURA launched a (private but publicly available) chain called INFURAnet (with INFURA running all the authorities) to provide a usable test network in the face of the Ropsten issues. It was obviously based on Parity but we would feel better if PoA was a standard and compatible feature across all clients. Therefore, we support this EIP.

Additionally, if Ropsten is replaced with a PoA network, we would be happy to run one of the authorities.

@AlexeyAkhunov
Copy link
Contributor

@AlexeyAkhunov AlexeyAkhunov commented Mar 6, 2017

What about still using PoW on the testnet, but with slightly modified parameters:

  1. Block Reward = 0
  2. Gas price is fixed to certain value
  3. There is a hard cap on the gas limit in a block
  4. Faucet gives testnet Ether only to accounts that have Ether in the same account on the main net, and that Ether is at least 24 hours old. Each account only receives test Ether once. Or some other limitation of this sort, which will allow faucet to be automatic, but will limit sybil attacks.

Hopefully, implementation could be much easier than Proof Of Authority

EDIT: Another idea - can Block Reward be negative? Meaning that mining actually cost Test Ether. That allows implementing sort of "Proof Of Authority" trivially, by simply distributing large amounts of test Ether. It also means that if Test Ether is dished out periodically, the maintainers of the test net can disallow abusive miners by not giving them the next tranche of test Ether

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 6, 2017

The issue with your modified PoW scheme is that it still permits creating huge reorgs by mining lots of blocks, even if without reward.

The second proposal doesn't solve this issue either as a malicious user might accumulate a lot of ether first, then create many many parallel chains. All will be valid since he does have the funds, and there's no way to take it away. Arguably more stable than the first proposal, but doing negative rewards might break clients unexpectedly as I don't think most codebases catered for this possibility.

Btw, the zero block reward is a nice idea for PoA too, as it prevents a rogue signer / leaked key from ruining the chain with accumulated funds.

@AlexeyAkhunov
Copy link
Contributor

@AlexeyAkhunov AlexeyAkhunov commented Mar 6, 2017

@karalabe Thanks! What I meant with the negative rewards - the maintainer of the network gives out enough Test Eth to current miner authorities to mine, lets say, for a week. After the week, the maintainer looks who needs a top-up, and only gives a top up to miners who behaved well. For those who did not behave well, the payouts simply stop.

@AlexeyAkhunov
Copy link
Contributor

@AlexeyAkhunov AlexeyAkhunov commented Mar 7, 2017

@karalabe Ah, I got your point about the parallel chains now. In that case, there needs to be some kind of regular expiration of Test Eth :)

@jaekwon
Copy link

@jaekwon jaekwon commented Mar 7, 2017

Here's GoEthereum on Tendermint.

https://github.com/tendermint/ethermint

The goal is to make as much of GoEthereum as compatible as possible.

Come to #ethermint on the Tendermint slack for discussions.

We have some upstream patches that would make Ethermint much cleaner. See the bottom of https://github.com/tendermint/ethermint/pull/42/files

@jaekwon
Copy link

@jaekwon jaekwon commented Mar 7, 2017

We're pushing GoEthereum to high tx limits and uncovering some issues.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 7, 2017

Just to mention a proposal by @frozeman and @fjl of adding the set of signers to the extra-data field of every X block to act as a checkpoint. This wouldn't be useful now, but it would permit anyone trivially adding a logic to "sync form H(X)" where H(X) is the hash of a checkpoint block.

The added benefit is that this would allow the genesis block to store the initial set of signers and we wouldn't need extra chain configuration parameters.

@holiman
Copy link
Contributor

@holiman holiman commented Mar 8, 2017

Here's a suggested protocol change: https://gist.github.com/holiman/5e021b24a7bfec95c8cc84b97e44e45a

It was a bit too long for fitting in a comment.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 9, 2017

@holiman To react a bit to the proposal here too, I see one problem that's easy-ish to solve, another that's hard:

Your scheme must also ensure that blocks cannot be minted like crazy, otherwise the difficulty becomes irrelevant. This can be done with the same "min 15 seconds apart" guarantee that the original proposal had.

The harder part is that with no guarantee on signer ordering/frequency (only relying on the difficulty for chain quality/validation), malicious signers can mine very long chains that aren't difficult enough to beat the canonical, however the nodes cannot know this before processing them. And since creating these chains is mostly free in a PoA world, malicious signers can keep spamming with little effort.

The original proposal had a guarantee that the majority of the signers agreed at some point that a chain is valid (even if it was reorged afterwards), so minority malicious miners can only feed made up chains of N/2 blocks.

The difficulty idea is elegant btw, just not sure how yet to make use of it :)

@keorn
Copy link

@keorn keorn commented Mar 11, 2017

If you do not mind somewhat relying on UNIX time and longer block times when validators are down, then Aura (in Parity) uses something like that:

  • time is divided into steps, the current step is t / step_duration
  • the primary for step is step % length(validators)
  • the header seal is a list of two values: step and signature (step is redundant and can be removed in a future version)
  • the total difficulty or as we refer to it "chain score" is set to be (using appropriate differencing to obtain block difficulty): U128_max * height - step

Validation: block at a given step can be only signed by the primary, only first block for a given step is accepted (if a second is received, a vote to remove the authority should be issued), block can arrive at most 1 step ahead.

Validator set can be altered in the way @karalabe proposed.

Either way we will attempt to implement whichever solution is elected.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 11, 2017

I'm not too fond of relying on time. Using @holiman 's proposal of calculating "your turn" based only on block height seems a bit better in respect as nodes don't have to be synced.

Any particular reason for having the chain difficulty calculated like that instead of just the height of the chain for example? What does this more complex formula gain you?

The issue I see with Aura's turn based scheme is that if a few signers drop off (which can be only natural in an internet scale system), then the chain dynamics would become quite irregular, with "gaps" in the minting time; versus my proposal where multiple signers can fill in for those that dropped.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 11, 2017

If I understand correctly, the idea in the difficulty algorithm is to score those chains higher that have the most signers signing at the correct turn. So chains that skip blocks are scored less vs. those that include all signers.

What happens in scenarios where blocks are minted in step, but propagated later after the step ends? Or if some signers receive the next block in time, while some signers receive it a bit later after the step ended?

@karalabe karalabe changed the title Rinkeby - Cross client Proof-of-Authority testnet Clique PoA protocol & Rinkeby PoA testnet Mar 11, 2017
@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 11, 2017

I've updated the proposal with a tech spec section describing the proposed PoA protocol itself. It's still missing a few details around signing (notably the 1-out-of-K block constraint), and I've yet to figure out the difficulty calculation.

Also I split off the PoA protocol from the testnet itself naming wise as I'd like to keep the two concepts separated to avoid confusion. Using metro station names for the testnets is fine, but for a reusable PoA scheme I wanted something a bit more "mundane" and/or obvious.

The names are still up for finalization. The Clique name for the PoA scheme (best until now) was suggested by @holiman .

@VoR0220
Copy link

@VoR0220 VoR0220 commented Mar 11, 2017

Id recommend using the Ethermint or Eris DB permissioning native contract or both. They've both been tested extensively and both would not require reinventing the wheel. Furthermore we're all friends here and have done the heavy leg work here so...why not?

@karalabe
Copy link
Member Author

@karalabe karalabe commented Mar 11, 2017

It's hard to evaluate such a proposal without any details. I personally am not familiar with either how the work, so I cannot comment on their feasibility.

My main design goals here are to be easy to add to any client and support current techs (fast, light, warp sync) without invasive changes.

Can those consensus engines be plugged into all clients? Can they run on mobile and embedded devices? Are they fully self contained without external dependencies? Can they achieve consensus header only? Are they compatible licensing wise with all clients? These all are essential requirements I've tried to meet.

I'm happy to consider them, but you need to provide a lot more detail to evaluate based upon.

@VoR0220
Copy link

@VoR0220 VoR0220 commented Mar 11, 2017

Absolutely.

So both use a tendermint consensus Proof of Stake, that is detailed here:

https://github.com/tendermint/tendermint/wiki/Byzantine-Consensus-Algorithm

As for the pluggability of the algorithm, it's been proven to be quite doable, in fact, Parity has already done it:

https://github.com/ethcore/parity/blob/ade5a13f5bad745b4200ececde42aa219ad768ae/json/src/spec/engine.rs

And ethermint already implements this through geth in a way (I wouldn't be the one to give the details, that would be something for @jaekwon or @ebuchman to explain)

https://github.com/tendermint/ethermint

As for Eris-DB and your attempt at permissioning by way of Proof Of Authority, we simply utilize the above BFT consensus algorithm and on top of that utilize a native contract (not dissimilar to the current cryptographic addresses such as SHA256, RIPEMD-160, etc.) to implement a permissioning scheme amongst the validators.

While we have our own version of the EVM that is much more stripped down than Geth, I don't think it would be something difficult to make a modular go package for ease of implementation (CC @silasdavis ):

https://github.com/eris-ltd/eris-db/blob/master/manager/eris-mint/evm/snative.go#L73

The above ^^^ could be implemented in a way through geth via some tinkering with this function in geth:

https://github.com/ethereum/go-ethereum/blob/master/core/vm/contracts.go#L33

Both solutions are written in Golang so there is surely a way to make them somewhat compatible. Again. Trying to find a way to work together so ya'll can keep your focus ;)

@aneequesafdar
Copy link

@aneequesafdar aneequesafdar commented Sep 25, 2018

@ivica7

  1. There is a small chance of soft forks in Clique, thus, same heuristics as PoW can be used, ie 5-6 block confirmations or more.

  2. Use the JSON RPC to connect to the client and you can use the following clique functions:

clique
{
  proposals: {},
  discard: function(),
  getProposals: function(callback),
  getSigners: function(),
  getSignersAtHash: function(),
  getSnapshot: function(),
  getSnapshotAtHash: function(),
  propose: function()
}
@nahuseyoum
Copy link

@nahuseyoum nahuseyoum commented Oct 2, 2018

@karalabe can we set a period longer than 15s for clique?

@REPTILEHAUS
Copy link

@REPTILEHAUS REPTILEHAUS commented Oct 4, 2018

@karalabe can we set a period longer than 15s for clique?

@nahuseyoum Yeah just set it when your making the genesis. use puppeth, to generate it.

@SivaGabbi
Copy link

@SivaGabbi SivaGabbi commented Oct 10, 2018

@karalabe : can you explain the part about 'Attack Vector: malicious signer' ? If i am a authorised sealer and I enter some malicious transactions into the block and mine it. How will my block be invalidated? I know that in Aura there is back and forth voting on the validity of every block but as far as i understand in Clique the sealer just mines the block.. correct me if i am wrong..

@karalabe
Copy link
Member Author

@karalabe karalabe commented Oct 10, 2018

The malicious signer scenario is more for the case where a signer is mining side forks all over the place, or including junk into the chain. In that case the network can try to vote it out.

Minting an invalid block is not possible, because all other nodes (and signers) in the network still validate and execute each block, so if you create a bad one, it will just get discarded.

@epm-bt
Copy link

@epm-bt epm-bt commented Oct 11, 2018

Hi all, sorry to bring up a whole new topic, but I got myself wondering today what would happen when the Ether on all the accounts eventually runs out...

I mean, because no Ether is given for minting blocks, (actually no Ether is EVER given out) the accounts on a PoA network will only spend Ether and even though I allocated a lot of Ether for the accounts upon creation and I transfer Ether to new joiners, one day (even though after 1,000 years) the Ether will end for all the accounts, and when that does happen, how will new transactions be issued?

Is there a way to continuously give away Ether to all accounts, or something like that??

I run a demo private network where two nodes are authorities and one is only information poster...

Thanks in advance,

@karalabe
Copy link
Member Author

@karalabe karalabe commented Oct 11, 2018

Ether doesn't vanish from the system. It just gets redistributed. Any fees paid by users are going fully to the signers, so they themselves can use that Ether to do whatever. It's a closed system.

@epm-bt
Copy link

@epm-bt epm-bt commented Oct 15, 2018

Hi there

Regarding the scalability limits of PoA consensus networks. By using puppeth we're asked to set up the minting frequency of blocks on the private network. Meaning that we set up how often new blocks are created. The question is, how to set this up to optimal? I'd reckon the more blocks you produce on a certain time window, the more transactions you can actually process, so why not set that to a couple seconds only? I realise more energy will be used on the authority nodes, but on the other hand we can process more transactions per time unit this way--what can be useful for high throughput applications.

Thanks!

@maxrobot
Copy link

@maxrobot maxrobot commented Oct 19, 2018

Hi Peter,

How was clique tested that it actually achieved consensus as required?

I see the snapshot tests but nothing else.

Indeed is something like this possible?

Thanks

@KaloNK
Copy link

@KaloNK KaloNK commented Oct 23, 2018

Ether doesn't vanish from the system. It just gets redistributed. Any fees paid by users are going fully to the signers, so they themselves can use that Ether to do whatever. It's a closed system.

It is still possible to loose the funds with the time, like for example transfer them to wrong address or a contract destroying them accidentally (if used not just as gas) or just by slowly loosing signers after hardware failures (all their ether is also lost and with the time ...).

What if the same method like for voting is used to generate or subtract some ether from a specific address?
Say there is a faucet contract that delivers ether by some logic (enough as gas for a month for example), then the signers can vote to fund the contract once a month.
Another use case is if a malicious user collects enough gas to perform a large attack - the attack can be quickly stopped by voting to destroy the funds in that account and later to 'mine' them again

@yujia21
Copy link

@yujia21 yujia21 commented Dec 4, 2018

Minor typo if/when there's a next edit:
"It's not elegant: Ethereum is supposed to have dynamic block limits"

@5chdn
Copy link
Contributor

@5chdn 5chdn commented Dec 5, 2018

Please direct review comments to the actual PR: #1570

@karalabe
Copy link
Member Author

@karalabe karalabe commented Dec 5, 2018

@maxrobot The crux of the consensus engine is the same as ethash, just block validation was modified a bit. There wasn't really a danger of consensus not being reached, it's the same GHOST protocol. But if you're wondering how we tested it, it's used by Rinkeby. I tested it while creating Rinkeby, running multiple nodes/signers and seeing how they behave. All in all the spec didn't need to be changed much, just implementation bugs ironed out.

@karalabe
Copy link
Member Author

@karalabe karalabe commented Dec 5, 2018

@KaloNK Censorship doesn't work. If we were to introduce a mechanism to "burn" ether on demand, an attacker would just split its ether into millions of accounts. Since it's monopoly money, it doesn't matter that you pay some fees to distribute the funds. That would prevent censorship while also putting extra load on the network. TL:DR Censorship just makes the network worse, but doesn't solve the issue.

As for bringing new Ether into the system, that's what the prefunding is for. Make an insane amount, and put it behind a multisig if you want. Then you can control the release if you want to do so.

@tkstanczak
Copy link

@tkstanczak tkstanczak commented Feb 19, 2019

Suggesting InTurn and OutOfTurn difficulties to be set at 7 and 3 respectively instead of current 2 and 1. This is to avoid the scenario that we experienced yesterday where the network (Goerli) operating with 5 out of 9 validators was repeatedly stuck when validators were following two paths - one with a single InTurn seal and the other with two consecutive OutOfTurn seals. Since the branch selection algorithm uses the TotalDifficulty as the primary discriminator (and the Ethereum tests use total tx count as the secondary one) we could either choose numbers 7 and 3 that have the lowest common multiple of 21 or change the branch selection algorithm to prefer longer shorter chains if they follow the InTurn blocks (much harder to implement and it would require separate branch selection algorithms for Clique and other consensus algorithms).

7 and 3 are selected so that 1 InTurn block is better than 2 OutOfTurn blocks and 3 OutOfTurn blocks are better than 1 InTurn block.

Current (total difficulty draw):
A 22222
B 222211


Proposed (A wins):
A 77777
B 777733

Proposed (B wins):
A 77777
B 7777333

@timelinefunds
Copy link

@timelinefunds timelinefunds commented Feb 19, 2019

@tkstanczak
Copy link

@tkstanczak tkstanczak commented Feb 20, 2019

Chris Mckay @Errorific Feb 19 23:27
I'm still not convinced there is any good reason to change the difficulty. If the 2 difficulty block takes longer than the block period to reach other nodes then I don't see value in making it possibly a stronger chain for any longer than that 1 block step. The stalling issues we've had are more about not having enough share of the validator pool available rather than the chain weight being wrong
ignoring my reluctance to change the difficulty though, is there a solution in making it inturn = 3, outturn = 2 so that longer chains don't get reorged away from as readily?

Tomasz Kajetan Stańczak @tkstanczak 12:04
7 - 3, or 5 - 3 depending on whether you want to prefer 1 InTurn or 2 OutOfTurn
3, 2 has the lowest common multiple at 6 which is not desired
7 - 3 at 21, 5 -3 at 15

Chris Mckay @Errorific 12:06
why is that a desirable/undesirable feature?

Tomasz Kajetan Stańczak @tkstanczak 12:06
also, 7 -3 only introduces reorgs more readily when in other cases you have a stalemate 2 vs 1 + 1
so it is not more eager - just more decisive
undesired is lower lowest common multiple, desired is a higher lowest common multiple
at common multiple you have stalemates
1 -2 has the lowest common multiple at 2

Chris Mckay @Errorific 12:08
you have stalemates when there is half or less of the validators committed to both forks

Tomasz Kajetan Stańczak @tkstanczak 12:09
they were aware of other forks just not reorganizing
if the other fork had different difficulty everyone would agree on the same fork
when we were resyncing some nodes from 0 it was because this was the only way from them to hopefully pick the right fork without reorganizing

Chris Mckay @Errorific 12:09
sure, but increasing the difficulty on either side would just push the point at which they stalled out further
if we had all 9 nodes online at the time and making blocks then we would not have reached a position where 1 of the forks was unable to produce more blocks and become a clear winner

Tomasz Kajetan Stańczak @tkstanczak 12:10
yes, making it statistically much less likely
the protocol is designed to operate consistently at 51% of the nodes active

Chris Mckay @Errorific 12:10
but increasing the potential length of the reorg

Tomasz Kajetan Stańczak @tkstanczak 12:11
if it does not operate well at this level then the protocol is imperfect
so it is a minor tweak

Chris Mckay @Errorific 12:11
right, and we dropped below 51% of the nodes being operational because we lost comms between some of them
I think it is widely agreed that clique is an imperfect protocol and was put together for the sake of test networks

Tomasz Kajetan Stańczak @tkstanczak 12:14
I would say that introducing the risk of 3 x 7 vs 5 x 3 is undesired because the potential reorg is longer - but this would be impossible in the 9 nodes network
because 5 x 3 would have enough validators to continue processing the network
as it succeeds to seal 5 blocks in a row
so it can go forever

Chris Mckay @Errorific 12:14
yes, and the chain that is regularly producing blocks that has more than half of the participants is the preferred chain

Tomasz Kajetan Stańczak @tkstanczak 12:15
yes

Chris Mckay @Errorific 12:15
so it doesn't matter that the in turn validator with higher difficulty is on the other fork, the chain that's working that more people are watching is the correct one. That chain shouldn't be unduly re-org'd away from

Tomasz Kajetan Stańczak @tkstanczak 12:16
so 7 - 3 would only get stuck at 10+ validators and with potentially longer reorgs - but this would require more nodes disconnected into two entirely separate networks
with 2 - 1 only one rogue / disconnected node can make the network stuck very quickly

Chris Mckay @Errorific 12:16
how with 2-1 does 1 rogue/disconnected node cause it to get stuck?

Tomasz Kajetan Stańczak @tkstanczak 12:16
@Errorific - yes this chain would not be reorged in 7 - 3 - ever
because it would quickly outpace the chain of 7s
with 5 out of 9 if any node mines on top of 1 at the time when 2 is available
then the network is stuck immediately
because they will not reorg to 2 difficulty even if they learn about it

Chris Mckay @Errorific 12:18
right, but in such a case we have 5 rogue/disconnected nodes, the 4 that are offline and the 1 that forked

Tomasz Kajetan Stańczak @tkstanczak 12:19
in 7 vs 3 if one node mines on top of 3 then it will reorg immediately to 7 on learning about it

Chris Mckay @Errorific 12:19
unless 2 nodes mine on top of the 3, or a 7 mines on top of the 3 because it becomes in turn

Tomasz Kajetan Stańczak @tkstanczak 12:19
if two nodes mine on top of 3 then the rest will switch to them because 9 > 7

Chris Mckay @Errorific 12:20
what if a 7 mines on top of the 3?

Tomasz Kajetan Stańczak @tkstanczak 12:20
then 10 > 9 > 7

Chris Mckay @Errorific 12:20
so why is that better?

Tomasz Kajetan Stańczak @tkstanczak 12:20
this is the beauty of 7 and 3 being prime
because you cannot have draws easily
7 + 3 + 3 != 7 + 7 != 3 + 3 + 3 + 3
the network always knows what chain to reorg to
with 2 - 1 you have 1 + 1 + 1 = 2 + 1, 1 + 1 = 2
so the network has lots of potential scenarios when chains do not reorg and stay with their chains
whenever a new node connects it will learn about the best block and advertise the best block - and it chooses by difficulty only
so if the new node connects to the network with 2 - 1 all the new nodes randomly allocate themselves to one of the forks

Chris Mckay @Errorific 12:23
ok, and this can only become a problem when the network drops below 50% of the validators working together and becomes splitbrain without a clear majority of the total validator pool

Tomasz Kajetan Stańczak @tkstanczak 12:23
with 7 - 3 they will always know who to choose
network will always get stuck below 50%
this is desired

Chris Mckay @Errorific 12:24
yes, which is what happened

Tomasz Kajetan Stańczak @tkstanczak 12:24
no
what happened was the network got stuck at above 51%
I will copy this to the github issue, if this is ok for you?

Chris Mckay @Errorific 12:25
I think our definition of healthy nodes is different. If there's network faults that are preventing 2 nodes from communicating effectively then one of the 2 needs to be considered unhealthy.
yup, copy away

Tomasz Kajetan Stańczak @tkstanczak 12:27
if any node has 0 peers then it is equivalent to an offline node
if two nodes in the network are NOT connected this is fine
as long as the validator has one peer it should be considered connected to the network
and the network will progress
what we had on Goerli was that the validator that had 1 peer would cause the others to get stuck
and there was no way of that ever getting resolved even with this validator learning about the new peers in the future
in 7 -3 the moment this validator starts getting connected again then the network would move forward
so if you, for example, added (in 7 - 3) a static node to this validator from any other node that knows about the winning chain then we would progress
in (2 - 1) adding a static node would not resolve the situation
because the total difficulty would be the same and both chains would be equivalent
so the validator would not reorg

@thefallentree
Copy link

@thefallentree thefallentree commented Mar 14, 2019

I'm releasing my implementation of EIP225 at https://github.com/thefallentree/parity-clique from now on. This software is being used to connect to rinkeby. Also it is being used as the only parity based goerli testnet authority node. I hope it will be useful to someone.

Thanks all.

@nicksavers
Copy link
Collaborator

@nicksavers nicksavers commented Mar 14, 2019

@karalabe Maybe it's a good idea to clean up the first comment to take out the redundant information that's now in the EIP and put a link to it at the top. And now that it's up and running with multiple clients on Görli testnet, is it perhaps time to start moving to Last Call in order to Finalize it?

@realcodywburns
Copy link
Contributor

@realcodywburns realcodywburns commented Jun 8, 2019

To improve CFT on a network larger than three:

For any sealer:
(BLOCKNUMBER % SEALER_INDEX == 0)  ?  DIFF_INTURN  :  DIFF_NOTURN 
 and:
SIGNER_LIMIT == floor(SIGNER_COUNT / 2) + 1

USING :
DEFAULT
DIFF_INTURN:  2
DIFF_NOTURN:  1

when a sealer misses an "INTURN";  N-1 / 2 sealers can sign a block with the same difficulty(DIFF_NOTURN) and submit it after rand(SIGNER_COUNT * 500 ms)
 

PROPOSED CHANGE

DIFF_INTURN:   1 + ( (SIGNER_COUNT-1)/2 ) 
DIFF_NOTURN:   RAND( (SIGNER_COUNT-1)/2 )
@JackieCode
Copy link

@JackieCode JackieCode commented Nov 6, 2019

I'm new for nodejs. how can I run the github.com/Magicking/Clixplorer ,I want a POA explorer

@axic
Copy link
Member

@axic axic commented Nov 8, 2019

As a note, Clique was made into a proper EIP (and marked Final recently): https://eips.ethereum.org/EIPS/eip-225

Please read that link for the final specification and keep this issue as the discussion medium.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.