Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Istanbul byzantine fault tolerant consensus protocol
Note, this work is deeply inspired by Clique POA. We've tried to design as similar a mechanism as possible in the protocol layer, such as with validator voting. We've also followed its EIP style of putting the background and rationale behind the proposed consensus protocol to help developers easily find technical references. This work is also inspired by Hyperledger's SBFT, Tendermint, HydraChain, and NCCU BFT.
Istanbul BFT is inspired by Castro-Liskov 99 paper. However, the original PBFT needed quite a bit of tweaking to make it work with blockchain. First off, there is no specific "client" which sends out requests and waits for the results. Instead, all of the validators can be seen as clients. Furthermore, to keep the blockchain progressing, a proposer will be continuously selected in each round to create block proposal for consensus. Also, for each consensus result, we expect to generate a verifiable new block rather than a bunch of read/write operations to the file system.
Istanbul BFT inherits from the original PBFT by using 3-phase consensus,
Blocks in Istanbul BFT protocol are final, which means that there are no forks and any valid block must be somewhere in the main chain. To prevent a faulty node from generating a totally different chain from the main chain, each validator appends
Istanbul BFT is a state machine replication algorithm. Each validator maintains a state machine replica in order reach block consensus.
Round change flow
Currently we support two policies: round robin and sticky proposer.
Validator list voting
We use a similar validator voting mechanism as Clique and copy most of the content from Clique EIP. Every epoch transaction resets the validator voting, meaning if an authorization or de-authorization vote is still in progress, that voting process will be terminated.
For all transactions blocks:
Future message and backlog
In an asynchronous network environment, one may receive future messages which cannot be processed in the current state. For example, a validator can receive
To speed up the consensus process, a validator that received
We define the following constants:
We also define the following per-block constants:
We didn't invent a new block header for Istanbul BFT. Instead, we follow Clique in repurposing the
Block hash, proposer seal, and committed seals
The Istanbul block hash calculation is different from the
The calculation is still similar to the
Proposer seal calculation
By the time of proposer seal calculation, the committed seals are still unknown, so we calculate the seal with those unknowns empty. The calculation is as follows:
Block hash calculation
While calculating block hash, we need to exclude committed seals since that data is dynamic between different validators. Therefore, we make
Before inserting a block into the blockchain, each validator needs to collect
Committed seal calculation:
Committed seal is calculated by each of the validator signing the hash along with
Block locking mechanism
Locking mechanism is introduced to resolve safety issues. In general, when a proposer is locked at certain height
Lock and unlock
Can you explain when block insertion might fail? I'm struggling to see why block insertion would ever fail for a valid proposal.
Why not just accept zero-gasprice transactions?
Have you tried running the network with >=1/3 faulty nodes? If so, what does the result look like; what kinds of failures do you see in practice?
Before actually inserting the block into the chain, the consensus only validates the block header. Inserting will do more checks so it can fail with other reasons.
You're right. We've updated the EIP according.
Theoretically it's also possible to finalize two conflicting blocks, if the proposer is one of the Byzantine nodes and makes two proposals and each get 2/3 prepares+commits. Though I guess that's fairly unlikely to happen in practice and so won't appear in that many random tests.
I know the meaning of block validity, but outside the PoW this is a little bit ambiguous.
Yes, I think you are right. Suppose there are f+1 faulty nodes, f+f good nodes, and the propose is among the faulty nodes. The proposer can send first f good nodes A block and second f good nodes B block. Then both groups can receive 2f+1 of prepares+commits for block A and B respectively. Thus two conflicting blocks can be finalized.
Each validator puts
Great! I was a little confuse through Valid block and Consensus Proof, your response is helpful also for the meaning of validation in Clique. Thank you.
Can you clarify when this timer starts? Is there one timer for the whole round, like in PBFT (well, in PBFT the timer starts once the client request is received), or is there a new timer at each phase (pre-prepared, prepared, etc.) as the figure seems to suggest?
Unless there is additional mechanism not described above (or perhaps I am just missing something), I think this protocol may have safety issues across round changes, as there does not seem to be anything stopping validators from committing a new block in a new round after others have committed in the previous round. This is what the "locking" mechanism in Tendermint addresses. In PBFT it's handled by broadcasting much more information during the round change. When you "blockchainify" PBFT, you can do away with this extra information if you're careful to introduce something like Tendermint's locking mechanism. I suspect that if you address these issues, you will end up with a protocol that is roughly identical (if not exactly identical) to Tendermint. Happy to discuss further and collaborate on this - great initiative!
Yes, there is only one timer which is reset/triggered in every beginning of a new round.
Yes, in some extreme cases there might be safety issues. For example, say there is only one validator which receives
Yes, sticky proposer policy can lead to this issue. We've listed "faulty propose detection" in the remaining tasks section aiming to resolve it. One possible way is to switch to round robin policy whenever a validator sees an empty block. However, sticky proposer can still hack it by generating very small block every round.
Detecting faulty node deterministically is hard which makes penalize faulty nodes even harder. For simplicity, this PR doesn't dive into this topic. It might be worth looking in the follow up EIP and research.
In our preliminary testing result with 4 validators setup, the consensus time took around 10ms ~ 100ms, depending on how many transactions per block. In our testing, we allow each block to contain up to 2000 transactions.
Great work on developing Istanbul!
One comment on "Does it still make sense to use gas?"
I've developed a testnet (using Ethermint) and modified the client to not charge gas. I wanted to bounce this idea of others to see whether this it is valid...
To avoid the infinite loop problem, the validators ensure the that smart contracts being published to the blockchain are sent from a small set of white-listed accounts.
These accounts are trusted by the consortium to only publish smart contracts that have gone through a strict review process.
I suppose in the extreme edge case that a computationally expensive slipped through and was published by mistake, then the validators stop and rollback to before the event.
Does this sound reasonable?
Appreciate any feedback on the faults with such an implementation.
The current implementation (as found in Quorum) breaks the concept of the "pending" block, used in several calls, but most notably in
In Ethereum, the pending block means the latest confirmed block + all pending transactions the node is aware of. This means that directly after a transaction is sent to the node (through RPC), the transaction count (aka nonce) in the "pending" block is increased. A lot of tools, like abigen in this repo or any other tool where tx signing occurs at the application level instead of in geth, rely on this for making multiple transactions at once. After the first one, the result of
With the current implementation of Istanbul, the definition of the "pending block" seem to be different. When submitting a transaction, the result for
So this seems to mean that the "pending block" definition changed from "latest block + pending txs" to "the block that is currently being voted on". I consider this a bug; if this is done on purpose, it breaks with a lot of existing applications (all users of abigen, f.e.) and should be reconsidered.
I originally reported about this issue in the Quorum repo, but there doesn't seem to be a good place to report bugs in Istanbul other than here.
I'm sorry to disrupt the technical discussion here with a non-technical question: What is the intention for including this in the EIP repository? In particular I was wondering:
(1) Is this proposal seeking public protocol adoption (it seems private chain focused, really at extending
One more question: In Clique, with
Hi I have a question about IBFT’s consensus fault at the number of lock <n/3:
Imagine we have n=7 node, f=2. The node are A, B, C, D, E, F, G
At first round:
At this stage, F and G stop voting.
We have 5 nodes, however E and D cannot unlock to either p1 or p2. A, B, C could not themselves come to any consensus since at most we have 4 node voting, while we need at least 5 Nodes.
As I can see current implementation of locks is not suffice to handle this case.
This still has not made it to accepted EIP status, @axic? Eeek.
With the EEA/EF Mainnet initiative, we really do need to be starting to consider EEA standards within the same EIP process, even if they do not apply to the ETH mainnet.
The EIP standards process needs to look at Ethereum-as-a-protocol, not purely the needs of $ETH.
When I raised that to @Souptacular in 2017, his response was that there was likely little appetite in the Core Devs group for taking on that extra load, considering that such proposals were not of direct benefit to ETH. Maybe the appetite is different now, especially with PegaSys people spanning both sides, @timbeiko and @shemnon being deeply involved with Core Devs, etc?
I am a bit confused, but I don't think anyone would have rejected this submitted as an EIP. As it stands today, this is only a discussion. When it gets submitted as a pull request, it can be merged as a draft and likely turned final, given it was implemented in multiple clients (and superseded already?).
Note that the Quorum implementation has recently changed the calculation for a quorum of validators to fix an issue. There are a bunch of details I'm not familiar with but this spec likely needs an update before it becomes final. From my memory of trying to implement IBFT1 I seem to recall some parts of this were misleading or wrong (or possibly the Quorum implementation was wrong but that's essentially become the standard for IBFT1 since it's what's in production). I should have raised them at the time (sorry) and would have to review the spec again now, though there are likely better people.
There is also ongoing work in the EEA to adopt a standard BFT consensus algorithm. I'm not sure what the status of that is. It does mean that we don't necessarily need this and other non-mainnet stuff as EIPs, the EEA spec may (or may not) be a better place for them.
@ajsutton My guts says that everything which can be EIPs should be EIPs, to avoid siloing between Public Ethereum and Enterprise Ethereum (which is exactly what happened with the EEA - intentionally at first, but with the intention of converging them back together in happier days - ie now).
There is nothing to say that all EIPs have to be implemented by ALL clients to be useful. There is nothing to say that all EIPs have to apply to the ETH mainnet to be accepted.
The fact that EIPs were NOT originally written for functionality like: JSON-RPCs, Swarm, Warp-Sync, Aura, Clique and more was a real problem. You were stuck with trying to be bug-for-bug compatible with Geth or with Parity.
Now we have more clients I would argue that pretty much EVERY useful feature from ETH1 clients, including EEA features, should have EIPs written for them - unless they are very experimental and new. The spec is what lets other clients adopt.
We modified the implementation to better handle dynamic validators based on a reported issue with scaling a network from 1 validator to 4. We'll continue to enhance the protocol as IBFT. We are currently working on a TLA+ spec, with so far a few updates to the described protocol, that we'll also make available once it's completed and more than happy to see it as an EIP. I thought this was originally an EIP.
It is an EIP actually: https://eips.ethereum.org/EIPS/eip-225
It has an EIP too: https://eips.ethereum.org/EIPS/eip-1474
The Clique EIP was written by @karalabe in an unsuccessful attempt to "unfork" the different POA approaches after Parity "went first" with Aura and then a group of companies launched the Kovan testnet without even informing the Geth team:
Parity did not "play ball" and implement Clique in Parity, and also did not author an EIP of their own for Aura, or propose any alternative standard which both teams could implement.
That was finally resolved by the Gorli project (co-funded by the EF and ETC Coop) which added Clique support to Parity. Thank you @soc1c, @aidanih and @YazzyYaz. ETC Coop paid $130K on our side for that to happen, and I believe that the EF matched that funding.
The JSON-RPC EIP also happened a lot later than the original Wiki spec. Does Parity even comply with the EIP? I honestly do not know. The lack of alignment between Geth and Parity on that score has been an issue since 2016.
A Warp-Sync EIP would have been very useful. Aleth was leveraging that functionality at one stage, right, @axic? Is that still the case?
Swarm is "graduated" from EF funding now, and they have their own process, making an EIP moot at this stage:
IBFT was created by AMIS, a Taiwanese banking consortium, in 2017 and it is completely unrelated to the Istanbul hard fork.
They called it Istanbul as a riff on Byzantium Fault Tolerance.
Where Byzantium, Constantinople and Istanbul were the names assigned to the phases of what was originally planned as a single hard fork called Metropolis, the phase of the original ETH roadmap prior to Serenity.
Those all being different names which the real world city of Istanbul has had in its history (and being a metropolis).