GRANDPA invariant violated (target > best) #13254

davxy · 2023-01-27T16:55:34Z

May happen that THIS GRANDPA invariant constraining a vote target to be less than the current best is violated.

The bug triggers quite often so is worth fixing it ASAP

Analysis:

GRANDPA protocol needs to select what is the target block to vote on here
That block is selected to be above a given block (aka "base") passed as parameter
target is selected using select_chain.finality_target(base, None) here. Starting from the leaves this will pick the block that is farther from the base (optionally less than a max number, None is passed)
best is selected using select_chain.best_chain() that , according to the LongestChain implementation here then here it returns the best descendant of the best block (sorry for the recursion... 😃) according to the META info stored by the backend. In practice this contains a descendant of the best according to BABE primary/secondary block rule (not always corresponding to the longest chain).

I have a locked up DB (i.e. every time I restart the validator this terminates immediately) like the following:

    7---8---9---10---11
    ^       |         ^selected as target by select chain
    base    |
            +---10'
                 ^best acording to backend meta info

When I start the node the following error is then returned and grandpa terminates:

XXX DEBUG LOG target: 11, best: 10, base: 7
2023-01-27 15:15:54 GRANDPA voter error: safety invariant has been violated: SelectChain returned a finality target higher than its best block
2023-01-27 15:15:54 Essential task `grandpa-voter` failed. Shutting down service.

Potential solution

Obviously we have to prevent GRANDPA to vote for a chain that is not the best according to our node.

The finality_target should choose directly a chain that includes our best, thus should pick the longest chain that:
contains the base AND the best

The text was updated successfully, but these errors were encountered:

rphmeier · 2023-01-28T02:30:04Z

Polkadot uses a custom finality voting rule and select_chain implementation which is quite crucial not to break: https://github.com/paritytech/polkadot/blob/master/node/service/src/relay_chain_selection.rs and https://github.com/paritytech/polkadot/blob/master/node/core/chain-selection/src/lib.rs

substrate/client/finality-grandpa/src/environment.rs

Line 1230 in 248fdf0

if target_header.number() > best_header.number() {

I don't fully understand why this check is required. If there is a maximum allowed block number then the SelectChain trait should accept this block number as a parameter to finality_target.

rphmeier · 2023-01-28T02:31:13Z

Obviously we have to prevent GRANDPA to vote for a chain that is not the best according to our node.

Not exactly sure what you mean by this. The SelectChain implementation is also part of the node. Which source of information is actually correct within the node and how do you know that for certain?

rphmeier · 2023-01-28T02:44:50Z

The finality_target should choose directly a chain that includes our best, thus should pick the longest chain that:
contains the base AND the best

I think this in particular would break Polkadot. Forcing a vote on anything >= than the minimum possible block is just dead wrong and a bad API for the SelectChain. In particular, it gives us no option to actually select a chain...

In Polkadot we are happy to build on top of forks which contain un-approved parachain candidates but we definitely 100% don't want to finalize them. This is just one example. But the API needs to support such use-cases.

SelectChain is a very consensus critical interface.

It is critical that GRANDPA follows the outputs it gets from SelectChain
It is critical that the interface does not constrain behavior any more than the absolute minimum required by GRANDPA

davxy · 2023-01-28T09:17:28Z

Maybe I was not super clear describing the issue.

If I have to synthesize the issue in one sentence I would say: "the current way to choose the target (SelectChain::finality_target()) doesn't take in consideration what is the chain chosen by SelectChain::best_chain()"

The two functions may return blocks in two separate forks, we compare their heights and bail out if best > target.

IMO the hash returned by finality_target should be in the same chain of best_chain (or can we vote for a chain that is not the best according to best_chain()?) As far as I understood this is not correct.

davxy · 2023-01-28T09:20:04Z

Obviously we have to prevent GRANDPA to vote for a chain that is not the best according to our node.

Not exactly sure what you mean by this. The SelectChain implementation is also part of the node. Which source of information is actually correct within the node and how do you know that for certain?

With "best according to our node" I was referring to the value returned by SelectChain::best_chain(),

andresilva · 2023-01-30T11:56:45Z

The reason we call SelectChain::best_chain in GRANDPA is just to pass it along to any VotingRule that might want to further restrict the votes (e.g. we have a voting rule to constrain votes by N blocks from best). I think we should change the API such that in a single call we can get both the finality target and the best block (this also helps with a possible race condition of making two asynchronous calls separately). The semantics should stay the same and the voter should follow the decision to finalize the block that was yielded by SelectChain, but I think it's fair for the API to expect that both blocks are congruent, i.e. the finality target should be an ancestor of the best block (or the same block).

davxy · 2023-01-31T14:23:44Z

@andresilva I elaborated our discussion and if here our requirements are:

that target.number <= best.number
(this is the new requirement) target is in the same chain of best (according to the SelectChain::best_chain)

Can't we just start our finality target search from SelectChain::best_chain and eventually get one of its ancestors if there is a limit to respect? Obviously the lower bound is the passed block (aka the base) as before...

In short, requisite 2 above limits our choices to this chain only: base -> ..... -> best
if I've not overlooked some detail, probably makes no sense to iterate over all the leaves.

rphmeier · 2023-02-02T18:06:35Z

target is in the same chain of best (according to the SelectChain::best_chain)

Is this actually achievable?

AFAIK target is set either as the last finalized block or is an overestimate of what could be finalized. This by definition can be conflicting with BABE or other chain selection rules.

We may require that target is in the same chain as best, but it should be done by changing the opinion of best (i.e. "breaking" BABE) rather than changing the definition of target (i.e. "breaking" GRANDPA)

davxy · 2023-02-02T18:46:03Z

It is achievable when the user decides to use LongestChain implementation of SelectChain (i.e. the "default" one that ships with Substrate).

Current authoring algorithms are using the same SelectChain that GRANDPA is using here.

They write what is their best in the backend (META column), this info is then fetched using self.backend.blockchain().info() in the LongestChain methods implementation.

This strategy is not something written in stone, but depends on the SelectChain implementation the user decides to use (e.g. Polkadot, as you know doesn't use LongestChain as is).

PR #13289 is actually using this information to get a finalization target in the same chain of the best

bkchr · 2023-02-11T06:11:51Z

Can't we just start our finality target search from SelectChain::best_chain and eventually get one of its ancestors if there is a limit to respect? Obviously the lower bound is the passed block (aka the base) as before...

The linked pr is still missing the removal of best_chain then. Otherwise we can run into issues when there happens a reorg between calling finality_target and best_chain as already explained by rob.

davxy self-assigned this Jan 27, 2023

davxy added the I3-bug The node fails to follow expected behavior. label Jan 27, 2023

davxy mentioned this issue Feb 1, 2023

Fix longest chain finalization target lookup #13289

Merged

bkchr mentioned this issue Feb 6, 2023

Subsystem (av-store) stalled while major syncing paritytech/polkadot#6656

Closed

andresilva mentioned this issue Feb 11, 2023

grandpa: don't error if best block and finality target are inconsistent #13364

Merged

davxy closed this as completed in #13289 Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRANDPA invariant violated (target > best) #13254

GRANDPA invariant violated (target > best) #13254

davxy commented Jan 27, 2023 •

edited

Loading

rphmeier commented Jan 28, 2023 •

edited

Loading

rphmeier commented Jan 28, 2023 •

edited

Loading

rphmeier commented Jan 28, 2023 •

edited

Loading

davxy commented Jan 28, 2023 •

edited

Loading

davxy commented Jan 28, 2023

andresilva commented Jan 30, 2023 •

edited

Loading

davxy commented Jan 31, 2023 •

edited

Loading

rphmeier commented Feb 2, 2023 •

edited

Loading

davxy commented Feb 2, 2023 •

edited

Loading

bkchr commented Feb 11, 2023

GRANDPA invariant violated (target > best) #13254

GRANDPA invariant violated (target > best) #13254

Comments

davxy commented Jan 27, 2023 • edited Loading

Potential solution

rphmeier commented Jan 28, 2023 • edited Loading

rphmeier commented Jan 28, 2023 • edited Loading

rphmeier commented Jan 28, 2023 • edited Loading

davxy commented Jan 28, 2023 • edited Loading

davxy commented Jan 28, 2023

andresilva commented Jan 30, 2023 • edited Loading

davxy commented Jan 31, 2023 • edited Loading

rphmeier commented Feb 2, 2023 • edited Loading

davxy commented Feb 2, 2023 • edited Loading

bkchr commented Feb 11, 2023

davxy commented Jan 27, 2023 •

edited

Loading

rphmeier commented Jan 28, 2023 •

edited

Loading

rphmeier commented Jan 28, 2023 •

edited

Loading

rphmeier commented Jan 28, 2023 •

edited

Loading

davxy commented Jan 28, 2023 •

edited

Loading

andresilva commented Jan 30, 2023 •

edited

Loading

davxy commented Jan 31, 2023 •

edited

Loading

rphmeier commented Feb 2, 2023 •

edited

Loading

davxy commented Feb 2, 2023 •

edited

Loading