Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integration] Fixes finalized block height with block state bug #936

Merged
merged 8 commits into from Jul 13, 2021

Conversation

yhassanzadeh13
Copy link
Contributor

@yhassanzadeh13 yhassanzadeh13 commented Jul 8, 2021

Abstract

This PR fixes a bug with the way block state tracks the finalized block, which causes a nil dereference panic on integration tests that use the finalized blocks.

Note: block state tracker is a test helper (not part of protocol code).

Problem Definition

While developing integration tests for sealing and verification, a flakey behavior observed that from time to time would cause a nil segmentation panic:

suite.go:63: test panicked: runtime error: invalid memory address or nil pointer dereference
goroutine 9 [running]:
runtime/debug.Stack(0xc0008b93b8, 0x1202c80, 0x1db1a80)
 /opt/hostedtoolcache/go/1.15.13/x64/src/runtime/debug/stack.go:24 +0x9f
github.com/stretchr/testify/suite.failOnPanic(0xc000107380)
 /home/runner/go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:63 +0x57
panic(0x1202c80, 0x1db1a80)
 /opt/hostedtoolcache/go/1.15.13/x64/src/runtime/panic.go:969 +0x1b9
github.com/onflow/flow-go/integration/tests/verification.(*ResultApprovalTestSuite).TestVerificationNodeHappyPath(0xc000001b00)
 /home/runner/work/flow-go/flow-go/integration/tests/verification/result_approval_test.go:22 +0x5d
reflect.Value.call(0xc000442960, 0xc000123680, 0x13, 0x13a6c32, 0x4, 0xc0008b9e30, 0x1, 0x1, 0xc0008b9cf8, 0x42406a, ...)

This panic happens when s.BlockState.WaitForFirstFinalized(s.T()) returns a nil pointer instead of the first finalized block. Looking at its method, we see there is most probably a discrepancy between the highestFinalized variable and finalizedByHeight map in BlockState: there can be cases that although the highestFinalized is updated to height x blocks of heights < x are still missing in finalizedByHeight map, which leads to a nil dereference panic if any of those missing blocks are requested.

func (bs *BlockState) WaitForFirstFinalized(t *testing.T) *messages.BlockProposal {
	require.Eventually(t, func() bool {
		return bs.highestFinalized > 0
	}, blockStateTimeout, 100*time.Millisecond,
		fmt.Sprintf("did not receive first finalized block within %v seconds", blockStateTimeout))

	return bs.finalizedByHeight[1]
}

The lowest-hanging fruit fails

As the lowest hanging fruit, one may suggest ignoring highestFinalized and relying solely on the finalizedByHeight map:

func (bs *BlockState) WaitForFirstFinalized(t *testing.T) *messages.BlockProposal {
	require.Eventually(t, func() bool {
		return bs.finalizedByHeight[1] != nil
	}, blockStateTimeout, 100*time.Millisecond,
		fmt.Sprintf("did not receive first finalized block within %v seconds", blockStateTimeout))

	return bs.finalizedByHeight[1]
}

This solution is though failing since there can be cases that finalizedByHeight[1] permanently remains nil although the consensus progresses fine. Based on the way that finalizedByHeight map is updated there can be inputs that make some finalized blocks _never make their way to the finalizedByHeight map. This is basically since the code assumes finalized blocks are coming in order, which is a wrong assumption. As a realistic execution trace of the code, the finalized blocks are coming in this order:

arriving block height      | 8 4 2 7 3 6 5 1 
highest proposed height    | 8 8 8 8 8 8 8 8 
highest finalized height   | 0 0 0 0 0 0 2 2 

So by the time block height 5 arrives, it progresses finalized height to 2 (while block 1 is missing), and even worse than that, when block height 1 arrives it does not make its way to the finalizedByHeight map, basically, since it is below the highest finalized block (i.e., 2).

Ultimate solution

In this PR we refactor the model block state updates the finalized block imitating the similar way that consensus nodes make a block finalized:

  • highestFinalized reflects the highest height that is both finalized, and has its block available in finalizedByHeight map, with all previous blocks already available in the finalizedByHeight map.
  • A block height h is added to finalizedByHeight map only if height h-1 already both finalized and exists in finalizedByHeight.
    if h == bs.highestFinalized+1 {
        bs.highestFinalized = h
    }
    

This solves both of the problems we encounter with the block state tracker:

  • No finalized height is ever leaks out of the finalizedByHeight map.
  • If highestFinalized = h then all heights [1, h] are already available on the map.

Comment on lines 43 to 50
if b.Header.Height < 3 {
return
}

confirmsHeight := b.Header.Height - uint64(3)
if confirmsHeight < bs.highestFinalized {
return
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same logic as before just simplified the indentations (addressed todo).

@yhassanzadeh13 yhassanzadeh13 marked this pull request as ready for review July 8, 2021 19:05
@codecov-commenter
Copy link

codecov-commenter commented Jul 8, 2021

Codecov Report

Merging #936 (33446f1) into master (0272226) will decrease coverage by 1.53%.
The diff coverage is 71.99%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #936      +/-   ##
==========================================
- Coverage   56.44%   54.91%   -1.54%     
==========================================
  Files         426      277     -149     
  Lines       25049    18339    -6710     
==========================================
- Hits        14140    10071    -4069     
+ Misses       8995     6904    -2091     
+ Partials     1914     1364     -550     
Flag Coverage Δ
unittests 54.91% <71.99%> (-1.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
engine/execution/state/state.go 21.79% <0.00%> (-0.19%) ⬇️
fvm/handler/event.go 57.37% <ø> (ø)
ledger/complete/ledger.go 61.53% <ø> (ø)
ledger/complete/mtrie/trie/trie.go 48.73% <0.00%> (-0.50%) ⬇️
module/builder/consensus/builder.go 71.59% <0.00%> (ø)
...ule/mempool/consensus/incorporated_result_seals.go 0.00% <0.00%> (ø)
module/mempool/queue/queue.go 96.00% <ø> (-0.67%) ⬇️
network/p2p/connManager.go 100.00% <ø> (ø)
network/p2p/libp2pConnector.go 0.00% <ø> (ø)
network/p2p/libp2pNode.go 66.40% <ø> (+0.14%) ⬆️
... and 190 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7c951f...33446f1. Read the comment docs.

Copy link
Member

@zhangchiqing zhangchiqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explaination

Comment on lines 67 to 70
// highestFinalized is only updated when not only the block is added to finalizedByHeight, but also
// it is "exactly" the next height compared to the current value.
// doing so prevents the illusion of highestFinalized indicating the highest finalized height "with an available block".
if h == bs.highestFinalized+1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that this logic can cause the highestFinalized to lag permanently behind because we are looping from high to low heights. Suppose there is a gap in the blocks (block 10 is missing), then we receive blocks 11-20. The highestFinalized block height must be less than 10 while the latest unfinalized height is 20. When we receive block 10, and each subsequent block, we will increase the highestFinalized value by at most 1 because of this condition and the iteration order. So the gap between latest unfinalized and latest finalized will remain >=10.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this comment by separating the logic of processing ancestors (high-to-low) and updating the highest finalized height (low-to-heigh) in 7af850e.

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎸

@yhassanzadeh13 yhassanzadeh13 merged commit 6036cdc into master Jul 13, 2021
@yhassanzadeh13 yhassanzadeh13 deleted the yahya/fix-block-state-bug branch July 13, 2021 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants