Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure indicating commitments #3391

Merged
merged 3 commits into from
Oct 15, 2020

Conversation

ptrus
Copy link
Member

@ptrus ptrus commented Oct 8, 2020

The E2E tests with increased epoch times (#3322) discovered a failure edge case that can happen if the storage committee becomes unavailable after proposer already proposes a batch, but before any of the nodes submit a commitment. In that case the nodes currently abort the processing and wait for a round failure. Since there is no submitted commitments the round failure only happens at the next epoch transition.

This PR adds support for failure indicating commitments (which were first proposed as part of the ADR 0005), which solve the above mentioned issue since compute nodes can now submit a failure indicating commitment and with it trigger a round failure.

NOTE: only the subset (support for failure indicating commitments) of the mentioned ADR is implemented here.

TODO:

  • probably some more tests
  • check for other scenarios where a failure indicating commitment should be submitted by an executor node (currently only the mentioned failure scenario was fixed)

go/roothash/api/commitment/executor.go Outdated Show resolved Hide resolved
go/roothash/api/commitment/pool.go Outdated Show resolved Hide resolved
@kostko kostko added c:breaking/runtime Category: breaking runtime changes c:breaking/consensus Category: breaking consensus changes c:roothash Category: root hash service c:runtime/compute Category: runtime compute worker labels Oct 8, 2020
@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch 3 times, most recently from d5d902f to c2df984 Compare October 9, 2020 12:43
go/roothash/api/commitment/pool.go Outdated Show resolved Hide resolved
go/worker/compute/executor/committee/node.go Outdated Show resolved Hide resolved
go/worker/compute/executor/committee/node.go Outdated Show resolved Hide resolved
@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch 9 times, most recently from 3cee767 to 5442a19 Compare October 9, 2020 18:15
@codecov
Copy link

codecov bot commented Oct 9, 2020

Codecov Report

Merging #3391 into master will decrease coverage by 0.09%.
The diff coverage is 63.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3391      +/-   ##
==========================================
- Coverage   66.03%   65.93%   -0.10%     
==========================================
  Files         371      371              
  Lines       33285    33367      +82     
==========================================
+ Hits        21979    22002      +23     
- Misses       8075     8127      +52     
- Partials     3231     3238       +7     
Impacted Files Coverage Δ
go/common/version/version.go 80.00% <ø> (ø)
go/worker/compute/executor/committee/node.go 61.95% <44.15%> (-1.23%) ⬇️
go/roothash/api/commitment/pool.go 76.09% <67.21%> (+3.60%) ⬆️
go/roothash/api/commitment/executor.go 77.64% <93.93%> (+9.12%) ⬆️
go/consensus/tendermint/apps/roothash/roothash.go 74.05% <100.00%> (ø)
go/roothash/tests/tester.go 88.43% <100.00%> (ø)
go/runtime/host/mock/mock.go 84.90% <100.00%> (ø)
go/runtime/host/sandbox/sandbox.go 65.74% <0.00%> (-9.85%) ⬇️
go/worker/keymanager/handler.go 61.22% <0.00%> (-6.13%) ⬇️
go/consensus/tendermint/full/services.go 77.11% <0.00%> (-5.94%) ⬇️
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7ec215...66063cc. Read the comment docs.

@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch 7 times, most recently from 60c624f to cdf79b1 Compare October 13, 2020 08:40
Copy link
Member

@kostko kostko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also probably submit a failure in case the runtime fails to process a batch, e.g. here:

n.abortBatchLocked(errRuntimeAborted)

@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch 3 times, most recently from 87b571b to e79973c Compare October 14, 2020 07:37
@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch from e79973c to 36fba86 Compare October 14, 2020 08:09
@ptrus ptrus marked this pull request as ready for review October 14, 2020 12:20
@ptrus
Copy link
Member Author

ptrus commented Oct 14, 2020

We should also probably submit a failure in case the runtime fails to process a batch, e.g. here:

Added, not sure how to best test these cases maybe with byzantine-storage tests, need to think about it a bit, but probably can be done in a separate PR.

@kostko
Copy link
Member

kostko commented Oct 14, 2020

Yeah we should add Byzantine tests for all these cases, can you open an issue for it?

go/roothash/api/commitment/executor.go Outdated Show resolved Hide resolved
go/roothash/api/commitment/pool.go Outdated Show resolved Hide resolved
go/worker/compute/executor/committee/node.go Outdated Show resolved Hide resolved
go/worker/compute/executor/committee/node.go Outdated Show resolved Hide resolved
go/worker/compute/executor/committee/node.go Outdated Show resolved Hide resolved
go/worker/compute/executor/committee/node.go Outdated Show resolved Hide resolved
go/worker/compute/executor/committee/node.go Outdated Show resolved Hide resolved
@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch from 36fba86 to 8f71c9e Compare October 15, 2020 11:16
@ptrus
Copy link
Member Author

ptrus commented Oct 15, 2020

Yeah we should add Byzantine tests for all these cases, can you open an issue for it?

#3414

@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch from 8f71c9e to f2b8be8 Compare October 15, 2020 11:34
n.abortBatchLocked(err)
return
}

// Sign the commitment and submit.
commit, err := commitment.SignExecutorCommitment(n.commonNode.Identity.NodeSigner, proposedResults)
// TODO: Add crash point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need one as there is one below and nothing interesting happens inbetween.

return
}

if batch == nil || batch.computed == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this body is longer, maybe invert the check so less stuff is indented?

@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch from f2b8be8 to 82f876d Compare October 15, 2020 13:13
@ptrus ptrus force-pushed the ptrus/feature/failure-indicating-commitments branch from 82f876d to 66063cc Compare October 15, 2020 13:24
@ptrus ptrus merged commit 5a5d2b8 into master Oct 15, 2020
@ptrus ptrus deleted the ptrus/feature/failure-indicating-commitments branch October 15, 2020 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:breaking/consensus Category: breaking consensus changes c:breaking/runtime Category: breaking runtime changes c:roothash Category: root hash service c:runtime/compute Category: runtime compute worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants