Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanity check for registered entities' stake #2748

Merged
merged 2 commits into from
May 6, 2020

Conversation

tjanez
Copy link
Member

@tjanez tjanez commented Mar 2, 2020

Extract and generalize registry's staking sanity checks in go/consensus/tendermint/apps/supplementarysanity.checkStakeClaims() and move them to go/registry/api.SanityCheckStake() function.

Augment the checks to check if an entity has enough stake for all stake claims in the Genesis document to prevent panics at oasis-node start-up due to entities not having enough stake in the escrow to satisfy all their stake claims.

Example panic when an entity doesn't have enough stake to be registered:

panic: mux: InitChain: fatal error in application: '200_registry': registry: genesis entity registration failure: staking: insufficient stake

goroutine 1 [running]:
github.com/oasislabs/oasis-core/go/consensus/tendermint/abci.(*abciMux).InitChain(0xc0001f64b0, 0x0, 0xed5d4d490, 0x0, 0xc000fd3580, 0x32, 0xc00005ae40, 0xc000128fc0, 0x1, 0x1, ...)
	github.com/oasislabs/oasis-core/go@/consensus/tendermint/abci/mux.go:442 +0xedd
github.com/tendermint/tendermint/abci/client.(*localClient).InitChainSync(0xc00117ade0, 0x0, 0xed5d4d490, 0x0, 0xc000fd3580, 0x32, 0xc00005ae40, 0xc000128fc0, 0x1, 0x1, ...)
	github.com/tendermint/tendermint@v0.32.8/abci/client/local_client.go:223 +0x101
github.com/tendermint/tendermint/proxy.(*appConnConsensus).InitChainSync(0xc007dbca90, 0x0, 0xed5d4d490, 0x0, 0xc000fd3580, 0x32, 0xc00005ae40, 0xc000128fc0, 0x1, 0x1, ...)
	github.com/tendermint/tendermint@v0.32.8/proxy/app_conn.go:65 +0x6b
github.com/tendermint/tendermint/consensus.(*Handshaker).ReplayBlocks(0xc000fa3238, 0xa, 0x170000, 0x173efdc, 0x6, 0xc000fd3580, 0x32, 0x0, 0x0, 0x0, ...)
	github.com/tendermint/tendermint@v0.32.8/consensus/replay.go:318 +0x666
github.com/tendermint/tendermint/consensus.(*Handshaker).Handshake(0xc000fa3238, 0x1af0dc0, 0xc0001f8230, 0x203001, 0x203001)
	github.com/tendermint/tendermint@v0.32.8/consensus/replay.go:269 +0x485
github.com/tendermint/tendermint/node.doHandshake(0x1aef940, 0xc007d51da0, 0xa, 0x0, 0x173efdc, 0x6, 0xc000fd3580, 0x32, 0x0, 0x0, ...)
	github.com/tendermint/tendermint@v0.32.8/node/node.go:281 +0x19a
github.com/tendermint/tendermint/node.NewNode(0xc0000f2500, 0x1acc940, 0xc0000db360, 0xc00db685f0, 0x1aaad60, 0xc00d95ff20, 0xc00d838520, 0x18e8718, 0xc00db60700, 0x1ad6a40, ...)
	github.com/tendermint/tendermint@v0.32.8/node/node.go:596 +0x343
github.com/oasislabs/oasis-core/go/consensus/tendermint.(*tendermintService).lazyInit.func2(0xc001020500, 0x0)
	github.com/oasislabs/oasis-core/go@/consensus/tendermint/tendermint.go:1015 +0x2bf
github.com/oasislabs/oasis-core/go/consensus/tendermint.(*tendermintService).Start(0xc00056de00, 0x2586e60, 0x152f980)
	github.com/oasislabs/oasis-core/go@/consensus/tendermint/tendermint.go:233 +0x89
github.com/oasislabs/oasis-core/go/oasis-node/cmd/node.newNode(0x1741200, 0x0, 0x0, 0x0)
	github.com/oasislabs/oasis-core/go@/oasis-node/cmd/node/node.go:740 +0x238a
github.com/oasislabs/oasis-core/go/oasis-node/cmd/node.NewNode(...)
	github.com/oasislabs/oasis-core/go@/oasis-node/cmd/node/node.go:454
github.com/oasislabs/oasis-core/go/oasis-node/cmd/node.Run(0x258a700, 0xc000f754a0, 0x0, 0x2)
	github.com/oasislabs/oasis-core/go@/oasis-node/cmd/node/node.go:82 +0x32
github.com/spf13/cobra.(*Command).execute(0x258a700, 0xc00003c190, 0x2, 0x2, 0x258a700, 0xc00003c190)
	github.com/spf13/cobra@v0.0.5/command.go:830 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0x258a700, 0x3f, 0x0, 0x0)
	github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v0.0.5/command.go:864
github.com/oasislabs/oasis-core/go/oasis-node/cmd.Execute()
	github.com/oasislabs/oasis-core/go@/oasis-node/cmd/root.go:46 +0x4b
main.main()
	github.com/oasislabs/oasis-core/go@/oasis-node/main.go:9 +0x20

Example panic when an entity doesn't have enough stake to have a node registered:

panic: mux: InitChain: fatal error in application: '200_registry': registry: genesis node registration failure: staking: insufficient stake

goroutine 1 [running]:
github.com/oasislabs/oasis-core/go/consensus/tendermint/abci.(*abciMux).InitChain(0xc000fca960, 0x0, 0xed5d4d490, 0x0, 0xc0001e80c0, 0x32, 0xc000530400, 0xc0012262a0, 0x1, 0x1, ...)
        github.com/oasislabs/oasis-core/go@/consensus/tendermint/abci/mux.go:442 +0xedd
github.com/tendermint/tendermint/abci/client.(*localClient).InitChainSync(0xc00f31c480, 0x0, 0xed5d4d490, 0x0, 0xc0001e80c0, 0x32, 0xc000530400, 0xc0012262a0, 0x1, 0x1, ...)
        github.com/tendermint/tendermint@v0.32.8/abci/client/local_client.go:223 +0x101
github.com/tendermint/tendermint/proxy.(*appConnConsensus).InitChainSync(0xc0004ea000, 0x0, 0xed5d4d490, 0x0, 0xc0001e80c0, 0x32, 0xc000530400, 0xc0012262a0, 0x1, 0x1, ...)
        github.com/tendermint/tendermint@v0.32.8/proxy/app_conn.go:65 +0x6b
github.com/tendermint/tendermint/consensus.(*Handshaker).ReplayBlocks(0xc000ff1238, 0xa, 0x170000, 0x173b41c, 0x6, 0xc0001e80c0, 0x32, 0x0, 0x0, 0x0, ...)
        github.com/tendermint/tendermint@v0.32.8/consensus/replay.go:318 +0x666
github.com/tendermint/tendermint/consensus.(*Handshaker).Handshake(0xc000ff1238, 0x1aed900, 0xc00052e1c0, 0x203000, 0x203000)
        github.com/tendermint/tendermint@v0.32.8/consensus/replay.go:269 +0x485
github.com/tendermint/tendermint/node.doHandshake(0x1aec380, 0xc0011c4780, 0xa, 0x0, 0x173b41c, 0x6, 0xc0001e80c0, 0x32, 0x0, 0x0, ...)
        github.com/tendermint/tendermint@v0.32.8/node/node.go:281 +0x19a
github.com/tendermint/tendermint/node.NewNode(0xc0001eadc0, 0x1ac9420, 0xc000535400, 0xc00dabb4d0, 0x1aa7840, 0xc00da76540, 0xc00d7a7400, 0x18e4d18, 0xc00daa95e0, 0x1ad3560, ...)
        github.com/tendermint/tendermint@v0.32.8/node/node.go:596 +0x343
github.com/oasislabs/oasis-core/go/consensus/tendermint.(*tendermintService).lazyInit.func2(0xc00000f3e0, 0x0)
        github.com/oasislabs/oasis-core/go@/consensus/tendermint/tendermint.go:1015 +0x2bf
github.com/oasislabs/oasis-core/go/consensus/tendermint.(*tendermintService).Start(0xc0004e0600, 0x2783e80, 0x152b880)
        github.com/oasislabs/oasis-core/go@/consensus/tendermint/tendermint.go:233 +0x89
github.com/oasislabs/oasis-core/go/oasis-node/cmd/node.newNode(0x173d700, 0x0, 0x0, 0x0)
        github.com/oasislabs/oasis-core/go@/oasis-node/cmd/node/node.go:740 +0x238a
github.com/oasislabs/oasis-core/go/oasis-node/cmd/node.NewNode(...)
        github.com/oasislabs/oasis-core/go@/oasis-node/cmd/node/node.go:454
github.com/oasislabs/oasis-core/go/oasis-node/cmd/node.Run(0x2787720, 0xc001047f60, 0x0, 0x2)
        github.com/oasislabs/oasis-core/go@/oasis-node/cmd/node/node.go:82 +0x32
github.com/spf13/cobra.(*Command).execute(0x2787720, 0xc0000ce160, 0x2, 0x2, 0x2787720, 0xc0000ce160)
        github.com/spf13/cobra@v0.0.5/command.go:830 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0x2787720, 0x3f, 0x0, 0x0)
        github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v0.0.5/command.go:864
github.com/oasislabs/oasis-core/go/oasis-node/cmd.Execute()
        github.com/oasislabs/oasis-core/go@/oasis-node/cmd/root.go:46 +0x4b
main.main()
        github.com/oasislabs/oasis-core/go@/oasis-node/main.go:9 +0x20

@tjanez tjanez added c:registry Category: entity/node/runtime registry service c:consensus/tendermint Category: Tendermint-based consensus c:bug Category: bug c:staking Category: staking labels Mar 2, 2020
@tjanez tjanez self-assigned this Mar 2, 2020
@Yawning
Copy link
Contributor

Yawning commented Mar 2, 2020

I guess? What's the planned resolution for this when the sanity check fails?

And what's up with the (likely unrelated) CI failure?

Copy link
Contributor

@pro-wh pro-wh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really an invariant? do we, for example, remove a registration if an entity gets slashed to below the threshold?

go/staking/api/sanity_check.go Outdated Show resolved Hide resolved
go/registry/api/sanity_check.go Outdated Show resolved Hide resolved
go/registry/api/sanity_check.go Outdated Show resolved Hide resolved
@pro-wh
Copy link
Contributor

pro-wh commented Mar 3, 2020

ah yeah we are supposed to deregister them if they get slashed below the threshold

@tjanez tjanez force-pushed the tjanez/check-entities-stake branch 3 times, most recently from bb6fed5 to de55cb2 Compare March 9, 2020 16:58
@tjanez
Copy link
Member Author

tjanez commented Mar 9, 2020

This is ready for another review.

Updating various tests made this PR a bit larger, but I tried to split things down into commits, so I suggest reviewing by commits.

@codecov
Copy link

codecov bot commented Mar 9, 2020

Codecov Report

Merging #2748 into master will decrease coverage by 0.04%.
The diff coverage is 60.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2748      +/-   ##
==========================================
- Coverage   67.64%   67.59%   -0.05%     
==========================================
  Files         350      350              
  Lines       33970    34036      +66     
==========================================
+ Hits        22980    23008      +28     
- Misses       8037     8080      +43     
+ Partials     2953     2948       -5     
Impacted Files Coverage Δ
go/genesis/api/sanity_check.go 11.11% <0.00%> (ø)
go/registry/api/api.go 37.57% <ø> (ø)
go/registry/api/sanity_check.go 60.18% <59.29%> (-2.32%) ⬇️
...nsus/tendermint/apps/supplementarysanity/checks.go 48.00% <66.66%> (+1.88%) ⬆️
go/consensus/tendermint/api/api.go 73.58% <0.00%> (-15.10%) ⬇️
go/consensus/tendermint/epochtime/epochtime.go 80.55% <0.00%> (-6.95%) ⬇️
go/common/cbor/codec.go 77.14% <0.00%> (-5.72%) ⬇️
go/consensus/api/grpc.go 60.46% <0.00%> (-4.66%) ⬇️
go/worker/common/p2p/p2p.go 65.31% <0.00%> (-3.61%) ⬇️
go/storage/api/root_cache.go 70.11% <0.00%> (-2.30%) ⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62a1949...d8d03ab. Read the comment docs.

Copy link
Contributor

@pro-wh pro-wh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah now we're not sure after all, if we want to delete an entity if it gets slashed below the threshold. cc @peterjgilbert @ravenac95

code-wise though: this doesn't do a full accumulation. could this check miss cases where the amount escrowed is enough, but there isn't enough overall for the entity plus, for example, some nodes?

go/staking/tests/tester.go Outdated Show resolved Hide resolved
go/staking/tests/tester.go Outdated Show resolved Hide resolved
go/genesis/genesis_test.go Outdated Show resolved Hide resolved
go/genesis/genesis_test.go Outdated Show resolved Hide resolved
go/genesis/genesis_test.go Outdated Show resolved Hide resolved
go/registry/api/sanity_check.go Outdated Show resolved Hide resolved
@tjanez
Copy link
Member Author

tjanez commented Mar 10, 2020

ah now we're not sure after all, if we want to delete an entity if it gets slashed below the threshold. cc @peterjgilbert @ravenac95

Hmm... are you worried about a potentially invalid state dump (i.e. genesis file)?

Do we currently de-register entities that fall below the staking thresholds when they are slashed?
If not, this could be a problem.

code-wise though: this doesn't do a full accumulation. could this check miss cases where the amount escrowed is enough, but there isn't enough overall for the entity plus, for example, some nodes?

Good catch! Indeed, the current check, e.g. misses the case when an entity has enough stake to be registered itself, but not enough stake to also register its nodes.

We could augment the sanity checks to create a map of StakeAccumulators for each entity and then add StakeClaims as appropriate (i.e. one for the entity, one for each node's role, ... do we also want to add stake claims for compute and key manager runtimes?).
Finally, the sanity checks would check if an entity's escrow active balance is >= StakeAccumulator.TotalClaims().

pro-wh
pro-wh previously requested changes Mar 17, 2020

for _, node := range nodes {
// Add node stake claims.
generatedEscrows[node.EntityID].StakeAccumulator.AddClaimUnchecked(StakeClaimForNode(node.ID), StakeThresholdsForNode(node))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already checked that node.EntityID is a valid entity ID, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is already checked by SanityCheckEntities() which in turn calls VerifyRegisterEntityArgs() for each signed entity which finally checks the ID in:
https://github.com/oasislabs/oasis-core/blob/b9fe8b7e4ea570b9aa69ec549ec1f3da5ffdcd70/go/registry/api/api.go#L363-L370

if err == nil {
expected = expectedQty.String()
}
return fmt.Errorf("insufficient stake for account %s (expected: %s got: %s)", entity.ID, expected, generatedEscrow.Active.Balance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we %w the error from CheckStateClaims?

or explain why we'd rather throw away that info

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've added %w to the fmt.Errorf() call.

@tjanez tjanez force-pushed the tjanez/check-entities-stake branch 3 times, most recently from 9ec669c to 7d6507a Compare April 30, 2020 11:02
@tjanez
Copy link
Member Author

tjanez commented Apr 30, 2020

Good catch! Indeed, the current check, e.g. misses the case when an entity has enough stake to be registered itself, but not enough stake to also register its nodes.

We could augment the sanity checks to create a map of StakeAccumulators for each entity and then add StakeClaims as appropriate (i.e. one for the entity, one for each node's role, ... do we also want to add stake claims for compute and key manager runtimes?).
Finally, the sanity checks would check if an entity's escrow active balance is >= StakeAccumulator.TotalClaims().

I've implemented that, please take a look.

@tjanez tjanez force-pushed the tjanez/check-entities-stake branch from 7d6507a to 0c50b3c Compare April 30, 2020 12:49
@tjanez tjanez force-pushed the tjanez/check-entities-stake branch 2 times, most recently from 4ea95db to 5187ebf Compare April 30, 2020 16:29
go/registry/api/sanity_check.go Outdated Show resolved Hide resolved
// SanityCheckStake ensures entities' stake accumulator claims are consistent
// with general state and entities have enough stake for themselves and all
// their registered nodes and runtimes.
func SanityCheckStake(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to reconsider the location of this sanity check if there are any other places that can have stake claims, but this is ok for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👍.

go/registry/api/sanity_check.go Outdated Show resolved Hide resolved
@tjanez
Copy link
Member Author

tjanez commented May 5, 2020

Hmm... are you worried about a potentially invalid state dump (i.e. genesis file)?

Do we currently de-register entities that fall below the staking thresholds when they are slashed?
If not, this could be a problem.

I've filled a follow-up issue that will address this: #2886.

@tjanez tjanez force-pushed the tjanez/check-entities-stake branch from 5187ebf to 7698eee Compare May 5, 2020 12:27
@tjanez
Copy link
Member Author

tjanez commented May 5, 2020

@pro-wh, could you take another look?

tjanez added 2 commits May 6, 2020 09:48
Formally define Nodes() and AllRuntimes() methods that were already
implemented by
go/consensus/tendermint/apps/registry/state.ImmutableState struct.

Implement Nodes() and AllRuntimes() methods for the
go/registry/api.sanityCheckRuntimeLookup struct.
Extract and generalize registry's staking sanity checks in
go/consensus/tendermint/apps/supplementarysanity.checkStakeClaims() and
move them to go/registry/api.SanityCheckStake() function.

Augment the checks to check if an entity has enough stake for all stake claims
in the Genesis document to prevent panics at oasis-node start-up due to
entities not having enough stake in the escrow to satisfy all their stake
claims.
@tjanez tjanez force-pushed the tjanez/check-entities-stake branch from 7698eee to d8d03ab Compare May 6, 2020 07:48
@tjanez tjanez dismissed pro-wh’s stale review May 6, 2020 08:20

@pro-wh, I think your comments were addressed. If there's anything else, please follow up.

@tjanez tjanez merged commit 892cc0b into master May 6, 2020
@tjanez tjanez deleted the tjanez/check-entities-stake branch May 6, 2020 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:bug Category: bug c:consensus/tendermint Category: Tendermint-based consensus c:registry Category: entity/node/runtime registry service c:staking Category: staking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants