Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CheckTx in transaction scheduler #2502

Merged
merged 2 commits into from
Jan 14, 2020
Merged

Conversation

abukosek
Copy link
Contributor

@abukosek abukosek commented Dec 23, 2019

Closes #1963.

TODO:

  • Make txn sched workers run runtimes.
  • Run CheckTx on transactions before queuing them.
  • Add a cmdline flag to toggle this check.
  • Figure out how to interpret CheckTx response.
  • Figure out why it hangs in e2e tests that use encrypted storage.
  • Figure out why it hangs in SGX e2e tests.
  • Write changelog fragment.

@abukosek abukosek force-pushed the andrej/feature/txsched-checktx branch 6 times, most recently from de8651c to 26f1d72 Compare December 30, 2019 12:02
@abukosek abukosek force-pushed the andrej/feature/txsched-checktx branch 16 times, most recently from b866af2 to 8db92a6 Compare January 10, 2020 09:36
)
continue
}
if rt.Capabilities.TEE, grr = workerHost.WaitForCapabilityTEE(w.ctx); grr != nil {
Copy link
Member

@kostko kostko Jan 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the reason for why SGX tests are not working in this PR. So all batches are being rejected due to the following error:

roothash/commitment: batch RAK signature invalid

This is because here you actually overwrite the RAK that is registered for the compute runtime (by the compute worker's RegisterRole handler). And since each enclave has its own RAK this makes all signatures for the compute enclave invalid.

The solution is either to:

  • Support multiple enclaves to be registered for the same runtime in the node descriptor (e.g., multiple RAKs per runtime per node), for different purposes (e.g., one for compute and another one for txnscheduler).
  • Remove this for loop so that only the enclave for compute is registered. There is no need to register the txnscheduler's RAK in the registry as this runtime is only used for local checks and it doesn't actually sign any batches.

I would be in favor of option two as it is much simpler and the txnscheduler runtime is only used for CheckTx, which currently doesn't sign the results using RAK. If we later desired to verify that the scheduler actually performed CheckTx, we could do add this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense! I'll fix it. Thanks for the help!

@abukosek abukosek force-pushed the andrej/feature/txsched-checktx branch from 8db92a6 to 2521985 Compare January 13, 2020 08:53
@codecov
Copy link

codecov bot commented Jan 13, 2020

Codecov Report

Merging #2502 into master will increase coverage by 0.01%.
The diff coverage is 54.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2502      +/-   ##
==========================================
+ Coverage   67.24%   67.25%   +0.01%     
==========================================
  Files         328      328              
  Lines       30184    30262      +78     
==========================================
+ Hits        20296    20352      +56     
- Misses       7386     7398      +12     
- Partials     2502     2512      +10
Impacted Files Coverage Δ
go/registry/api/api.go 41.15% <0%> (ø) ⬆️
go/worker/txnscheduler/init.go 100% <100%> (ø) ⬆️
go/worker/txnscheduler/committee/node.go 65.5% <46.55%> (-4.81%) ⬇️
go/worker/txnscheduler/worker.go 83.67% <72.41%> (-3.99%) ⬇️
go/worker/compute/committee/state.go 74.07% <0%> (-11.12%) ⬇️
go/worker/common/host/protocol/protocol.go 68.65% <0%> (-5.23%) ⬇️
go/worker/compute/committee/node.go 63.23% <0%> (-4.63%) ⬇️
go/worker/keymanager/worker.go 58.85% <0%> (-4.21%) ⬇️
go/worker/merge/committee/node.go 75.42% <0%> (-1.21%) ⬇️
go/worker/common/committee/group.go 81.78% <0%> (-0.75%) ⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51ed3fa...25e1f50. Read the comment docs.

@abukosek abukosek force-pushed the andrej/feature/txsched-checktx branch from 2521985 to 09df97e Compare January 13, 2020 09:11
@abukosek abukosek changed the title [WIP] Use CheckTx in transaction scheduler Use CheckTx in transaction scheduler Jan 13, 2020
@abukosek abukosek marked this pull request as ready for review January 13, 2020 09:42
@abukosek abukosek force-pushed the andrej/feature/txsched-checktx branch from 09df97e to c5b3d61 Compare January 13, 2020 10:07
go/worker/txnscheduler/service.go Outdated Show resolved Hide resolved
go/worker/txnscheduler/service.go Outdated Show resolved Hide resolved
go/worker/common/runtime_host.go Outdated Show resolved Hide resolved
n.commonNode.CrossNode.Lock()
defer n.commonNode.CrossNode.Unlock()

return *n.commonNode.CurrentBlock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems dangerous, what if the block is nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it was used previously, this couldn't happen, I think, but I've now removed this code, since it's no longer needed, as CheckTx is done in the node code.

go/worker/txnscheduler/worker.go Outdated Show resolved Hide resolved
@@ -128,6 +136,59 @@ func (n *Node) HandlePeerMessage(ctx context.Context, message *p2p.Message) (boo
return false, nil
}

// CheckTx checks the given call in the node's runtime.
func (n *Node) CheckTx(ctx context.Context, call []byte) error {
if n.commonNode.CurrentBlock == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing the CurrentBlock requires holding the CrossNode lock if called outside of methods which assume the caller does so (e.g., the ones ending in Locked).

// Interpret CheckTx result.
resultRaw := resp.WorkerCheckTxBatchResponse.Results[0]
var result transaction.TxnOutput
cbor.MustUnmarshal(resultRaw, &result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why panic on unmarshal? It should log the error and return ErrCheckTxFailed.

if n.checkTxEnabled {
// Check transaction before queuing it.
err := n.CheckTx(ctx, call)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be if err := n.CheckTx(...); err != nil { ....

@abukosek abukosek force-pushed the andrej/feature/txsched-checktx branch from b5106ee to 84f9f98 Compare January 14, 2020 13:25
@abukosek
Copy link
Contributor Author

Fixed! Ready for re-review :)

@abukosek abukosek force-pushed the andrej/feature/txsched-checktx branch from 84f9f98 to 25e1f50 Compare January 14, 2020 14:57
@abukosek abukosek merged commit d5df3bd into master Jan 14, 2020
@abukosek abukosek deleted the andrej/feature/txsched-checktx branch January 14, 2020 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transaction scheduler should use CheckTx
2 participants