Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution Cost Flamegraph #1834

Merged
merged 19 commits into from
Jun 28, 2024
Merged

Execution Cost Flamegraph #1834

merged 19 commits into from
Jun 28, 2024

Conversation

0xOmarA
Copy link
Member

@0xOmarA 0xOmarA commented Jun 14, 2024

Summary

This PR adds a more detailed execution cost breakdown and also adds the ability for a flamegraph to be generated it.

faucet-free-xrd

(Save the above SVG locally to view it correctly)

Details

This PR adds the ability for the costing module to collect more detailed information on the cost unit consumption during execution. This information is then surfaced in transaction receipts and can be graphed and represented as a flamegraph/flamechart. Note that in flamegraphs the events are ordered alphabetically and not in the order in which they occured.

Please note that this PR makes a breaking change to transaction receipts. I suggest that we keep this breaking change until we introduce the transaction receipt DTO.

Copy link

github-actions bot commented Jun 14, 2024

Docker tags
docker.io/radixdlt/private-scrypto-builder:fc0e3b0b67

Copy link

github-actions bot commented Jun 14, 2024

Benchmark for fc0e3b0

Click to view benchmark
Test Base PR %
costing::bench_prepare_wasm 65.9±0.14ms 66.4±0.37ms +0.76%
costing::decode_sbor 10.8±0.03µs 11.0±0.02µs +1.85%
costing::decode_sbor_bytes 29.5±0.03µs 29.1±0.07µs -1.36%
costing::deserialize_wasm 1285.1±4.48µs 1304.5±3.12µs +1.51%
costing::instantiate_flash_loan 4.0±0.82ms 3.8±0.65ms -5.00%
costing::instantiate_radiswap 5.7±0.06ms 5.7±0.06ms 0.00%
costing::spin_loop 21.5±0.08ms 21.4±0.03ms -0.47%
costing::validate_sbor_payload 27.8±0.06µs 27.4±0.09µs -1.44%
costing::validate_sbor_payload_bytes 232.6±0.74ns 236.1±1.23ns +1.50%
costing::validate_secp256k1 76.4±0.06µs 76.2±0.04µs -0.26%
costing::validate_wasm 37.0±0.06ms 36.9±0.03ms -0.27%
decimal::add/0 8.4±0.00ns 8.4±0.00ns 0.00%
decimal::add/rust-native 9.8±0.00ns 9.8±0.01ns 0.00%
decimal::add/wasmer 114.7±0.05ns 116.3±0.24ns +1.39%
decimal::add/wasmer-call-native 457.6±0.54ns 456.2±0.57ns -0.31%
decimal::add/wasmi 611.8±1.75ns 704.1±3.49ns +15.09%
decimal::add/wasmi-call-native 5.4±0.01µs 5.3±0.02µs -1.85%
decimal::div/0 189.4±0.14ns 191.8±0.27ns +1.27%
decimal::from_string/0 152.5±0.15ns 154.4±0.18ns +1.25%
decimal::mul/0 138.8±0.18ns 142.0±1.17ns +2.31%
decimal::mul/rust-native 139.3±0.09ns 138.0±0.53ns -0.93%
decimal::mul/wasmer 1512.0±1.55ns 1496.8±0.75ns -1.01%
decimal::mul/wasmer-call-native 582.4±0.45ns 581.0±0.47ns -0.24%
decimal::mul/wasmi 40.8±0.04µs 41.2±0.10µs +0.98%
decimal::mul/wasmi-call-native 5.5±0.04µs 5.5±0.01µs 0.00%
decimal::pow/0 655.7±0.45ns 650.3±0.89ns -0.82%
decimal::pow/rust-native 633.5±0.72ns 634.3±0.55ns +0.13%
decimal::pow/wasmer 6.5±0.00µs 6.7±0.00µs +3.08%
decimal::pow/wasmer-call-native 1049.6±2.84ns 1027.2±0.58ns -2.13%
decimal::pow/wasmi 191.4±0.47µs 194.9±0.33µs +1.83%
decimal::pow/wasmi-call-native 5.2±0.01µs 5.2±0.01µs 0.00%
decimal::root/0 7.9±0.01µs 7.7±0.01µs -2.53%
decimal::sub/0 8.5±0.01ns 8.5±0.01ns 0.00%
decimal::to_string/0 459.0±0.74ns 457.1±0.74ns -0.41%
precise_decimal::add/0 9.2±0.10ns 9.4±0.02ns +2.17%
precise_decimal::add/rust-native 11.4±0.00ns 11.4±0.00ns 0.00%
precise_decimal::add/wasmer 125.4±0.16ns 127.6±0.18ns +1.75%
precise_decimal::add/wasmer-call-native 490.4±0.43ns 484.6±0.46ns -1.18%
precise_decimal::add/wasmi 775.7±1.03ns 869.4±18.30ns +12.08%
precise_decimal::add/wasmi-call-native 6.8±0.05µs 6.6±0.01µs -2.94%
precise_decimal::div/0 303.3±0.58ns 300.7±0.42ns -0.86%
precise_decimal::from_string/0 198.4±0.23ns 199.4±0.29ns +0.50%
precise_decimal::mul/0 386.2±1.73ns 383.7±1.16ns -0.65%
precise_decimal::mul/rust-native 301.7±0.34ns 304.4±0.64ns +0.89%
precise_decimal::mul/wasmer 3.4±0.00µs 3.4±0.00µs 0.00%
precise_decimal::mul/wasmer-call-native 805.0±1.62ns 811.8±0.98ns +0.84%
precise_decimal::mul/wasmi 105.7±0.15µs 105.4±0.09µs -0.28%
precise_decimal::mul/wasmi-call-native 7.3±0.01µs 7.1±0.01µs -2.74%
precise_decimal::pow/0 1971.5±2.67ns 1970.7±0.99ns -0.04%
precise_decimal::pow/rust-native 1463.9±2.29ns 1467.1±3.43ns +0.22%
precise_decimal::pow/wasmer 16.1±0.01µs 16.1±0.02µs 0.00%
precise_decimal::pow/wasmer-call-native 2.1±0.00µs 2.1±0.00µs 0.00%
precise_decimal::pow/wasmi 509.5±2.70µs 515.5±0.68µs +1.18%
precise_decimal::pow/wasmi-call-native 12.9±0.02µs 13.2±0.05µs +2.33%
precise_decimal::root/0 57.8±0.06µs 57.8±0.06µs 0.00%
precise_decimal::sub/0 9.5±0.01ns 9.5±0.01ns 0.00%
precise_decimal::to_string/0 721.0±1.23ns 733.4±1.46ns +1.72%
schema::validate_payload 343.9±0.38µs 343.4±0.30µs -0.15%
transaction::radiswap 5.3±0.02ms 5.4±0.02ms +1.89%
transaction::transfer 1762.2±5.29µs 1751.7±4.63µs -0.60%
transaction_processing::prepare 2.2±0.00ms 2.2±0.00ms 0.00%
transaction_processing::prepare_and_decompile 6.1±0.03ms 6.1±0.02ms 0.00%
transaction_processing::prepare_and_decompile_and_recompile 24.7±1.32ms 24.0±0.31ms -2.83%
transaction_validation::validate_manifest 42.2±0.05µs 42.1±0.05µs -0.24%
transaction_validation::verify_bls_2KB 1073.2±136.42µs 978.0±11.44µs -8.87%
transaction_validation::verify_bls_32B 1010.4±31.55µs 972.1±12.40µs -3.79%
transaction_validation::verify_ecdsa 74.3±0.04µs 74.4±0.08µs +0.13%
transaction_validation::verify_ed25519 54.9±0.11µs 54.7±0.10µs -0.36%

Copy link
Contributor

@dhedey dhedey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice output! :)

But before considering merging this, I think we need to be more careful about:

  • What we're adding to our interfaces and the consequences. Indeed, I don't think we want to add lots of internal engine models to the transaction receipt, as it makes it much harder to change them without breaking things.
  • Reducing performance impact for normal preview (which full nodes do a lot of)

Instead:

  • Perhaps we may wish to flatten lots of things out into Strings on the way to the receipt
  • OR consider pushing the flamegraph generation further into the engine, to improve perf and reduce the overhead.

If we want a more structured execution trace for other reasons, (which might be really cool, and was my original model for preview), we should start by considering what the format of its DTO should be - rather than just exposing engine models on the receipt interface.

scrypto-test/src/environment/env.rs Show resolved Hide resolved
radix-engine/src/transaction/transaction_receipt.rs Outdated Show resolved Hide resolved
radix-engine/src/transaction/transaction_receipt.rs Outdated Show resolved Hide resolved
radix-engine/src/system/actor.rs Show resolved Hide resolved

// Add an entry for the more detailed execution cost
cost_breakdown.detailed_execution_cost_breakdown.push((
self.current_depth,
Copy link
Contributor

@dhedey dhedey Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have this method take a &KernelApi or a current_depth parameter, than risk the depth getting out of sync.

(That would enable us to delete current_depth from the costing module state, and clear up a lot of possibly flakey code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend leaving this change for a later PR since it's quite a big change and the logic around the depth I think is somewhat simple and can easily be verified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you did fix this in the end, thanks :).

But yeah, I think it makes sense to change in this PR so we don't introduce something and then remove it which creates unneeded churn.

radix-engine/src/track/interface.rs Show resolved Hide resolved
@0xOmarA
Copy link
Member Author

0xOmarA commented Jun 19, 2024

@dhedey Thank you for the review! I'm responding to your review comment in this comment.

  • What we're adding to our interfaces and the consequences. Indeed, I don't think we want to add lots of internal engine models to the transaction receipt, as it makes it much harder to change them without breaking things.

So, I thought about this a little bit more. I think that I would like not to include this information in the transaction receipt. Instead, I want to propose that execution returns two transactions receipts instead of one: TransactionReceiptV1 and something like DebugTransactionReceipt.

The TransactionReceiptV1 is the original transaction receipt that we've had for a while now, no changes to that, not even the detailed execution cost units breakdown that I added with this PR. The DebugTransactionReceipt (name is just an example for the time being) is another receipt returned that is not versioned and that would include information that we would never persist and would just include information that we want to expose to tests or for debugging. This model would not be versioned since it would not need versioning as its never persisted. We would never use this outside of tests. Additionally, this model can use as much of the internal engine types as it needs, again, since it's never persisted and not used anywhere other than tests.

This would address the backward compatibility issues we talked about and also any concerns we have related to the size of the preview receipts since adding this detailed execution cost information to the transaction receipt will make it massive.

The idea would be as follows: information that we're ready to persist, write DTOs for, and expose to clients would be part of the TransactionReceiptVN and information that we're primarily using for development would be part of the DebugTransactionReceipt. If we become confident in the models for some of the data exposed in the DebugTransactionReceipt we can move it to the TransactionReceiptVN if there is a need for it. I think that this would've been useful to have on a few occasions the ones that come to mind is the information on the resource use which Michal worked on and this information too on the detailed cost breakdown.

What do you think?

  • Reducing performance impact for normal preview (which full nodes do a lot of)

Totally agree, need to rework this bit. I'm now concerned about how much the size of the receipt increases with this information and about the performance impact. So, will rework this bit.

Instead:

  • Perhaps we may wish to flatten lots of things out into Strings on the way to the receipt

At the current moment of time, I would recommend that we do not flatten them into strings and keep them this way so that we can extract as much information out of them as needed during the analysis. You'll notice that only a small amount of the information is used in the flamegraph and a large part goes unused, this is intentionally done so that we have this information when we need further analysis.

If we do the DebugTransactionReceipt idea, do you think that this would alleviate your concerns about the size?

  • OR consider pushing the flamegraph generation further into the engine, to improve perf and reduce the overhead.

To understand, in this case the flamegraph would not be generated from the receipt, it would be generated on the fly (perhaps in the costing module) as the transaction is executed?

If we want a more structured execution trace for other reasons, (which might be really cool, and was my original model for preview), we should start by considering what the format of its DTO should be - rather than just exposing engine models on the receipt interface.

Yup, I agree, I'm personally very much in favor of us not exposing this information at the time being since it's still not very mature and can very much change and instead we would surface it to tests in a DebugTransactionReceipt that accompanies any TransactionReceipt we get when executing tests.

This information is not currently in a state where I'd say "yeah let's commit to this format long term" and it could very much change in the future because we want to surface more information. So, this is where the debug transaction receipt idea comes from.

Copy link
Member

@iamyulong iamyulong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but ideally we merge after David's approval as he've given most of the comments. :-D

Copy link
Contributor

@dhedey dhedey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. One minor comment about a TODO that we should probably fix, otherwise good to go.

@@ -871,6 +867,7 @@ impl<C: SystemCallbackObject> KernelCallbackObject for System<C> {
let limits_module = { LimitsModule::from_params(system_parameters.limit_parameters) };

let costing_module = CostingModule {
current_depth: 0, // TODO: Is it correct to assume that the current depth is zero here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was missed? We should resolve this TODO - either commenting why we can assume this, or handling it properly?

Seems reasonable to me that init sets current_depth to 0 though 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I missed this, I've removed the TODO comment and will add a comment explaining why we set it to zero.

@@ -1125,6 +1127,15 @@ impl<C: SystemCallbackObject> KernelCallbackObject for System<C> {
None
};

let debug_information = match (&costing_module.detailed_cost_breakdown,) {
(Some(detailed_cost_breakdown),) => Some(TransactionDebugInformation {
detailed_execution_cost_breakdown: detailed_cost_breakdown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we create a costing_module.deconstruct() which splits out the detailed_cost_breakdown from the fee_reserve etc then can we avoid the expensive clone here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. In this case avoiding the clone didn't require adding a new method. The current context already owns the CostingModule so I can simply operate over the owned value instead of operating over a reference and this avoids the clone.

radix-engine/src/system/actor.rs Show resolved Hide resolved

// Add an entry for the more detailed execution cost
cost_breakdown.detailed_execution_cost_breakdown.push((
self.current_depth,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you did fix this in the end, thanks :).

But yeah, I think it makes sense to change in this PR so we don't introduce something and then remove it which creates unneeded churn.

@0xOmarA 0xOmarA merged commit 0b54a0d into develop Jun 28, 2024
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants