Execution Cost Flamegraph #1834

0xOmarA · 2024-06-14T14:04:51Z

Summary

This PR adds a more detailed execution cost breakdown and also adds the ability for a flamegraph to be generated it.

(Save the above SVG locally to view it correctly)

Details

This PR adds the ability for the costing module to collect more detailed information on the cost unit consumption during execution. This information is then surfaced in transaction receipts and can be graphed and represented as a flamegraph/flamechart. Note that in flamegraphs the events are ordered alphabetically and not in the order in which they occured.

Please note that this PR makes a breaking change to transaction receipts. I suggest that we keep this breaking change until we introduce the transaction receipt DTO.

github-actions · 2024-06-14T14:18:38Z

Docker tags
docker.io/radixdlt/private-scrypto-builder:fc0e3b0b67

radix-engine/src/transaction/transaction_receipt.rs

github-actions · 2024-06-14T14:43:08Z

Benchmark for `fc0e3b0`

Click to view benchmark

Test	Base	PR	%
costing::bench_prepare_wasm	65.9±0.14ms	66.4±0.37ms	+0.76%
costing::decode_sbor	10.8±0.03µs	11.0±0.02µs	+1.85%
costing::decode_sbor_bytes	29.5±0.03µs	29.1±0.07µs	-1.36%
costing::deserialize_wasm	1285.1±4.48µs	1304.5±3.12µs	+1.51%
costing::instantiate_flash_loan	4.0±0.82ms	3.8±0.65ms	-5.00%
costing::instantiate_radiswap	5.7±0.06ms	5.7±0.06ms	0.00%
costing::spin_loop	21.5±0.08ms	21.4±0.03ms	-0.47%
costing::validate_sbor_payload	27.8±0.06µs	27.4±0.09µs	-1.44%
costing::validate_sbor_payload_bytes	232.6±0.74ns	236.1±1.23ns	+1.50%
costing::validate_secp256k1	76.4±0.06µs	76.2±0.04µs	-0.26%
costing::validate_wasm	37.0±0.06ms	36.9±0.03ms	-0.27%
decimal::add/0	8.4±0.00ns	8.4±0.00ns	0.00%
decimal::add/rust-native	9.8±0.00ns	9.8±0.01ns	0.00%
decimal::add/wasmer	114.7±0.05ns	116.3±0.24ns	+1.39%
decimal::add/wasmer-call-native	457.6±0.54ns	456.2±0.57ns	-0.31%
decimal::add/wasmi	611.8±1.75ns	704.1±3.49ns	+15.09%
decimal::add/wasmi-call-native	5.4±0.01µs	5.3±0.02µs	-1.85%
decimal::div/0	189.4±0.14ns	191.8±0.27ns	+1.27%
decimal::from_string/0	152.5±0.15ns	154.4±0.18ns	+1.25%
decimal::mul/0	138.8±0.18ns	142.0±1.17ns	+2.31%
decimal::mul/rust-native	139.3±0.09ns	138.0±0.53ns	-0.93%
decimal::mul/wasmer	1512.0±1.55ns	1496.8±0.75ns	-1.01%
decimal::mul/wasmer-call-native	582.4±0.45ns	581.0±0.47ns	-0.24%
decimal::mul/wasmi	40.8±0.04µs	41.2±0.10µs	+0.98%
decimal::mul/wasmi-call-native	5.5±0.04µs	5.5±0.01µs	0.00%
decimal::pow/0	655.7±0.45ns	650.3±0.89ns	-0.82%
decimal::pow/rust-native	633.5±0.72ns	634.3±0.55ns	+0.13%
decimal::pow/wasmer	6.5±0.00µs	6.7±0.00µs	+3.08%
decimal::pow/wasmer-call-native	1049.6±2.84ns	1027.2±0.58ns	-2.13%
decimal::pow/wasmi	191.4±0.47µs	194.9±0.33µs	+1.83%
decimal::pow/wasmi-call-native	5.2±0.01µs	5.2±0.01µs	0.00%
decimal::root/0	7.9±0.01µs	7.7±0.01µs	-2.53%
decimal::sub/0	8.5±0.01ns	8.5±0.01ns	0.00%
decimal::to_string/0	459.0±0.74ns	457.1±0.74ns	-0.41%
precise_decimal::add/0	9.2±0.10ns	9.4±0.02ns	+2.17%
precise_decimal::add/rust-native	11.4±0.00ns	11.4±0.00ns	0.00%
precise_decimal::add/wasmer	125.4±0.16ns	127.6±0.18ns	+1.75%
precise_decimal::add/wasmer-call-native	490.4±0.43ns	484.6±0.46ns	-1.18%
precise_decimal::add/wasmi	775.7±1.03ns	869.4±18.30ns	+12.08%
precise_decimal::add/wasmi-call-native	6.8±0.05µs	6.6±0.01µs	-2.94%
precise_decimal::div/0	303.3±0.58ns	300.7±0.42ns	-0.86%
precise_decimal::from_string/0	198.4±0.23ns	199.4±0.29ns	+0.50%
precise_decimal::mul/0	386.2±1.73ns	383.7±1.16ns	-0.65%
precise_decimal::mul/rust-native	301.7±0.34ns	304.4±0.64ns	+0.89%
precise_decimal::mul/wasmer	3.4±0.00µs	3.4±0.00µs	0.00%
precise_decimal::mul/wasmer-call-native	805.0±1.62ns	811.8±0.98ns	+0.84%
precise_decimal::mul/wasmi	105.7±0.15µs	105.4±0.09µs	-0.28%
precise_decimal::mul/wasmi-call-native	7.3±0.01µs	7.1±0.01µs	-2.74%
precise_decimal::pow/0	1971.5±2.67ns	1970.7±0.99ns	-0.04%
precise_decimal::pow/rust-native	1463.9±2.29ns	1467.1±3.43ns	+0.22%
precise_decimal::pow/wasmer	16.1±0.01µs	16.1±0.02µs	0.00%
precise_decimal::pow/wasmer-call-native	2.1±0.00µs	2.1±0.00µs	0.00%
precise_decimal::pow/wasmi	509.5±2.70µs	515.5±0.68µs	+1.18%
precise_decimal::pow/wasmi-call-native	12.9±0.02µs	13.2±0.05µs	+2.33%
precise_decimal::root/0	57.8±0.06µs	57.8±0.06µs	0.00%
precise_decimal::sub/0	9.5±0.01ns	9.5±0.01ns	0.00%
precise_decimal::to_string/0	721.0±1.23ns	733.4±1.46ns	+1.72%
schema::validate_payload	343.9±0.38µs	343.4±0.30µs	-0.15%
transaction::radiswap	5.3±0.02ms	5.4±0.02ms	+1.89%
transaction::transfer	1762.2±5.29µs	1751.7±4.63µs	-0.60%
transaction_processing::prepare	2.2±0.00ms	2.2±0.00ms	0.00%
transaction_processing::prepare_and_decompile	6.1±0.03ms	6.1±0.02ms	0.00%
transaction_processing::prepare_and_decompile_and_recompile	24.7±1.32ms	24.0±0.31ms	-2.83%
transaction_validation::validate_manifest	42.2±0.05µs	42.1±0.05µs	-0.24%
transaction_validation::verify_bls_2KB	1073.2±136.42µs	978.0±11.44µs	-8.87%
transaction_validation::verify_bls_32B	1010.4±31.55µs	972.1±12.40µs	-3.79%
transaction_validation::verify_ecdsa	74.3±0.04µs	74.4±0.08µs	+0.13%
transaction_validation::verify_ed25519	54.9±0.11µs	54.7±0.10µs	-0.36%

dhedey

Really nice output! :)

But before considering merging this, I think we need to be more careful about:

What we're adding to our interfaces and the consequences. Indeed, I don't think we want to add lots of internal engine models to the transaction receipt, as it makes it much harder to change them without breaking things.
Reducing performance impact for normal preview (which full nodes do a lot of)

Instead:

Perhaps we may wish to flatten lots of things out into Strings on the way to the receipt
OR consider pushing the flamegraph generation further into the engine, to improve perf and reduce the overhead.

If we want a more structured execution trace for other reasons, (which might be really cool, and was my original model for preview), we should start by considering what the format of its DTO should be - rather than just exposing engine models on the receipt interface.

scrypto-test/src/environment/env.rs

radix-engine/src/transaction/transaction_receipt.rs

radix-engine/src/system/actor.rs

radix-engine/src/system/system_modules/costing/costing_module.rs

dhedey · 2024-06-17T10:53:01Z

radix-engine/src/system/system_modules/costing/costing_module.rs

+
+            // Add an entry for the more detailed execution cost
+            cost_breakdown.detailed_execution_cost_breakdown.push((
+                self.current_depth,


I'd rather have this method take a &KernelApi or a current_depth parameter, than risk the depth getting out of sync.

(That would enable us to delete current_depth from the costing module state, and clear up a lot of possibly flakey code).

Sounds good, will do that.

I would recommend leaving this change for a later PR since it's quite a big change and the logic around the depth I think is somewhat simple and can easily be verified.

Looks like you did fix this in the end, thanks :).

But yeah, I think it makes sense to change in this PR so we don't introduce something and then remove it which creates unneeded churn.

radix-engine/src/system/system_modules/costing/costing_module.rs

radix-engine/src/track/interface.rs

0xOmarA · 2024-06-19T12:56:32Z

@dhedey Thank you for the review! I'm responding to your review comment in this comment.

What we're adding to our interfaces and the consequences. Indeed, I don't think we want to add lots of internal engine models to the transaction receipt, as it makes it much harder to change them without breaking things.

So, I thought about this a little bit more. I think that I would like not to include this information in the transaction receipt. Instead, I want to propose that execution returns two transactions receipts instead of one: TransactionReceiptV1 and something like DebugTransactionReceipt.

The TransactionReceiptV1 is the original transaction receipt that we've had for a while now, no changes to that, not even the detailed execution cost units breakdown that I added with this PR. The DebugTransactionReceipt (name is just an example for the time being) is another receipt returned that is not versioned and that would include information that we would never persist and would just include information that we want to expose to tests or for debugging. This model would not be versioned since it would not need versioning as its never persisted. We would never use this outside of tests. Additionally, this model can use as much of the internal engine types as it needs, again, since it's never persisted and not used anywhere other than tests.

This would address the backward compatibility issues we talked about and also any concerns we have related to the size of the preview receipts since adding this detailed execution cost information to the transaction receipt will make it massive.

The idea would be as follows: information that we're ready to persist, write DTOs for, and expose to clients would be part of the TransactionReceiptVN and information that we're primarily using for development would be part of the DebugTransactionReceipt. If we become confident in the models for some of the data exposed in the DebugTransactionReceipt we can move it to the TransactionReceiptVN if there is a need for it. I think that this would've been useful to have on a few occasions the ones that come to mind is the information on the resource use which Michal worked on and this information too on the detailed cost breakdown.

What do you think?

Reducing performance impact for normal preview (which full nodes do a lot of)

Totally agree, need to rework this bit. I'm now concerned about how much the size of the receipt increases with this information and about the performance impact. So, will rework this bit.

Instead:

Perhaps we may wish to flatten lots of things out into Strings on the way to the receipt

At the current moment of time, I would recommend that we do not flatten them into strings and keep them this way so that we can extract as much information out of them as needed during the analysis. You'll notice that only a small amount of the information is used in the flamegraph and a large part goes unused, this is intentionally done so that we have this information when we need further analysis.

If we do the DebugTransactionReceipt idea, do you think that this would alleviate your concerns about the size?

OR consider pushing the flamegraph generation further into the engine, to improve perf and reduce the overhead.

To understand, in this case the flamegraph would not be generated from the receipt, it would be generated on the fly (perhaps in the costing module) as the transaction is executed?

If we want a more structured execution trace for other reasons, (which might be really cool, and was my original model for preview), we should start by considering what the format of its DTO should be - rather than just exposing engine models on the receipt interface.

Yup, I agree, I'm personally very much in favor of us not exposing this information at the time being since it's still not very mature and can very much change and instead we would surface it to tests in a DebugTransactionReceipt that accompanies any TransactionReceipt we get when executing tests.

This information is not currently in a state where I'd say "yeah let's commit to this format long term" and it could very much change in the future because we want to surface more information. So, this is where the debug transaction receipt idea comes from.

…-generation

iamyulong

LGTM, but ideally we merge after David's approval as he've given most of the comments. :-D

This reverts commit 7b525bb.

dhedey

Looks good, thanks. One minor comment about a TODO that we should probably fix, otherwise good to go.

dhedey · 2024-06-28T10:04:44Z

radix-engine/src/system/system_callback.rs

@@ -871,6 +867,7 @@ impl<C: SystemCallbackObject> KernelCallbackObject for System<C> {
        let limits_module = { LimitsModule::from_params(system_parameters.limit_parameters) };

        let costing_module = CostingModule {
+            current_depth: 0, // TODO: Is it correct to assume that the current depth is zero here?


I think this was missed? We should resolve this TODO - either commenting why we can assume this, or handling it properly?

Seems reasonable to me that init sets current_depth to 0 though 👍

Yup I missed this, I've removed the TODO comment and will add a comment explaining why we set it to zero.

dhedey · 2024-06-28T10:06:59Z

radix-engine/src/system/system_callback.rs

@@ -1125,6 +1127,15 @@ impl<C: SystemCallbackObject> KernelCallbackObject for System<C> {
            None
        };

+        let debug_information = match (&costing_module.detailed_cost_breakdown,) {
+            (Some(detailed_cost_breakdown),) => Some(TransactionDebugInformation {
+                detailed_execution_cost_breakdown: detailed_cost_breakdown


If we create a costing_module.deconstruct() which splits out the detailed_cost_breakdown from the fee_reserve etc then can we avoid the expensive clone here?

Sounds good to me. In this case avoiding the clone didn't require adding a new method. The current context already owns the CostingModule so I can simply operate over the owned value instead of operating over a reference and this avoids the clone.

radix-engine/src/system/actor.rs

dhedey · 2024-06-28T10:17:05Z

radix-engine/src/system/system_modules/costing/costing_module.rs

+
+            // Add an entry for the more detailed execution cost
+            cost_breakdown.detailed_execution_cost_breakdown.push((
+                self.current_depth,


Looks like you did fix this in the end, thanks :).

But yeah, I think it makes sense to change in this PR so we don't introduce something and then remove it which creates unneeded churn.

0xOmarA added 6 commits June 12, 2024 15:56

Add methods to test environment for accessing the kernel.

d253cac

Add an owned costing entry for use with the costing module.

acde1a9

Add a more detailed execution cost breakdown.

d32967a

Add logic for flamegraph generation

87f0a46

Expose the needed information through receipt

1e3bc58

fix no-std errors

8302315

fix no-std errors

30f6a90

0xOmarA commented Jun 14, 2024

View reviewed changes

radix-engine/src/transaction/transaction_receipt.rs Outdated Show resolved Hide resolved

dhedey reviewed Jun 17, 2024

View reviewed changes

0xOmarA added 7 commits June 20, 2024 17:18

Address the receipt's backward compatibility

d5826ed

Add an additional execution config for debug transactions

1518183

Merge remote-tracking branch 'origin/develop' into feature/flamegraph…

7759c24

…-generation

Remove un-needed ScryptoSbor

7b525bb

Change how we do regions

47e1553

Pass the network definition as input for flamegraph generation.

f1e2d9d

Small fix

dc2beaa

iamyulong approved these changes Jun 20, 2024

View reviewed changes

0xOmarA added 3 commits June 21, 2024 14:28

Revert "Remove un-needed ScryptoSbor"

0e64dfe

This reverts commit 7b525bb.

Change the format of the analysis output

c2e5ab4

Change the format of the execution cost breakdown test output

93ab383

dhedey approved these changes Jun 28, 2024

View reviewed changes

0xOmarA added 2 commits June 28, 2024 13:33

Addressing comments

31e827e

pass the current depth as an argument to modules

50ab8d2

0xOmarA merged commit 0b54a0d into develop Jun 28, 2024
28 of 29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution Cost Flamegraph #1834

Execution Cost Flamegraph #1834

0xOmarA commented Jun 14, 2024 •

edited

Loading

github-actions bot commented Jun 14, 2024 •

edited

Loading

github-actions bot commented Jun 14, 2024 •

edited

Loading

dhedey left a comment

dhedey Jun 17, 2024 •

edited

Loading

0xOmarA Jun 19, 2024

0xOmarA Jun 20, 2024

dhedey Jun 28, 2024

0xOmarA commented Jun 19, 2024 •

edited

Loading

iamyulong left a comment

dhedey left a comment

dhedey Jun 28, 2024

0xOmarA Jun 28, 2024

dhedey Jun 28, 2024

0xOmarA Jun 28, 2024

dhedey Jun 28, 2024

Execution Cost Flamegraph #1834

Execution Cost Flamegraph #1834

Conversation

0xOmarA commented Jun 14, 2024 • edited Loading

Summary

Details

github-actions bot commented Jun 14, 2024 • edited Loading

github-actions bot commented Jun 14, 2024 • edited Loading

Benchmark for fc0e3b0

dhedey left a comment

Choose a reason for hiding this comment

dhedey Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xOmarA commented Jun 19, 2024 • edited Loading

iamyulong left a comment

Choose a reason for hiding this comment

dhedey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xOmarA commented Jun 14, 2024 •

edited

Loading

github-actions bot commented Jun 14, 2024 •

edited

Loading

github-actions bot commented Jun 14, 2024 •

edited

Loading

Benchmark for `fc0e3b0`

dhedey Jun 17, 2024 •

edited

Loading

0xOmarA commented Jun 19, 2024 •

edited

Loading