Allow separate max-heap-pages for offchain (non-consensus) execution context. #8885

kianenigma · 2021-05-22T21:48:59Z

previous ExexutionStrategy is now wrapped in ExecutionConfig, which is the strategy and a ExecutionContext. This propagated to a lot of strategy -> config replacements. I like the outcome.
polkadot companion: Companion for /substrate/pull/8885 polkadot#3378

bkchr · 2021-05-23T06:02:05Z

We cache the wasm instances based on the wasm method, runtime version and the heap pages. Meaning we probably need to increase the cache size, which also leads to a higher memory usage.

(just some note on what we should not forget)

kianenigma · 2021-05-25T09:30:09Z

might not be needed, in favour of #8893

client/api/src/execution_extensions.rs

client/light/src/call_executor.rs

client/service/src/client/call_executor.rs

kianenigma · 2021-06-28T08:07:03Z

@cheme @bkchr this can be given a look now.

kianenigma · 2021-06-28T08:30:02Z

I'm kinda lost and uncertain if this PR is really needed.

This only adds the support for setting the offchain-heap-pages onchain, and then it can be used. Deploying this today, would not help at all unless if we set that value. And setting that value itself has also some caveats as if it is set on-chain, we can not replace it again via a client upgrade..

cheme · 2021-06-28T09:22:24Z

I'm kinda lost and uncertain if this PR is really needed.

This only adds the support for setting the offchain-heap-pages onchain, and then it can be used. Deploying this today, would not help at all unless if we set that value. And setting that value itself has also some caveats as if it is set on-chain, we can not replace it again via a client upgrade..

I am not sure what are the plan for heap-pages on chain, but if it will live, having the offchain one seems ok to me.
Generally I am thinking of offchain as a default execution context.
I mean, if I was a validator I would like to be able to switch it at any time, so I would probably hack a way to override it.
Maybe ExecutionContext could contain a heap page override value from cli parameter (we got that for runtime (wasm-runtime-override) so would also make sense for heap page value).
Could also probably allow overriding runtime only for offchain which could help in some case (when you don't want to risk breaking consensus with a custom wasm build).

cheme · 2021-06-28T08:17:24Z

bin/node/testing/src/bench.rs

@@ -626,22 +626,24 @@ pub enum Profile {
 }

 impl Profile {
-	fn into_execution_strategies(self) -> ExecutionStrategies {
+	fn into_execution_configs(self) -> ExecutionConfigs {


Suggested change

fn into_execution_configs(self) -> ExecutionConfigs {

fn into_offchain_execution_configs(self) -> ExecutionConfigs {

Code is calling new_offchain only, so just precising.

Actually shouldn't it be new_offchain only for 'offchain_worker'?

frame/system/src/lib.rs

primitives/state-machine/src/backend.rs

cheme · 2021-06-28T08:58:19Z

primitives/state-machine/src/lib.rs

+
+	/// The context of the execution.
+	#[derive(Copy, Clone, Eq, PartialEq, Debug)]
+	pub enum ExecutionContext {


There is an existing ExecutionContext enum in sp_core.
So would suggest renaming (but no naming idea).

yeah, I noticed that as well, but can't find another alternative for now.

primitives/state-machine/src/lib.rs

primitives/storage/src/lib.rs

test-utils/client/src/lib.rs

Co-authored-by: cheme <emericchevalier.pro@gmail.com>

kianenigma · 2021-07-31T13:58:01Z

y usage of a node running your branch vs master? Wasmtime memory usage per instance is quite high AFAIR.

Yeah I'll check on the burnin. Note that now we create ONLY one additional instance with this higher number of heap pages.

…-max-heap-pages

kianenigma · 2021-08-05T14:10:21Z

Okay, I've done the burn-in. Two observations:

1 wasm instance for OCW is too little, and will cause a lot of warnings around each election, because the duration of the election OCW thread is more than a typical block time. Instead, what we want is to set the instances to 2, where the second one is the backup for the duration of election.
With 1 wasm instance, I could not see any observable memory increase on our burn-in nodes. I'm doing a new burn-in now with 2 instances, and I hope the answer is still the same.

kianenigma · 2021-08-06T10:01:33Z

burn-in with two instances for offchain:

cheme

Changes look good, I am just not sure if the additional memory usage is fine but otherwise I approve.

Generally I would find it good for offchain worker to be able to configure the max number of instance, the number of runing instance and override the page size from the cli, but it is an other question. Yet it will make it easy to handle memory issue in the future.

Also Offchain context currently only change the heap size (and the pool), omitting the pool, it could also just have been an 'Override' or 'Custom' heap size option that offchain will specifically use with executor, making changes smaller, but I guess it can be good to distinguish offchain case (using different wasm for offchain could be nice, but it is also an other question).

cheme · 2021-08-06T15:42:28Z

bin/node/testing/src/bench.rs

@@ -626,22 +626,24 @@ pub enum Profile {
 }

 impl Profile {
-	fn into_execution_strategies(self) -> ExecutionStrategies {
+	fn into_execution_configs(self) -> ExecutionConfigs {


Actually shouldn't it be new_offchain only for 'offchain_worker'?

cheme · 2021-08-06T15:46:37Z

client/cli/src/params/import_params.rs

 		let exec = &self.execution_strategies;
-		let exec_all_or = |strat: Option<ExecutionStrategy>, default: ExecutionStrategy| {
+		let exec_all_or = |start: Option<ExecutionStrategy>, default: ExecutionStrategy| {


Why the renaming? ('strat' was for strategy I guess)

cheme · 2021-08-06T15:52:11Z

client/executor/src/native_executor.rs

@@ -262,12 +269,13 @@ impl sp_core::traits::ReadRuntimeVersion for WasmExecutor {
 			true,
 			"Core_version",
 			&[],
+			DEFAULT_HEAP_PAGES_CONSENSUS,


Could it be hardcoded to some lower value (just reading a version)?
Or can be the worst idea as instance could not be reuse 🤔

cheme · 2021-08-06T16:01:18Z

client/executor/src/wasm_runtime.rs

+	max_consensus_runtime_instances: usize,
+	/// Same as `consensus_runtimes`, but used in offchain code context. Their size is always bound
+	/// by 1 (can easily be configurable, but we don't need this right now).
+	offchain_runtimes: Mutex<[Option<Arc<VersionedRuntime>>; MAX_RUNTIMES]>,


I guess offchain is still using same wasm as the runtime.
So we can use twice as much memory of instantiated wasm now, I guess it is fine.

Could it make sense to have MAX_RUNTIMES_OFFCHAIN. Did we consider only one with two instance?

Afterthought: one instance would be really bad. I'm online being potentially stuck behind phragmen.

cheme · 2021-08-06T16:04:44Z

client/executor/src/wasm_runtime.rs

+		// this must be released prior to calling f.
+		let mut runtimes = match runtime_code.context {
+			sp_core::traits::CodeContext::Consensus => self.consensus_runtimes.lock(),
+			sp_core::traits::CodeContext::Offchain => self.offchain_runtimes.lock(),


So offchain could not use consensus instance, since heappage differs I guess it would not be a good idea to try.
I do like that there is a clean separation.

cheme · 2021-08-06T16:17:32Z

client/service/src/client/wasm_override.rs

-		let version = WasmOverride::runtime_version(&executor, &wasm, Some(128))
-			.expect("should get the `RuntimeVersion` of the test-runtime wasm blob");
+		let version =
+			WasmOverride::runtime_version(&executor, &wasm, Some(128), CodeContext::Consensus)


Does wasm override apply to offchain?
If so will it run with consensus context?

cheme · 2021-08-06T18:28:03Z

client/executor/src/native_executor.rs

+/// Default num of pages for the heap of the runtime, when used for offchain operations.
+///
+/// 256mb per instance.
+const DEFAULT_HEAP_PAGES_OFFCHAIN: u64 = DEFAULT_HEAP_PAGES_CONSENSUS * 2;


Having bigger requirement for offchain is a bit awkward.
I mean most offchain process should be light (I am only aware of 'I am online', but its requirement are certainly small).
So we use a big mem profile for offchain phragmen mostly.
This makes me think that offchain worker default heap page could depend on the offchain method being call.
Same if considering overriding from cli.
(I am not sure the pool allow switching instance depending on heap page, but could also use a single use instance for this special case)

stale · 2021-09-05T18:33:32Z

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

kianenigma · 2021-09-07T08:47:30Z

we will probably kill this in favour of whatever @bkchr is cooking.

stale · 2021-10-07T09:35:47Z

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

bkchr · 2021-10-07T09:48:58Z

What @kianenigma said :P

kianenigma · 2021-10-12T12:03:05Z

Wen dynamic heap pages @bkchr?

bkchr · 2021-10-12T12:06:47Z

++++++++++

kianenigma · 2022-04-19T15:38:41Z

honestly, I still come back to this and think it is a good interim feature to have. WDYT @paritytech/sdk-node?

bkchr · 2022-04-19T19:05:50Z

🙈

koute · 2022-04-20T07:24:49Z

honestly, I still come back to this and think it is a good interim feature to have. WDYT @paritytech/sdk-node?

The heap_pages parameter doesn't actually change the maximum number of pages the runtime's memory is allowed to grow to, but the minimum number of pages which will be allocated by default. Basically the original code did this ("did" because I've merged steps 1 and 2 in a subsequent refactoring, but the behavior's still the same; also, I'm talking about the wasmtime executor, not sure if others have exactly the same behavior):

Instantiate a new runtime with whatever the default number of pages is specified inside of the WASM blob. (This can be changed when the runtime is compiled by passing an appropriate flag to the linker, so in theory it can be anything, although in our case we don't change it.)
Grow the memory by extra heap_pages specified. Now we have N + heap_pages preallocated for our memory.
(N + heap_pages) * PAGE_SIZE - __heap_base bytes of memory are now available for any subsequent heap allocation (where __heap_base is also determined at compile-time so it also can be anything, and PAGE_SIZE is 64K).

Just for your reference, here's the memory map of a current-ish Kusama runtime I have laying around:

0x000000 - 0x100000: stack
0x100000 - 0x159778: rodata
0x159778 - 0x159910: data
0x159910 - 0x15991c: bss
0x15991c - 0x159920: padding
0x159920 - 0x160000: heap

So basically any dynamic allocation made inside of the runtime is served by that heap chunk at the end, and the heap_pages enlarges that chunk by heap_pages * PAGE_SIZE.

Now, our runtime doesn't do this, but the heap can already be dynamically grown from within the runtime (using this intrinsic) and from what I can see our allocator supports this! And before you ask, yes, there is actually a limit: the max_memory_size specified in the sc_executor_wasmtime::Config struct caps how big the memory can grow, however for the normal runtime this is None, so essentially you could (in theory) allocate as much memory as you want. (And for PVFs this is capped at 2080 pages.)

Of course this is not very convenient, since it's not automatic. The allocator host function will panic outside of the runtime when running out of memory instead of automatically growing the heap, and you need to grow the heap from within the runtime, so essentially you'd have to either blindly grow the memory or somehow know that the next allocation will fail. We could easily modify the allocator to automatically try to grow the memory if we wanted to though.

Wen dynamic heap pages @bkchr?

TLDR: Technically it's already here due to how the system works, although it's not very convenient, and it's debatable whether it's intended to be actually used or not... 🤷

bkchr · 2022-04-20T07:49:28Z

Point here being that I have a branch that has all this already prepared. I just need to clean it up and fix two tests 🙈

kianenigma · 2022-04-20T10:42:45Z

@koute thanks for the explainer, and I think you basically unwrapped what basti has implemented in his head and WIP branch.

Please keep me up to date with further changes regarding this, and do bear in mind that allowing some OCW to allocate more memory (both DEFAULT_HEAP_PAGES and MAX_POSSIBLE_ALLOCATION, both of which are somewhat arbitrary constant in our codebase dictated to the whole world) will be of huge advantage.

If you happen to know another workaround to achieve this in the short term, please do let me know.

kianenigma · 2022-05-16T08:19:14Z

FYI, we're still running into this in a semi-random basis on the offchain workers of westned, in a rather unpredictable fashion: https://grafana.parity-mgmt.parity.io/goto/DF1cyNX7z?orgId=1

Working draft with some consideration about names.

e6d90db

github-actions bot added the A3-in_progress Pull request is in progress. No review needed at this stage. label May 22, 2021

kianenigma requested review from bkchr and tomusdrw May 22, 2021 21:52

kianenigma changed the title ~~Allow separate max-heap-pages for offchain execution context.~~ Allow separate max-heap-pages for offchain (non-consensus) execution context. May 22, 2021

kianenigma marked this pull request as ready for review May 25, 2021 09:24

kianenigma requested a review from andresilva as a code owner May 25, 2021 09:24

github-actions bot added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels May 25, 2021

gavofyork added B5-clientnoteworthy C3-medium PR touches the given topic and has a medium impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. labels May 25, 2021

kianenigma added 2 commits June 28, 2021 09:09

Master.into()

ca07e34

Fix all tests

e70e4db

kianenigma commented Jun 28, 2021

View reviewed changes

client/api/src/execution_extensions.rs Show resolved Hide resolved

kianenigma commented Jun 28, 2021

View reviewed changes

client/light/src/call_executor.rs Outdated Show resolved Hide resolved

kianenigma commented Jun 28, 2021

View reviewed changes

client/light/src/call_executor.rs Outdated Show resolved Hide resolved

kianenigma commented Jun 28, 2021

View reviewed changes

client/service/src/client/call_executor.rs Outdated Show resolved Hide resolved

github-actions bot added the A7-needspolkadotpr label Jun 28, 2021

fix some line width

1f7fabd

Fix

98897fa

kianenigma mentioned this pull request Jun 28, 2021

Companion for /substrate/pull/8885 paritytech/polkadot#3378

Closed

cheme reviewed Jun 28, 2021

View reviewed changes

kianenigma and others added 2 commits July 2, 2021 09:43

Update primitives/state-machine/src/lib.rs

9ce9263

Co-authored-by: cheme <emericchevalier.pro@gmail.com>

Master.into()

8630537

kianenigma added 2 commits July 31, 2021 15:58

Remove the stale heap pages param

a4fb6c6

Merge branch 'master' of github.com:paritytech/substrate into kiz-ocw…

2242bd5

…-max-heap-pages

kianenigma added 2 commits August 5, 2021 16:11

Master.into()

249286b

fmt it properly

946cfde

kianenigma requested review from bkchr and cheme August 6, 2021 10:02

kianenigma mentioned this pull request Aug 6, 2021

Tracking Issue for multi-block election + dedicated election system parachain paritytech/polkadot-sdk#461

Open

19 tasks

cheme reviewed Aug 6, 2021

View reviewed changes

stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Sep 5, 2021

stale bot removed the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Sep 7, 2021

stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Oct 7, 2021

bkchr closed this Oct 7, 2021

bkchr deleted the kiz-ocw-max-heap-pages branch October 7, 2021 09:49

louismerlin removed the D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. label Oct 13, 2022

	fn into_execution_configs(self) -> ExecutionConfigs {
	fn into_offchain_execution_configs(self) -> ExecutionConfigs {

Allow separate max-heap-pages for offchain (non-consensus) execution context. #8885

Allow separate max-heap-pages for offchain (non-consensus) execution context. #8885

Conversation

kianenigma commented May 22, 2021 • edited

bkchr commented May 23, 2021

kianenigma commented May 25, 2021

kianenigma commented Jun 28, 2021

kianenigma commented Jun 28, 2021

cheme commented Jun 28, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheme Jul 8, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kianenigma commented Jul 31, 2021

kianenigma commented Aug 5, 2021

kianenigma commented Aug 6, 2021 • edited

cheme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Sep 5, 2021

kianenigma commented Sep 7, 2021

stale bot commented Oct 7, 2021

bkchr commented Oct 7, 2021

kianenigma commented Oct 12, 2021

bkchr commented Oct 12, 2021

kianenigma commented Apr 19, 2022

bkchr commented Apr 19, 2022

koute commented Apr 20, 2022

bkchr commented Apr 20, 2022

kianenigma commented Apr 20, 2022

kianenigma commented May 16, 2022

kianenigma commented May 22, 2021 •

edited

cheme commented Jun 28, 2021 •

edited

cheme Jul 8, 2021 •

edited

kianenigma commented Aug 6, 2021 •

edited