chain-spec building failure due to WASM allocation size #5419

gpestana · 2024-08-20T14:06:44Z

I'm trying to build a chain-spec with a large number of nominators and validators for the multi-block election tests but given the amount of stakers to add, the chain-spec generation fails with:

❯ ~/cargo_target/debug/staking-node build-spec --disable-default-bootnode > chain-specs/staking-max.spec --raw
2024-08-20 14:58:04 Building chain spec
2024-08-20 14:58:07 going to fail due to allocating 54354570
Error: Service(Other("wasm call error Requested allocation size is too large"))

Is there any way to increase the WASM allocation size at the spec building time?

This issue seems to be similar to paritytech/substrate#11132 but at the time of chain-spec building, not block production.

The text was updated successfully, but these errors were encountered:

michalkucharczyk · 2024-08-26T07:33:36Z

Could you try this? Seems that this error is reported when the allocation exceeds this value:

polkadot-sdk/substrate/primitives/core/src/lib.rs

Lines 408 to 411 in 178e699

    
           /// The maximum number of bytes that can be allocated at one time. 
        
           // The maximum possible allocation size was chosen rather arbitrary, 32 MiB should be enough for 
        
           // everybody. 
        
           pub const MAX_POSSIBLE_ALLOCATION: u32 = 33554432; // 2^25 bytes, 32 MiB

michalkucharczyk · 2024-08-26T07:36:11Z

from what I know It is not possible to change this value during chain spec building.

gpestana · 2024-08-27T15:57:28Z

@michalkucharczyk thanks for the tip, it also requires a few other changes but in the end I got it working. The solution for this has been posted in the stackexchange for the record https://substrate.stackexchange.com/questions/11863/allocate-extra-wasm-memory-to-generate-large-chainspecs/11864

michalkucharczyk · 2024-08-27T18:42:33Z

Thank you for writing this down.

kianenigma · 2024-09-03T11:02:35Z

Should we address this at a more fundamental level? We can't expect everyone to change the code.

All WASM memory limits should ideall be removed in chain spec generation and OCW code path.

bkchr · 2024-09-03T19:00:36Z

What you want is this: polkadot-fellows/RFCs#4 and then to move the allocator inside of the runtime.

gpestana · 2024-09-10T12:39:43Z

@bkchr we've been discussing about increasing the max number of validators that can be registered in the system and be part of the snapshot. From a few experiments locally, the offchain miner panics with failure to allocating memory due to the current limits when calculating the election. I wonder if we could double the MAX_POSSIBLE_ALLOCATION to 64MiB and increase the max number of mem pages in the current hardcoded limits.

I'm now trying to improve the code so require less memory allocation, in any case, if we need to increase the limits, are there any thing to consider/side effects from increasing those limits?

bkchr · 2024-09-12T12:49:19Z

We can not just bump these limits. This changes the behavior of the allocator. If possible, I would not like to touch this at all before we move this to the runtime.

I think I proposed this already somewhere, but can we for now just not change the way we register them at genesis? Either pass them with two fields into the runtime or use append to add them.

@gpestana do you know where exactly the allocation is failing?

kianenigma · 2024-09-12T16:31:47Z

I think I proposed this already somewhere, but can we for now just not change the way we register them at genesis? Either pass them with two fields into the runtime or use append to add them.

The current issue is about OCW code path actually, no longer genesis, but @gpestana knows better and can point out exactly where we run OOM.

It is kind of silly to have the OCW/Genesis code path, which are by no means "consensus-critical" be subject to the memory limits that are afaik only arguable in consensus code paths. I hope we can hack around it.

In the old substrate days, I made a PR that never got merged, but it did something like this:

On the client side, we attached a enum Execution { Consensus, Offchain } to every wasm invocation. If Execution::Offchain, then these memory limits where a lot higher, or non existent. Would something like that make senese?

bkchr · 2024-09-12T20:17:06Z

On the client side, we attached a enum Execution { Consensus, Offchain } to every wasm invocation. If Execution::Offchain, then these memory limits where a lot higher, or non existent. Would something like that make senese?

We have this, but even in the old days you did not modify these constants.

gpestana · 2024-09-18T22:53:32Z

As a way to be more lenient wrt to memory constrains in the offchain execution mode, I'm parameterising the const MAX_ALLOCATING_MEMORY. This would unlock a couple of things: 1. chainspec generation won't fail for large states (e.g. init the chain with a staking state similar to currently in Polkadot) and to be more lenient on offchain staking-miners and other offchain workloads that can hit the constant memory limits imposed currently by the client.

My current direction is to expose those params in the CLI and overwrite the constants if the call context is of type CallContext::Offchain. The new parameters could be piggybacked into the struct HeapAllocStrategy, which is already taken in by the WasmExecutor::with_instance.

wdyt about this approach?

gpestana · 2024-10-16T13:08:23Z

PR to address this #5419

bkchr · 2024-10-16T20:07:38Z

It is kind of silly to have the OCW/Genesis code path, which are by no means "consensus-critical" be subject to the memory limits that are afaik only arguable in consensus code paths. I hope we can hack around it.

Just because you don't know how the allocator works, doesn't mean that the current way is "silly". As I said, you can not just change these constants without knowing what they are doing or how the allocator works.

When you change the allocator as done in #6081, it means the moment there is a two gigabyte allocation, this memory is never "freed" again. You will only be able to reuse this space. Maybe you are lucky and what you are doing works, but I clearly don't want to depend on luck for this kind of change. But for example for a Vec it allocates probably in multiple of two steps and for a 2 gigabyte vector. This means that the total number of allocations is reduced quite a lot.

What you want is this: polkadot-fellows/RFCs#4 and then to move the allocator inside of the runtime.

What I said here is the solution. It would be a better use of time to try to push this forward, instead of trying to come up with hacks that we can not accept.

kianenigma · 2024-10-18T14:39:01Z

When you change the allocator as done in #6081, it means the moment there is a two gigabyte allocation, this memory is never "freed" again.

If that WASM instance is only used to generate the genesis config, or a one-off offchain worker, and then it is thrown away, is this a real issue?

You are right to protest that I don't exactly know how these wasm executor/allocator stuff is working under the hood, and perhaps my use of the word silly was not good, so let me rephrase: The current situation seems "unnecessarily restrictive".

Offchain WASM instances should have higher memory limits, and ideally kept separate from all the ones that are used for onchain execution. The offchain ones might be prone to this flaw that you are talking about, but, so be it, they are not particularly important anyways 🤷

bkchr · 2024-10-18T22:38:48Z

If that WASM instance is only used to generate the genesis config, or a one-off offchain worker, and then it is thrown away, is this a real issue?

For sure that will not lead to any problem with the on chain execution. However, depending on the access in the offchain worker, the offchain worker could die because it runs out of memory. And it will run out of memory faster with the proposed solution.

Maybe could someone explain me what we mean by "allowing more validators"? More validators can apply for getting nominated? Also don't we have the solution miner that runs totally independent of the on chain runtime?

michalkucharczyk · 2024-10-20T09:58:01Z

dumb thought : we could only increase limit in particular binary (chain-spec-builder), while keeping old limit in node.

bkchr · 2024-10-20T19:19:32Z

Yeah, that would be an acceptable solution for now.

gpestana added the T0-node This PR/Issue is related to the topic “node”. label Aug 20, 2024

gpestana closed this as completed Aug 27, 2024

kianenigma reopened this Sep 3, 2024

michalkucharczyk mentioned this issue Sep 17, 2024

Can't run the node after updating to polkadot-v1.8.0 #5744

Closed

skunert mentioned this issue Oct 1, 2024

Add overhead benchmark to frame-omni-bencher #5891

Merged

1 task

michalkucharczyk mentioned this issue Oct 15, 2024

[DNM] Add offchain executor params #5278

Closed

gpestana linked a pull request Oct 15, 2024 that will close this issue

Changes client memory allocation limits when executing in the offchain context #6081

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chain-spec building failure due to WASM allocation size #5419

chain-spec building failure due to WASM allocation size #5419

gpestana commented Aug 20, 2024

michalkucharczyk commented Aug 26, 2024 •

edited

Loading

michalkucharczyk commented Aug 26, 2024 •

edited

Loading

gpestana commented Aug 27, 2024

michalkucharczyk commented Aug 27, 2024

kianenigma commented Sep 3, 2024

bkchr commented Sep 3, 2024

gpestana commented Sep 10, 2024

bkchr commented Sep 12, 2024

kianenigma commented Sep 12, 2024

bkchr commented Sep 12, 2024

gpestana commented Sep 18, 2024

gpestana commented Oct 16, 2024

bkchr commented Oct 16, 2024 •

edited

Loading

kianenigma commented Oct 18, 2024

bkchr commented Oct 18, 2024

michalkucharczyk commented Oct 20, 2024

bkchr commented Oct 20, 2024

chain-spec building failure due to WASM allocation size #5419

chain-spec building failure due to WASM allocation size #5419

Comments

gpestana commented Aug 20, 2024

michalkucharczyk commented Aug 26, 2024 • edited Loading

michalkucharczyk commented Aug 26, 2024 • edited Loading

gpestana commented Aug 27, 2024

michalkucharczyk commented Aug 27, 2024

kianenigma commented Sep 3, 2024

bkchr commented Sep 3, 2024

gpestana commented Sep 10, 2024

bkchr commented Sep 12, 2024

kianenigma commented Sep 12, 2024

bkchr commented Sep 12, 2024

gpestana commented Sep 18, 2024

gpestana commented Oct 16, 2024

bkchr commented Oct 16, 2024 • edited Loading

kianenigma commented Oct 18, 2024

bkchr commented Oct 18, 2024

michalkucharczyk commented Oct 20, 2024

bkchr commented Oct 20, 2024

michalkucharczyk commented Aug 26, 2024 •

edited

Loading

michalkucharczyk commented Aug 26, 2024 •

edited

Loading

bkchr commented Oct 16, 2024 •

edited

Loading