Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic INITIAL_MEMORY for statically large modules #20888

Open
SingleAccretion opened this issue Dec 10, 2023 · 14 comments
Open

Automatic INITIAL_MEMORY for statically large modules #20888

SingleAccretion opened this issue Dec 10, 2023 · 14 comments

Comments

@SingleAccretion
Copy link
Contributor

Right now the default value of INITIAL_MEMORY is 16MB, which sets a hard limit on the amount of static data in a module - the linker will error out if the amount is insufficient. To work around this, you need to be able to calculate the number beforehand, e. g. the official .NET SDK guesstimates by doing something to the effect of going over all of the inputs and summing their size.

This all seems rather self-inflicted. The linker already knows how much memory will it need, and indeed, if you don't pass --initial-memory, it will already "do the right thing".

The question is, can this automatic behavior be supported by Emscripten natively?

Note: this is all assuming memory growth is allowed, of course.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 10, 2023

The problem I suppose is the INITIAL_MEMORY really means + stack + static data + heap, but what you really want is a way to specify just the heap size.

I guess for most folks this isn't a huge issue since most programs don't tend to have a lot of static data. 16mb of static data is fairly unusual I think.

Would -sINITIAL_HEAP help here? We could have it default to 16mb too, but that would be 16mb on top of the static data. Right now we don't have a wasm-ld flag for this so we would need to add a new one.

@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Dec 10, 2023

My initial thought was a way to simply not pass anything (effectively, initial heap size of zero). Allowing one to configure the initial heap size on top of that would be an optimization - how significant is it?

I guess for most folks this isn't a huge issue since most programs don't tend to have a lot of static data. 16mb of static data is fairly unusual I think.

That's true, the motivation here is removing the (unnecessary) complexity from the toolchain.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 12, 2023

My initial thought was a way to simply not pass anything (effectively, initial heap size of zero). Allowing one to configure the initial heap size on top of that would be an optimization - how significant is it?

Sadly this won't work without some other changes since we default to a non-growable memory. Most programs would not work with a heap size of zero.

I guess for most folks this isn't a huge issue since most programs don't tend to have a lot of static data. 16mb of static data is fairly unusual I think.

That's true, the motivation here is removing the (unnecessary) complexity from the toolchain.

Are you referring to existing complexity in emscripten? Or in the .NET toolchain?

@sbc100
Copy link
Collaborator

sbc100 commented Dec 12, 2023

Ah, I assume you are referring to the referring guesstimate code that you linked to above. I agree that should really not be necessary.

@SingleAccretion
Copy link
Contributor Author

Sadly this won't work without some other changes since we default to a non-growable memory. Most programs would not work with a heap size of zero.

Would making this only supported with -sALLOW_MEMORY_GROWTH=1 work (it was my intention)? I can implement the larger feature too, it just seemed to me that -sALLOW_MEMORY_GROWTH=0 is not really a used-in-production sort of setting (is that wrong?).

@sbc100
Copy link
Collaborator

sbc100 commented Dec 13, 2023

Sadly this won't work without some other changes since we default to a non-growable memory. Most programs would not work with a heap size of zero.

Would making this only supported with -sALLOW_MEMORY_GROWTH=1 work (it was my intention)? I can implement the larger feature too, it just seemed to me that -sALLOW_MEMORY_GROWTH=0 is not really a used-in-production sort of setting (is that wrong?).

Sure, I think making this work for -sALLOW_MEMORY_GROWTH=1 is conceivable.

But the default setting is -sALLOW_MEMORY_GROWTH=0 and I assume that what most folks use. Allowing memory growth does have some downsides and I imagine changing the default to -sALLOW_MEMORY_GROWTH=1 could have some negative repercussions.

But defaulting to simply not passing --initial-memory in the case when the memory is growable seems like it could.. assuming the default for max memory is high enough (we still need to pass that default).

@SingleAccretion
Copy link
Contributor Author

I will work on the full INITIAL_HEAP feature then. What is the usual delay between adding an LLVM feature and being able to consume it in Emscripten (weeks/months)?

@sbc100
Copy link
Collaborator

sbc100 commented Dec 13, 2023

Emscripten uses tip of tree llvm. We integrate changes right away.

sbc100 pushed a commit to llvm/llvm-project that referenced this issue Dec 15, 2023
It is beneficial to preallocate a certain number of pages in the linear
memory (i. e. use the "minimum" field of WASM memories) so that fewer
"memory.grow"s are needed at startup.

So far, the way to do that has been to pass the "--initial-memory"
option to the linker. It works, but has the very significant downside of
requiring the user to know the size of static data beforehand, as it
must not exceed the number of bytes passed-in as "--initial-memory".

The new "--initial-heap" option avoids this downside by simply appending
the specified number of pages to static data (and stack), regardless of
how large they already are.

Ref: emscripten-core/emscripten#20888.
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Dec 16, 2023

@sbc100 I've drawn up the design sketch for the Emscripten side of this work:

First-class Emscripten setting ("-sINITIAL_HEAP").

Defaults:
  Old: INITIAL_MEMORY=16MB
  New: INITIAL_MEMORY=<unspecified>
       INITIAL_HEAP=16MB; backwards compatibility: cannot be less than previously set

Interactions:
 -- With MAXIMUM_MEMORY
    -- Default: cannot be larger than (MAXIMUM_MEMORY - (stack size + size of static data)).
       -- We don't the size of static data, hence to preserve compatibility, we cannot use
          INITIAL_HEAP when MAXIMUM_MEMORY is specified.
    -- Explicit: defer validation to wasm-ld.
 -- With INITIAL_MEMORY
    -- Default: should not be passed.
    -- Explicit: defer validation to wasm-ld.
       -- Alternative: make it an error?

Does it look good to you?

As INITIAL_MEMORY is already defaulted right now, it seems hard to not expose this as a first-class setting - that would require digging through user-supplied linker args (-Wl) for --initial-heap presence, and omitting --initial-memory if it is present.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 18, 2023

This sounds good to me.

I think we could probably also deprecate INITIAL_MEMORY at some point since it seems strictly better to use STACK_SIZE and INITIAL_HEAP.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 19, 2023

The llvm change should now be available in emscripten, of you would like a set a PR here.

@SingleAccretion
Copy link
Contributor Author

Yep, working on it. It turns out there are a few places that rely on INITIAL_MEMORY (in link.py), and a certain amount of care is needed to have things not break.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 19, 2023

Yep, working on it. It turns out there are a few places that rely on INITIAL_MEMORY (in link.py), and a certain amount of care is needed to have things not break.

I wonder if we could just make INITIAL_MEMORY into an alias for INITIAL_HEAP to avoid some code complexity. It would mean that users of INITIAL_MEMORY would actually get a little larger of a memory than they asked for.. but maybe that would be OK?

Alternatively we just deprecate INITIAL_MEMORY completely after a certain amount of time, but keep any associated code complexity until that point?

@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Dec 19, 2023

I wonder if we could just make INITIAL_MEMORY into an alias for INITIAL_HEAP to avoid some code complexity. It would mean that users of INITIAL_MEMORY would actually get a little larger of a memory than they asked for.. but maybe that would be OK?

There is a significant difference between the two settings: INITIAL_MEMORY is a hard/known ceiling on the initial memory. With it you know the exact amount of initial memory. With INITIAL_HEAP, you don't know that number until after link, which can be too late.

Here's the current version of the changes, I think it illustrates the differences: https://github.com/emscripten-core/emscripten/compare/main...SingleAccretion:emscripten:Initial-Heap?expand=1. I still need to fix up the case where INITIAL_MEMORY is used to create the memory at runtime.

Alternatively we just deprecate INITIAL_MEMORY completely after a certain amount of time, but keep any associated code complexity until that point?

It is certainly possible to make breaking changes. INITIAL_MEMORY seems fairly fundamental / widely used, so it would need a good deprecation period. If/when the break would justify itself, I suppose I do not know (due to lack of data).

qihangkong pushed a commit to rvgpu/rvgpu-llvm that referenced this issue Apr 23, 2024
It is beneficial to preallocate a certain number of pages in the linear
memory (i. e. use the "minimum" field of WASM memories) so that fewer
"memory.grow"s are needed at startup.

So far, the way to do that has been to pass the "--initial-memory"
option to the linker. It works, but has the very significant downside of
requiring the user to know the size of static data beforehand, as it
must not exceed the number of bytes passed-in as "--initial-memory".

The new "--initial-heap" option avoids this downside by simply appending
the specified number of pages to static data (and stack), regardless of
how large they already are.

Ref: emscripten-core/emscripten#20888.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants