Add emscripten_memset_js and use it from memset #21683

kg · 2024-04-02T21:17:35Z

Addresses #21620

Sorry if I got something wrong here, I'm kind of flying blind based on the documentation... I was only able to successfully run the "other" tests and some of them fail out of the box on an unmodified checkout for me.

I'm guessing I need to update all of the text files under test/ that mention _emscripten_memcpy_js to also mention _emscripten_memset_js? Or are they automatically generated? When I did a local run it only seemed like my changes added one additional test failure, so I'm guessing I ran the wrong tests.

sbc100

lgtm!

If your change effects the codesize tests you will want to run test/runner other.*code_size* other.*metadce* --rebase.

Bare in mind these tests are fairly sensitive, so you will want to make sure you hare using llvm and binary from the tip-of-tree version of emsdk. To do this you install emsdk with emsdk install tot && emsdk activate tot and then use export EM_CONFIG=/path/to/emsdk/.emscripten. The result if this is that when you run ./emcc or ./test/runner from your emscripten checkout (whereever it is) it will inherit everything else from emsdk (i.e. llvm and binaryen)

sbc100 · 2024-04-03T00:08:50Z

src/library.js

@@ -1237,7 +1238,7 @@ addToLibrary({
      {{{ makeSetValue('tm', C_STRUCTS.tm.tm_yday, 'arraySum(isLeapYear(fullDate.getFullYear()) ? MONTH_DAYS_LEAP : MONTH_DAYS_REGULAR, fullDate.getMonth()-1)+fullDate.getDate()-1', 'i32') }}};
      {{{ makeSetValue('tm', C_STRUCTS.tm.tm_isdst, '0', 'i32') }}};
      {{{ makeSetValue('tm', C_STRUCTS.tm.tm_gmtoff, 'date.gmtoff', LONG_TYPE) }}};
-


Maybe revert these whitespace changes.

kripken · 2024-04-03T03:31:37Z

system/lib/libc/emscripten_memset.c

+    _emscripten_memset_js(str, c, n);
+    return str;
+  }
+#endif


It looks like emscripten_memcpy.c only uses the _js version once in that file, in the last case. That is, it doesn't use it in the first case, here, for -Oz/ASan. Is there a reason to do things differently for memset?

We want good performance in -Oz, right? I can understand not wanting it to apply for asan, though, since the JS fill will bypass asan checks.

Yeah, for ASan I think we want to be able to instrument it as you said, and performance is less critical there.

For -Oz we've focused on reducing size at all costs, which includes avoiding loop unrolling here. I think avoiding adding a fast path through JS is consistent with that?

juj · 2024-04-03T07:35:09Z

I see that we currently have the following:

I wonder why bulk memory operations are not used in -Oz? I would have though that they should be, the Wasm opcodes for memset and memcpy should provide the smallest code?

Reading PR #19128 I was not able to find why EMSCRIPTEN_OPTIMIZE_FOR_OZ should take precedence over __wasm_bulk_memory__?

Unless there is some counterintuitive reason (which would be a bit sad if that is the case :( ), I think the code for memset would rather want to look like:

#ifdef __wasm_bulk_memory__

void *__memset(void *str, int c, size_t n) {
  return _emscripten_memset_bulkmem(str, c, n);
}

#elif defined(EMSCRIPTEN_OPTIMIZE_FOR_OZ)

void *__memset(void *str, int c, size_t n) {
  unsigned char *s = (unsigned char *)str;
#pragma clang loop unroll(disable)
  while(n--) *s++ = c;
  return str;
}

#else
...

That should make the memset run fast also in interpreted wasm mode I'd believe? (in Wasm VMs, memsets and memcpys should call fast SIMD operations even when interpreting)

Looks like the same situation applies to memcpy as well: EMSCRIPTEN_OPTIMIZE_FOR_OZ is set to take precedence over __wasm_bulk_memory__, but the other way around would be smaller code size, and more performant as well?

kripken · 2024-04-03T16:30:30Z

@juj I think you're right, we should use bulk memory in -Oz if available since it is compact. In fact that might make the other question I had above a moot point.

kg · 2024-04-03T23:54:19Z

It sounds like the correct construction for both memset and memcpy, then, is:

if (asan)
  simple
elif (bulk-memory)
  bulk-memory
elif (-Oz)
  smallest
else
  fastest non-bulk-memory

Is that right?

For 'smallest' there is the difficulty in weighing the extra bytes for the JS helper against the performance loss from a bare memcpy/memset loop - it seems like it could be significant. And then I guess there's the standalone flag that enters the equation, in standalone mode we can't rely on JS.

sbc100 · 2024-04-04T00:14:07Z

Indeed. In standalone mode we try our best to avoid JS imports.

kripken · 2024-04-04T00:17:38Z

Yes, exactly, I believe that's the optimal order. I think we should do that in this PR and then as a separate followup we can fix memcpy.

kg · 2024-04-04T00:19:48Z

Can we fit the JS helper into -Oz? I think that's what we use on our end. I imagine it adds a couple hundred bytes pre-minification, and I don't know how significant that is for size-constrained emscripten users.

sbc100 · 2024-04-04T00:26:28Z

Can we fit the JS helper into -Oz? I think that's what we use on our end. I imagine it adds a couple hundred bytes pre-minification, and I don't know how significant that is for size-constrained emscripten users.

I think its best to stick the simple/small solution in -Oz, without that extra JS import (the import itself takes a few bytes).

Going forward I imagine most folks will be using bulk memory so it should be a mute point anyway soon.

sbc100 · 2024-04-04T00:28:55Z

In fact.. soon we can completely remove the explicit call to JS any since the plan for enabling bulk memory is to build with bulk-memory enabled. For users targeting legacy browsers we will then have binaryen lower the bulk memory instructions into something else (maybe a call to an import, maybe an inline implementation).

kg · 2024-04-30T19:44:06Z

It feels like this particular change is probably not worth it for most users, due to the conflict between code size demands and performance. (Plus I never figured out how to make the tests pass.) So I'm going to close it. Feel free to ping me if you'd like to see me bring it back.

kg added 2 commits April 2, 2024 14:12

Add emscripten_memset_js and use it from memset

5d15dd6

Fix build

efdfe40

sbc100 reviewed Apr 3, 2024

View reviewed changes

kripken reviewed Apr 3, 2024

View reviewed changes

kg closed this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add emscripten_memset_js and use it from memset #21683

Add emscripten_memset_js and use it from memset #21683

kg commented Apr 2, 2024

sbc100 left a comment

sbc100 Apr 3, 2024

kripken Apr 3, 2024

kg Apr 3, 2024

kripken Apr 3, 2024

juj commented Apr 3, 2024

kripken commented Apr 3, 2024

kg commented Apr 3, 2024 •

edited

sbc100 commented Apr 4, 2024

kripken commented Apr 4, 2024

kg commented Apr 4, 2024

sbc100 commented Apr 4, 2024

sbc100 commented Apr 4, 2024

kg commented Apr 30, 2024

Add emscripten_memset_js and use it from memset #21683

Add emscripten_memset_js and use it from memset #21683

Conversation

kg commented Apr 2, 2024

sbc100 left a comment

Choose a reason for hiding this comment

sbc100 Apr 3, 2024

Choose a reason for hiding this comment

kripken Apr 3, 2024

Choose a reason for hiding this comment

kg Apr 3, 2024

Choose a reason for hiding this comment

kripken Apr 3, 2024

Choose a reason for hiding this comment

juj commented Apr 3, 2024

kripken commented Apr 3, 2024

kg commented Apr 3, 2024 • edited

sbc100 commented Apr 4, 2024

kripken commented Apr 4, 2024

kg commented Apr 4, 2024

sbc100 commented Apr 4, 2024

sbc100 commented Apr 4, 2024

kg commented Apr 30, 2024

kg commented Apr 3, 2024 •

edited