Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add emscripten_memset_js and use it from memset #21683

Closed
wants to merge 2 commits into from

Conversation

kg
Copy link

@kg kg commented Apr 2, 2024

Addresses #21620

Sorry if I got something wrong here, I'm kind of flying blind based on the documentation... I was only able to successfully run the "other" tests and some of them fail out of the box on an unmodified checkout for me.

I'm guessing I need to update all of the text files under test/ that mention _emscripten_memcpy_js to also mention _emscripten_memset_js? Or are they automatically generated? When I did a local run it only seemed like my changes added one additional test failure, so I'm guessing I ran the wrong tests.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

If your change effects the codesize tests you will want to run test/runner other.*code_size* other.*metadce* --rebase.

Bare in mind these tests are fairly sensitive, so you will want to make sure you hare using llvm and binary from the tip-of-tree version of emsdk. To do this you install emsdk with emsdk install tot && emsdk activate tot and then use export EM_CONFIG=/path/to/emsdk/.emscripten. The result if this is that when you run ./emcc or ./test/runner from your emscripten checkout (whereever it is) it will inherit everything else from emsdk (i.e. llvm and binaryen)

@@ -1237,7 +1238,7 @@ addToLibrary({
{{{ makeSetValue('tm', C_STRUCTS.tm.tm_yday, 'arraySum(isLeapYear(fullDate.getFullYear()) ? MONTH_DAYS_LEAP : MONTH_DAYS_REGULAR, fullDate.getMonth()-1)+fullDate.getDate()-1', 'i32') }}};
{{{ makeSetValue('tm', C_STRUCTS.tm.tm_isdst, '0', 'i32') }}};
{{{ makeSetValue('tm', C_STRUCTS.tm.tm_gmtoff, 'date.gmtoff', LONG_TYPE) }}};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe revert these whitespace changes.

_emscripten_memset_js(str, c, n);
return str;
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like emscripten_memcpy.c only uses the _js version once in that file, in the last case. That is, it doesn't use it in the first case, here, for -Oz/ASan. Is there a reason to do things differently for memset?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want good performance in -Oz, right? I can understand not wanting it to apply for asan, though, since the JS fill will bypass asan checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for ASan I think we want to be able to instrument it as you said, and performance is less critical there.

For -Oz we've focused on reducing size at all costs, which includes avoiding loop unrolling here. I think avoiding adding a fast path through JS is consistent with that?

@juj
Copy link
Collaborator

juj commented Apr 3, 2024

I see that we currently have the following:

image

I wonder why bulk memory operations are not used in -Oz? I would have though that they should be, the Wasm opcodes for memset and memcpy should provide the smallest code?

Reading PR #19128 I was not able to find why EMSCRIPTEN_OPTIMIZE_FOR_OZ should take precedence over __wasm_bulk_memory__?

Unless there is some counterintuitive reason (which would be a bit sad if that is the case :( ), I think the code for memset would rather want to look like:

#ifdef __wasm_bulk_memory__

void *__memset(void *str, int c, size_t n) {
  return _emscripten_memset_bulkmem(str, c, n);
}

#elif defined(EMSCRIPTEN_OPTIMIZE_FOR_OZ)

void *__memset(void *str, int c, size_t n) {
  unsigned char *s = (unsigned char *)str;
#pragma clang loop unroll(disable)
  while(n--) *s++ = c;
  return str;
}

#else
...

That should make the memset run fast also in interpreted wasm mode I'd believe? (in Wasm VMs, memsets and memcpys should call fast SIMD operations even when interpreting)

Looks like the same situation applies to memcpy as well: EMSCRIPTEN_OPTIMIZE_FOR_OZ is set to take precedence over __wasm_bulk_memory__, but the other way around would be smaller code size, and more performant as well?

@kripken
Copy link
Member

kripken commented Apr 3, 2024

@juj I think you're right, we should use bulk memory in -Oz if available since it is compact. In fact that might make the other question I had above a moot point.

@kg
Copy link
Author

kg commented Apr 3, 2024

It sounds like the correct construction for both memset and memcpy, then, is:

if (asan)
  simple
elif (bulk-memory)
  bulk-memory
elif (-Oz)
  smallest
else
  fastest non-bulk-memory

Is that right?

For 'smallest' there is the difficulty in weighing the extra bytes for the JS helper against the performance loss from a bare memcpy/memset loop - it seems like it could be significant. And then I guess there's the standalone flag that enters the equation, in standalone mode we can't rely on JS.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 4, 2024

Indeed. In standalone mode we try our best to avoid JS imports.

@kripken
Copy link
Member

kripken commented Apr 4, 2024

Yes, exactly, I believe that's the optimal order. I think we should do that in this PR and then as a separate followup we can fix memcpy.

@kg
Copy link
Author

kg commented Apr 4, 2024

Can we fit the JS helper into -Oz? I think that's what we use on our end. I imagine it adds a couple hundred bytes pre-minification, and I don't know how significant that is for size-constrained emscripten users.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 4, 2024

Can we fit the JS helper into -Oz? I think that's what we use on our end. I imagine it adds a couple hundred bytes pre-minification, and I don't know how significant that is for size-constrained emscripten users.

I think its best to stick the simple/small solution in -Oz, without that extra JS import (the import itself takes a few bytes).

Going forward I imagine most folks will be using bulk memory so it should be a mute point anyway soon.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 4, 2024

In fact.. soon we can completely remove the explicit call to JS any since the plan for enabling bulk memory is to build with bulk-memory enabled. For users targeting legacy browsers we will then have binaryen lower the bulk memory instructions into something else (maybe a call to an import, maybe an inline implementation).

@kg
Copy link
Author

kg commented Apr 30, 2024

It feels like this particular change is probably not worth it for most users, due to the conflict between code size demands and performance. (Plus I never figured out how to make the tests pass.) So I'm going to close it. Feel free to ping me if you'd like to see me bring it back.

@kg kg closed this Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants