Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imply -msimd128 when -msse* or other similar flags are specified #12714

Open
RReverser opened this issue Nov 5, 2020 · 19 comments
Open

Imply -msimd128 when -msse* or other similar flags are specified #12714

RReverser opened this issue Nov 5, 2020 · 19 comments

Comments

@RReverser
Copy link
Collaborator

Right now, Emscripten errors out if one of the SIMD flags, e.g. -msse2, is used without also specifying -msimd128.

While this works fine for Wasm-specific projects, it adds friction to building existing projects relying on SSE intrinsics. For example, CMake checks that try to use those flags, error out by default, and the only ways to fix it are either 1) modify CFLAGS / CXXFLAGS / etc. on the caller side to include -msimd128, which will now affect all targets in CMake and not only those that used -msse2 or 2) selectively modify CMakeFiles to include this extra flag only in places where -msse2 is used too and only on Emscripten target.

Both those options are a bit cumbersome and I'd argue the underlying friction is unnecessary. Just like -pthread implies enabling atomics, bulk-memory etc., -msse2 and friends should imply enabling -msimd128 feature, as on their own they already are a clear signal that user intends to use SIMD instructions.

@RReverser RReverser added the SIMD label Nov 5, 2020
@tlively
Copy link
Member

tlively commented Nov 7, 2020

I'm slightly worried about folks getting SIMD when they don't mean to and being confused when their program fails to validate. This would happen if we make this change they naively run emcmake or emconfigure on a project that uses SSE in its native builds. If we make this change, I expect to get issues about that. On the other hand, the extra build system friction when users do intend to use SIMD is not great either. I wonder if we should just tell users to use EMCC_CFLAGS to sneak -msimd128 into the flags when they use CMake and want SSE to be detected correctly? That seems super hacky, though. What do you think, @sbc100?

@sbc100
Copy link
Collaborator

sbc100 commented Nov 7, 2020

Are you talking about projects that want to use our SSE2 emulation? So they are not actually using wasm simd directly? i.e. would they have otherwise unmodified projects?

Re (1): if your final binary is going to require wasm simd is there any downside to passing -msimd128 to only some of your targets.. its seems like it should be more of project-side settings (like the architecture you are targetting).

Re (2): this is seems like a low bar if they are also modifying sources, but if they are relying on the emulation of SSE2 I agree we should have a way to make it "just work".

Do we only support SSE2 emulation with wasm-simd enabled? (i.e. we don't have scaler fallback for SSE2 users)? If so, then this seems different to pthreads where we do support running pthread-enabled code in single threaded binaries, reducing all atomics to no-ops. So with -pthread its reasonable for libraries to build with -pthread enabled and then be linked into single threaded programs. For SIMD is sounds like that is not true, so explicitly opting in with the platform-specific flag seems more appropriate here.

Use EMCC_CFLAGS would make me sad.. but maybe I should just get over it.

@RReverser
Copy link
Collaborator Author

I don't think EMCC_CFLAGS is really a solution, because, like CFLAGS override mentioned in (1) in the description, it will cause - msimd128 to be passed even to targets that don't want to use SIMD, and it has side effect of e.g. autovectorising code in binaries that were explicitly built without any of - msse* or similar SIMD options. This can be surprising and problematic IMO.

@RReverser
Copy link
Collaborator Author

@tlively Yeah, I hear and am slightly worried about that concern too... I wonder if another compromise could be to have Emscripten-specific -s ENABLE_SIMD_INTRINSIC_EMULATION (or something similar with a better name) that would control enabling SIMD only for targets that explicitly pass one of the other flags. That is, when it's enabled, target using -msse2 would be treated as if it was built with -msimd128 - msse2, whereas target without one of the SSE/AVX flags would be treated as normal, without -msimd128.

If that option is disabled by default, it will be backward-compatible too, but when enabled, it would make portability simpler without user having to change all build scripts with Emscripten-specific logic and flags.

@tlively
Copy link
Member

tlively commented Nov 7, 2020

Are there really use cases where you would want to pass -msimd128 for some objects and not others in the same project? I don't think I buy that. The decision to use -msimd128 should be based on the capabilities of your target engine, not on what you want the optimizer to do. If we want users to pass an extra flag to make their projects build correctly, it might as well be -msimd128 rather than a new flag.

@RReverser
Copy link
Collaborator Author

Are there really use cases where you would want to pass -msimd128 for some objects and not others in the same project?

Yes, if a project in question (like one that motivated this issue) already compiles several implementations of the algorithm - a scalar, SSE version, AVX version, etc. - and then uses feature detection to choose one.

Obviously, the feature detection still needs to be modified for the Web, as there's no cpuid or similar equivalent for dynamically switching between implementations (yet), but there's no reason to add friction to the build system as well. We can allow scalar versions to be compiled as scalar, and -msse* / -mavx* versions to be built with SIMD, just like they're on native targets.

@juj
Copy link
Collaborator

juj commented Nov 8, 2020

I'm slightly worried about folks getting SIMD when they don't mean to and being confused when their program fails to validate.

This is the age old conversation we have had Emscripten since its inception. On one side of the conversation are arguments saying "what if I am porting an existing project and did not intend to use the feature but accidentally my project build system did?". We used to have dozens of examples of this: not supporting -I/absolute/path/to/a/directory/, ignoring if linked in input files are not found, ignoring if linked in symbols are not found, making C exit(x) be a no-op, not supporting -lfoo to link, ignoring -pthread and requiring -s USE_PTHREADS=1 instead, and so on come to mind with a quick memory probe...

The core train of though is that "what if you had a project build system that was strictly speaking incorrect for the web, but you wanted/expected some kind of "soft cushion" in that build system for sloppy cross-compilation purposes"?

Historically we have dropped such soft cushions one by one to make the compiler behave more standards-wise on all aspects. You ask for X and you will get X, instead of needing a Emscripten-specific way to ask for X.

My stand on that is that on the short run it can impact portability of a project that is starting out cross-compiling their codebase (but then need to work out a lot of enable-this/disable-thats to properly target the platform), but on the long term it has always been a benefit to act uniformly and rigorously across platforms.

Part of the issue is that when people expect soft cushions, they would often expect different soft cushions, and the soft cushions that are in place for one person, may get in the way for someone else, and then they would get very angry when they observe the wrong guesses in Emscripten of what soft cushions needed to be in place for them.

Exactly that happened on this "should -msse work standalone" conversation. I originally landed the SSE implementation with a "of course it should" perspective: -msse should enable targeting Wasm SIMD and SSE, but not enable LLVM autovectorization.

Then shortly after I vividly remember getting vocal angry feedback about that decision over at Discord from @Akaricchi on how I had broken their build system, which led to them authoring PR #11312 to add back in the soft cushion, which required having both -msse and -msimd128.

Part of the argument was that Emscripten does not have "real SSE" so of course -msse should not work on its own. Likewise Emscripten does not have "real pthreads", but they have limitations that can deadlock (so should we have -pthread?), and it does not have real GLES2, but GLSL must avoid dynamic loops and client-side rendering (so should we support -lGLESv2?)

Like I mentioned back then, I don't mind which way this kind of thing flips. We pass -msse2 -msimd128 -fno-vectorize for SIMD builds, and no flags for scalar builds. I am not convinced about the arguments that led to requiring both -msse and -msimd128, but if this is the kind of soft cushion that people expect to be there, then it is fine by me.

is there any downside to passing -msimd128 to only some of your targets..

-msimd128 enables autovectorization in LLVM by default. That can be bad for generated code size and often in our view carries no noticeable effect on performance beyond toy cases (YMMV though). So it may be the case that in order to build with -msse, one will also need to add -msimd128, but then remember to also add -fno-vectorize to undo the side effects of -msimd128.

Do we only support SSE2 emulation with wasm-simd enabled? (i.e. we don't have scaler fallback for SSE2 users)?

We do not support scalar emulation of SSEx intrinsics.

Are there really use cases where you would want to pass -msimd128 for some objects and not others in the same project?

Given that -msimd128 implies autovectorization, not passing it for some object files may be a way to selectively enable/disable vectorization - users may not be familiar with -fno-vectorize.

@Akaricchi
Copy link
Contributor

Here's the thing: when you pass a flag like -msse to a normal compiler, if the program compiles at all, then you're virtually guaranteed that this program will run on basically any CPU that's in use today that would've also been able to run that program compiled without -msse. This is a pretty reasonable assumption to make in 2020. By overloading that flag to imply -msimd128, that assumption is flipped on its head. Suddenly you're now asking the compiler to produce code that is not even compatible with the latest stable version of the wasm spec. Stuff like this must at the very least come with a BIG FAT WARNING. This may be ok to do when simd128 is stable and as ubiquitous as SSE is on x86, but that time is not now.

@juj
Copy link
Collaborator

juj commented Nov 8, 2020

Having a -s ENABLE_EXPERIMENTAL_SIMD128=1 flag that is required to enable -msse (without -msimd128) does sound good to me, default to 0, and when SIMD spec is stable, flip it over to default to ENABLE_EXPERIMENTAL_SIMD128=1.

Requiring a flag -msimd128 to enable -msse because "it is experimental" is not a good design, because -msimd128 has side effects beyond enabling SSE. Hence I would recommend:

  • default -s ENABLE_EXPERIMENTAL_SIMD128 to 0,
  • if -msimd128 is passed and -s ENABLE_EXPERIMENTAL_SIMD128=1 is not passed, error out with build error,
  • if -msse is passed and -s ENABLE_EXPERIMENTAL_SIMD128=1 is not passed, error out with build error,
  • when wasm simd is ratified to be stable, flip default of -s ENABLE_EXPERIMENTAL_SIMD128 to 1, so both -msse and -msimd128 will begin to work by default without errors.

That way users will not need to pass -msimd128 to enable targeting SSE when people are comfortable with it, and users will not need to use a flag -msimd128 with side effects to enable -msse.

@tlively
Copy link
Member

tlively commented Nov 8, 2020

I agree with @juj's analysis that we've tended toward removing "soft cushions" (that I would call "magic") from Emscripten, and I agree that trying to behave more predictably like a native compiler is a good goal. However, the fundamental disagreement here seems to be whether Emscripten should behave like a native compiler that targets Wasm or a like a native compiler that targets x86 or ARM.

If we are trying to behave like a native compiler targeting x86, then of course -msse should just work, but I would actually prefer Emscripten to behave like a native compiler targeting Wasm (i.e. behave like bare clang). In that case, -msse should not be expected to work on its own any more than -msse would be expected to work when compiling to ARM. That's why I don't see any problems with requiring the user to pass -msimd128 whenever they want to enable Wasm SIMD, just as they would with bare clang.

@RReverser
Copy link
Collaborator Author

I would actually prefer Emscripten to behave like a native compiler targeting Wasm (i.e. behave like bare clang)

I feel like that's counter to common use-cases (or, dare I say, even goals) of Emscripten, where the whole point is making it easier to port code originally written for x86 and POSIX to Wasm. If the goal was to behave like bare Clang, we'd lose many more "magical" features that make such portability easier.

Even in this particular case, Clang will unlikely ever have support for SSE / AVX / etc. intrinsics for the Wasm target as they're x86-specific, but Emscripten does provide support for their emulation. Since it does, it might as well make such emulation more user-friendly and closer to x86-oriented build script expectations.

As such, I agree with @juj's suggestion in #12714 (comment), which, IIUC, is building on the idea of flag I proposed above, but also suggesting to make it experimental and remove it in some future when SIMD becomes ubiquitous on Wasm.

RReverser added a commit to RReverser/libwebp2 that referenced this issue Nov 10, 2020
 - Switch from `-msse` to `-msse4.2`. In theory, those should be identical, but Emscripten currently has a bug where only latter form is supported: emscripten-core/emscripten#12746. Since we're using only SSE 4.2, an explicit form seems to be preferable anyway.
 - Add `-msimd128` to flags to actually enable WebAssembly SIMD when performing SIMD detection. It's currently required in addition to `-msse*` / `-mfpu=neon` flags which only perform translation of corresponding intrinsics to Wasm SIMD ones. See a discussion at emscripten-core/emscripten#12714 for automating this and making easier in the future.
 - Remove compilation branch that prevented definitions of `WP2_USE_SSE` and `WP2_USE_NEON` on Emscripten even when SIMD support was detected at compile-time.
 - Add an implementation of `WP2GetCPUInfo` for Emscripten which uses static `WP2_USE_*` flags to determine if a corresponding SIMD instruction is supported. This is because Wasm doesn't have proper feature detection (yet) and requires making separate build for SIMD version anyway.
RReverser added a commit to RReverser/libwebp2 that referenced this issue Nov 10, 2020
 - Switch from `-msse` to `-msse4.2`. In theory, those should be identical, but Emscripten currently has a bug where only latter form is supported: emscripten-core/emscripten#12746. Since we're using only SSE 4.2, an explicit form seems to be preferable anyway.
 - Add `-msimd128` to flags to actually enable WebAssembly SIMD when performing SIMD detection. It's currently required in addition to `-msse*` / `-mfpu=neon` flags which only perform translation of corresponding intrinsics to Wasm SIMD ones. See a discussion at emscripten-core/emscripten#12714 for automating this and making easier in the future.
 - Remove compilation branch that prevented definitions of `WP2_USE_SSE` and `WP2_USE_NEON` on Emscripten even when SIMD support was detected at compile-time.
 - Add an implementation of `WP2GetCPUInfo` for Emscripten which uses static `WP2_USE_*` flags to determine if a corresponding SIMD instruction is supported. This is because Wasm doesn't have proper feature detection (yet) and requires making separate build for SIMD version anyway.
RReverser added a commit to RReverser/libwebp2 that referenced this issue Nov 10, 2020
 - Switch from `-msse` to `-msse4.2`. In theory, those should be identical, but Emscripten currently has a bug where only latter form is supported: emscripten-core/emscripten#12746. Since we're using only SSE 4.2, an explicit form seems to be preferable anyway.
 - Add `-msimd128` to flags to actually enable WebAssembly SIMD when performing SIMD detection. It's currently required in addition to `-msse*` / `-mfpu=neon` flags which only perform translation of corresponding intrinsics to Wasm SIMD ones. See a discussion at emscripten-core/emscripten#12714 for automating this and making easier in the future.
 - Remove compilation branch that prevented definitions of `WP2_USE_SSE` and `WP2_USE_NEON` on Emscripten even when SIMD support was detected at compile-time.
 - Add an implementation of `WP2GetCPUInfo` for Emscripten which uses static `WP2_USE_*` flags to determine if a corresponding SIMD instruction is supported. This is because Wasm doesn't have proper feature detection (yet) and requires making separate build for SIMD version anyway.
@tlively
Copy link
Member

tlively commented Nov 11, 2020

I don't see how replacing one flag (-msimd128) with another flag (-sENABLE_EXPERIMENTAL_SIMD128=1) would reduce friction, although I do see that removing the extra flag entirely would reduce friction. How about we revisit this once SIMD is standardized and ubiquitous so that we can be reasonably sure that users won't be unpleasantly surprised when building their unmodified native project gives them SIMD?

The fact that -msimd128 can have additional undesirable side effects (e.g. turning on the vautovectorizer for no benefit) is mostly a temporary bug that should not be worked around by adding new Emscripten flags, either. The proper fix for the underlying problem would be to tune the autovectorizer in the WebAssembly LLVM backend. If we want to let users work around that in the short term, it would be better to document the issue and recommend using -fno-vectorize than to add a new Emscripten flag.

@RReverser
Copy link
Collaborator Author

I don't see how replacing one flag (-msimd128) with another flag (-sENABLE_EXPERIMENTAL_SIMD128=1) would reduce friction, although I do see that removing the extra flag entirely would reduce friction.

The important difference between the two flags is that, as mentioned above, passing -msimd128 unconditionally would affect scalar objects and targets too and make them dependant on SIMD, whereas an ENABLE_EXPERIMENTAL_SIMD128 (or whatever we name it) would only affect those targets that already make use of SSE, AVX or NEON.

Making use of such conditional flag would reduce friction, because, otherwise, developer has no choice but to carefully add -msimd128 as an extra flag conditionally only on Emscripten and only to commands that already make use of -msse* in the build scripts.

@juj
Copy link
Collaborator

juj commented Nov 11, 2020

The proper fix for the underlying problem would be to tune the autovectorizer in the WebAssembly LLVM backend.

It is not possible to fix the increased code size caused by autovectorization in the backend, because the backend cannot know when the added code size is not worth the diminishing benefit on performance.

and recommend using -fno-vectorize than to add a new Emscripten flag.

That works for me too.

I am not convinced by Akaricchi's argument above either - the issue is not about wasm simd being experimental, but, again, about people expecting soft cushions when porting projects. This trend has occurred so many times already. If one asks for SSE and Emscripten provides SSE, and it was not desired, the real fix should go in the build system in question. That would be the least friction path. Whether SSE is experimental or not is not particularly relevant - if one has a project build system that happened to sneak in that -msse and they got bit by it when porting, it might seem like it somehow was.

But in any case, I don't care strongly, so willing to accommodate to the people who do, hence proposing the specific flag for that.

@Akaricchi
Copy link
Contributor

I am not convinced by Akaricchi's argument above either - the issue is not about wasm simd being experimental, but, again, about people expecting soft cushions when porting projects.

If you seriously think overloading a traditionally x86-specific flag to mean "enable experimental feature for a completely different target" is not the issue here, fine. I can then just as easily claim that this behavior is a "soft cushion" for people who expect their x86-specific code to "just werk" in a non-x86 environment (at a detriment to people who want maximum portability, i.e. WASM MVP compliance). Why the soft cushion here? Just fix your build system and add the -ffuck-portability flag, then at least you get to know that you're operating out of spec for the sake of x86 simulation.

Funnily enough, the reason why I even noticed the problem originally is because -msse was actually broken without explicit -msimd128 at the time - it would enable the SSE simulation, but would fail to compile the fake intrinsics because -msimd128 was not enabled. Had it worked as intended, I might have not noticed it for a long time, unknowingly relying on an unstable feature, until it just happened to bite me in the ass one day. This is just not acceptable. This shouldn't even be an argument. Don't enable unstable features unless I explicitly request it, and especially don't overload well-established flags to do that!

I'm not asking for a soft cushion, I'm asking to please not fill my cushion with spikes and glass shards.

@juj
Copy link
Collaborator

juj commented Nov 11, 2020

Thank you, I believe you stated that viewpoint already. Please be mindful about the tone and language you contribute with, may be good to refer to https://webassembly.org/community/code-of-conduct/ , under which Emscripten also operates.

RReverser added a commit to RReverser/libwebp that referenced this issue Nov 11, 2020
 - Add `-msimd128` to flags to actually enable WebAssembly SIMD when performing SIMD detection. It's currently required in addition to `-msse*` / `-mfpu=neon` flags which only perform translation of corresponding intrinsics to Wasm SIMD ones. See a discussion at emscripten-core/emscripten#12714 for automating this and making easier in the future.
 - Remove compilation branch that prevented definitions of `WEBP_USE_SSE` and `WEBP_USE_NEON` on Emscripten even when SIMD support was detected at compile-time.
 - Add an implementation of `VP8GetCPUInfo` for Emscripten which uses static `WEBP_USE_*` flags to determine if a corresponding SIMD instruction is supported. This is because Wasm doesn't have proper feature detection (yet) and requires making separate build for SIMD version anyway.

Note: this is essentially a port of a similar commit on WebP2 project: RReverser/libwebp2@f8911dc
RReverser added a commit to RReverser/libwebp that referenced this issue Nov 11, 2020
 - Add `-msimd128` to flags to actually enable WebAssembly SIMD when performing SIMD detection. It's currently required in addition to `-msse*` / `-mfpu=neon` flags which only perform translation of corresponding intrinsics to Wasm SIMD ones. See a discussion at emscripten-core/emscripten#12714 for automating this and making easier in the future.
 - Remove compilation branch that prevented definitions of `WEBP_USE_SSE` and `WEBP_USE_NEON` on Emscripten even when SIMD support was detected at compile-time.
 - Add an implementation of `VP8GetCPUInfo` for Emscripten which uses static `WEBP_USE_*` flags to determine if a corresponding SIMD instruction is supported. This is because Wasm doesn't have proper feature detection (yet) and requires making separate build for SIMD version anyway.

Note: this is essentially a port of a similar commit on WebP2 project: RReverser/libwebp2@f8911dc
PKRoma pushed a commit to PKRoma/libwebp that referenced this issue Nov 20, 2020
 - Add `-msimd128` to flags to actually enable WebAssembly SIMD
   when performing SIMD detection. It's currently required in
   addition to `-msse*` / `-mfpu=neon` flags which only perform
   translation of corresponding intrinsics to Wasm SIMD ones.
   See a discussion at emscripten-core/emscripten#12714 for
   automating this and making easier in the future.
 - Remove compilation branch that prevented definitions of
   `WEBP_USE_SSE` and `WEBP_USE_NEON` on Emscripten even when
   SIMD support was detected at compile-time.
 - Add an implementation of `VP8GetCPUInfo` for Emscripten which
   uses static `WEBP_USE_*` flags to determine if a corresponding
   SIMD instruction is supported. This is because Wasm doesn't
   have proper feature detection (yet) and requires making separate
   build for SIMD version anyway.

Change-Id: I77592081b91fd0e4cbc9242f5600ce905184f506
@ngzhian
Copy link
Collaborator

ngzhian commented Apr 28, 2021

Having a -s ENABLE_EXPERIMENTAL_SIMD128=1 flag that is required to enable -msse (without -msimd128) does sound good to me, default to 0, and when SIMD spec is stable, flip it over to default to ENABLE_EXPERIMENTAL_SIMD128=1.

As an FYI, SIMD is on in Phase 4 which is considered stable. Browsers support is still spotty, Chrome 91 (next version) will have it, Firefox will have it very soon (unsure of exact version), Safari doesn't.

@RReverser
Copy link
Collaborator Author

RReverser commented Apr 28, 2021

Firefox will have it very soon (unsure of exact version)

Firefox 89 on desktop, see WebAssembly/website#233 (comment).

@stale
Copy link

stale bot commented Apr 28, 2022

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants