-
Notifications
You must be signed in to change notification settings - Fork 16
Add initial WASM SIMD128 support #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that, in the long term, we will probably want to autogenerate the tests, so that you write a test once with the expected output, and then compare it across all implementations. But, happy to land this as an intermediate version so we have at least some test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I tried a couple different macro approaches, but it ended up being extremely confusing. Maybe there is a way to express the tests as a nice macro as well. I am also not very confident with macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't yet have a good idea of how to test neon
. Maybe the CI mac already supports it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI Macs should support this, because they're physical M1 machines. In theory, you should be running the same code with different level enum values, I think.
#[cfg(target_arch = "wasm32")] | ||
use wasm_bindgen_test::*; | ||
|
||
/// `test_wasm_simd_parity` enforces that the fallback level and +simd128 levels output the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the reference result should be provided manually (since it's possible the fallback is wrong as well), but that's for the future when we implement a proper test suite
let expr = Fallback.expr(method, vec_ty, &args); | ||
let mask_ty = mask_type.scalar.rust(scalar_bits); | ||
quote! { #expr as #mask_ty } | ||
quote! { -(#expr as #mask_ty) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see now, this is because in neon, the mask for true is all bits set to 1? I assumed it doesn't matter what the representation for true is as long as false
is, but I guess it makes sense to keep it consistent. Do you know if all SIMD variants are guaranteed to have that representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick look, SSE/AVX (x86) and neon both set all bits to 1 for true, and all to 0 for false. So this change will be consistent with them.
E.g. Neon, x86, fallback, and Wasm should all be identical.
Validated that tests run on CI: https://github.com/raphlinus/fearless_simd/actions/runs/15732084678/job/44335357500#step:6:3110 |
Note
The measurements in this PR were done very rough via a quick Chrome profile. Because the numbers are small, until we run a rigorous benchmark it'll be hard to validate the impact. Especially across browser engines, and with JS engines optimising WASM and fusing operators.
Overview
This PR adds initial WASM SIMD support to
fearless_simd
, implementing enough operations to enable WASM SIMD in linebender/vello#1053. Rather than implementing all operations in one large PR, this focuses on the essential subset needed for Vello and breaks up an otherwise huge change. There's also some tricky operations to implement using the small amount of WASM instructions 😅 .Performance Impact
Tested with Ghost Tiger rendering in Vello:
Without fast kurbo (baseline):
With fast kurbo (linebender/kurbo#427):
Test methodology: I linked
vello
locally tofearless_simd
viapath
reference, and modifiedvello_hybrid
to use the WASM SIMD level.Changes
add
,sub
,mul
,min
,max
simd_eq
,simd_ne
,simd_lt
, etc.sqrt
,madd
0/1
instead of0/-1
)Test Plan
Added
test_wasm_simd_parity!
macro that verifies operations produce identical results across Fallback and WASM SIMD implementations.I only tested a small subset. Maybe in the future we code-gen the tests as well?
Next Steps
Future PRs will add more operations to achieve full WASM SIMD coverage.