Skip to content

Conversation

ajakubowicz-canva
Copy link
Collaborator

@ajakubowicz-canva ajakubowicz-canva commented Jun 18, 2025

Note

The measurements in this PR were done very rough via a quick Chrome profile. Because the numbers are small, until we run a rigorous benchmark it'll be hard to validate the impact. Especially across browser engines, and with JS engines optimising WASM and fusing operators.

Overview

This PR adds initial WASM SIMD support to fearless_simd, implementing enough operations to enable WASM SIMD in linebender/vello#1053. Rather than implementing all operations in one large PR, this focuses on the essential subset needed for Vello and breaks up an otherwise huge change. There's also some tricky operations to implement using the small amount of WASM instructions 😅 .

Performance Impact

Tested with Ghost Tiger rendering in Vello:

Without fast kurbo (baseline):

  • Without SIMD: ~42.31ms per frame
  • With SIMD: ~39.91ms per frame
  • Improvement: ~5% faster

With fast kurbo (linebender/kurbo#427):

  • Without SIMD: ~6ms per frame
  • With SIMD: ~4ms per frame
  • Improvement: ~30% faster (Take this with a huge grain of salt)

Test methodology: I linked vello locally to fearless_simd via path reference, and modified vello_hybrid to use the WASM SIMD level.

Changes

  • New architecture: Added WASM SIMD128 support
  • Operations implemented: Core subset including:
    • Binary ops: add, sub, mul, min, max
    • Comparison ops: simd_eq, simd_ne, simd_lt, etc.
    • Math ops: sqrt, madd
  • Testing: Added parity tests ensuring Fallback and WASM SIMD produce identical results
  • Bug fix: Fixed incorrect mask generation in Fallback comparison operations (was returning 0/1 instead of 0/-1)

Test Plan

Added test_wasm_simd_parity! macro that verifies operations produce identical results across Fallback and WASM SIMD implementations.

I only tested a small subset. Maybe in the future we code-gen the tests as well?

Next Steps

Future PRs will add more operations to achieve full WASM SIMD coverage.

@ajakubowicz-canva ajakubowicz-canva changed the title Start on wasm32 simd128 architecture Add initial WASM SIMD128 support Jun 18, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that, in the long term, we will probably want to autogenerate the tests, so that you write a test once with the expected output, and then compare it across all implementations. But, happy to land this as an intermediate version so we have at least some test coverage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I tried a couple different macro approaches, but it ended up being extremely confusing. Maybe there is a way to express the tests as a nice macro as well. I am also not very confident with macros.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't yet have a good idea of how to test neon. Maybe the CI mac already supports it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI Macs should support this, because they're physical M1 machines. In theory, you should be running the same code with different level enum values, I think.

#[cfg(target_arch = "wasm32")]
use wasm_bindgen_test::*;

/// `test_wasm_simd_parity` enforces that the fallback level and +simd128 levels output the same
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the reference result should be provided manually (since it's possible the fallback is wrong as well), but that's for the future when we implement a proper test suite

let expr = Fallback.expr(method, vec_ty, &args);
let mask_ty = mask_type.scalar.rust(scalar_bits);
quote! { #expr as #mask_ty }
quote! { -(#expr as #mask_ty) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see now, this is because in neon, the mask for true is all bits set to 1? I assumed it doesn't matter what the representation for true is as long as false is, but I guess it makes sense to keep it consistent. Do you know if all SIMD variants are guaranteed to have that representation?

Copy link
Collaborator Author

@ajakubowicz-canva ajakubowicz-canva Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look, SSE/AVX (x86) and neon both set all bits to 1 for true, and all to 0 for false. So this change will be consistent with them.

E.g. Neon, x86, fallback, and Wasm should all be identical.

@ajakubowicz-canva
Copy link
Collaborator Author

@ajakubowicz-canva ajakubowicz-canva merged commit e46bcfd into main Jun 18, 2025
6 checks passed
@ajakubowicz-canva ajakubowicz-canva deleted the ajakubowicz-wasm32-simd128 branch June 18, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants