Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nightly support for more SIMD implementations #31

Open
mooman219 opened this issue Aug 19, 2020 · 2 comments
Open

Add nightly support for more SIMD implementations #31

mooman219 opened this issue Aug 19, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@mooman219
Copy link
Owner

There are unstable APIs for SIMD on platforms like Web, ARM, and POWER. Ideally add a way to utilize this APIs if the user is on nightly.

@mooman219 mooman219 added the enhancement New feature or request label Aug 19, 2020
@john01dav
Copy link

john01dav commented Nov 24, 2021

https://blog.rust-lang.org/inside-rust/2020/09/29/Portable-SIMD-PG.html

It might be better to use this than to support lots of platforms manually. That way you get support for more platforms (presumably) with less work.

@mooman219
Copy link
Owner Author

mooman219 commented Nov 24, 2021

@john01dav There's some spicy bits that will need hand written intrinsics. See:

// x = Read 4 floats from self.a
let mut x = _mm_loadu_ps(a.get_unchecked(i));
// x += (0.0, x[0], x[1], x[2])
x = _mm_add_ps(x, _mm_castsi128_ps(_mm_slli_si128(_mm_castps_si128(x), 4)));
// x += (0.0, 0.0, x[0], x[1])
x = _mm_add_ps(x, _mm_castsi128_ps(_mm_slli_si128(_mm_castps_si128(x), 8)));
// x += offset
x = _mm_add_ps(x, offset);
// y = x * 255.9
let y = _mm_mul_ps(x, _mm_set1_ps(255.9));
// y = abs(y)
let y = _mm_andnot_ps(_mm_castsi128_ps(nzero), y);
// y = Convert y to i32s and truncate
let mut y = _mm_cvttps_epi32(y);
// y = Take the first byte of each of the 4 values in y and pack them into
// the first 4 bytes of y.
y = _mm_packus_epi16(_mm_packs_epi32(y, nzero), nzero);
// Store the first 4 u8s from y in output.
let pointer: &mut i32 = core::mem::transmute(output.get_unchecked_mut(i));
*pointer = core::mem::transmute::<__m128i, [i32; 4]>(y)[0];
// offset = (x[3], x[3], x[3], x[3])
offset = _mm_set1_ps(core::mem::transmute::<__m128, [f32; 4]>(x)[3]);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants