Feature Tracking #1

novacrazy · 2020-12-08T13:11:24Z

Backends

Extra data types

i16/u16
i8/u8

These can use 128-bit registers even on AVX/AVX2, and 256-bit registers on AVX512

Polyfills

Emulated FMA on older platforms
- For f32, promote to f64 and back.
- For f64, implement this method

Iterator library

Prototype

Vectorized math library

Currently fully implemented for single and double-precision:
sin, cos, tan, asin, acos, atan, atan2, sinh, cosh, tanh, asinh, acosh, atanh, exp, exp2, exph (0.5 * exp), exp10, exp_m1, cbrt, powf, ln, ln_1p, ln2, ln10, erf, erfinv, tgamma, lgamma, next_float, prev_float

Precision-agnostic implementations: lerp, scale, fmod, powi (single and vector exponents), poly, poly_f, poly_rational, summation_f, product_f, smoothstep, smootherstep, smootheststep, hermite (single and vector degrees), jacobi, legendre, bessel_y

TODO:

Beta function
Zeta function
Digamma function

Bessel functions:

Bessel J_n for n > 1, n=0 and n=1 are implemented.
Bessel J_f (Bessel function of the first kind with real order)
Bessel Y_f (Bessel function of the second kind with real order)
Bessel I_n (Modified Bessel function of the first kind)
Bessel K_n (Modified Bessel function of the second kind)
Hankel function?

Complex and Dual number libraries

Make difficult parts branchless, ideally.

Precision Improvements

Improve precision of lgamma where possible.
- Should it fallback to ln(tgamma(x)) when we know it won't overflow?
Improve precision of trig functions when angle is a product of π (sin(x*π), etc.)
Compensated float fallbacks on platforms without FMA

Performance improvements:

Investigate ways to improve non-FMA operations.
Look for ways to simplify more expressions algebraically.
Experiment with the "crush denormals" trick to remove denormal inputs?
- 1 - (1 - x) is the trick.

Policy improvements:

Improve codegen size for Size policy, especially when WASM support is added (both scalar and SIMD)

Testing

Structured tests for all vector types and backends (some partial tests exist, but I need to clean them up)
Tests for the math library

The text was updated successfully, but these errors were encountered:

dragostis · 2021-01-07T11:19:27Z

How do you plan to support NEON? Would you be willing to help out stabilize std::arch's NEON API? Also, are you aware of the stdsimd effort to provide an MVP for backend-agnostic SIMD in std?

novacrazy · 2021-01-07T14:55:32Z

@dragostis I'm planning to use the arm and aarch64 NEON intrinsics behind a feature-flag until they are stabilized.

As for stdsimd, I find the design unusable in-practice. The lack of dynamic dispatch or even consistent static dispatch is a deal-breaker. If anything causes an stdsimd function to de-inline from the #[target-feature(...)] scope, it will fall back to scalar, because that's just how #[target-feature(...)] works in Rust (and C++ as well I think). The only way to use it correctly is by setting target-feature or target-cpu on the entire binary, and give up all dispatch entirely, which is unacceptable in a real application. It's the same problem packed_simd had.

Thermite, on the other hand, uses its Simd trait alongside the #[dispatch] macro to ensure all functions are properly monomorphized to use the correct instruction-set, regardless of whether it was inlined or not, without any extra machinery on the users' part.

dragostis · 2021-01-07T17:50:57Z

Thanks for the detailed reply. I see what you mean about stdsimd. Dynamic dispatch sounds like something that a mature version of std::simd would have eventually.

I was actually more curious about how they're using LLVM intrinsics for the arm part. Since this is not what you want to do, do you have plans to move forward the stabilization of the arm part of core::arch? I've read a bit about it and it seems like it will require quite a bit more push until it will be close to stabilization.

novacrazy · 2021-01-07T18:14:34Z

I am entirely unaffiliated with Rust core or the stabilization efforts. I'm not familiar with what it would take to advance stabilization, either.

Regarding the LLVM intrinsics (platform intrinsics), they are both great and annoying at the same time. LLVM has implemented some great codegen algorithms to do a variety of tasks, but it's missing some operations that do exist as dedicated instructions, and the code it generates can be slightly rigid and overly safe at times. (shuffles and selects/blends come to mind).

After having used packed_simd for a couple years, I prefer to stay away from the platform-intrinsics. Individual instruction intrinsics are far more predictable.

However, at the same time, Rust's usage of platform-intrinsics internally using arbitrary types leads to a lot of extra LLVM bytecode being generated where I expect just a simple intrinsic call, which has led to small deoptimizations in isolated cases, mostly centered around const-folding (not Rust const, but LLVM const) and algebraic simplification. I've tried to minimize that as much as possible in Thermite, but it probably doesn't matter much on a larger scale anyway. Just a nitpick.

Also, while I'm here, I'm going to find some time soon to continue on the other backends. Scalar is mostly complete, but I need to be careful with select/blend ops to ensure good codegen with those abstractions. SSE4.2 will be next.

novacrazy added this to the 0.1 milestone Dec 8, 2020

novacrazy self-assigned this Dec 8, 2020

novacrazy pinned this issue Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Tracking #1

Feature Tracking #1

novacrazy commented Dec 8, 2020 •

edited

Loading

dragostis commented Jan 7, 2021

novacrazy commented Jan 7, 2021

dragostis commented Jan 7, 2021 •

edited

Loading

novacrazy commented Jan 7, 2021

Feature Tracking #1

Feature Tracking #1

Comments

novacrazy commented Dec 8, 2020 • edited Loading

Backends

Extra data types

Polyfills

Iterator library

Vectorized math library

TODO:

Bessel functions:

Complex and Dual number libraries

Precision Improvements

Performance improvements:

Policy improvements:

Testing

dragostis commented Jan 7, 2021

novacrazy commented Jan 7, 2021

dragostis commented Jan 7, 2021 • edited Loading

novacrazy commented Jan 7, 2021

novacrazy commented Dec 8, 2020 •

edited

Loading

dragostis commented Jan 7, 2021 •

edited

Loading