-
Notifications
You must be signed in to change notification settings - Fork 13
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
ARM NEON's rcp estimate and rsqt estimate has less accuracy than corresponding SSE2 ops.
https://qiita.com/sanmanyannyan/items/62bb5ce6ada975a7106a
We need to increase the iteration of NewtonRaphson steps(2 or 3 times more iterations) to get the same level of the accuracy of rcp, rsqrt in SSE2 code path (estimate + one round of NewtonRaphson)
Currently we use use 2 iterations for NEON code path(vrcpsq_f32, vrsqrtsq_f32)
embree-aarch64/common/math/math.h
Line 73 in 3f75f8c
| float32x4_t reciprocal = vrecpeq_f32(a); |
Relates hole issue in #20
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request