We pass _Sign in minmax_element as a runtime parameter, and then have to dispatch on it for ARM64:
|
#if defined(_M_ARM64) || defined(_M_ARM64EC) |
|
if constexpr (!std::is_same_v<typename _Traits::_Neon, _Traits_8_neon>) { |
|
if (_Byte_length(_First, _Last) >= 16) { |
|
if (_Sign) { |
|
return _Minmax_element_impl<_Mode, typename _Traits::_Neon, true>(_First, _Last); |
|
} else { |
|
return _Minmax_element_impl<_Mode, typename _Traits::_Neon, false>(_First, _Last); |
|
} |
|
} |
|
} |
|
|
|
if (_Sign) { |
|
return _Minmax_element_impl<_Mode, typename _Traits::_Scalar, true>(_First, _Last); |
|
} else { |
|
return _Minmax_element_impl<_Mode, typename _Traits::_Scalar, false>(_First, _Last); |
|
} |
|
#else // ^^^ defined(_M_ARM64) || defined(_M_ARM64EC) / !defined(_M_ARM64) && !defined(_M_ARM64EC) vvv |
We should have the entry functions based on sign too, like here:
|
__declspec(noalias) _Min_max_1i __stdcall __std_minmax_1i(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_1u __stdcall __std_minmax_1u(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_2i __stdcall __std_minmax_2i(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_2u __stdcall __std_minmax_2u(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_4i __stdcall __std_minmax_4i(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_4u __stdcall __std_minmax_4u(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_8i __stdcall __std_minmax_8i(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_8u __stdcall __std_minmax_8u(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_f __stdcall __std_minmax_f(const void* _First, const void* _Last) noexcept;
|
|
__declspec(noalias) _Min_max_d __stdcall __std_minmax_d(const void* _First, const void* _Last) noexcept;
|
The current status quo is the result of (micro-)optimization for SSE4.2 and AVX2. they don't have unsigned comparisons. So the signed intrinsics have to be used on unsigned paths too, and with the runtime parameter we unify the signed and the unsigned path with a very minor runtime cost.
But as we have ARM64, and also we can have unsigned comparisons for AVX-512, we need separate code paths.
This also incorporates unnecessary parameter removal flor floats
|
// TRANSITION, ABI: remove unused `bool` |
|
const void* __stdcall __std_min_element_f(const void* const _First, const void* const _Last, bool) noexcept { |
|
return _Sorting::_Minmax_element_disp<_Sorting::_Mode_min, _Sorting::_Traits_f>(_First, _Last, false); |
|
} |
I'm not sure if we should fix these now (and leave old functions for compatibility), or wait for vNext and do it cleanly at once.
We pass
_Signinminmax_elementas a runtime parameter, and then have to dispatch on it for ARM64:STL/stl/src/vector_algorithms.cpp
Lines 3286 to 3302 in 2626cf1
We should have the entry functions based on sign too, like here:
STL/stl/inc/algorithm
Lines 75 to 84 in 2626cf1
The current status quo is the result of (micro-)optimization for SSE4.2 and AVX2. they don't have unsigned comparisons. So the signed intrinsics have to be used on unsigned paths too, and with the runtime parameter we unify the signed and the unsigned path with a very minor runtime cost.
But as we have ARM64, and also we can have unsigned comparisons for AVX-512, we need separate code paths.
This also incorporates unnecessary parameter removal flor floats
STL/stl/src/vector_algorithms.cpp
Lines 3770 to 3773 in 2626cf1
I'm not sure if we should fix these now (and leave old functions for compatibility), or wait for vNext and do it cleanly at once.