BUG: Build failure (of 1.26.2) on SapphireRapids (`avx512_spr`) due to multiple definition of `avx512_qsort` and `avx512_qselect` #25274

branfosj · 2023-11-29T17:33:42Z

Describe the issue:

Building 1.26.2 on SapphireRapids with

spin build -- -Dcpu-baseline=native

or, on IceLake with

spin build -- -Dcpu-baseline=avx512_spr

Fails due to multiple definition of void avx512_qsort<_Float16>(_Float16*, long) and void avx512_qselect<_Float16>(_Float16*, long, long).

Reproduce the code example:

n/a

Error message:

FAILED: numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so                                                                                                                                          c++  -o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_arraytypes.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_einsum.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_einsum_sumprod.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_lowlevel_strided_loops.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_nditer_templ.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_scalartypes.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_loops.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_matmul.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/meson-generated_scalarmath.c.o ../numpy/core/src/umath/svml/linux/avx512/svml_z0_acos_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_acos_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_acos_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_acosh_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_acosh_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_acosh_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_asin_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_asin_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_asin_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_asinh_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_asinh_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_asinh_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atan2_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atan2_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atan2_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atan_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atan_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atan_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atanh_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atanh_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_atanh_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cbrt_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cbrt_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cbrt_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cos_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cos_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cos_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cosh_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cosh_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_cosh_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_exp2_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_exp2_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_exp2_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_exp_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_exp_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_exp_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_expm1_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_expm1_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_expm1_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log10_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log10_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log10_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log1p_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log1p_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log1p_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log2_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log2_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log2_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_log_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_pow_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_pow_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_pow_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_sin_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_sin_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_sin_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_sinh_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_sinh_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_sinh_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_tan_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_tan_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_tan_d_ha.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_tanh_d_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_tanh_s_la.s ../numpy/core/src/umath/svml/linux/avx512/svml_z0_tanh_d_ha.s numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_abstractdtypes.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_alloc.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_arrayobject.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_array_coercion.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_array_method.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_array_assign_scalar.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_array_assign_array.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_arrayfunction_override.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_buffer.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_calculation.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_compiled_base.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_common.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_common_dtype.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_convert.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_convert_datatype.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_conversion_utils.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_ctors.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_datetime.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_datetime_strings.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_datetime_busday.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_datetime_busdaycal.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_descriptor.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_dlpack.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_dtypemeta.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_dragon4.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_dtype_transfer.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_dtype_traversal.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_experimental_public_dtype_api.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_flagsobject.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_getset.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_hashdescr.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_item_selection.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_iterators.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_legacy_dtype_implementation.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_mapping.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_methods.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_multiarraymodule.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_nditer_api.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_nditer_constr.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_nditer_pywrap.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_number.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_refcount.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_sequence.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_shape.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_scalarapi.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_strfuncs.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_temp_elide.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_typeinfo.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_usertypes.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_vdot.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_npysort_quicksort.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_npysort_mergesort.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_npysort_timsort.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_npysort_heapsort.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_npysort_radixsort.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_npysort_selection.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_npysort_binsearch.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_textreading_conversions.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_textreading_field_types.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_textreading_growth.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_textreading_readtext.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_textreading_rows.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_textreading_stream_pyobject.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_textreading_str_to_int.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_multiarray_textreading_tokenize.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_npymath_arm64_exports.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_array_assign.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_mem_overlap.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_npy_argparse.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_npy_hashtable.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_npy_longdouble.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_ucsnarrow.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_ufunc_override.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_numpyos.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_npy_cpu_features.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_cblasfuncs.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_common_python_xerbla.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_ufunc_type_resolution.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_clip.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_dispatching.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_extobj.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_legacy_array_method.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_override.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_reduction.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_ufunc_object.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_umathmodule.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_string_ufuncs.cpp.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath_wrapping_array_method.c.o numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so.p/src_umath__scaled_float_dtype.c.o -Wl,--as-needed -Wl,--allow-shlib-undefined -shared -fPIC -Wl,--start-group numpy/core/libnpymath.a numpy/core/lib_multiarray_umath_mtargets.a /rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/OpenBLAS/0.3.24-GCC-13.2.0/lib/libopenblas.so -Wl,--end-group
/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/binutils/2.40-GCCcore-13.2.0/bin/ld: numpy/core/libsimd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_simd_qsort_16bit.dispatch.cpp.o: in function `void avx512_qsort<_Float16>(_Float16*, long)':
/rds/projects/2017/branfosj-rse/ProblemSolving/numpy/numpy-1.26.2/build/../numpy/core/src/npysort/x86-simd-sort/src/avx512fp16-16bit-qsort.hpp:161: multiple definition of `void avx512_qsort<_Float16>(_Float16*, long)'; numpy/core/libsimd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_simd_qsort_16bit.dispatch.cpp.o:/rds/projects/2017/branfosj-rse/ProblemSolving/numpy/numpy-1.26.2/build/../numpy/core/src/npysort/x86-simd-sort/src/avx512fp16-16bit-qsort.hpp:161: first defined here
/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/binutils/2.40-GCCcore-13.2.0/bin/ld: numpy/core/libsimd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_simd_qsort_16bit.dispatch.cpp.o: in function `void avx512_qselect<_Float16>(_Float16*, long, long)':
/rds/projects/2017/branfosj-rse/ProblemSolving/numpy/numpy-1.26.2/build/../numpy/core/src/npysort/x86-simd-sort/src/avx512fp16-16bit-qsort.hpp:149: multiple definition of `void avx512_qselect<_Float16>(_Float16*, long, long)'; numpy/core/libsimd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_simd_qsort_16bit.dispatch.cpp.o:/rds/projects/2017/branfosj-rse/ProblemSolving/numpy/numpy-1.26.2/build/../numpy/core/src/npysort/x86-simd-sort/src/avx512fp16-16bit-qsort.hpp:149: first defined here
collect2: error: ld returned 1 exit status

Runtime information:

The Meson build system 
Version: 1.2.99
Source dir: /rds/projects/2017/branfosj-rse/ProblemSolving/numpy/numpy-1.26.2
Build dir: /rds/projects/2017/branfosj-rse/ProblemSolving/numpy/numpy-1.26.2/build
Build type: native build
Project name: NumPy
Project version: 1.26.2
C compiler for the host machine: cc (gcc 13.2.0 "cc (GCC) 13.2.0")
C linker for the host machine: cc ld.bfd 2.40
C++ compiler for the host machine: c++ (gcc 13.2.0 "c++ (GCC) 13.2.0")
C++ linker for the host machine: c++ ld.bfd 2.40
Cython compiler for the host machine: cython (cython 3.0.4)
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program python3 found: YES (/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/Python/3.11.5-GCCcore-13.2.0/bin/python)
Found pkg-config: /rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/pkgconf/2.0.3-GCCcore-13.2.0/bin/pkg-config (2.0.3)
Run-time dependency python found: YES 3.11
Has header "Python.h" with dependency python-3.11: YES
Compiler for C supports arguments -fno-strict-aliasing: YES
Test features "SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL AVX512_SPR" : Supported
Test features "AVX512_KNL" : Supported
Test features "AVX512_KNM" : Supported
Configuring npy_cpu_dispatch_config.h using configuration
Message:
     CPU Optimization Options
baseline:
        Requested : avx512_spr
		Enabled   : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL AVX512_SPR
dispatch:
        Requested : max -xop -fma4
		Enabled   : AVX512_KNL AVX512_KNM
Library m found: YES
Run-time dependency scipy-openblas found: NO (tried pkgconfig)
Run-time dependency mkl found: NO (tried pkgconfig and system)
Run-time dependency mkl found: NO (tried pkgconfig and system)
Run-time dependency accelerate found: NO (tried system)
Run-time dependency openblas found: YES 0.3.24
Message: BLAS symbol suffix:
Run-time dependency mkl found: NO (tried pkgconfig and system)
Run-time dependency accelerate found: NO (tried system)
Run-time dependency openblas found: YES 0.3.24

And the last part of the configure

 Generating multi-targets for "_umath_tests.dispatch.h"                                                                                                                                                          Enabled targets: baseline                                                                                                                                                                                   Generating multi-targets for "argfunc.dispatch.h"                                                                                                                                                               Enabled targets: baseline                                                                                                                                                                                   Generating multi-targets for "simd_qsort.dispatch.h"                                                                                                                                                            Enabled targets: AVX512_SKX                                                                                                                                                                                 Generating multi-targets for "simd_qsort_16bit.dispatch.h"                                                                                                                                                      Enabled targets: AVX512_SPR, AVX512_ICL                                                                                                                                                                     Generating multi-targets for "loops_arithm_fp.dispatch.h"                                                                                                                                                       Enabled targets: baseline                                                                                                                                                                                   Generating multi-targets for "loops_arithmetic.dispatch.h"                                                                                                                                                      Enabled targets: baseline                                                                                                                                                                                   Generating multi-targets for "loops_comparison.dispatch.h"                                                                                                                                                      Enabled targets: baseline
Generating multi-targets for "loops_exponent_log.dispatch.h"                                                                                                                                                    Enabled targets: baseline
Generating multi-targets for "loops_hyperbolic.dispatch.h"                                                                                                                                                      Enabled targets: baseline                                                                                                                                                                                   Generating multi-targets for "loops_logical.dispatch.h"
  Enabled targets: baseline                                                                                                                                                                                   Generating multi-targets for "loops_minmax.dispatch.h"
  Enabled targets: baseline                                                                                                                                                                                   Generating multi-targets for "loops_modulo.dispatch.h"                                                                                                                                                          Enabled targets: baseline
Generating multi-targets for "loops_trigonometric.dispatch.h"
  Enabled targets: baseline
Generating multi-targets for "loops_umath_fp.dispatch.h"
  Enabled targets: baseline
Generating multi-targets for "loops_unary.dispatch.h"
  Enabled targets: baseline
Generating multi-targets for "loops_unary_fp.dispatch.h"
  Enabled targets: baseline
Generating multi-targets for "loops_unary_fp_le.dispatch.h"
  Enabled targets: baseline
Generating multi-targets for "loops_unary_complex.dispatch.h"
  Enabled targets: baseline
Generating multi-targets for "loops_autovec.dispatch.h"
  Enabled targets: baseline
Generating multi-targets for "_simd.dispatch.h"
  Enabled targets: baseline
Build targets in project: 62

NumPy 1.26.2

  User defined options
    prefix      : /usr
    cpu-baseline: avx512_spr

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

r-devulap · 2023-12-05T05:30:11Z

@branfosj thanks for reporting. I am taking a look.

r-devulap · 2023-12-11T18:19:49Z

This looks like a bug in the build system. The issue seems to be that qsort_16bit dispatch file is built with baseline cpu flags on top of the specific dispatch flags which, if I understand correctly, is not intended. When using -Dcpu-baseline=avx512_spr, the avx512_icl dispatch essentially gets built with avx512_spr leading to multiple definition error. See commands used to build the x86_simd_qsort_16bit.dispatch.cpp file below:

{
    "directory": "/home/raghuveer/MyFiles/src/wrkdir_numpy/numpy/build",
    "command": "g++-12 -Inumpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/home/raghuveer/anaconda3/envs/np-dev/include/python3.11 -I/home/raghuveer/MyFiles/src/wrkdir_numpy/numpy/build/meson_cpu -fdiagnostics-color=always -Wall -Winvalid-pch -std=c++17 -O2 -g -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -mavx512fp16 -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_SSE2 -DNPY_HAVE_SSE -DNPY_HAVE_SSE3 -DNPY_HAVE_SSSE3 -DNPY_HAVE_SSE41 -DNPY_HAVE_POPCNT -DNPY_HAVE_SSE42 -DNPY_HAVE_AVX -DNPY_HAVE_F16C -DNPY_HAVE_FMA3 -DNPY_HAVE_AVX2 -DNPY_HAVE_AVX512F -DNPY_HAVE_AVX512F_REDUCE -DNPY_HAVE_AVX512CD -DNPY_HAVE_AVX512_SKX -DNPY_HAVE_AVX512VL -DNPY_HAVE_AVX512BW -DNPY_HAVE_AVX512DQ -DNPY_HAVE_AVX512BW_MASK -DNPY_HAVE_AVX512DQ_MASK -DNPY_HAVE_AVX512_CLX -DNPY_HAVE_AVX512VNNI -DNPY_HAVE_AVX512_CNL -DNPY_HAVE_AVX512IFMA -DNPY_HAVE_AVX512VBMI -DNPY_HAVE_AVX512_ICL -DNPY_HAVE_AVX512VBMI2 -DNPY_HAVE_AVX512BITALG -DNPY_HAVE_AVX512VPOPCNTDQ -DNPY_HAVE_AVX512_SPR -DNPY_HAVE_AVX512FP16 -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -mavx512fp16 -DNPY_MTARGETS_CURRENT=AVX512_SPR -MD -MQ numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o -MF numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o.d -o numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o -c ../numpy/_core/src/npysort/x86_simd_qsort_16bit.dispatch.cpp",
    "file": "../numpy/_core/src/npysort/x86_simd_qsort_16bit.dispatch.cpp",
    "output": "numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o"
  },
  {
    "directory": "/home/raghuveer/MyFiles/src/wrkdir_numpy/numpy/build",
    "command": "g++-12 -Inumpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/home/raghuveer/anaconda3/envs/np-dev/include/python3.11 -I/home/raghuveer/MyFiles/src/wrkdir_numpy/numpy/build/meson_cpu -fdiagnostics-color=always -Wall -Winvalid-pch -std=c++17 -O2 -g -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -mavx512fp16 -DNPY_HAVE_AVX512_SPR -DNPY_HAVE_AVX512FP16 -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_SSE2 -DNPY_HAVE_SSE -DNPY_HAVE_SSE3 -DNPY_HAVE_SSSE3 -DNPY_HAVE_SSE41 -DNPY_HAVE_POPCNT -DNPY_HAVE_SSE42 -DNPY_HAVE_AVX -DNPY_HAVE_F16C -DNPY_HAVE_FMA3 -DNPY_HAVE_AVX2 -DNPY_HAVE_AVX512F -DNPY_HAVE_AVX512F_REDUCE -DNPY_HAVE_AVX512CD -DNPY_HAVE_AVX512_SKX -DNPY_HAVE_AVX512VL -DNPY_HAVE_AVX512BW -DNPY_HAVE_AVX512DQ -DNPY_HAVE_AVX512BW_MASK -DNPY_HAVE_AVX512DQ_MASK -DNPY_HAVE_AVX512_CLX -DNPY_HAVE_AVX512VNNI -DNPY_HAVE_AVX512_CNL -DNPY_HAVE_AVX512IFMA -DNPY_HAVE_AVX512VBMI -DNPY_HAVE_AVX512_ICL -DNPY_HAVE_AVX512VBMI2 -DNPY_HAVE_AVX512BITALG -DNPY_HAVE_AVX512VPOPCNTDQ -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -DNPY_MTARGETS_CURRENT=AVX512_ICL -MD -MQ numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o -MF numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o.d -o numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o -c ../numpy/_core/src/npysort/x86_simd_qsort_16bit.dispatch.cpp",
    "file": "../numpy/_core/src/npysort/x86_simd_qsort_16bit.dispatch.cpp",
    "output": "numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o"
  },

r-devulap · 2023-12-11T18:20:11Z

ping @seiko2plus

tylerjereddy · 2023-12-11T18:25:19Z

~~See also SapphireRapids and IceLake sorting concerns on main reproduced at: #24842 (comment). Not sure if related though.~~

seiko2plus · 2023-12-12T09:58:23Z

This looks like a bug in the build system. The issue seems to be that qsort_16bit dispatch file is built with baseline cpu flags on top of the specific dispatch flags which, if I understand correctly, is not intended. When using -Dcpu-baseline=avx512_spr, the avx512_icl dispatch essentially gets built with avx512_spr leading to multiple definition error. See commands used to build the x86_simd_qsort_16bit.dispatch.cpp file below:

{
    "directory": "/home/raghuveer/MyFiles/src/wrkdir_numpy/numpy/build",
    "command": "g++-12 -Inumpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/home/raghuveer/anaconda3/envs/np-dev/include/python3.11 -I/home/raghuveer/MyFiles/src/wrkdir_numpy/numpy/build/meson_cpu -fdiagnostics-color=always -Wall -Winvalid-pch -std=c++17 -O2 -g -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -mavx512fp16 -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_SSE2 -DNPY_HAVE_SSE -DNPY_HAVE_SSE3 -DNPY_HAVE_SSSE3 -DNPY_HAVE_SSE41 -DNPY_HAVE_POPCNT -DNPY_HAVE_SSE42 -DNPY_HAVE_AVX -DNPY_HAVE_F16C -DNPY_HAVE_FMA3 -DNPY_HAVE_AVX2 -DNPY_HAVE_AVX512F -DNPY_HAVE_AVX512F_REDUCE -DNPY_HAVE_AVX512CD -DNPY_HAVE_AVX512_SKX -DNPY_HAVE_AVX512VL -DNPY_HAVE_AVX512BW -DNPY_HAVE_AVX512DQ -DNPY_HAVE_AVX512BW_MASK -DNPY_HAVE_AVX512DQ_MASK -DNPY_HAVE_AVX512_CLX -DNPY_HAVE_AVX512VNNI -DNPY_HAVE_AVX512_CNL -DNPY_HAVE_AVX512IFMA -DNPY_HAVE_AVX512VBMI -DNPY_HAVE_AVX512_ICL -DNPY_HAVE_AVX512VBMI2 -DNPY_HAVE_AVX512BITALG -DNPY_HAVE_AVX512VPOPCNTDQ -DNPY_HAVE_AVX512_SPR -DNPY_HAVE_AVX512FP16 -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -mavx512fp16 -DNPY_MTARGETS_CURRENT=AVX512_SPR -MD -MQ numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o -MF numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o.d -o numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o -c ../numpy/_core/src/npysort/x86_simd_qsort_16bit.dispatch.cpp",
    "file": "../numpy/_core/src/npysort/x86_simd_qsort_16bit.dispatch.cpp",
    "output": "numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_SPR.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o"
  },
  {
    "directory": "/home/raghuveer/MyFiles/src/wrkdir_numpy/numpy/build",
    "command": "g++-12 -Inumpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/home/raghuveer/anaconda3/envs/np-dev/include/python3.11 -I/home/raghuveer/MyFiles/src/wrkdir_numpy/numpy/build/meson_cpu -fdiagnostics-color=always -Wall -Winvalid-pch -std=c++17 -O2 -g -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -mavx512fp16 -DNPY_HAVE_AVX512_SPR -DNPY_HAVE_AVX512FP16 -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_SSE2 -DNPY_HAVE_SSE -DNPY_HAVE_SSE3 -DNPY_HAVE_SSSE3 -DNPY_HAVE_SSE41 -DNPY_HAVE_POPCNT -DNPY_HAVE_SSE42 -DNPY_HAVE_AVX -DNPY_HAVE_F16C -DNPY_HAVE_FMA3 -DNPY_HAVE_AVX2 -DNPY_HAVE_AVX512F -DNPY_HAVE_AVX512F_REDUCE -DNPY_HAVE_AVX512CD -DNPY_HAVE_AVX512_SKX -DNPY_HAVE_AVX512VL -DNPY_HAVE_AVX512BW -DNPY_HAVE_AVX512DQ -DNPY_HAVE_AVX512BW_MASK -DNPY_HAVE_AVX512DQ_MASK -DNPY_HAVE_AVX512_CLX -DNPY_HAVE_AVX512VNNI -DNPY_HAVE_AVX512_CNL -DNPY_HAVE_AVX512IFMA -DNPY_HAVE_AVX512VBMI -DNPY_HAVE_AVX512_ICL -DNPY_HAVE_AVX512VBMI2 -DNPY_HAVE_AVX512BITALG -DNPY_HAVE_AVX512VPOPCNTDQ -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512vnni -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512bitalg -mavx512vpopcntdq -DNPY_MTARGETS_CURRENT=AVX512_ICL -MD -MQ numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o -MF numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o.d -o numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o -c ../numpy/_core/src/npysort/x86_simd_qsort_16bit.dispatch.cpp",
    "file": "../numpy/_core/src/npysort/x86_simd_qsort_16bit.dispatch.cpp",
    "output": "numpy/_core/libx86_simd_qsort_16bit.dispatch.h_AVX512_ICL.a.p/src_npysort_x86_simd_qsort_16bit.dispatch.cpp.o"
  },

The behavior observed is not a bug but a consequence of defining a non-inline function in the C++ header files by x86_simd_qsort_, leading to a violation of the One Definition Rule (ODR). To resolve this, functions like qsort or any other non-inline functions should either be declared inline, which treats the function as a weak symbol and allows multiple definitions across translation units, or they should be defined within an anonymous namespace. The anonymous namespace approach ensures that each translation unit has its own unique version of the function, effectively preventing ODR violations while maintaining encapsulation.

Flamefire · 2023-12-12T12:47:24Z

@seiko2plus I concluded the same in intel/x86-simd-sort#111 (comment)

TLDR: Indeed an inline fixes the double symbol. However I fear it is numpy violating the ODR:

"inline" allows the symbol to be defined by multiple TUs with the linker taking any of them assuming all are the same (code)
template functions are "inline" by default
numpy compiles the same file, with the same includes multiple times with different AVX flags
generated code of those inline functions is now different (e.g. function is compiled with AVX2 and AVX512 in 2 cpp files)
linking those object files together discards all but 1 of those inline function instances -> You'll end up with only AVX2 or AVX512 (or their sub-variants) as a single function
dispatching now (likely, haven't fully investigated how the numpy dispatching works) will dispatch to a supposedly AVX2 function but the linker might have chosen the AVX512 function -> crash at runtime on CPUs not supporting AVX512

Having all functions have internal linkage fix that

many functions in x86-simd-sort are even defined as static inline which is similar to the anonymous namespace
Trouble: static inline cannot be applied to this template specialization, so the anonymous namespace needs to be used causing this to be required at more places
numpy includes x86-simd-sort so all functions in x86-simd-sort need to have internal linkage which pessimizes all other users of the latter because under "normal circumstances" inline would be enough but now there'll be multiple copies of the function in a binary even though one would have been enough

And finally I think in both numpy and x86-simd-sort the use of templates and explicit specializations is overused or even misused. A common patter seems to be:

template <typename T>
void foo(T *arr, int64_t arrsize);
template <>
void foo(int16_t *arr, int64_t arrsize) { /* impl */ }
template <>
void foo(uint16_t *arr, int64_t arrsize) { /* impl */ }
// ...

Why are those templates and not simply overloaded functions? That would make e.g. the above addition of static for internal linkage much easier

r-devulap · 2023-12-13T04:51:02Z

#25376 should fix this build issue. Could you please verify?

Flamefire · 2023-12-13T08:15:46Z

I checked out the PR locally and did a pip install . in an environment where it failed before and it does succeed with that PR.

However I still think this is an ODR violation from numpy in linking together functions compiled with different architecture flags which may lead to runtime crashes depending on the linker and target environment/cpu

seiko2plus · 2023-12-13T08:45:32Z

numpy compiles the same file, with the same includes multiple times with different AVX flags

That's true, but each compilation exports symbols with unique suffixes based on compiler # definition e.g. -DNPY_MTARGETS_CURRENT=AVX512_ICL

generated code of those inline functions is now different (e.g. function is compiled with AVX2 and AVX512 in 2 cpp files)
dispatching now (likely, haven't fully investigated how the numpy dispatching works) will dispatch to a supposedly AVX2 function but the linker might have chosen the AVX512 function -> crash at runtime on CPUs not supporting AVX512

If the compiler fails to inline a function then the priority goes to the lowest interest target that's why we tend to export unique weak symbols for each TPU, see #25045 (comment) for more clearfiction.

And finally I think in both numpy and x86-simd-sort the use of templates and explicit specializations is overused or even misused. A common patter seems to be:

Both Numpy and x86-simd-sort use suffixed functions for SIMD kernels on both C and C++ sources to avoid ODR violation.

Flamefire · 2023-12-13T09:01:19Z

But the suffixing is not exhaustive as can be seen by this issue: avx512_qsort is called from a file compiled with -mavx512fp16 and again from a file without that as shown in #25274 (comment). It might be possible that this function and all possibly called functions are not fully inlined into the TU

This applies similar to most functions in x86-simd-sort which are templates that are only "inline", i.e. weak symbols and not unique (which would require static or anonymous namespaces). So this heavily relies on the linker to hopefully sort it out correctly.

seiko2plus · 2023-12-13T10:43:09Z

But the suffixing is not exhaustive as can be seen by this issue: avx512_qsort is called from a file compiled with -mavx512fp16

That's because you are raising the ceiling of the baseline features. During the loading of the NumPy module, there is a validation step that raises a Python runtime error if the running machine does not support the baseline features, in order to avoid illegal instruction errors.

It might be possible that this function and all possibly called functions are not fully inlined into the TU
This applies similar to most functions in x86-simd-sort which are templates that are only "inline", i.e. weak symbols and not

Let's differentiate between two situations: the weak symbols, which occur when the C++ compiler fails to inline inline-functions. In this case, the linker silently selects the first duplicated symbol. The second situation involves global symbols, which occur when the C++ compiler fails to inline non-inline functions (our issue here). This leads to a link-time error if there are any duplicated symbols.

To deal with the possibility of duplicated weak symbols, the current approach is safe as long as SIMD kernels have unique symbols and non-suffixed functions of the lowest interest are chosen. Regarding duplicated global symbols, we shouldn't encounter that issue if we adhere to the standard. Moreover, such duplications can be detected at build-time.

seiko2plus · 2023-12-15T10:39:02Z

re-opened this issue till the backport of #25376 gets merged.

Flamefire · 2023-12-15T13:03:00Z

re-opened this issue till the backport of #25376 gets merged.

Didn't you merge this 2 minutes prior which is why this got closed?

seiko2plus · 2023-12-15T13:38:07Z

Didn't you merge this 2 minutes prior which is why this got closed?

Yes, I forgot to unlink it, however, since this issue relates to 1.26.x, I thought it would be better to leave it open till the backport and also the confirmation from the author of the issue.

Flamefire · 2023-12-15T14:13:11Z

Ok, so you meant until that is merged into 1.26.x

FWIW: I already made backport patches for 1.25.1 that also apply on 1.26.2:

@branfosj Confirmed that those solve the issue on his system too.

seiko2plus · 2023-12-23T13:11:49Z

Backported by #25475, thank you everyone.

branfosj added the 00 - Bug label Nov 29, 2023

r-devulap self-assigned this Dec 5, 2023

boegel mentioned this issue Dec 6, 2023

{lang}[gfbf/2023.09] SciPy-bundle v2023.11 easybuilders/easybuild-easyconfigs#19318

Merged

Flamefire mentioned this issue Dec 10, 2023

Multiple definition issue due to explicit function template specialization intel/x86-simd-sort#111

Closed

charris added this to the 1.26.3 release milestone Dec 11, 2023

r-devulap mentioned this issue Dec 12, 2023

BUG: Fix build issues on SPR and avx512_qsort float16 #25376

Merged

This was referenced Dec 14, 2023

Fix numpy build on Sapphire Rapids CPUs in SciPy-bundle-2023.07-gfbf-2023a easybuilders/easybuild-easyconfigs#19419

Merged

Fix numpy build on Sapphire Rapids CPUs in SciPy-bundle-2023.11-gfbf-2023.09 easybuilders/easybuild-easyconfigs#19425

Merged

seiko2plus added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Dec 14, 2023

seiko2plus closed this as completed in #25376 Dec 15, 2023

seiko2plus reopened this Dec 15, 2023

seiko2plus closed this as completed Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Build failure (of 1.26.2) on SapphireRapids (`avx512_spr`) due to multiple definition of `avx512_qsort` and `avx512_qselect` #25274

BUG: Build failure (of 1.26.2) on SapphireRapids (`avx512_spr`) due to multiple definition of `avx512_qsort` and `avx512_qselect` #25274

branfosj commented Nov 29, 2023 •

edited

Loading

r-devulap commented Dec 5, 2023

r-devulap commented Dec 11, 2023

r-devulap commented Dec 11, 2023

tylerjereddy commented Dec 11, 2023 •

edited

Loading

seiko2plus commented Dec 12, 2023

Flamefire commented Dec 12, 2023

r-devulap commented Dec 13, 2023

Flamefire commented Dec 13, 2023

seiko2plus commented Dec 13, 2023

Flamefire commented Dec 13, 2023

seiko2plus commented Dec 13, 2023

seiko2plus commented Dec 15, 2023

Flamefire commented Dec 15, 2023

seiko2plus commented Dec 15, 2023

Flamefire commented Dec 15, 2023

seiko2plus commented Dec 23, 2023

BUG: Build failure (of 1.26.2) on SapphireRapids (avx512_spr) due to multiple definition of avx512_qsort and avx512_qselect #25274

BUG: Build failure (of 1.26.2) on SapphireRapids (avx512_spr) due to multiple definition of avx512_qsort and avx512_qselect #25274

Comments

branfosj commented Nov 29, 2023 • edited Loading

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

r-devulap commented Dec 5, 2023

r-devulap commented Dec 11, 2023

r-devulap commented Dec 11, 2023

tylerjereddy commented Dec 11, 2023 • edited Loading

seiko2plus commented Dec 12, 2023

Flamefire commented Dec 12, 2023

r-devulap commented Dec 13, 2023

Flamefire commented Dec 13, 2023

seiko2plus commented Dec 13, 2023

Flamefire commented Dec 13, 2023

seiko2plus commented Dec 13, 2023

seiko2plus commented Dec 15, 2023

Flamefire commented Dec 15, 2023

seiko2plus commented Dec 15, 2023

Flamefire commented Dec 15, 2023

seiko2plus commented Dec 23, 2023

BUG: Build failure (of 1.26.2) on SapphireRapids (`avx512_spr`) due to multiple definition of `avx512_qsort` and `avx512_qselect` #25274

BUG: Build failure (of 1.26.2) on SapphireRapids (`avx512_spr`) due to multiple definition of `avx512_qsort` and `avx512_qselect` #25274

branfosj commented Nov 29, 2023 •

edited

Loading

tylerjereddy commented Dec 11, 2023 •

edited

Loading