Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The current loop in function lookup_2_lanes infais/utils/simdlib_emulated.h goes from 0 to 31. It has an if statement to do an assignment for j < 16 and a different assignment for j >= 16. By unrolling the loop to do the j < 16 and the j >= 16 iterations in parallel the if j < 16 is eliminated and the number of loop iterations is reduced in half. Then unroll the loop for the j < 16 and the j >=16 to a depth of 2. This change results in approximately a 55% reduction in the execution time for the bench_ivf_fastscan.py workload on Power 10 when compiled with CMAKE_INSTALL_CONFIG_NAME=Release. The removal of the if (j < 16) statement and the unrolling of the loop removes branch cycle stall and register dependencies on instruction issue. The result is the unrolled code is able issue instructions earlier thus reducing the total number of cycles required to execute the function. This patch makes a copy of faiss/utils/simdlib_emulated.h and names it faiss/utils/simdlib_emulated_ppc64.h. The new file has the new version of lookup_2_lanes. The new included file is gets included in file faiss/utils/simdlib.h used if the define __PPC64__ is set by the GCC compiler on Linux or the XLC clang compiler for AIX. Otherwise, the original fine simdlib_emulated.h is included.
- Loading branch information