Skip to content

Commit

Permalink
Unroll loop in lookup_2_lanes
Browse files Browse the repository at this point in the history
The current loop in function lookup_2_lanes infais/utils/simdlib_emulated.h
goes from 0 to 31.  It has an if statement to do an assignment for j < 16
and a different assignment for j >= 16.  By unrolling the loop to do the j
< 16 and the j >= 16 iterations in parallel the if j < 16 is eliminated and
the number of loop iterations is reduced in half.

Then unroll the loop for the j < 16 and the j >=16 to a depth of 2.

This change results in approximately a 55% reduction in the execution time
for the bench_ivf_fastscan.py workload on Power 10 when compiled with
CMAKE_INSTALL_CONFIG_NAME=Release.

The removal of the if (j < 16) statement and the unrolling of the loop
removes branch cycle stall and register dependencies on instruction issue.
The result is the unrolled code is able issue instructions earlier thus
reducing the total number of cycles required to execute the function.

This patch makes a copy of faiss/utils/simdlib_emulated.h and names it
faiss/utils/simdlib_emulated_ppc64.h.  The new file has the new version
of lookup_2_lanes.  The new included file is gets included in file
faiss/utils/simdlib.h used if the define __PPC64__ is set by the GCC
compiler on Linux or the XLC clang compiler for AIX.  Otherwise, the
original fine simdlib_emulated.h is included.
  • Loading branch information
Carl Love committed Apr 16, 2024
1 parent ab2b7f5 commit f1c2700
Show file tree
Hide file tree
Showing 2 changed files with 1,092 additions and 0 deletions.
8 changes: 8 additions & 0 deletions faiss/utils/simdlib.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,16 @@
#else

// emulated = all operations are implemented as scalars
#if defined(__PPC64__)

#include <faiss/utils/simdlib_emulated_ppc64.h>

#else

#include <faiss/utils/simdlib_emulated.h>

#endif

// FIXME: make a SSE version
// is this ever going to happen? We will probably rather implement AVX512

Expand Down
Loading

0 comments on commit f1c2700

Please sign in to comment.