Skip to content

Improve performance of IndexPQFastScan and IndexIVFPQFastScan on aarch64 and non-AVX2 devices #1812

@matsui528

Description

@matsui528

Summary

Dear @mdouze and all,

Currently, IndexPQFastScan and IndexIVFPQFastScan are optimized only for AVX2. Thus, the algorithms are not much fast in non-AVX2 environments.

Our team (@vorj, @n-miyamoto-fixstars, @LWisteria, and @matsui528) has accelerated the above two algorithms for the following two environments:

  • for non-AVX2 environment (e.g., old x86 computer w/o AVX2)
    • We optimized simdlib_emulated.h, enabling 4x speedup for SIFT1M.
  • for aarch64 (e.g., Rasberry Pi)
    • We implemented simdlib_neon.h, which is a NEON counterpart of simdlib_avx2.h.
    • This achieved 60x faster performance on SIFT1M.

If you are likely to merge, we plan to submit a PR for the above changes. What do you think? The changes are mainly in the above header files and do not significantly change the faiss codebase.

Kind regards,

Platform

OS: Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-1038-aws aarch64)

Faiss version: fe7b061

Installed from: compiled by myself

Faiss compilation options: cmake -DFAISS_ENABLE_GPU=OFF -DFAISS_OPT_LEVEL=aarch64 -DCMAKE_BUILD_TYPE=Release

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

We only post the results. If you need more detailed information, please let me know.
1

  • Evaluated on an AWS EC2 ARM instance (c6g.4xlarge)
  • original is the current code
  • improved-emulated is the result of optimizing simdlib_emulated.h, which is faster than original
  • neon is the result by simdlib_neon.h, which is much faster than original

2
The above image illustrates the ratio of speedup.

  • In the best case, neon is approx. 60x faster than original (M=32, nbits=4, nprobe=16)
    • original: 4.5 ms
    • neon: 0.077 ms

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions