Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve IndexPQFastScan and IndexIVFPQFastScan performance for aarch64 devices #1815

Closed
wants to merge 7 commits into from

Conversation

vorj
Copy link
Contributor

@vorj vorj commented Apr 12, 2021

related: #1812

This PR improves the performance of IndexPQFastScan and IndexIVFPQFastScan on aarch64 devices, e.g., 60x faster on an AWS Arm instance with the SIFT1M dataset.
The contents of this PR are below:

  • Add simdlib_neon.h
    • simdlib_neon.h has simdlib compatible API, and they are implemented with Arm NEON intrinsics.
    • simdlib.h includes simdlib_neon.h if __aarch64__ is defined.
  • Move geteven , getodd , getlow128 , and gethigh128 from distances_simd.cpp to simdlib_avx2.h .
  • Port geteven , getodd , getlow128 , and gethigh128 for non-AVX2 environments.
    • These codes are implemented with AVX2 intrinsics, so they have prevented to implement compute_PQ_dis_tables_dsub2 for non-AVX2 environments.
    • Now simdlib_avx2.h , simdlib_emulated.h , and simdlib_neon.h all have those functions.
  • Enable compute_PQ_dis_tables_dsub2 on aarch64
    • Above change makes compute_PQ_dis_tables_dsub2 independent from geteven and so on.
    • compute_PQ_dis_tables_dsub2 implemented with simdlib_neon.h is little faster than current implementation, so enabling that.
      • In contrast, compute_PQ_dis_tables_dsub2 implemented with simdlib_emulated.h is slower than current implementation, so we have not enabled it in our PR.

@mdouze
Copy link
Contributor

mdouze commented Apr 14, 2021

Looks good! Unfortunately there is no continuous build for ARM yet.
But it should be fine to integrate in Faiss.

@facebook-github-bot
Copy link
Contributor

@mdouze has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mdouze merged this pull request in fe777d8.

facebook-github-bot pushed a commit that referenced this pull request May 20, 2021
Summary:
related: #1815,  #1880

`vshl` / `vshr` of ARM NEON requires immediate (compiletime constant) value as shift parameter.
However, the implementations of those intrinsics on GCC can receive runtime value.
Current faiss implementation depends on this, so some correct-behavioring compilers like Clang can't build faiss for aarch64.
This PR fix this issue; thus faiss applied this PR can be built with Clang for aarch64 machines like M1 Mac.

Pull Request resolved: #1882

Reviewed By: beauby

Differential Revision: D28465563

Pulled By: mdouze

fbshipit-source-id: e431dfb3b27c9728072f50b4bf9445a3f4a5ac43
@vorj vorj mentioned this pull request Mar 17, 2023
facebook-github-bot pushed a commit that referenced this pull request Mar 23, 2023
Summary:
related: #1916, #1979, #2009, #2013, #2195, #2210

After my PR #1815 had been merged `-DCMAKE_BUILD_TYPE=Debug` has been invalid on aarch64, and many people have been hit the problem. (sorry to inconvenience...)
This PR fixes this.

### Details:

Using the function pointers of intrinsics on run-time context causes the link errors.
`-DCMAKE_BUILD_TYPE=Release` has been available because compiler optimizer can propagate and collapse the function pointers as constant.
However, when the pointers doesn't collapsed the link errors occurred, so `-DCMAKE_BUILD_TYPE=Debug` has been unavailable.
To prevent the link errors, I've replaced the function pointers of intrinsics on run-time context to on compile-time context explicitly.

Pull Request resolved: #2768

Reviewed By: mdouze

Differential Revision: D44296147

Pulled By: alexanderguzhva

fbshipit-source-id: 81fa013c5e05a486b6b82cb85d76eeefdefca891
mdouze added a commit that referenced this pull request Mar 23, 2023
* fix windows test (#2775)

Summary: Pull Request resolved: #2775

Reviewed By: algoriddle

Differential Revision: D44210010

fbshipit-source-id: b9b620a4b0a874e09ee2f6082ff0f9463716fdf4

* faiss/utils/simdlib_avx2.h: avoid C++20 ambiguous overloaded operator (#2772)

Summary:
Pull Request resolved: #2772

Resolves errors from overloaded ambiguous operators:
```
faiss/utils/partitioning.cpp:283:34: error: ISO C++20 considers use of overloaded operator '==' (with operand types 'faiss::simd16uint16' and 'faiss::simd16uint16') to be ambiguous despite there being a unique best viable function [-Werror,-Wambiguous-reversed-operator]
```

Reviewed By: alexanderguzhva, meyering

Differential Revision: D44186458

fbshipit-source-id: 0257fa0aaa4fe74c056bef751591f5f7e5357c9d

* Fix Debug Build on aarch64 (#2768)

Summary:
related: #1916, #1979, #2009, #2013, #2195, #2210

After my PR #1815 had been merged `-DCMAKE_BUILD_TYPE=Debug` has been invalid on aarch64, and many people have been hit the problem. (sorry to inconvenience...)
This PR fixes this.

### Details:

Using the function pointers of intrinsics on run-time context causes the link errors.
`-DCMAKE_BUILD_TYPE=Release` has been available because compiler optimizer can propagate and collapse the function pointers as constant.
However, when the pointers doesn't collapsed the link errors occurred, so `-DCMAKE_BUILD_TYPE=Debug` has been unavailable.
To prevent the link errors, I've replaced the function pointers of intrinsics on run-time context to on compile-time context explicitly.

Pull Request resolved: #2768

Reviewed By: mdouze

Differential Revision: D44296147

Pulled By: alexanderguzhva

fbshipit-source-id: 81fa013c5e05a486b6b82cb85d76eeefdefca891

---------

Co-authored-by: Matthijs Douze <matthijs@meta.com>
Co-authored-by: Jeff Palm <palmje@meta.com>
Co-authored-by: Y.Imaizumi <40021161+vorj@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants