Faiss runs very slowly on M1 Mac #2386

SupreethRao99 · 2022-07-15T16:04:05Z

Summary

running inference on a saved index it is painfully slow on M1 Pro (10 core CPU 16 core GPU). The index is about 3.4Gb in size and takes 1.5 seconds for inference on CPU backend on colab but is taking >20 minutes on M1 CPU, what would be the reason for such slow performance ?

Platform

OS: macOS 12.4
Faiss version: 1.7.2
Installed from: compiled by self following install.md, and this issue

Faiss compilation options:

LDFLAGS="-L/opt/homebrew/opt/llvm/lib" CPPFLAGS="-I/opt/homebrew/opt/llvm/include" CXX=/opt/homebrew/opt/llvm/bin/clang++ CC=/opt/homebrew/opt/llvm/bin/clang cmake -DFAISS_ENABLE_GPU=OFF -B build .

Running on:

CPU
GPU

Interface:

C++
Python

Reproduction instructions

The code that I'm running is as follows

import numpy as np
import pandas as pd
import faiss
from sentence_transformers import SentenceTransformer

# Training Index
df = pd.read_csv('abcnews-data-text.csv')
data = df.headline_text.to_list()

model = SentenceTransformer('distilbert-base-nli-mean-tokens')
encoded_data = model.encode(data)

index = faiss.IndexIDMap(faiss.IndexFlatIP(768))
index.add_with_ids(encoded_data, np.array(range(0, len(data))))
faiss.write_index(index, 'abc_news')

# Inference 

def search(query):
   t=time.time()
   query_vector = model.encode([query])
   k = 5
   top_k = index.search(query_vector, k)
   print('totaltime: {}'.format(time.time()-t))
   return [data[_id] for _id in top_k[1].tolist()[0]]

index = faiss.read_index('abc_news')
query=str(input())
results=search(query)
print('results :')
for result in results:
   print('\t',result)

The text was updated successfully, but these errors were encountered:

wx257osn2 · 2022-07-18T18:02:01Z

@SupreethRao99 It seems that you built faiss as Debug mode. -DCMAKE_BUILD_TYPE=Release may help you.

SupreethRao99 · 2022-07-19T11:06:21Z

@wx257osn2 thank you. I tried the approach that you suggested by adding -DCMAKE_BUILD_TYPE=Release when building FAISS, but inference is still taking >10 mins.

wx257osn2 · 2022-07-19T18:17:26Z

@SupreethRao99 Thanks trying. Hmm... what the BLAS library did you install? It seems that IndexFlatIP calls them.

mdouze · 2022-07-20T09:53:20Z

Indeed the speed of flat search is dominated by the BLAS sgemm. Maybe the cmake logs indicate which BLAS version is used.

SupreethRao99 · 2022-07-20T10:23:48Z

Thank you @mdouze @wx257osn2 , the CMAKE logs are as follows,
CMAKE logs

(SemanticSearch) supreeth@Supreeths-MacBook-Pro faiss % LDFLAGS="-L/opt/homebrew/opt/llvm/lib" CPPFLAGS="-I/opt/homebrew/opt/llvm/include" CXX=/opt/homebrew/opt/llvm/bin/clang++ CC=/opt/homebrew/opt/llvm/bin/clang cmake -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=OFF -B build .
-- The CXX compiler identification is Clang 14.0.6
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/homebrew/opt/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0") 
-- Found OpenMP: TRUE (found version "5.0")  
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Could NOT find MKL (missing: MKL_LIBRARIES) 
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Looking for dgemm_
-- Looking for dgemm_ - found
-- Found BLAS: /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework  
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework;-lm;-ldl  
-- Found SWIG: /opt/homebrew/bin/swig (found version "4.0.2") found components: python 
-- Found Python: /Users/supreeth/miniforge3/envs/SemanticSearch/include/python3.8 (found version "3.8.13") found components: Development NumPy Interpreter Development.Module Development.Embed 
CMake Deprecation Warning at build/_deps/googletest-src/CMakeLists.txt:4 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is Clang 14.0.6
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/homebrew/opt/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
CMake Deprecation Warning at build/_deps/googletest-src/googletest/CMakeLists.txt:56 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Found PythonInterp: /Users/supreeth/miniforge3/envs/SemanticSearch/bin/python (found version "3.8.13") 
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/supreeth/SemanticSearch/faiss/build
(SemanticSearch) supreeth@Supreeths-MacBook-Pro faiss % $ make -C build -j faiss
zsh: command not found: $
(SemanticSearch) supreeth@Supreeths-MacBook-Pro faiss % make -C build -j faiss 
[  0%] Building CXX object faiss/CMakeFiles/faiss.dir/AutoTune.cpp.o
[  0%] Building CXX object faiss/CMakeFiles/faiss.dir/IVFlib.cpp.o
[  2%] Building CXX object faiss/CMakeFiles/faiss.dir/Clustering.cpp.o
[  2%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexBinary.cpp.o
[  5%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexAdditiveQuantizer.cpp.o
[  8%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexBinaryHNSW.cpp.o
[  8%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexBinaryHash.cpp.o
[  8%] Building CXX object faiss/CMakeFiles/faiss.dir/Index2Layer.cpp.o
[ 10%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexBinaryFlat.cpp.o
[ 13%] Building CXX object faiss/CMakeFiles/faiss.dir/Index.cpp.o
[ 13%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexBinaryFromFloat.cpp.o
[ 13%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexHNSW.cpp.o
[ 13%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexFlat.cpp.o
[ 16%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexFlatCodes.cpp.o
[ 16%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVFAdditiveQuantizer.cpp.o
[ 18%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVF.cpp.o
[ 18%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVFSpectralHash.cpp.o
[ 21%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVFPQFastScan.cpp.o
[ 24%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexBinaryIVF.cpp.o
[ 27%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVFFlat.cpp.o
[ 27%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVFPQ.cpp.o
[ 27%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexNNDescent.cpp.o
[ 27%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVFPQR.cpp.o
[ 29%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexLSH.cpp.o
[ 32%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVFFastScan.cpp.o
[ 35%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexLattice.cpp.o
[ 35%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexIVFAdditiveQuantizerFastScan.cpp.o
[ 35%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexNSG.cpp.o
[ 35%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexFastScan.cpp.o
[ 37%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexPQ.cpp.o
[ 40%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexAdditiveQuantizerFastScan.cpp.o
[ 40%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexPQFastScan.cpp.o
[ 43%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexShards.cpp.o
[ 43%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexScalarQuantizer.cpp.o
[ 45%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexPreTransform.cpp.o
[ 48%] Building CXX object faiss/CMakeFiles/faiss.dir/clone_index.cpp.o
[ 51%] Building CXX object faiss/CMakeFiles/faiss.dir/MetaIndexes.cpp.o
[ 54%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/HNSW.cpp.o
[ 54%] Building CXX object faiss/CMakeFiles/faiss.dir/index_factory.cpp.o
[ 56%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexReplicas.cpp.o
[ 56%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/AdditiveQuantizer.cpp.o
[ 59%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/ProductQuantizer.cpp.o
[ 59%] Building CXX object faiss/CMakeFiles/faiss.dir/VectorTransform.cpp.o
[ 59%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/ScalarQuantizer.cpp.o
[ 59%] Building CXX object faiss/CMakeFiles/faiss.dir/MatrixStats.cpp.o
[ 62%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/AuxIndexStructures.cpp.o
[ 62%] Building CXX object faiss/CMakeFiles/faiss.dir/IndexRefine.cpp.o
[ 62%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/NSG.cpp.o
[ 62%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/PolysemousTraining.cpp.o
[ 62%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/LocalSearchQuantizer.cpp.o
[ 64%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/FaissException.cpp.o
[ 64%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/ResidualQuantizer.cpp.o
[ 67%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/index_read.cpp.o
[ 70%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/ProductAdditiveQuantizer.cpp.o
[ 70%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/pq4_fast_scan.cpp.o
[ 72%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/kmeans1d.cpp.o
[ 72%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/io.cpp.o
[ 72%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/index_write.cpp.o
[ 75%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/pq4_fast_scan_search_1.cpp.o
[ 78%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/lattice_Zn.cpp.o
[ 78%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/pq4_fast_scan_search_qbs.cpp.o
[ 81%] Building CXX object faiss/CMakeFiles/faiss.dir/impl/NNDescent.cpp.o
[ 81%] Building CXX object faiss/CMakeFiles/faiss.dir/invlists/BlockInvertedLists.cpp.o
[ 83%] Building CXX object faiss/CMakeFiles/faiss.dir/invlists/DirectMap.cpp.o
[ 86%] Building CXX object faiss/CMakeFiles/faiss.dir/invlists/InvertedListsIOHook.cpp.o
[ 89%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/distances.cpp.o
[ 89%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/extra_distances.cpp.o
[ 89%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/hamming.cpp.o
[ 89%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/distances_simd.cpp.o
[ 89%] Building CXX object faiss/CMakeFiles/faiss.dir/invlists/InvertedLists.cpp.o
[ 91%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/WorkerThread.cpp.o
[ 91%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/Heap.cpp.o
[ 91%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/quantize_lut.cpp.o
[ 94%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/partitioning.cpp.o
[ 97%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/random.cpp.o
[ 97%] Building CXX object faiss/CMakeFiles/faiss.dir/utils/utils.cpp.o
[100%] Building CXX object faiss/CMakeFiles/faiss.dir/invlists/OnDiskInvertedLists.cpp.o
[100%] Linking CXX static library libfaiss.a
[100%] Built target faiss
(SemanticSearch) supreeth@Supreeths-MacBook-Pro faiss %

SupreethRao99 · 2022-07-20T10:25:42Z

also, is there any way in which I can help build and upload FAISS to conda so that people will not have to build from source. Furthermore, are there plans to support GPU acceleration on M1 processors?

wx257osn2 · 2022-07-20T13:08:43Z

@SupreethRao99

-- Found BLAS: /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework

Ah, that is. According to this and this, Apple's Accelerate framework for M1 runs on the AMX coprocessor. It seems that the coprocessor is good at electric efficiency, but not good at run-time speed, especially in multi-thread execution.
Could you try to use OpenBLAS which enabled OpenMP? I'm not sure of that OpenBLAS helps you, but it may does with appropriate thread counts.

Furthermore, are there plans to support GPU acceleration on M1 processors?

I'm not a Meta employee for implementing faiss nor a contributor of GPU implementations so below is just my estimation: I think that supporting M1 GPU is currently not planed and needed to add a lot of implementations even if M1 GPU will be supported, because faiss uses CUDA to implement GPGPU codes and CUDA can't work on M1 GPU.
That must be the hard way, but the faiss team will probably welcome to your contribution if you will implement it.

SupreethRao99 · 2022-07-20T15:07:17Z

thank you @wx257osn2 , I will rebuild FAISS with OpenBLAS and give it a try.

mdouze · 2022-07-21T16:19:45Z

also, is there any way in which I can help build and upload FAISS to conda so that people will not have to build from source. Furthermore, are there plans to support GPU acceleration on M1 processors?

Right, as @wx257osn2 says, it is a significant effort to support the GPU version of FAISS in CUDA, so adding support for other types of GPUs like the M1's or Intel or AMD is not planned.

wx257osn2 · 2022-07-22T18:52:55Z

I put a brief note about porting GPGPU codes from CUDA to other environments. Actually, the porting cost to AMD's GPU on Linux is relatively low than M1, Intel, and AMD's GPU on Windows. AMD is developing a GPGPU environment called ROCm, and HIP, which is a wrapper for CUDA and ROCm. The API of HIP is mostly one of CUDA itself, so porting cost to HIP is not so high, and HIP codes can run on both CUDA and ROCm. There are also high performance libraries and the wrappers like rocBLAS and hipBLAS, and so on. CuPy, which is a GPU-accelerated NumPy/SciPy-compatible python module, is the famous product written in HIP. CuPy had been originally written in CUDA, but that has been ported to HIP at few years ago. However, the porting cost is just relatively low , not zero. CuPy showed that even porting to HIP is not so easy.
On the other hand, the cost of porting to OpenCL (for AMD on Windows and for Intel) or Metal (for Apple) is much more high. Moreover the maintenance costs would be increased throughout the future if supporting those GPUs, because it cannot be straightforwardly integrated with CUDA implementations unlike HIP. This will be too hard. In these reason, it seems some reasonable that there is no plan beyond that contributors like us do as needed, I think.

mdouze · 2022-07-26T10:26:49Z

Thanks @wx257osn2 for the overview.

To summarize, the issues for us to support alternative hardware are:

we need to be able to test the support with CircleCI to track regressions, ie. the appropriate hardware must be available in CircleCI
the stages for support are (1) compiling (2) passing tests and (3) optimizing. For step (3), unfortunately due to hardware and compiler specificities, it is not obvious that the speed of the hardware accelerator is competitive with existing accelerators, sometimes it is even slower than CPU.
we prioritize the hardware we work on ourselves, which is currently NVIDIA gpus.
and finally, we already have trouble maintaining the precompiles packages in the set of platforms we support...

So if anyone is willing to take ownership of other hardware accelerators, we'd be very happy to collaborate ;-)

kaanbursa · 2022-08-28T18:29:34Z

On my Mac M1, when I search on jupyter notebook it kills my session. I've tried it with python 3.7 and 3.9 both kills the session however it does work when I start a python app from pycharm or terminal. using faiss-cpu

wx257osn2 · 2022-08-30T02:39:20Z

@kaanbursa This issue is about the performance, not about whether it works or not. You should create your new issue about your problem with more information about installation method, error messages, and so on.

wx257osn2 · 2022-08-30T02:40:44Z

@SupreethRao99 Do you have any update? Has OpenBLAS helped you?

SupreethRao99 · 2022-08-30T04:41:36Z

@wx257osn2 Yes, OpenBLAS does give a good speed up. Thank you !

nullhook · 2024-01-17T20:17:17Z

potentially this should be ran on the gpu on m1 but this requires porting over cuda kernels to metal.

ellesharma · 2024-01-31T23:08:20Z

@SupreethRao99 How did you rebuild FAISS with OpenBLAS ? Newbie to GenAI stuff.

yusufsyaifudin · 2024-05-18T04:48:54Z

@wx257osn2 Yes, OpenBLAS does give a good speed up. Thank you !

@SupreethRao99 can you share how to rebuild FAISS using OpenBLAS and make it faster on M1?

mdouze added the install label Jul 20, 2022

wx257osn2 mentioned this issue Sep 8, 2022

swigfaiss.so incompatible with mac arm64 #2432

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faiss runs very slowly on M1 Mac #2386

Faiss runs very slowly on M1 Mac #2386

SupreethRao99 commented Jul 15, 2022 •

edited

wx257osn2 commented Jul 18, 2022

SupreethRao99 commented Jul 19, 2022

wx257osn2 commented Jul 19, 2022 •

edited

mdouze commented Jul 20, 2022

SupreethRao99 commented Jul 20, 2022

SupreethRao99 commented Jul 20, 2022

wx257osn2 commented Jul 20, 2022 •

edited

SupreethRao99 commented Jul 20, 2022

mdouze commented Jul 21, 2022

wx257osn2 commented Jul 22, 2022 •

edited

mdouze commented Jul 26, 2022

kaanbursa commented Aug 28, 2022

wx257osn2 commented Aug 30, 2022

wx257osn2 commented Aug 30, 2022

SupreethRao99 commented Aug 30, 2022

nullhook commented Jan 17, 2024

ellesharma commented Jan 31, 2024

yusufsyaifudin commented May 18, 2024

Faiss runs very slowly on M1 Mac #2386

Faiss runs very slowly on M1 Mac #2386

Comments

SupreethRao99 commented Jul 15, 2022 • edited

Summary

Platform

Reproduction instructions

wx257osn2 commented Jul 18, 2022

SupreethRao99 commented Jul 19, 2022

wx257osn2 commented Jul 19, 2022 • edited

mdouze commented Jul 20, 2022

SupreethRao99 commented Jul 20, 2022

SupreethRao99 commented Jul 20, 2022

wx257osn2 commented Jul 20, 2022 • edited

SupreethRao99 commented Jul 20, 2022

mdouze commented Jul 21, 2022

wx257osn2 commented Jul 22, 2022 • edited

mdouze commented Jul 26, 2022

kaanbursa commented Aug 28, 2022

wx257osn2 commented Aug 30, 2022

wx257osn2 commented Aug 30, 2022

SupreethRao99 commented Aug 30, 2022

nullhook commented Jan 17, 2024

ellesharma commented Jan 31, 2024

yusufsyaifudin commented May 18, 2024

SupreethRao99 commented Jul 15, 2022 •

edited

wx257osn2 commented Jul 19, 2022 •

edited

wx257osn2 commented Jul 20, 2022 •

edited

wx257osn2 commented Jul 22, 2022 •

edited