Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faiss::gpu::runMatrixMult ... cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256 #2064

Open
2 of 4 tasks
anirudhajith opened this issue Sep 23, 2021 · 23 comments
Assignees
Labels

Comments

@anirudhajith
Copy link

anirudhajith commented Sep 23, 2021

Summary

I'm trying to train an IVFPQ index for 100000 768-dimensional embeddings on an NVIDIA GPU with 40537MiB of memory. The code fails at index.train() with the following error message:

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256
Aborted (core dumped)

Platform

OS: Ubuntu 20.04

Faiss version: faiss-gpu 1.7.1.post2

Installed from: anaconda (pip install faiss-gpu)

Faiss compilation options: Nothing explicitly

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

# n = 768, flatK = 100, D = 64, K = 256
res = faiss.StandardGpuResources()
n = train_embeddings.shape[1]    # train_embeddings has shape (100000, 768)
quantizer = faiss.IndexFlatL2(n)
index = faiss.IndexIVFPQ(quantizer, n, flatK, D, round(log2(K)))
co = faiss.GpuClonerOptions()
co.useFloat16 = True
index = faiss.index_cpu_to_gpu(res, 2, index, co)    # to use GPU2 on a multi-GPU VM
index.train(train_embeddings)                        # error
@xzyaoi
Copy link

xzyaoi commented Sep 24, 2021

What is your CUDA version? If it is >=11.2, have you tried on CUDA 10?

@anirudhajith
Copy link
Author

anirudhajith commented Sep 24, 2021

@xzyaoi My CUDA version is 10.1.
I also checked if the same code runs correctly with the GPU-specific lines commented out, and it does. I'm still not able to get it to run using the GPU though.

@anirudhajith anirudhajith changed the title Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256 faiss::gpu::runMatrixMult ... cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256 Sep 24, 2021
@mdouze mdouze added the GPU label Sep 28, 2021
@wickedfoo
Copy link
Contributor

What kind of GPU are you using? 40 GiB makes me think of A100, which really should require CUDA 11?

@MotiBaadror
Copy link

Hi @wickedfoo My cuda is 11 but still showing the error

@monchin
Copy link

monchin commented Jan 24, 2022

The same error, have you solved it now?

@tengteng-Lin
Copy link

I have the same error with RTX3090

@MotiBaadror
Copy link

MotiBaadror commented Mar 2, 2022 via email

@anirudhajith
Copy link
Author

anirudhajith commented Mar 2, 2022

@MotiBaadror Can you tell us what CUDA, faiss, faiss-gpu, etc. versions were when you finally managed to get it to work? Were you using A100 GPUs?

@zhoujianch
Copy link

I have the same error with RTX3090, please help me ???

@tengteng-Lin
Copy link

tengteng-Lin commented Mar 22, 2022 via email

@zhoujianch
Copy link

@tengteng-Lin thanks for your reply.
my faiss vesion are both faiss-cpu=1.7.2 and faiss-gpu=1.7.2, but it still does not work for me.
are you compile library from source?

@anirudhajith anirudhajith reopened this Mar 22, 2022
@TevenLeScao
Copy link

Seeing this with A100 / CUDA 11.5 / faiss-gpu=1.7.2

@Yu-Shi
Copy link

Yu-Shi commented May 10, 2022

Seeing this with A100 / CUDA 11.1 / faiss-gpu=1.7.2. The error occurs at the search step of a flat index.

@F0rt1s
Copy link

F0rt1s commented Jun 20, 2022

I am running into the same issue on RTX3090. Ubuntu, Driver 510.73.05; Cuda: 11.6

@ghost
Copy link

ghost commented Jun 24, 2022

showing me this error with cuda 11.6 rtx3090 faiss-gpu=1.7.2

@AlexGreason
Copy link

Seeing this with cuda 11.1 rtx3090 faiss-gpu=1.7.2

@vlievin
Copy link

vlievin commented Jul 30, 2022

Same error with:

  • faiss-gpu==1.7.2 as well as faiss-gpu==1.6.5
  • cudatoolkit==11.6.0 and cudatookit==11.3.1
  • Quadro RTX 5000

Trying to reinstall from scratch, upgrade or downgrade faiss did not solve this problem, any hint would be appreciated

@future-xy
Copy link

Same here:
faiss-gpu 1.7.2
cuda 11.6
RTX A5000

@zhangxiangxiao
Copy link

Same with
faiss-gpu 1.7.2
CUDA 11.7
RTX 3090

@zhangxiangxiao
Copy link

zhangxiangxiao commented Sep 12, 2022

Update: The error occurs when I use the faiss-gpu PIP package from https://github.com/kyamagu/faiss-wheels (in Rocky Linux 9 with Python 3.9 and CUDA 11.7). If I use Anaconda3 with Python 3.8 and install the faiss-gpu from pytorch conda repo with cuda 11.3 (which is the officially supported manner), the error no longer appears. Perhaps this should have been an issue in that repo instead.

@Victorwz
Copy link

Victorwz commented Nov 7, 2022

Update: The error occurs when I use the faiss-gpu PIP package from https://github.com/kyamagu/faiss-wheels (in Rocky Linux 9 with Python 3.9 and CUDA 11.7). If I use Anaconda3 with Python 3.8 and install the faiss-gpu from pytorch conda repo with cuda 11.3 (which is the officially supported manner), the error no longer appears. Perhaps this should have been an issue in that repo instead.

Thank you so much. I also fix this issue on A100 GPU following your suggestion. My environment is python==3.8, cuda==11.3, faiss-gpu==1.7.2, torch==1.9.1+cu111.

@Kin-Zhang
Copy link

Kin-Zhang commented Jul 19, 2023

  • env: python==3.9, cuda==11.4, faiss-gpu==1.7.4 / A100

Met also in the env above, but haven't tried the solution to downgrade the python version.


After downgrading the python version to py38 and follow #2064 (comment) said, it works!!!

@vikmary
Copy link

vikmary commented Dec 21, 2023

It helped me to install a specific wheel with faiss-gpu==1.7.3:

pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests