Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running demo_ivfpq_indexing_gpu Segmentation fault #67

Closed
Geek0x0 opened this issue Apr 5, 2017 · 11 comments
Closed

running demo_ivfpq_indexing_gpu Segmentation fault #67

Geek0x0 opened this issue Apr 5, 2017 · 11 comments

Comments

@Geek0x0
Copy link

Geek0x0 commented Apr 5, 2017

1. The results are as follows:

[0.382 s] Generating 100000 vectors in 128D for training
[0.540 s] Training the index
Training IVF quantizer on 100000 vectors in 128D
Clustering 100000 points in 128D to 1788 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.032731 s
  Iteration 0 (0.12 s, search 0.09 s): objective=1.43954e+06 imbalance=2.907 nsplit=0       Iteration 9 (3.68 s, search 3.61 s): objective=930934 imbalance=1.255 nsplit=0       
computing residuals
training 4 x 256 product quantizer on 16384 vectors in 128D
Training PQ slice 0/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00141504 s
  Iteration 24 (2.73 s, search 2.45 s): objective=27271.5 imbalance=1.018 nsplit=0       
Training PQ slice 1/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.000438965 s
  Iteration 24 (2.37 s, search 2.06 s): objective=27193.4 imbalance=1.016 nsplit=0       
Training PQ slice 2/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.000931885 s
  Iteration 24 (2.59 s, search 2.25 s): objective=27230.8 imbalance=1.021 nsplit=0       
Training PQ slice 3/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.000437988 s
  Iteration 24 (1.96 s, search 1.78 s): objective=27174 imbalance=1.023 nsplit=0         
[14.164 s] storing the pre-trained index to /tmp/index_trained.faissindex
[14.186 s] Building a dataset of 200000 vectors to index
[14.506 s] Adding the vectors to the index
Segmentation fault (core dumped)

2. Library dependency:

        linux-vdso.so.1 =>  (0x00007ffc876da000)
	libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007f1f42cfb000)
	libcublas.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0 (0x00007f1f40263000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1f4005a000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1f3fe3d000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1f3fc39000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1f3f8b6000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1f3f5ad000)
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f1f3f38b000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1f3f174000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1f3edab000)
	/lib64/ld-linux-x86-64.so.2 (0x0000564584980000)
	libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f1f3ea80000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f1f3e840000)

3. Heap information:

  #0  0x00007fffe9828c9a in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
  #1  0x00007fffe974f696 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
  #2  0x00007fffe9884992 in cuEventDestroy_v2 ()
   from /usr/lib/x86_64-linux-gnu/libcuda.so.1
  #3  0x00000000004f62f4 in cudart::cudaApiEventDestroy(CUevent_st*) ()
  #4  0x0000000000524b94 in cudaEventDestroy ()
  #5  0x0000000000440af8 in faiss::gpu::streamWaitBase<std::vector<CUstream_st*, std::allocator<CUstream_st*> >, std::initializer_list<CUstream_st*> > (
      listWaiting=std::vector of length 2, capacity 2 = {...}, listWaitOn=...)
      at impl/../utils/DeviceUtils.h:131
  #6  0x0000000000479294 in faiss::gpu::streamWait<std::vector<CUstream_st*,   std::allocator<CUstream_st*> > > (b=..., a=std::vector of length 2, capacity 2 = {...})
      at impl/../utils/DeviceUtils.h:140
  #7  faiss::gpu::runL2Distance<float> (resources=0x7fffffffe780, centroids=..., 
    centroidNorms=centroidNorms@entry=0xb9e1a30, queries=..., k=k@entry=1, 
    outDistances=..., outIndices=..., ignoreOutDistances=true, tileSize=256)
    at impl/Distance.cu:110
  #8  0x000000000047032e in faiss::gpu::runL2Distance (resources=<optimized out>, 
    vectors=..., vectorNorms=vectorNorms@entry=0xb9e1a30, queries=..., k=k@entry=1, 
    outDistances=..., outIndices=..., ignoreOutDistances=<optimized out>, tileSize=-1)
    at impl/Distance.cu:307
  #9  0x000000000042e574 in faiss::gpu::FlatIndex::query (this=0xb9e1970, vecs=..., 
    k=k@entry=1, outDistances=..., outIndices=..., 
    exactDistance=exactDistance@entry=false, tileSize=-1) at impl/FlatIndex.cu:121
  #10 0x00000000004432c3 in faiss::gpu::IVFPQ::classifyAndAddVectors (this=0xb9e2d80, 
    vecs=..., indices=...) at impl/IVFPQ.cu:138
  #11 0x0000000000424fd0 in faiss::gpu::GpuIndexIVFPQ::add_with_ids (this=0x7fffffffe8d0, 
    n=200000, x=0x7fffcf580010, xids=0xc0ab390) at GpuIndexIVFPQ.cu:355
  #12 0x000000000042092b in faiss::gpu::GpuIndexIVF::add (this=0x7fffffffe8d0, n=200000, 
    x=0x7fffcf580010) at GpuIndexIVF.cu:254
  #13 0x000000000040e8bf in main () at test/demo_ivfpq_indexing_gpu.cpp:114

4. Hardware information

        01:00.0 3D controller: NVIDIA Corporation GM206M [GeForce GTX 965M] (rev a1)
	DeviceName: NVIDIA N16E-GR
	Subsystem: Hewlett-Packard Company GM206M [GeForce GTX 965M]
	Flags: bus master, fast devsel, latency 0, IRQ 134
	Memory at a3000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 90000000 (64-bit, prefetchable) [size=256M]
	Memory at a0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 4000 [size=128]
	[virtual] Expansion ROM at a4000000 [disabled] [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_375_drm, nvidia_375
@mdouze
Copy link
Contributor

mdouze commented Apr 5, 2017

Hi @caydyn-skd

Thanks for the extensive bug report. I believe this is linked to the low-mem issue #66 mentioned here

#66 (comment)

Please stay tuned until we have a fix.

@mdouze
Copy link
Contributor

mdouze commented Apr 6, 2017

Could you try with the current version? It has better low-mem GPU support.

@namhyungk
Copy link

I also have an error running demo_ivfpq_indexing_gpu as below. The most recent version is used and it is running on TITAN X (Maxwell) with 12 GB of memory.

$ ./test/demo_ivfpq_indexing_gpu
[0.561 s] Generating 100000 vectors in 128D for training
[0.699 s] Training the index
Training IVF quantizer on 100000 vectors in 128D
Clustering 100000 points in 128D to 1788 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.01 s
  Iteration 9 (0.34 s, search 0.26 s): objective=930934 imbalance=1.255 nsplit=0
computing residuals
training 4 x 256 product quantizer on 16384 vectors in 128D
Training PQ slice 0/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (1.89 s, search 1.52 s): objective=27271.5 imbalance=1.018 nsplit=0
Training PQ slice 1/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (1.99 s, search 1.62 s): objective=27193.4 imbalance=1.016 nsplit=0
Training PQ slice 2/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (1.97 s, search 1.60 s): objective=27230.8 imbalance=1.021 nsplit=0
Training PQ slice 3/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (1.49 s, search 1.20 s): objective=27174 imbalance=1.023 nsplit=0
[8.526 s] storing the pre-trained index to /tmp/index_trained.faissindex
[8.573 s] Building a dataset of 200000 vectors to index
[8.841 s] Adding the vectors to the index
Faiss assertion err == CUBLAS_STATUS_SUCCESS failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with T = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at utils/MatrixMult.cu:141Aborted (core dumped)

@Geek0x0
Copy link
Author

Geek0x0 commented Apr 7, 2017

Hi @mdouze I tried running the latest version, but still have the same problem

caydyn@dev:/home/caydyn/faiss$ git log -1
commit 7abe81b4f6abad56731ec1c27968173c8ce0d322
Author: matthijs <matthijs@fb.com>
Date:   Thu Apr 6 04:33:41 2017 -0700

    Better support for low-mem GPUs
    avoid reading beyond the end of an array in fvec_L2sqr and related functions
#0  0x00007fffe9828c9a in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1  0x00007fffe974f696 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007fffe9884992 in cuEventDestroy_v2 ()
   from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00000000004f9124 in cudart::cudaApiEventDestroy(CUevent_st*) ()
#4  0x00000000005279c4 in cudaEventDestroy ()
#5  0x0000000000442458 in faiss::gpu::streamWaitBase<std::vector<CUstream_st*, std::allocator<CUstream_st*> >, std::initializer_list<CUstream_st*> > (
    listWaiting=std::vector of length 2, capacity 2 = {...}, listWaitOn=...)
    at impl/../utils/DeviceUtils.h:131
#6  0x000000000047a906 in faiss::gpu::streamWait<std::vector<CUstream_st*, std::allocator<CUstream_st*> > > (b=..., a=std::vector of length 2, capacity 2 = {...})
    at impl/../utils/DeviceUtils.h:140
#7  faiss::gpu::runL2Distance<float> (resources=0x7fffffffe770, centroids=..., 
    centroidsTransposed=0x0, centroidNorms=centroidNorms@entry=0xb9e8570, queries=..., 
    k=k@entry=1, outDistances=..., outIndices=..., ignoreOutDistances=true, 
    tileSizeOverride=-1) at impl/Distance.cu:145
#8  0x0000000000471aee in faiss::gpu::runL2Distance (resources=<optimized out>, 
    vectors=..., vectorsTransposed=<optimized out>, 
    vectorNorms=vectorNorms@entry=0xb9e8570, queries=..., k=k@entry=1, outDistances=..., 
    outIndices=..., ignoreOutDistances=<optimized out>, tileSizeOverride=-1)
    at impl/Distance.cu:349
#9  0x000000000042f402 in faiss::gpu::FlatIndex::query (this=0xb9e8420, input=..., 
    k=k@entry=1, outDistances=..., outIndices=..., 
    exactDistance=exactDistance@entry=false, tileSize=-1) at impl/FlatIndex.cu:124
#10 0x0000000000444c23 in faiss::gpu::IVFPQ::classifyAndAddVectors (this=0xc37f3c0, 
    vecs=..., indices=...) at impl/IVFPQ.cu:138
#11 0x0000000000425abb in faiss::gpu::GpuIndexIVFPQ::addImpl_ (this=0x7fffffffe8c0, 
    n=200000, x=<optimized out>, xids=<optimized out>) at GpuIndexIVFPQ.cu:352
#12 0x0000000000417424 in faiss::gpu::GpuIndex::addInternal_ (this=0x7fffffffe8c0, 
    n=200000, x=0x7fffcfd81010, ids=0xbae4260) at GpuIndex.cu:74
#13 0x000000000042134b in faiss::gpu::GpuIndexIVF::add (this=0x7fffffffe8c0, n=200000, 
    x=0x7fffcfd81010) at GpuIndexIVF.cu:259
#14 0x000000000040ea8f in main () at test/demo_ivfpq_indexing_gpu.cpp:114

(gdb) f 14
#14 0x000000000040ea8f in main () at test/demo_ivfpq_indexing_gpu.cpp:114
114	        index.add (nb, database.data());
(gdb) l
109	        }
110	
111	        printf ("[%.3f s] Adding the vectors to the index\n",
112	                elapsed() - t0);
113	
114	        index.add (nb, database.data());
115	
116	        printf ("[%.3f s] done\n", elapsed() - t0);
117	
118	        // remember a few elements from the database as queries


[0.379 s] Generating 100000 vectors in 128D for training
[0.526 s] Training the index
Training IVF quantizer on 100000 vectors in 128D
Clustering 100000 points in 128D to 1788 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.03 s
  Iteration 0 (0.15 s, search 0.12 s): objective=1.43954e+06 imbalance=2.907 nsplit=0       Iteration 9 (3.68 s, search 3.59 s): objective=930934 imbalance=1.255 nsplit=0       
computing residuals
training 4 x 256 product quantizer on 16384 vectors in 128D
Training PQ slice 0/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (3.20 s, search 2.67 s): objective=27271.5 imbalance=1.018 nsplit=0       
Training PQ slice 1/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (2.95 s, search 2.49 s): objective=27193.4 imbalance=1.016 nsplit=0       
Training PQ slice 2/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (3.24 s, search 2.75 s): objective=27230.8 imbalance=1.021 nsplit=0       
Training PQ slice 3/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (2.43 s, search 2.15 s): objective=27174 imbalance=1.023 nsplit=0         
[16.307 s] storing the pre-trained index to /tmp/index_trained.faissindex
[16.353 s] Building a dataset of 200000 vectors to index
[16.649 s] Adding the vectors to the index
Segmentation fault (core dumped)
linux-vdso.so.1 =>  (0x00007fff637dd000)
libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007f0182d83000)
libcublas.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0 (0x00007f01802eb000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f01800e2000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f017fec5000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f017fcc1000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f017f93e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f017f635000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f017f413000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f017f1fc000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f017ee33000)
/lib64/ld-linux-x86-64.so.2 (0x00005628e2e86000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f017eb08000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f017e8c8000)

@wickedfoo
Copy link
Contributor

@eduj36 Can you run nvidia-smi and copy the output here? What version is your driver? Does it match your CUDA SDK version (8.0)?

@caydyn-skd can you run nvidia-smi and copy the output here as well?

@wag
Copy link

wag commented Apr 24, 2017

I'm experiencing the same issue with a GTX 970 (4GB) on the latest version, 2816831

dev:~/build/faiss/gpu/test$ gdb ./demo_ivfpq_indexing_gpu
[...]
[0.365 s] Generating 100000 vectors in 128D for training
[0.483 s] Training the index
Training IVF quantizer on 100000 vectors in 128D
Clustering 100000 points in 128D to 1788 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.03 s
  Iteration 9 (0.43 s, search 0.37 s): objective=930934 imbalance=1.255 nsplit=0            
computing residuals
training 4 x 256 product quantizer on 16384 vectors in 128D
Training PQ slice 0/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (4.50 s, search 3.79 s): objective=27271.5 imbalance=1.018 nsplit=0       
Training PQ slice 1/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (4.59 s, search 3.97 s): objective=27193.4 imbalance=1.016 nsplit=0       
Training PQ slice 2/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (4.72 s, search 4.07 s): objective=27230.8 imbalance=1.021 nsplit=0       
Training PQ slice 3/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (4.60 s, search 3.93 s): objective=27174 imbalance=1.023 nsplit=0         
[19.516 s] storing the pre-trained index to /tmp/index_trained.faissindex
[19.550 s] Building a dataset of 200000 vectors to index
[19.785 s] Adding the vectors to the index

Thread 1 "demo_ivfpq_inde" received signal SIGSEGV, Segmentation fault.
0x00007fffe1825caa in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) bt
#0  0x00007fffe1825caa in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1  0x00007fffe174c696 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007fffe1881962 in cuEventDestroy_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00000000004f9124 in cudart::cudaApiEventDestroy(CUevent_st*) ()
#4  0x00000000005279c4 in cudaEventDestroy ()
#5  0x0000000000442458 in faiss::gpu::streamWaitBase<std::vector<CUstream_st*, std::allocator<CUstream_st*> >, std::initializer_list<CUstream_st*> > (
    listWaiting=std::vector of length 2, capacity 2 = {...}, listWaitOn=...) at impl/../utils/DeviceUtils.h:131
#6  0x000000000047a906 in faiss::gpu::streamWait<std::vector<CUstream_st*, std::allocator<CUstream_st*> > > (b=..., a=std::vector of length 2, capacity 2 = {...})
    at impl/../utils/DeviceUtils.h:140
#7  faiss::gpu::runL2Distance<float> (resources=0x7fffffffd970, centroids=..., centroidsTransposed=0x0, centroidNorms=centroidNorms@entry=0xb523740, queries=..., k=k@entry=1, 
    outDistances=..., outIndices=..., ignoreOutDistances=true, tileSizeOverride=-1) at impl/Distance.cu:145
#8  0x0000000000471aee in faiss::gpu::runL2Distance (resources=<optimized out>, vectors=..., vectorsTransposed=<optimized out>, vectorNorms=vectorNorms@entry=0xb523740, queries=..., 
    k=k@entry=1, outDistances=..., outIndices=..., ignoreOutDistances=<optimized out>, tileSizeOverride=-1) at impl/Distance.cu:349
#9  0x000000000042f402 in faiss::gpu::FlatIndex::query (this=0xb5235f0, input=..., k=k@entry=1, outDistances=..., outIndices=..., exactDistance=exactDistance@entry=false, tileSize=-1)
    at impl/FlatIndex.cu:124
#10 0x0000000000444c23 in faiss::gpu::IVFPQ::classifyAndAddVectors (this=0xbebae20, vecs=..., indices=...) at impl/IVFPQ.cu:138
#11 0x0000000000425abb in faiss::gpu::GpuIndexIVFPQ::addImpl_ (this=0x7fffffffdac0, n=200000, x=<optimized out>, xids=<optimized out>) at GpuIndexIVFPQ.cu:352
#12 0x0000000000417424 in faiss::gpu::GpuIndex::addInternal_ (this=0x7fffffffdac0, n=200000, x=0x7fffc364f010, ids=0xb61f440) at GpuIndex.cu:74
#13 0x000000000042134b in faiss::gpu::GpuIndexIVF::add (this=0x7fffffffdac0, n=200000, x=0x7fffc364f010) at GpuIndexIVF.cu:259
#14 0x000000000040ea8f in main ()
dev:~/build/faiss/gpu/test$ ldd demo_ivfpq_indexing_gpu
	linux-vdso.so.1 =>  (0x00007fff112e6000)
	libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007fb9fb90a000)
	libcublas.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0 (0x00007fb9f8e72000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb9f8c69000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb9f8a4c000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb9f8848000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb9f84c5000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb9f81bc000)
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fb9f7f9a000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb9f7d83000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb9f79ba000)
	/lib64/ld-linux-x86-64.so.2 (0x000055c5d2d2d000)
	libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fb9f768f000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fb9f744f000)
dev:~$ lspci -vvv -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Elitegroup Computer Systems GM204 [GeForce GTX 970]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 319
	Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at d0000000 (64-bit, prefetchable) [size=32M]
	Region 5: I/O ports at e000 [size=128]
	[virtual] Expansion ROM at df000000 [disabled] [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_375_drm, nvidia_375
dev:~$ nvidia-smi 
Mon Apr 24 09:41:45 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970     Off  | 0000:01:00.0      On |                  N/A |
| 35%   30C    P8    19W / 151W |    872MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1986    G   /usr/lib/xorg/Xorg                             658MiB |
|    0      4119    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   212MiB |
+-----------------------------------------------------------------------------+

@wag
Copy link

wag commented Apr 26, 2017

Just tried the same on a system with 2 GTX 1070 (8GB each) without any problems.

@wickedfoo
Copy link
Contributor

wickedfoo commented Apr 26, 2017

@wag Can you try it on the GTX 970 without running other processes (like X) on the GPU, since there appear to be 2 processes using resources on the GPU? (e.g., just straight from the console). I think internally we've only used server GPUs with nothing running on them or no other resources consumed, curious if it conflicts somehow on the lower-mem GPUs with 4 GB.

@namhyungk
Copy link

@wickedfoo
Hi, sorry for late response. Here is the result from nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   52C    P8    16W / 250W |      2MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:03:00.0     Off |                  N/A |
| 22%   50C    P8    16W / 250W |      2MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

And from nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

@wag
Copy link

wag commented Apr 27, 2017

@wickedfoo Same segfault without X or any other processes running on the GPU.

@mdouze
Copy link
Contributor

mdouze commented Jun 23, 2017

Closing for now. Please re-open if the bug occurs with the current version of Faiss.

@mdouze mdouze closed this as completed Jun 23, 2017
mqnfred pushed a commit to mqnfred/faiss that referenced this issue Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants