running demo_ivfpq_indexing_gpu Segmentation fault #67

Geek0x0 · 2017-04-05T02:30:13Z

1. The results are as follows:

[0.382 s] Generating 100000 vectors in 128D for training
[0.540 s] Training the index
Training IVF quantizer on 100000 vectors in 128D
Clustering 100000 points in 128D to 1788 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.032731 s
  Iteration 0 (0.12 s, search 0.09 s): objective=1.43954e+06 imbalance=2.907 nsplit=0       Iteration 9 (3.68 s, search 3.61 s): objective=930934 imbalance=1.255 nsplit=0       
computing residuals
training 4 x 256 product quantizer on 16384 vectors in 128D
Training PQ slice 0/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00141504 s
  Iteration 24 (2.73 s, search 2.45 s): objective=27271.5 imbalance=1.018 nsplit=0       
Training PQ slice 1/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.000438965 s
  Iteration 24 (2.37 s, search 2.06 s): objective=27193.4 imbalance=1.016 nsplit=0       
Training PQ slice 2/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.000931885 s
  Iteration 24 (2.59 s, search 2.25 s): objective=27230.8 imbalance=1.021 nsplit=0       
Training PQ slice 3/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.000437988 s
  Iteration 24 (1.96 s, search 1.78 s): objective=27174 imbalance=1.023 nsplit=0         
[14.164 s] storing the pre-trained index to /tmp/index_trained.faissindex
[14.186 s] Building a dataset of 200000 vectors to index
[14.506 s] Adding the vectors to the index
Segmentation fault (core dumped)

2. Library dependency:

        linux-vdso.so.1 =>  (0x00007ffc876da000)
	libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007f1f42cfb000)
	libcublas.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0 (0x00007f1f40263000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1f4005a000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1f3fe3d000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1f3fc39000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1f3f8b6000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1f3f5ad000)
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f1f3f38b000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1f3f174000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1f3edab000)
	/lib64/ld-linux-x86-64.so.2 (0x0000564584980000)
	libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f1f3ea80000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f1f3e840000)

3. Heap information:

  #0  0x00007fffe9828c9a in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
  #1  0x00007fffe974f696 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
  #2  0x00007fffe9884992 in cuEventDestroy_v2 ()
   from /usr/lib/x86_64-linux-gnu/libcuda.so.1
  #3  0x00000000004f62f4 in cudart::cudaApiEventDestroy(CUevent_st*) ()
  #4  0x0000000000524b94 in cudaEventDestroy ()
  #5  0x0000000000440af8 in faiss::gpu::streamWaitBase<std::vector<CUstream_st*, std::allocator<CUstream_st*> >, std::initializer_list<CUstream_st*> > (
      listWaiting=std::vector of length 2, capacity 2 = {...}, listWaitOn=...)
      at impl/../utils/DeviceUtils.h:131
  #6  0x0000000000479294 in faiss::gpu::streamWait<std::vector<CUstream_st*,   std::allocator<CUstream_st*> > > (b=..., a=std::vector of length 2, capacity 2 = {...})
      at impl/../utils/DeviceUtils.h:140
  #7  faiss::gpu::runL2Distance<float> (resources=0x7fffffffe780, centroids=..., 
    centroidNorms=centroidNorms@entry=0xb9e1a30, queries=..., k=k@entry=1, 
    outDistances=..., outIndices=..., ignoreOutDistances=true, tileSize=256)
    at impl/Distance.cu:110
  #8  0x000000000047032e in faiss::gpu::runL2Distance (resources=<optimized out>, 
    vectors=..., vectorNorms=vectorNorms@entry=0xb9e1a30, queries=..., k=k@entry=1, 
    outDistances=..., outIndices=..., ignoreOutDistances=<optimized out>, tileSize=-1)
    at impl/Distance.cu:307
  #9  0x000000000042e574 in faiss::gpu::FlatIndex::query (this=0xb9e1970, vecs=..., 
    k=k@entry=1, outDistances=..., outIndices=..., 
    exactDistance=exactDistance@entry=false, tileSize=-1) at impl/FlatIndex.cu:121
  #10 0x00000000004432c3 in faiss::gpu::IVFPQ::classifyAndAddVectors (this=0xb9e2d80, 
    vecs=..., indices=...) at impl/IVFPQ.cu:138
  #11 0x0000000000424fd0 in faiss::gpu::GpuIndexIVFPQ::add_with_ids (this=0x7fffffffe8d0, 
    n=200000, x=0x7fffcf580010, xids=0xc0ab390) at GpuIndexIVFPQ.cu:355
  #12 0x000000000042092b in faiss::gpu::GpuIndexIVF::add (this=0x7fffffffe8d0, n=200000, 
    x=0x7fffcf580010) at GpuIndexIVF.cu:254
  #13 0x000000000040e8bf in main () at test/demo_ivfpq_indexing_gpu.cpp:114

4. Hardware information

        01:00.0 3D controller: NVIDIA Corporation GM206M [GeForce GTX 965M] (rev a1)
	DeviceName: NVIDIA N16E-GR
	Subsystem: Hewlett-Packard Company GM206M [GeForce GTX 965M]
	Flags: bus master, fast devsel, latency 0, IRQ 134
	Memory at a3000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 90000000 (64-bit, prefetchable) [size=256M]
	Memory at a0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 4000 [size=128]
	[virtual] Expansion ROM at a4000000 [disabled] [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_375_drm, nvidia_375

The text was updated successfully, but these errors were encountered:

mdouze · 2017-04-05T09:46:29Z

Hi @caydyn-skd

Thanks for the extensive bug report. I believe this is linked to the low-mem issue #66 mentioned here

#66 (comment)

Please stay tuned until we have a fix.

mdouze · 2017-04-06T11:52:22Z

Could you try with the current version? It has better low-mem GPU support.

namhyungk · 2017-04-06T22:29:16Z

I also have an error running demo_ivfpq_indexing_gpu as below. The most recent version is used and it is running on TITAN X (Maxwell) with 12 GB of memory.

$ ./test/demo_ivfpq_indexing_gpu
[0.561 s] Generating 100000 vectors in 128D for training
[0.699 s] Training the index
Training IVF quantizer on 100000 vectors in 128D
Clustering 100000 points in 128D to 1788 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.01 s
  Iteration 9 (0.34 s, search 0.26 s): objective=930934 imbalance=1.255 nsplit=0
computing residuals
training 4 x 256 product quantizer on 16384 vectors in 128D
Training PQ slice 0/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (1.89 s, search 1.52 s): objective=27271.5 imbalance=1.018 nsplit=0
Training PQ slice 1/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (1.99 s, search 1.62 s): objective=27193.4 imbalance=1.016 nsplit=0
Training PQ slice 2/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (1.97 s, search 1.60 s): objective=27230.8 imbalance=1.021 nsplit=0
Training PQ slice 3/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (1.49 s, search 1.20 s): objective=27174 imbalance=1.023 nsplit=0
[8.526 s] storing the pre-trained index to /tmp/index_trained.faissindex
[8.573 s] Building a dataset of 200000 vectors to index
[8.841 s] Adding the vectors to the index
Faiss assertion err == CUBLAS_STATUS_SUCCESS failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with T = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at utils/MatrixMult.cu:141Aborted (core dumped)

Geek0x0 · 2017-04-07T02:39:26Z

Hi @mdouze I tried running the latest version, but still have the same problem

caydyn@dev:/home/caydyn/faiss$ git log -1
commit 7abe81b4f6abad56731ec1c27968173c8ce0d322
Author: matthijs <matthijs@fb.com>
Date:   Thu Apr 6 04:33:41 2017 -0700

    Better support for low-mem GPUs
    avoid reading beyond the end of an array in fvec_L2sqr and related functions

#0  0x00007fffe9828c9a in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1  0x00007fffe974f696 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007fffe9884992 in cuEventDestroy_v2 ()
   from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00000000004f9124 in cudart::cudaApiEventDestroy(CUevent_st*) ()
#4  0x00000000005279c4 in cudaEventDestroy ()
#5  0x0000000000442458 in faiss::gpu::streamWaitBase<std::vector<CUstream_st*, std::allocator<CUstream_st*> >, std::initializer_list<CUstream_st*> > (
    listWaiting=std::vector of length 2, capacity 2 = {...}, listWaitOn=...)
    at impl/../utils/DeviceUtils.h:131
#6  0x000000000047a906 in faiss::gpu::streamWait<std::vector<CUstream_st*, std::allocator<CUstream_st*> > > (b=..., a=std::vector of length 2, capacity 2 = {...})
    at impl/../utils/DeviceUtils.h:140
#7  faiss::gpu::runL2Distance<float> (resources=0x7fffffffe770, centroids=..., 
    centroidsTransposed=0x0, centroidNorms=centroidNorms@entry=0xb9e8570, queries=..., 
    k=k@entry=1, outDistances=..., outIndices=..., ignoreOutDistances=true, 
    tileSizeOverride=-1) at impl/Distance.cu:145
#8  0x0000000000471aee in faiss::gpu::runL2Distance (resources=<optimized out>, 
    vectors=..., vectorsTransposed=<optimized out>, 
    vectorNorms=vectorNorms@entry=0xb9e8570, queries=..., k=k@entry=1, outDistances=..., 
    outIndices=..., ignoreOutDistances=<optimized out>, tileSizeOverride=-1)
    at impl/Distance.cu:349
#9  0x000000000042f402 in faiss::gpu::FlatIndex::query (this=0xb9e8420, input=..., 
    k=k@entry=1, outDistances=..., outIndices=..., 
    exactDistance=exactDistance@entry=false, tileSize=-1) at impl/FlatIndex.cu:124
#10 0x0000000000444c23 in faiss::gpu::IVFPQ::classifyAndAddVectors (this=0xc37f3c0, 
    vecs=..., indices=...) at impl/IVFPQ.cu:138
#11 0x0000000000425abb in faiss::gpu::GpuIndexIVFPQ::addImpl_ (this=0x7fffffffe8c0, 
    n=200000, x=<optimized out>, xids=<optimized out>) at GpuIndexIVFPQ.cu:352
#12 0x0000000000417424 in faiss::gpu::GpuIndex::addInternal_ (this=0x7fffffffe8c0, 
    n=200000, x=0x7fffcfd81010, ids=0xbae4260) at GpuIndex.cu:74
#13 0x000000000042134b in faiss::gpu::GpuIndexIVF::add (this=0x7fffffffe8c0, n=200000, 
    x=0x7fffcfd81010) at GpuIndexIVF.cu:259
#14 0x000000000040ea8f in main () at test/demo_ivfpq_indexing_gpu.cpp:114

(gdb) f 14
#14 0x000000000040ea8f in main () at test/demo_ivfpq_indexing_gpu.cpp:114
114	        index.add (nb, database.data());
(gdb) l
109	        }
110	
111	        printf ("[%.3f s] Adding the vectors to the index\n",
112	                elapsed() - t0);
113	
114	        index.add (nb, database.data());
115	
116	        printf ("[%.3f s] done\n", elapsed() - t0);
117	
118	        // remember a few elements from the database as queries

[0.379 s] Generating 100000 vectors in 128D for training
[0.526 s] Training the index
Training IVF quantizer on 100000 vectors in 128D
Clustering 100000 points in 128D to 1788 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.03 s
  Iteration 0 (0.15 s, search 0.12 s): objective=1.43954e+06 imbalance=2.907 nsplit=0       Iteration 9 (3.68 s, search 3.59 s): objective=930934 imbalance=1.255 nsplit=0       
computing residuals
training 4 x 256 product quantizer on 16384 vectors in 128D
Training PQ slice 0/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (3.20 s, search 2.67 s): objective=27271.5 imbalance=1.018 nsplit=0       
Training PQ slice 1/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (2.95 s, search 2.49 s): objective=27193.4 imbalance=1.016 nsplit=0       
Training PQ slice 2/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (3.24 s, search 2.75 s): objective=27230.8 imbalance=1.021 nsplit=0       
Training PQ slice 3/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (2.43 s, search 2.15 s): objective=27174 imbalance=1.023 nsplit=0         
[16.307 s] storing the pre-trained index to /tmp/index_trained.faissindex
[16.353 s] Building a dataset of 200000 vectors to index
[16.649 s] Adding the vectors to the index
Segmentation fault (core dumped)

linux-vdso.so.1 =>  (0x00007fff637dd000)
libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007f0182d83000)
libcublas.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0 (0x00007f01802eb000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f01800e2000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f017fec5000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f017fcc1000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f017f93e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f017f635000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f017f413000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f017f1fc000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f017ee33000)
/lib64/ld-linux-x86-64.so.2 (0x00005628e2e86000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f017eb08000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f017e8c8000)

wickedfoo · 2017-04-19T23:21:58Z

@eduj36 Can you run nvidia-smi and copy the output here? What version is your driver? Does it match your CUDA SDK version (8.0)?

@caydyn-skd can you run nvidia-smi and copy the output here as well?

wag · 2017-04-24T16:57:21Z

I'm experiencing the same issue with a GTX 970 (4GB) on the latest version, 2816831

dev:~/build/faiss/gpu/test$ gdb ./demo_ivfpq_indexing_gpu
[...]
[0.365 s] Generating 100000 vectors in 128D for training
[0.483 s] Training the index
Training IVF quantizer on 100000 vectors in 128D
Clustering 100000 points in 128D to 1788 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.03 s
  Iteration 9 (0.43 s, search 0.37 s): objective=930934 imbalance=1.255 nsplit=0            
computing residuals
training 4 x 256 product quantizer on 16384 vectors in 128D
Training PQ slice 0/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (4.50 s, search 3.79 s): objective=27271.5 imbalance=1.018 nsplit=0       
Training PQ slice 1/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (4.59 s, search 3.97 s): objective=27193.4 imbalance=1.016 nsplit=0       
Training PQ slice 2/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (4.72 s, search 4.07 s): objective=27230.8 imbalance=1.021 nsplit=0       
Training PQ slice 3/4
Clustering 16384 points in 32D to 256 clusters, redo 1 times, 25 iterations
  Preprocessing in 0.00 s
  Iteration 24 (4.60 s, search 3.93 s): objective=27174 imbalance=1.023 nsplit=0         
[19.516 s] storing the pre-trained index to /tmp/index_trained.faissindex
[19.550 s] Building a dataset of 200000 vectors to index
[19.785 s] Adding the vectors to the index

Thread 1 "demo_ivfpq_inde" received signal SIGSEGV, Segmentation fault.
0x00007fffe1825caa in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) bt
#0  0x00007fffe1825caa in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1  0x00007fffe174c696 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007fffe1881962 in cuEventDestroy_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00000000004f9124 in cudart::cudaApiEventDestroy(CUevent_st*) ()
#4  0x00000000005279c4 in cudaEventDestroy ()
#5  0x0000000000442458 in faiss::gpu::streamWaitBase<std::vector<CUstream_st*, std::allocator<CUstream_st*> >, std::initializer_list<CUstream_st*> > (
    listWaiting=std::vector of length 2, capacity 2 = {...}, listWaitOn=...) at impl/../utils/DeviceUtils.h:131
#6  0x000000000047a906 in faiss::gpu::streamWait<std::vector<CUstream_st*, std::allocator<CUstream_st*> > > (b=..., a=std::vector of length 2, capacity 2 = {...})
    at impl/../utils/DeviceUtils.h:140
#7  faiss::gpu::runL2Distance<float> (resources=0x7fffffffd970, centroids=..., centroidsTransposed=0x0, centroidNorms=centroidNorms@entry=0xb523740, queries=..., k=k@entry=1, 
    outDistances=..., outIndices=..., ignoreOutDistances=true, tileSizeOverride=-1) at impl/Distance.cu:145
#8  0x0000000000471aee in faiss::gpu::runL2Distance (resources=<optimized out>, vectors=..., vectorsTransposed=<optimized out>, vectorNorms=vectorNorms@entry=0xb523740, queries=..., 
    k=k@entry=1, outDistances=..., outIndices=..., ignoreOutDistances=<optimized out>, tileSizeOverride=-1) at impl/Distance.cu:349
#9  0x000000000042f402 in faiss::gpu::FlatIndex::query (this=0xb5235f0, input=..., k=k@entry=1, outDistances=..., outIndices=..., exactDistance=exactDistance@entry=false, tileSize=-1)
    at impl/FlatIndex.cu:124
#10 0x0000000000444c23 in faiss::gpu::IVFPQ::classifyAndAddVectors (this=0xbebae20, vecs=..., indices=...) at impl/IVFPQ.cu:138
#11 0x0000000000425abb in faiss::gpu::GpuIndexIVFPQ::addImpl_ (this=0x7fffffffdac0, n=200000, x=<optimized out>, xids=<optimized out>) at GpuIndexIVFPQ.cu:352
#12 0x0000000000417424 in faiss::gpu::GpuIndex::addInternal_ (this=0x7fffffffdac0, n=200000, x=0x7fffc364f010, ids=0xb61f440) at GpuIndex.cu:74
#13 0x000000000042134b in faiss::gpu::GpuIndexIVF::add (this=0x7fffffffdac0, n=200000, x=0x7fffc364f010) at GpuIndexIVF.cu:259
#14 0x000000000040ea8f in main ()

dev:~/build/faiss/gpu/test$ ldd demo_ivfpq_indexing_gpu
	linux-vdso.so.1 =>  (0x00007fff112e6000)
	libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007fb9fb90a000)
	libcublas.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0 (0x00007fb9f8e72000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb9f8c69000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb9f8a4c000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb9f8848000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb9f84c5000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb9f81bc000)
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fb9f7f9a000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb9f7d83000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb9f79ba000)
	/lib64/ld-linux-x86-64.so.2 (0x000055c5d2d2d000)
	libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fb9f768f000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fb9f744f000)

dev:~$ lspci -vvv -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Elitegroup Computer Systems GM204 [GeForce GTX 970]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 319
	Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at d0000000 (64-bit, prefetchable) [size=32M]
	Region 5: I/O ports at e000 [size=128]
	[virtual] Expansion ROM at df000000 [disabled] [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_375_drm, nvidia_375

dev:~$ nvidia-smi 
Mon Apr 24 09:41:45 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970     Off  | 0000:01:00.0      On |                  N/A |
| 35%   30C    P8    19W / 151W |    872MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1986    G   /usr/lib/xorg/Xorg                             658MiB |
|    0      4119    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   212MiB |
+-----------------------------------------------------------------------------+

wag · 2017-04-26T12:36:43Z

Just tried the same on a system with 2 GTX 1070 (8GB each) without any problems.

wickedfoo · 2017-04-26T14:51:17Z

@wag Can you try it on the GTX 970 without running other processes (like X) on the GPU, since there appear to be 2 processes using resources on the GPU? (e.g., just straight from the console). I think internally we've only used server GPUs with nothing running on them or no other resources consumed, curious if it conflicts somehow on the lower-mem GPUs with 4 GB.

namhyungk · 2017-04-26T17:01:44Z

@wickedfoo
Hi, sorry for late response. Here is the result from nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   52C    P8    16W / 250W |      2MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:03:00.0     Off |                  N/A |
| 22%   50C    P8    16W / 250W |      2MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

And from nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

wag · 2017-04-27T07:00:56Z

@wickedfoo Same segfault without X or any other processes running on the GPU.

mdouze · 2017-06-23T07:32:00Z

Closing for now. Please re-open if the bug occurs with the current version of Faiss.

Support static link as a feature

mdouze added the duplicate label Apr 5, 2017

kisow mentioned this issue Apr 7, 2017

benchs/bench_gpu_sift1m.py core dumped #66

Closed

mdouze added the bug label Apr 7, 2017

mdouze closed this as completed Jun 23, 2017

mqnfred pushed a commit to mqnfred/faiss that referenced this issue Oct 23, 2023

Merge pull request facebookresearch#67 from SkyFan2002/master

c2838b8

Support static link as a feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running demo_ivfpq_indexing_gpu Segmentation fault #67

running demo_ivfpq_indexing_gpu Segmentation fault #67

Geek0x0 commented Apr 5, 2017 •

edited

mdouze commented Apr 5, 2017

mdouze commented Apr 6, 2017

namhyungk commented Apr 6, 2017

Geek0x0 commented Apr 7, 2017

wickedfoo commented Apr 19, 2017

wag commented Apr 24, 2017

wag commented Apr 26, 2017

wickedfoo commented Apr 26, 2017 •

edited

namhyungk commented Apr 26, 2017

wag commented Apr 27, 2017

mdouze commented Jun 23, 2017

running demo_ivfpq_indexing_gpu Segmentation fault #67

running demo_ivfpq_indexing_gpu Segmentation fault #67

Comments

Geek0x0 commented Apr 5, 2017 • edited

mdouze commented Apr 5, 2017

mdouze commented Apr 6, 2017

namhyungk commented Apr 6, 2017

Geek0x0 commented Apr 7, 2017

wickedfoo commented Apr 19, 2017

wag commented Apr 24, 2017

wag commented Apr 26, 2017

wickedfoo commented Apr 26, 2017 • edited

namhyungk commented Apr 26, 2017

wag commented Apr 27, 2017

mdouze commented Jun 23, 2017

Geek0x0 commented Apr 5, 2017 •

edited

wickedfoo commented Apr 26, 2017 •

edited