-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to search topk vectors among 20000 same vectors #484
Comments
Hi |
@mdouze I may not be able to provide the entire code. However, the main procedure is almost the same as this demo except that I use the same number instead of drand48() in the searching database: I use 100000 128D vectors in the searching database with PQ8. One 8GB P4 gpu. Thank you very much! |
it seems your gpu memory is exhausted or temp memory not set |
Disable precomputed codes on the index; it appears that you do not have enough memory to use precomputed codes. |
Also, how many coarse IVF centroids does your index use? |
@shutcode Thanks, but I do set temp mem to 18%. And you are right, the gpu mem is exhausted but I do not think it should be exhausted. I only try to search 100000 vectors with a P4 GPU with 8GB memory. I assume the memory should be enough to search 100000 vectors. |
@wickedfoo Hi wickedfoo, thank you for your reply. I have tried to disable and enable the precomputed codes but it does not make a difference. I have tried 100, 500 and 1000 coarse centroids but all of them do not seem to work. |
The problem is that I was wondering why it asks for so much memory since I only try to search for 100000 vectors. If they are not exactly the same (add a random number on the vector), it only needs a little memory. However, if they are exactly the same vectors, they require so much memory. |
No activity, closing. |
Summary
I was trying to search topk vectors among 20000 exactly same vectors. Seg fault
Platform
OS: Ubuntu 14.04
Running on :
Tesla P4
Reproduction instructions
Briefly, I was trying to search in 20000 128D vectors. I used GpuIndexIVFPQ(PQ8) with a Tesla P4 with 8 GB memory. It crashes when I search for top 100 Nearest Neighbors. Here is the WARN info and the backtrace of gdb.
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 3473344000 B, highwater 0 B)
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 3473344000 B, highwater 3473344000 B)
Faiss assertion 'err == cudaSuccess' failed in char* faiss::gpu::StackDeviceMemory::Stack::getAlloc(size_t, cudaStream_t) at utils/StackDeviceMemory.cpp:77; details: cudaMalloc error 2 on alloc size 3473344000
bt of gdb:
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffdf3dff700 (LWP 8711)]
0x00007fffeea9fc37 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fffeea9fc37 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fffeeaa3028 in GI_abort () at abort.c:89
#2 0x00000000004f258b in faiss::gpu::StackDeviceMemory::Stack::getAlloc (
this=0xd145a90, size=3473344000, stream=0x36059a0)
at utils/StackDeviceMemory.cpp:75
#3 0x00000000004f2fb4 in faiss::gpu::StackDeviceMemory::getMemory (this=0xd145a80,
stream=0x36059a0, size=3473344000) at utils/StackDeviceMemory.cpp:207
#4 0x000000000046bce0 in faiss::gpu::DeviceTensor<float, 1, true, int, faiss::gpu::traits::DefaultPtrTraits>::DeviceTensor (this=0x7ffdf3dfe0e0, m=..., sizes=...,
stream=0x36059a0, space=faiss::gpu::Device)
at impl/../utils/DeviceTensor-inl.cuh:132
#5 0x0000000000464f0d in faiss::gpu::runPQScanMultiPassPrecomputed (queries=...,
precompTerm1=..., precompTerm2=..., precompTerm3=..., topQueryToCentroid=...,
useFloat16Lookup=true, bytesPerCode=8, numSubQuantizers=8,
numSubQuantizerCodes=256, listCodes=..., listIndices=...,
indicesOptions=faiss::gpu::INDICES_64_BIT, listLengths=..., maxListLength=217084,
k=100, outDistances=..., outIndices=..., res=0x3605840)
at impl/PQScanMultiPassPrecomputed.cu:488
#6 0x00000000004521e1 in faiss::gpu::IVFPQ::runPQPrecomputedCodes (
this=0x7ffdec36e140, queries=..., coarseDistances=..., coarseIndices=..., k=100,
outDistances=..., outIndices=...) at impl/IVFPQ.cu:661
#7 0x0000000000451903 in faiss::gpu::IVFPQ::query (this=0x7ffdec36e140, queries=...,
nprobe=500, k=100, outDistances=..., outIndices=...) at impl/IVFPQ.cu:551
#8 0x000000000044b952 in faiss::gpu::GpuIndexIVFPQ::searchImpl (this=0xd0bfb10,
n=1000, x=0x7ffdec000c40, k=100, distances=0x7ffdec204670, labels=0x7ffdec141160)
at GpuIndexIVFPQ.cu:380
#9 0x0000000000447cef in faiss::gpu::GpuIndex::search (this=0xd0bfb10, n=1000,
x=0x7ffdec000c40, k=100, distances=0x7ffdec204670, labels=0x7ffdec141160)
at GpuIndex.cu:143
#10 0x0000000000410776 in CwAnnShardImpl::search_cw_feat_unit (this=0x1bc86a0,
vec_query_feats=0x7fffa13b5010, feat_num=1000, feat_dim=130,
vec_added_feats=0xd145ce0, topk=100, res_dists=0x7ffe599de010,
res_nns=0x7ffe03c9d010) at CwAnnMTImpl.cpp:1102
#11 0x00000000004103c0 in CwAnnShardImpl::search_cw_feats_with_batch_gpu (
this=0x1bc86a0, vec_query_feats=0x7fffa13b5010, feat_num=1000, feat_dim=130,
vec_added_feats=0xd145ce0, k=100, res_dists=0x7ffe599de010, res_nns=0x7ffe03c9d010)
at CwAnnMTImpl.cpp:1026
#12 0x000000000040ebce in CwAnnShardMTImpl::__lambda9::operator() (__closure=0xd1340e0)
at CwAnnMTImpl.cpp:725
#13 0x000000000041a0dd in std::_Function_handler<void(), CwAnnShardMTImpl::search_cw_batch_unit(float const*, int, int, int, float*, long int*)::__lambda9>::_M_invoke(const std---Type to continue, or q to quit---
::_Any_data &) (__functor=...) at /usr/include/c++/4.8/functional:2071
#14 0x00000000004b3810 in std::function<void ()>::operator()() const (
this=0x7ffdf3dfecd0) at /usr/include/c++/4.8/functional:2471
#15 0x00000000004b0f49 in faiss::gpu::WorkerThread::threadLoop (this=0xd0be020)
at utils/WorkerThread.cpp:100
#16 0x00000000004b0d70 in faiss::gpu::WorkerThread::threadMain (this=0xd0be020)
at utils/WorkerThread.cpp:69
#17 0x00000000004b0a97 in faiss::gpu::WorkerThread::__lambda5::operator() (
__closure=0x35d92a0) at utils/WorkerThread.cpp:31
#18 0x00000000004b2140 in std::_Bind_simplefaiss::gpu::WorkerThread::startThread()::__lambda5()::_M_invoke<>(std::_Index_tuple<>) (this=0x35d92a0)
at /usr/include/c++/4.8/functional:1732
#19 0x00000000004b2097 in std::_Bind_simplefaiss::gpu::WorkerThread::startThread()::__lambda5()::operator()(void) (this=0x35d92a0) at /usr/include/c++/4.8/functional:1720
#20 0x00000000004b2030 in std::thread::_Impl<std::_Bind_simplefaiss::gpu::WorkerThread::startThread()::__lambda5() >::_M_run(void) (this=0x35d9288)
at /usr/include/c++/4.8/thread:115
#21 0x00007fffef60ea60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#22 0x00007ffff2237184 in start_thread (arg=0x7ffdf3dff700) at pthread_create.c:312
#23 0x00007fffeeb6703d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Appreciate any help.
The text was updated successfully, but these errors were encountered: