Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to search topk vectors among 20000 same vectors #484

Closed
1 task
ZhuoranLyu opened this issue Jun 7, 2018 · 9 comments
Closed
1 task

Fails to search topk vectors among 20000 same vectors #484

ZhuoranLyu opened this issue Jun 7, 2018 · 9 comments

Comments

@ZhuoranLyu
Copy link

ZhuoranLyu commented Jun 7, 2018

Summary

I was trying to search topk vectors among 20000 exactly same vectors. Seg fault

Platform

OS: Ubuntu 14.04

Running on :

  • GPU
    Tesla P4

Reproduction instructions

Briefly, I was trying to search in 20000 128D vectors. I used GpuIndexIVFPQ(PQ8) with a Tesla P4 with 8 GB memory. It crashes when I search for top 100 Nearest Neighbors. Here is the WARN info and the backtrace of gdb.

WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 3473344000 B, highwater 0 B)
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 3473344000 B, highwater 3473344000 B)
Faiss assertion 'err == cudaSuccess' failed in char* faiss::gpu::StackDeviceMemory::Stack::getAlloc(size_t, cudaStream_t) at utils/StackDeviceMemory.cpp:77; details: cudaMalloc error 2 on alloc size 3473344000

bt of gdb:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffdf3dff700 (LWP 8711)]
0x00007fffeea9fc37 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fffeea9fc37 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fffeeaa3028 in GI_abort () at abort.c:89
#2 0x00000000004f258b in faiss::gpu::StackDeviceMemory::Stack::getAlloc (
this=0xd145a90, size=3473344000, stream=0x36059a0)
at utils/StackDeviceMemory.cpp:75
#3 0x00000000004f2fb4 in faiss::gpu::StackDeviceMemory::getMemory (this=0xd145a80,
stream=0x36059a0, size=3473344000) at utils/StackDeviceMemory.cpp:207
#4 0x000000000046bce0 in faiss::gpu::DeviceTensor<float, 1, true, int, faiss::gpu::traits::DefaultPtrTraits>::DeviceTensor (this=0x7ffdf3dfe0e0, m=..., sizes=...,
stream=0x36059a0, space=faiss::gpu::Device)
at impl/../utils/DeviceTensor-inl.cuh:132
#5 0x0000000000464f0d in faiss::gpu::runPQScanMultiPassPrecomputed (queries=...,
precompTerm1=..., precompTerm2=..., precompTerm3=..., topQueryToCentroid=...,
useFloat16Lookup=true, bytesPerCode=8, numSubQuantizers=8,
numSubQuantizerCodes=256, listCodes=..., listIndices=...,
indicesOptions=faiss::gpu::INDICES_64_BIT, listLengths=..., maxListLength=217084,
k=100, outDistances=..., outIndices=..., res=0x3605840)
at impl/PQScanMultiPassPrecomputed.cu:488
#6 0x00000000004521e1 in faiss::gpu::IVFPQ::runPQPrecomputedCodes
(
this=0x7ffdec36e140, queries=..., coarseDistances=..., coarseIndices=..., k=100,
outDistances=..., outIndices=...) at impl/IVFPQ.cu:661
#7 0x0000000000451903 in faiss::gpu::IVFPQ::query (this=0x7ffdec36e140, queries=...,
nprobe=500, k=100, outDistances=..., outIndices=...) at impl/IVFPQ.cu:551
#8 0x000000000044b952 in faiss::gpu::GpuIndexIVFPQ::searchImpl
(this=0xd0bfb10,
n=1000, x=0x7ffdec000c40, k=100, distances=0x7ffdec204670, labels=0x7ffdec141160)
at GpuIndexIVFPQ.cu:380
#9 0x0000000000447cef in faiss::gpu::GpuIndex::search (this=0xd0bfb10, n=1000,
x=0x7ffdec000c40, k=100, distances=0x7ffdec204670, labels=0x7ffdec141160)
at GpuIndex.cu:143
#10 0x0000000000410776 in CwAnnShardImpl::search_cw_feat_unit (this=0x1bc86a0,
vec_query_feats=0x7fffa13b5010, feat_num=1000, feat_dim=130,
vec_added_feats=0xd145ce0, topk=100, res_dists=0x7ffe599de010,
res_nns=0x7ffe03c9d010) at CwAnnMTImpl.cpp:1102
#11 0x00000000004103c0 in CwAnnShardImpl::search_cw_feats_with_batch_gpu (
this=0x1bc86a0, vec_query_feats=0x7fffa13b5010, feat_num=1000, feat_dim=130,
vec_added_feats=0xd145ce0, k=100, res_dists=0x7ffe599de010, res_nns=0x7ffe03c9d010)
at CwAnnMTImpl.cpp:1026
#12 0x000000000040ebce in CwAnnShardMTImpl::__lambda9::operator() (__closure=0xd1340e0)
at CwAnnMTImpl.cpp:725
#13 0x000000000041a0dd in std::_Function_handler<void(), CwAnnShardMTImpl::search_cw_batch_unit(float const*, int, int, int, float*, long int*)::__lambda9>::_M_invoke(const std---Type to continue, or q to quit---
::_Any_data &) (__functor=...) at /usr/include/c++/4.8/functional:2071
#14 0x00000000004b3810 in std::function<void ()>::operator()() const (
this=0x7ffdf3dfecd0) at /usr/include/c++/4.8/functional:2471
#15 0x00000000004b0f49 in faiss::gpu::WorkerThread::threadLoop (this=0xd0be020)
at utils/WorkerThread.cpp:100
#16 0x00000000004b0d70 in faiss::gpu::WorkerThread::threadMain (this=0xd0be020)
at utils/WorkerThread.cpp:69
#17 0x00000000004b0a97 in faiss::gpu::WorkerThread::__lambda5::operator() (
__closure=0x35d92a0) at utils/WorkerThread.cpp:31
#18 0x00000000004b2140 in std::_Bind_simplefaiss::gpu::WorkerThread::startThread()::__lambda5()::_M_invoke<>(std::_Index_tuple<>) (this=0x35d92a0)
at /usr/include/c++/4.8/functional:1732
#19 0x00000000004b2097 in std::_Bind_simplefaiss::gpu::WorkerThread::startThread()::__lambda5()::operator()(void) (this=0x35d92a0) at /usr/include/c++/4.8/functional:1720
#20 0x00000000004b2030 in std::thread::_Impl<std::_Bind_simplefaiss::gpu::WorkerThread::startThread()::__lambda5() >::_M_run(void) (this=0x35d9288)
at /usr/include/c++/4.8/thread:115
#21 0x00007fffef60ea60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#22 0x00007ffff2237184 in start_thread (arg=0x7ffdf3dff700) at pthread_create.c:312
#23 0x00007fffeeb6703d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Appreciate any help.

@mdouze
Copy link
Contributor

mdouze commented Jun 8, 2018

Hi
This is a corner case, but we may want to look into it.
Could you post a minimal code that reproduces it, preferably in C++ ?

@ZhuoranLyu
Copy link
Author

@mdouze I may not be able to provide the entire code. However, the main procedure is almost the same as this demo except that I use the same number instead of drand48() in the searching database:

https://github.com/facebookresearch/faiss/blob/1fe2872013685092d697f08a2a48e110acd25b2b/gpu/test/demo_ivfpq_indexing_gpu.cpp

I use 100000 128D vectors in the searching database with PQ8. One 8GB P4 gpu.

Thank you very much!

@shutcode
Copy link

it seems your gpu memory is exhausted or temp memory not set

@wickedfoo
Copy link
Contributor

Disable precomputed codes on the index; it appears that you do not have enough memory to use precomputed codes.

@wickedfoo
Copy link
Contributor

Also, how many coarse IVF centroids does your index use?

@ZhuoranLyu
Copy link
Author

@shutcode Thanks, but I do set temp mem to 18%. And you are right, the gpu mem is exhausted but I do not think it should be exhausted. I only try to search 100000 vectors with a P4 GPU with 8GB memory. I assume the memory should be enough to search 100000 vectors.

@ZhuoranLyu
Copy link
Author

@wickedfoo Hi wickedfoo, thank you for your reply. I have tried to disable and enable the precomputed codes but it does not make a difference. I have tried 100, 500 and 1000 coarse centroids but all of them do not seem to work.

@ZhuoranLyu
Copy link
Author

The problem is that I was wondering why it asks for so much memory since I only try to search for 100000 vectors. If they are not exactly the same (add a random number on the vector), it only needs a little memory. However, if they are exactly the same vectors, they require so much memory.

@mdouze
Copy link
Contributor

mdouze commented Aug 28, 2018

No activity, closing.

@mdouze mdouze closed this as completed Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants