Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Search billion vectors failed #379

Closed
1 task
0DF0Arc opened this issue Mar 26, 2018 · 5 comments
Closed
1 task

GPU Search billion vectors failed #379

0DF0Arc opened this issue Mar 26, 2018 · 5 comments
Labels

Comments

@0DF0Arc
Copy link

0DF0Arc commented Mar 26, 2018

Summary

Platform

OS: <Ubuntu 14.04->

Faiss version:
Running on :

  • GPU P40 *2

Reproduction instructions

Using a Index IVF10000, PQ24, when add 1 billion random generated data to the index and do search ,get the follow error:

WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 18446744065779875840 B, highwater 0 B)
Faiss assertion 'err == cudaSuccess' failed in char* faiss::gpu::StackDeviceMemory::Stack::getAlloc(size_t, cudaStream_t) at /root/workspace/backup/back/SimilaritySearch/src/gpu/utils/StackDeviceMemory.cpp:77; details: cudaMalloc error 2 on alloc size 18446744065770205184

with 400 million vectors, the index and code works ok

code :
faiss::Index* tmp_cpu = faiss::read_index("/home/zxin10/index08b/index_800m.index", false);
faiss::Index* gpu_index = faiss::gpu::index_cpu_to_gpu_multiple((std::vectorfaiss::gpu::GpuResources* &)gpu_memory.res_mul, devices, cpu_index, gpu_memory.options_mul
faiss::Index::idx_t *indices = new faiss::Index::idx_t[nq * k];
float *distances = new float[nq * k];
index->search(nq, query_vecs.data(), k, distances, indices);

@wickedfoo
Copy link
Contributor

Your code snippet is not complete, as it won't ocmpile. Are you enabling sharding?

400 million * 24 bytes per vector will fit on a single GPU, whereas 1 billion will not. Can 500 million fit on a single GPU? If so, then sharding should allow it to fit on 2 GPUs.

@wickedfoo
Copy link
Contributor

since your GPU appears to have 24 GB of memory, note that 18% by default is eaten up for temp scratch space in StandardGpuResources, you can reduce this to 1.5 GB or so, and also note that there are overheads.

Try aiming for 700 million or so on a single GPU and see if that works.

@0DF0Arc
Copy link
Author

0DF0Arc commented Mar 27, 2018

@wickedfoo Hi, Actually, I tried to shard 800million to 2 GPU, shard mode =1, and the gpu memory consumption is aounr 18G each after index_cpu_to_gpu_multiple, sitll same issue. Could this have something to do with the Index?

@mdouze mdouze added the GPU label Mar 27, 2018
@ZhuoranLyu
Copy link

@0DF0Arc Same issue. I try to use one p4(8GB) to index 30 million 128D vectors with PQ8. However, it fails to add all the vectors to the index. Here is the backtrace:

#0 memmove_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1848
#1 0x00007fffdff48b4f in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007fffe00f1faf in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007fffdfff563e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007fffdfff63bc in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007fffdff184c8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007fffdff199e0 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007fffe0059612 in cuMemcpyHtoDAsync_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#8 0x00007ffff247c8cc in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#9 0x00007ffff2458b5b in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#10 0x00007ffff2492b08 in cudaMemcpyAsync () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#11 0x0000000000433f73 in faiss::gpu::Tensor<float, 2, true, int, faiss::gpu::traits::DefaultPtrTraits>::copyFrom(faiss::gpu::Tensor<float, 2, true, int, faiss::gpu::traits::DefaultPtrTraits>&, CUstream_st*) ()
#12 0x0000000000432357 in faiss::gpu::DeviceTensor<float, 2, true, int, faiss::gpu::traits::DefaultPtrTraits> faiss::gpu::toDevice<float, 2>(faiss::gpu::GpuResources*, int, float*, CUstream_st*, std::initializer_list) ()
#13 0x000000000043c833 in faiss::gpu::GpuIndexIVFPQ::addImpl
(long, float const*, long const*) ()
#14 0x00000000004393aa in faiss::gpu::GpuIndex::addInternal
(long, float const*, long const*) ()
#15 0x000000000043910c in faiss::gpu::GpuIndex::add_with_ids(long, float const*, long const*) ()
#16 0x000000000040ce65 in CwAnnTopkImpl::add_with_batch_gpu (this=0x7fffffffc240,
vec_feats=0x7ff42a0e9010, feat_num=31061938, ids=0x7fff3c56c010) at CwAnnTopkImpl.cpp:219
#17 0x000000000040e02d in CwAnnTopkImpl::add_with_ids_cwfeat_gpu (this=0x7fffffffc240,
vec_feats=0x7ff42a0e9010, feat_num=31061938, feat_dim=128, ids=0x7fff3c56c010)
at CwAnnTopkImpl.cpp:559
#18 0x000000000040a060 in main (argc=1, argv=0x7fffffffe2d8) at test/test_cwimpl_testhitrate.cpp:143

Appreciate any help.

@mdouze
Copy link
Contributor

mdouze commented May 11, 2018

No activity, closing.

@mdouze mdouze closed this as completed May 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants