Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search failed with flat index #1901

Closed
del-zhenwu opened this issue Apr 11, 2020 · 5 comments
Closed

Search failed with flat index #1901

del-zhenwu opened this issue Apr 11, 2020 · 5 comments
Assignees
Labels
kind/bug Issues or changes related a bug priority/urgent Must be staffed and worked on either currently, or very soon, ideally in time for the next release. severity/major Major, major function doesn't work under some condition.
Projects

Comments

@del-zhenwu
Copy link
Contributor

del-zhenwu commented Apr 11, 2020

Describe the bug

09:49:47  2020-04-11:01:49:47,135 INFO     [k8s_runner.py:349] Table: sift_128_euclidean, row count: 1000000
09:49:47  2020-04-11:01:49:47,141 DEBUG    [k8s_runner.py:355] Building index with param: {"nlist": 16384}
09:49:47  2020-04-11:01:49:47,141 INFO     [client.py:123] Building index start, collection_name: sift_128_euclidean, index_type: FLAT
09:49:47  2020-04-11:01:49:47,142 INFO     [client.py:125] {'nlist': 16384}
09:49:49  2020-04-11:01:49:49,169 INFO     [client.py:34] Milvus create_index run in 2.03s
09:49:49  2020-04-11:01:49:49,173 INFO     [k8s_runner.py:357] {'index_type': 'flat', 'index_param': {'nlist': 16384}}
09:49:49  2020-04-11:01:49:49,173 INFO     [k8s_runner.py:358] Start preload collection: sift_128_euclidean
09:49:51  2020-04-11:01:49:50,604 INFO     [client.py:34] Milvus preload_collection run in 1.43s
09:49:51  2020-04-11:01:49:50,604 DEBUG    [k8s_runner.py:364] {'index_type': 'flat', 'index_param': {'nlist': 16384}}
09:49:51  2020-04-11:01:49:50,636 DEBUG    [k8s_runner.py:374] {'nq': 10000, 'topk': 10, 'search_param': {'nprobe': 1}}
09:49:52  2020-04-11:01:49:52,320 INFO     [client.py:34] Milvus query run in 1.65s
09:49:52  2020-04-11:01:49:52,348 INFO     [k8s_runner.py:381] Query ann_accuracy: 0.999
09:49:52  2020-04-11:01:49:52,350 INFO     [client.py:209] Server command: mode, result: GPU
09:49:52  2020-04-11:01:49:52,351 INFO     [client.py:209] Server command: build_commit_id, result: c8a59b273c90f31012bac02c7df6b131f8015488
09:49:52  /usr/local/lib/python3.6/site-packages/pymongo/topology.py:155: UserWarning: MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#is-pymongo-fork-safe
09:49:52    "MongoClient opened before fork. Create MongoClient only "
09:49:52  2020-04-11:01:49:52,363 DEBUG    [k8s_runner.py:374] {'nq': 10000, 'topk': 10, 'search_param': {'nprobe': 2}}
09:51:43  <_MultiThreadedRendezvous of RPC that terminated with:
09:51:43  	status = StatusCode.UNAVAILABLE
09:51:43  	details = "Socket closed"
09:51:43  	debug_error_string = "{"created":"@1586569891.594461646","description":"Error received from peer ipv4:10.34.0.2:19530","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket closed","grpc_status":14}"
09:51:43  >
09:51:43  2020-04-11:01:51:31,600 ERROR    [grpc_handler.py:919] <_MultiThreadedRendezvous of RPC that terminated with:
09:51:43  	status = StatusCode.UNAVAILABLE
09:51:43  	details = "Socket closed"
09:51:43  	debug_error_string = "{"created":"@1586569891.594461646","description":"Error received from peer ipv4:10.34.0.2:19530","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket closed","grpc_status":14}"
09:51:43  >
09:51:43  2020-04-11:01:51:31,600 ERROR    [client.py:72] Error occurred: Socket closed
09:51:43  2020-04-11:01:51:31,600 ERROR    [main.py:69] Status not ok
09:51:43  2020-04-11:01:51:31,601 ERROR    [main.py:70] Traceback (most recent call last):
09:51:43    File "main.py", line 67, in queue_worker
09:51:43      runner.run(run_type, collection)
09:51:43    File "/home/jenkins/agent/workspace/milvus-debug/milvus_benchmark/k8s_runner.py", line 376, in run
09:51:43      result = milvus_instance.query(query_vectors.tolist(), top_k, search_param=search_param)
09:51:43    File "/home/jenkins/agent/workspace/milvus-debug/milvus_benchmark/client.py", line 32, in wrapper
09:51:43      result = func(*args, **kwargs)
09:51:43    File "/home/jenkins/agent/workspace/milvus-debug/milvus_benchmark/client.py", line 146, in query
09:51:43      self.check_status(status)
09:51:43    File "/home/jenkins/agent/workspace/milvus-debug/milvus_benchmark/client.py", line 73, in check_status
09:51:43      raise Exception("Status not ok")
09:51:43  Exception: Status not ok
09:51:43  
09:51:43  2020-04-11:01:51:31,601 DEBUG    [k8s_runner.py:63] benchmark-test-zjrsnvun
09:51:43  Error: uninstall: Release not loaded: benchmark-test-gzelwvgk: release: not found
09:51:43  2020-04-11:01:51:31,657 DEBUG    [utils.py:259] helm uninstall -n milvus benchmark-test-zjrsnvun
09:51:43  release "benchmark-test-zjrsnvun" uninstalled
09:51:43  2020-04-11:01:51:31,994 DEBUG    [main.py:75] All task finished in queue: eros

Steps/Code to reproduce behavior

Expected behavior
A clear and concise description of what you expected to happen.

Environment details
GPU-version
commit: c8a59b

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@del-zhenwu
Copy link
Contributor Author

  1. insert sift-1m
  2. build index with flat
  3. preload, and search with {'nq': 10000, 'topk': 10, 'search_param': {'nprobe': 1}}

@tinkerlin tinkerlin added the kind/bug Issues or changes related a bug label Apr 11, 2020
@tinkerlin tinkerlin added this to To do in 0.8.0 via automation Apr 11, 2020
@tinkerlin tinkerlin added priority/urgent Must be staffed and worked on either currently, or very soon, ideally in time for the next release. severity/major Major, major function doesn't work under some condition. labels Apr 11, 2020
@del-zhenwu
Copy link
Contributor Author

09:49:23  2020-04-11:01:49:23,90 INFO     [client.py:209] Server command: get_config *, result: {"cache_config":{"cache_insert_data":"false","cpu_cache_capacity":"16","cpu_cache_threshold":"0.7","insert_buffer_size":"1"},"db_config":{"archive_days_threshold":"0","archive_disk_threshold":"0","auto_flush_interval":"1","backend_url":"sqlite://:@:/","preload_table":""},"engine_config":{"gpu_search_threshold":"1","omp_thread_num":"0","use_avx512":"true","use_blas_threshold":"1100"},"gpu_resource_config":{"build_index_resources":"gpu0,gpu1","cache_capacity":"4","cache_threshold":"0.7","enable":"true","search_resources":"gpu0,gpu1"},"metric_config":{"address":"192.168.1.237","enable_monitor":"true","port":"9091"},"server_config":{"address":"0.0.0.0","deploy_mode":"single","port":"19530","time_zone":"UTC+8","web_port":"19121"},"storage_config":{"primary_path":"/var/lib/milvus/data","s3_access_key":"minioadmin","s3_address":"127.0.0.1","s3_bucket":"milvus-bucket","s3_enable":"false","s3_port":"9000","s3_secret_key":"minioadmin","secondary_path":""},"tracing_config":{"json_config_path":""},"wal_config":{"buffer_size":"128","enable":"true","recovery_error_ignore":"true","wal_path":"/var/lib/milvus/data/wal"}}

@del-zhenwu
Copy link
Contributor Author

log in elk:

Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-37-generic x86_64)
2020-04-11 09:49:52,481 | INFO | default | [SERVER] ../bin/milvus_server(+0x13032b) [0x565207d8532b]
April 11th 2020, 09:49:52.880benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] ../bin/milvus_server(+0x139bdd) [0x565207d8ebdd]
April 11th 2020, 09:49:52.880benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] /lib/x86_64-linux-gnu/libc.so.6(+0x5aa45) [0x7f948b58fa45]
April 11th 2020, 09:49:52.880benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd66f) [0x7f948bf9966f]
April 11th 2020, 09:49:52.880benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f948c26c6db]
April 11th 2020, 09:49:52.880benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] /lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0x192a) [0x7f948b591cba]
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,475 | INFO | default | [SERVER] [search][0] Search start in gRPC server
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,479 | INFO | default | [SERVER] [search][0] Search pre-execute. Check search parameters
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] ../bin/milvus_server(+0x289a80) [0x565207edea80]
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,275 | INFO | default | [SERVER] [search][0] Search done.
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,479 | INFO | default | [SERVER] [search][0] Search execute.
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] [SERVER] Server received critical signal: 11
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] ../bin/milvus_server(+0x28a031) [0x565207edf031]
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20) [0x7f948b573f20]
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] /lib/x86_64-linux-gnu/libc.so.6(+0x18e5a1) [0x7f948b6c35a1]
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,154 | INFO | default | [WAL] record type 5 collection  lsn 140273086613296
April 11th 2020, 09:49:52.879benchmark-test-zjrsnvun-milvus-7988d49d58-2np5r
2020-04-11 09:49:52,481 | INFO | default | [SERVER] Call stack:

@op-hunter
Copy link
Contributor

crash stack:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f80773dd801 in __GI_abort () at abort.c:79
#2 0x000056016949de7d in faiss::gpu::IVFFlat::copyCodeVectorsFromCpu(float const*, long const*, std::vector<unsigned long, std::allocator > const&) ()
#3 0x00005601694989ac in faiss::gpu::GpuIndexIVFFlat::copyFrom(faiss::IndexIVFFlat const*) ()
#4 0x0000560169481f60 in faiss::gpu::ToGpuCloner::clone_Index (this=0x7f8047ffd6e0, index=) at gpu/GpuCloner.cpp:211
#5 0x000056016948289b in faiss::gpu::index_cpu_to_gpu (resources=0x7f7fb8001450, device=0, index=0x7f7fc8001c00, options=) at gpu/GpuCloner.cpp:270
#6 0x00005601692aaac2 in milvus::knowhere::IVF::CopyCpuToGpu (this=0x7f7fc8006fe0, device_id=0, config=...) at /home/zilliz/workspace/dev/milvus/milvus/core/src/index/knowhere/knowhere/index/vector_index/IndexIVF.cpp:230
#7 0x0000560169286d3d in milvus::knowhere::cloner::CopyCpuToGpu (index=std::shared_ptrmilvus::knowhere::VecIndex (use count 3, weak count 0) = {...}, device_id=0, config=...)
at /home/zilliz/workspace/dev/milvus/milvus/core/src/index/knowhere/knowhere/index/vector_index/helpers/Cloner.cpp:60
#8 0x0000560169021c1a in milvus::engine::ExecutionEngineImpl::CopyToGpu (this=0x7f7fcc0032f0, device_id=0, hybrid=false) at /home/zilliz/workspace/dev/milvus/milvus/core/src/db/engine/ExecutionEngineImpl.cpp:567
#9 0x0000560168e1dcf4 in milvus::scheduler::XSearchTask::Load (this=0x7f7fcc004760, type=milvus::scheduler::LoadType::CPU2GPU, device_id=0 '\000') at /home/zilliz/workspace/dev/milvus/milvus/core/src/scheduler/task/SearchTask.cpp:150
#10 0x0000560168df9c6b in milvus::scheduler::GpuResource::LoadFile (this=0x5601841c0850, task=std::shared_ptrmilvus::scheduler::Task (use count 4, weak count 0) = {...}) at /home/zilliz/workspace/dev/milvus/milvus/core/src/scheduler/resource/GpuResource.cpp:29
#11 0x0000560168dff182 in milvus::scheduler::Resource::loader_function (this=0x5601841c0850) at /home/zilliz/workspace/dev/milvus/milvus/core/src/scheduler/resource/Resource.cpp:170
#12 0x0000560168e01583 in std::__invoke_impl<void, void (milvus::scheduler::Resource::)(), milvus::scheduler::Resource> (
__f=@0x5601841c1c90: (void (milvus::scheduler::Resource::)(milvus::scheduler::Resource * const)) 0x560168dfeeea milvus::scheduler::Resource::loader_function(), __t=@0x5601841c1c88: 0x5601841c0850) at /usr/include/c++/7/bits/invoke.h:73
#13 0x0000560168e00cfe in std::__invoke<void (milvus::scheduler::Resource::
)(), milvus::scheduler::Resource*> (__fn=@0x5601841c1c90: (void (milvus::scheduler::Resource::)(milvus::scheduler::Resource * const)) 0x560168dfeeea milvus::scheduler::Resource::loader_function())
at /usr/include/c++/7/bits/invoke.h:95
#14 0x0000560168e05df5 in std::thread::_Invoker<std::tuple<void (milvus::scheduler::Resource::
)(), milvus::scheduler::Resource*> >::_M_invoke<0ul, 1ul> (this=0x5601841c1c88) at /usr/include/c++/7/thread:234
#15 0x0000560168e05c1b in std::thread::_Invoker<std::tuple<void (milvus::scheduler::Resource::)(), milvus::scheduler::Resource> >::operator() (this=0x5601841c1c88) at /usr/include/c++/7/thread:243
#16 0x0000560168e05afa in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (milvus::scheduler::Resource::)(), milvus::scheduler::Resource> > >::_M_run (this=0x5601841c1c80) at /usr/include/c++/7/thread:186
#17 0x00007f8077e016df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00007f80780d46db in start_thread (arg=0x7f8047fff700) at pthread_create.c:463
#19 0x00007f80774be88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

server output:
Faiss assertion 'list_length.size() == this->getNumLists()' failed in void faiss::gpu::IVFFlat::copyCodeVectorsFromCpu(const float*, const long int*, const std::vector&) at gpu/impl/IVFFlat.cu:57; details: Expect list size 16384 but 683088 received!

@del-zhenwu
Copy link
Contributor Author

no-reproduced

0.8.0 automation moved this from To do to Done Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/urgent Must be staffed and worked on either currently, or very soon, ideally in time for the next release. severity/major Major, major function doesn't work under some condition.
Projects
No open projects
0.8.0
  
Done
Development

No branches or pull requests

4 participants