Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search performance gap between cpu and gpu(nq<gpu_search_threshold) #2437

Closed
del-zhenwu opened this issue May 26, 2020 · 8 comments
Closed
Assignees
Labels
area/performance Performance issues priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@del-zhenwu
Copy link
Contributor

del-zhenwu commented May 26, 2020

Describe the bug
performance gap between cpu/gpu(nq<gpu_search_threshold):
cpu:

15:52:38  2020-05-26:07:52:38,482 INFO     [k8s_runner.py:321] Search param: {"nprobe": 8}
15:52:38  ╭─────────────┬─────────────┬─────────────┬─────────────┬─────────────╮
15:52:38  │  Nq/Top-k   │      1      │     10      │     100     │    1000     │
15:52:38  ├─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤
15:52:38  │           1 │        1.86 │        1.81 │        1.81 │        1.77 │
15:52:38  │          10 │        2.98 │           3 │         3.2 │        3.16 │
15:52:38  │         100 │        8.04 │        8.22 │        9.18 │       11.43 │

Screenshot_2020-05-26 cpu_sift_1b_search_sq8(1)

===
gpu:

18:34:24  2020-05-26:10:34:21,192 INFO     [k8s_runner.py:321] Search param: {"nprobe": 8}
18:34:24  ╭─────────────┬─────────────┬─────────────┬─────────────┬─────────────╮
18:34:24  │  Nq/Top-k   │      1      │     10      │     100     │    1000     │
18:34:24  ├─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤
18:34:24  │           1 │        3.07 │        2.99 │        2.42 │        2.72 │
18:34:24  │          10 │        3.81 │        4.14 │        4.62 │        5.49 │
18:34:24  │         100 │       10.73 │        10.7 │       11.46 │       13.37 │

Screenshot_2020-05-26 cpu_sift_1b_search_sq8

Steps/Code to reproduce behavior
dataset: sift-1b

Expected behavior
A clear and concise description of what you expected to happen.

Environment details
branch: 0.9.1
index:

15:25:42  2020-05-26:07:25:37,896 INFO  
   [k8s_runner.py:311] {'index_type': 'ivf_sq8', 'index_param': {'nlist': 16384}}

config:

'engine_config.use_blas_threshold': 0, 'engine_config.gpu_search_threshold': 200

cpu config:

15:25:42  2020-05-26:07:25:37,891 INFO     [client.py:222] Server command: get_config *, result: {
"cache_config":
{"cache_insert_data":"false","cpu_cache_capacity":"150","cpu_cache_threshold":"0.7","insert_buffer_size":"1"},
"db_config":
{"archive_days_threshold":"0","archive_disk_threshold":"0","auto_flush_interval":"1","backend_url":"sqlite://:@:/","preload_collection":""},
"engine_config":
{"omp_thread_num":"0","simd_type":"auto","use_blas_threshold":"0"},
"logs":
{"debug.enable":"true","error.enable":"true","fatal.enable":"true","info.enable":"true","log_rotate_num":"0","max_log_file_size":"1024","path":"/test/milvus/db_data_8/sift_1b_2048_128_l2_sq8/logs","trace.enable":"true","warning.enable":"true"},
"metric_config":
{"address":"192.168.1.237","enable_monitor":"true","port":"9091"},
"server_config":
{"address":"0.0.0.0","deploy_mode":"single","port":"19530","time_zone":"UTC+8","web_enable":"true","web_port":"19121"},
"storage_config":
{"file_cleanup_timeout":"10","primary_path":"/test/milvus/db_data_8/sift_1b_2048_128_l2_sq8","secondary_path":""},
"tracing_config":
{"json_config_path":""},
"wal_config":
{"buffer_size":"256","enable":"true","recovery_error_ignore":"true","wal_path":"/test/milvus/db_data_8/sift_1b_2048_128_l2_sq8/wal"}}

===
gpu config:

17:42:42  2020-05-26:09:42:41,877 INFO     [client.py:222] Server command: get_config *, result: {
"cache_config":
{"cache_insert_data":"false","cpu_cache_capacity":"150","cpu_cache_threshold":"0.7","insert_buffer_size":"1"},
"db_config":
{"archive_days_threshold":"0","archive_disk_threshold":"0","auto_flush_interval":"1","backend_url":"sqlite://:@:/","preload_collection":""},
"engine_config":
{"gpu_search_threshold":"200","omp_thread_num":"0","simd_type":"auto","use_blas_threshold":"0"},"
gpu_resource_config":
{"build_index_resources":"gpu0,gpu1","cache_capacity":"6","cache_threshold":"0.7","enable":"true","search_resources":"gpu0,gpu1"},
"logs":
{"debug.enable":"true","error.enable":"true","fatal.enable":"true","info.enable":"true","log_rotate_num":"0","max_log_file_size":"1024","path":"/test/milvus/db_data_8/sift_1b_2048_128_l2_sq8h/logs","trace.enable":"true","warning.enable":"true"},
"metric_config":
{"address":"192.168.1.237","enable_monitor":"true","port":"9091"},
"server_config":
{"address":"0.0.0.0","deploy_mode":"single","port":"19530","time_zone":"UTC+8","web_enable":"true","web_port":"19121"},
"storage_config":
{"file_cleanup_timeout":"10","primary_path":"/test/milvus/db_data_8/sift_1b_2048_128_l2_sq8h","secondary_path":""},
"tracing_config":
{"json_config_path":""},
"wal_config":
{"buffer_size":"256","enable":"true","recovery_error_ignore":"true","wal_path":"/test/milvus/db_data_8/sift_1b_2048_128_l2_sq8h/wal"}}

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@del-zhenwu del-zhenwu added the area/performance Performance issues label May 26, 2020
@JinHai-CN JinHai-CN assigned wxyucs and unassigned wxyucs May 26, 2020
@cydrain cydrain self-assigned this May 28, 2020
@cydrain
Copy link
Contributor

cydrain commented May 28, 2020

compare between CPU and GPU with nq=100, k=100

     │ map uids │quantization│ data search │  IDMAP  │   total   │
CPU  │   662.47 │   3954.04  │   1951.63   │ 2515.82 │  9106.22  │
GPU  │  1447.21 │   4536.22  │   1957.53   │ 3367.76 │ 11325.16  │

GPU slower than CPU consists of 3 parts:

  1. In 233 index files, there is 1 IDMAP index file, query time is 2515.82(CPU) VS. 3367.76(GPU)
  2. quantization time is not equal, 3954.04(CPU) VS. 4536.22(GPU)
  3. map uids performance is not equal, 662.47(CPU) VS. 1447.21(GPU)

==================================================================
compare between CPU and GPU with nq=100, k=100 (latest 0.9.1)

     │ map uids │quantization│ data search │  IDMAP  │   total   │
CPU  │  117.45  │   3821.89  │   1840.22   │ 2685.12 │  8510.53  │
GPU  │  949.17  │   5878.83  │   2011.31   │ 3047.41 │ 11922.17  │

@cydrain
Copy link
Contributor

cydrain commented Jun 2, 2020

Need to do:

  1. do objdump to compare the assemble code between CPU and GPU
  2. do CPU profiling for IDMAP search between CPU and GPU
  3. add more debug log for quantization

@op-hunter
Copy link
Contributor

I had compared the disassemble code between CPU and GPU version, they are almost the same.

@op-hunter
Copy link
Contributor

op-hunter commented Jun 3, 2020

cpu version map uids costs time statistics:

nq = 10, topk = 10
the 1th map uids costs: 281.457584 ms
the 2th map uids costs: 241.540864 ms
nq = 10, topk = 100
the 1th map uids costs: 629.247534 ms
the 2th map uids costs: 342.6071419999999 ms
nq = 100, topk = 10
the 1th map uids costs: 563.6161859999999 ms
the 2th map uids costs: 285.4171279999998 ms
nq = 100, topk = 100
the 1th map uids costs: 1923.391010999999 ms
the 2th map uids costs: 699.3380979999998 ms
total time: 4966.6155469999985

cpu version idmap costs time statistics:

nq = 10, topk = 10
the 1th idmap search costs: 1420.886106ms
the 2th idmap search costs: 1241.22002ms
nq = 10, topk = 100
the 1th idmap search costs: 1313.316275ms
the 2th idmap search costs: 1470.983243ms
nq = 100, topk = 10
the 1th idmap search costs: 2967.846867ms
the 2th idmap search costs: 3112.891742ms
nq = 100, topk = 100
the 1th idmap search costs: 2807.553905ms
the 2th idmap search costs: 3096.952711ms
total time: 17431.650868999997ms

gpu version map uids time statistics:

nq = 10, topk = 10
the 1th map uids costs: 359.0528389999999 ms
the 2th map uids costs: 77.25043499999995 ms
nq = 10, topk = 100
the 1th map uids costs: 321.13985700000006 ms
the 2th map uids costs: 601.8976440000005 ms
nq = 100, topk = 10
the 1th map uids costs: 394.6957380000002 ms
the 2th map uids costs: 397.812442 ms
nq = 100, topk = 100
the 1th map uids costs: 1520.9041959999995 ms
the 2th map uids costs: 2018.7300160000007 ms
total time: 5691.483167000001

gpu version idmap costs time statistics:

nq = 10, topk = 10
the 1th idmap search costs: 1285.741843ms
the 2th idmap search costs: 1068.580862ms
nq = 10, topk = 100
the 1th idmap search costs: 1260.300283ms
the 2th idmap search costs: 1230.708255ms
nq = 100, topk = 10
the 1th idmap search costs: 3347.814468ms
the 2th idmap search costs: 2947.540391ms
nq = 100, topk = 100
the 1th idmap search costs: 3115.975143ms
the 2th idmap search costs: 3121.549829ms
total time: 17378.211074000003ms

As result shown above, each test's performance are not stable because of the system jitter.
I think the strategy of taking the fastest one as the final test result concealed the system jitter.

@op-hunter
Copy link
Contributor

I use google-perftool to do the profiling on the IDMAP process, the result shows that over 90% time costs on function omp_get_num_procs, it's very strange.

@del-zhenwu
Copy link
Contributor Author

Search time ranged is expected, put this issue hold

@del-zhenwu del-zhenwu added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jun 6, 2020
@shengjun1985
Copy link
Contributor

@cydrain @del-zhenwu
GPU version will let the index readonly.
And the PageLockMemory is used which will cause downgrade cpu ivf search performance.

Here are the difference codes between the two version.
image

Here is the description.
image

@shengjun1985 shengjun1985 assigned del-zhenwu and unassigned cydrain Oct 14, 2020
@del-zhenwu
Copy link
Contributor Author

close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance issues priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

6 participants