Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poll-cv: Async-client/Sync-server streaming perf test fails/timeouts with poll-cv engine #8545

Closed
sreecha opened this issue Oct 27, 2016 · 1 comment
Assignees
Milestone

Comments

@sreecha
Copy link
Contributor

sreecha commented Oct 27, 2016

The following tests fail with poll-cv engine:
cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_secure
cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure

  1. To reproduce, do the following:
$ # Go to your grpc local repo root
$ GRPC_POLL_STRATEGY=poll-cv  tools/run_tests/run_performance_test.py -l c++ -r cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure

You will also notice that if you run qps_worker and qps_json_driver processes manually (or perhaps give a longer timeout in run_performance.py above), then it will complete after a long time (around 5 mins or even 10 mins).

  1. Another way to reproduce (i.e running qps_workers and qps_json_driver manually)
$ # Go to grpc local repo root and build
$ CONFIG=dbg make qps_json_driver qps_worker -j

$ # Open three terminal windows

$ # Window 1
$ GRPC_POLL_STRATEGY=poll-cv GRPC_VERBOSITY=DEBUG bins/dbg/qps_worker --driver_port=10001

$ # Window 2
$ GRPC_POLL_STRATEGY=poll-cv GRPC_VERBOSITY=DEBUG bins/dbg/qps_worker --driver_port=10002

$ # Window 3 (Notice that this need not use  poll-cv. It uses epoll by default and it is okay)
$ export QPS_WORKERS=localhost:10001,localhost:10002
$ GRPC_VERBOSITY=DEBUG bins/dbg/qps_json_driver  '--scenarios_json={"scenarios": [{"name": "cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure", "warmup_seconds": 5, "benchmark_seconds": 30, "num_servers": 1, "server_config": {"async_server_threads": 0, "core_limit": 0, "security_params": null, "server_type": "SYNC_SERVER"}, "client_config": {"client_type": "ASYNC_CLIENT", "security_params": null, "payload_config": {"simple_params": {"resp_size": 0, "req_size": 0}}, "client_channels": 64, "async_client_threads": 0, "outstanding_rpcs_per_channel": 100, "rpc_type": "STREAMING", "load_params": {"closed_loop": {}}, "histogram_params": {"max_possible": 60000000000.0, "resolution": 0.01}}, "num_clients": 0}]}'
@kpayson64
Copy link
Contributor

@sreecha @ctiller
I think #8982 actually substantially helps this case:

With that change:

kpayson@kpayson0:/workspace/grpc2$ GRPC_POLL_STRATEGY=poll-cv tools/run_tests/run_performance_tests.py -l c++ -r cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure
START: Preparing hosts.
PASSED: local_prepare [time=0.3sec; retries=0:0]
SUCCESS: Prepare step completed successfully.
START: Building.
PASSED: local_build [time=1.3sec; retries=0:0]
SUCCESS: Built successfully.
START: qps_worker_c++_0
START: qps_worker_c++_1

++ dirname tools/run_tests/performance/run_qps_driver.sh
+ cd tools/run_tests/performance/../../..
+ bins/opt/qps_json_driver '--scenarios_json={"scenarios": [{"name": "cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure", "warmup_seconds": 5, "benchmark_seconds": 30, "num_servers": 1, "server_config": {"async_server_threads": 0, "core_limit": 0, "security_params": null, "server_type": "SYNC_SERVER"}, "client_config": {"client_type": "ASYNC_CLIENT", "security_params": null, "payload_config": {"simple_params": {"resp_size": 0, "req_size": 0}}, "client_channels": 64, "async_client_threads": 0, "outstanding_rpcs_per_channel": 100, "rpc_type": "STREAMING", "load_params": {"closed_loop": {}}, "histogram_params": {"max_possible": 60000000000.0, "resolution": 0.01}}, "num_clients": 0}]}' --scenario_result_file=scenario_result.json
RUNNING SCENARIO: cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure
D0206 12:35:17.315935701   58764 ev_posix.c:105]             Using polling engine: poll-cv
I0206 12:35:18.322355554   58764 driver.cc:268]              Starting server on localhost:10000 (worker #0)
D0206 12:35:18.322451586   58764 ev_posix.c:105]             Using polling engine: poll-cv
I0206 12:35:18.343740214   58764 driver.cc:336]              Starting client on localhost:10010 (worker #1)
D0206 12:35:18.343844104   58764 driver.cc:375]              Client 0 gets 64 channels
I0206 12:35:18.465201668   58764 driver.cc:396]              Initiating
I0206 12:35:18.476654821   58764 driver.cc:417]              Warming up
I0206 12:35:23.476763322   58764 driver.cc:423]              Starting
I0206 12:35:23.495435406   58764 driver.cc:450]              Running
I0206 12:35:53.476777367   58764 driver.cc:464]              Finishing clients
I0206 12:35:53.477892372   58764 driver.cc:478]              Received final status from client 0
I0206 12:35:53.806017085   58764 driver.cc:510]              Finishing servers
I0206 12:35:53.811958723   58764 driver.cc:524]              Received final status from server 0
I0206 12:35:57.056021375   58764 report.cc:81]               QPS: 35206.0
I0206 12:35:57.058179710   58764 report.cc:91]               QPS: 35206.0 (2933.8/server core)
I0206 12:35:57.058211257   58764 report.cc:96]               Latencies (50/90/95/99/99.9%-ile): 157399.5/228166.4/330301.3/717298.3/1086503.3 us
I0206 12:35:57.058224571   58764 report.cc:106]              Server system time: 496.20%
I0206 12:35:57.058235606   58764 report.cc:108]              Server user time:   108.57%
I0206 12:35:57.058246214   58764 report.cc:110]              Client system time: 37.81%
I0206 12:35:57.058256758   58764 report.cc:112]              Client user time:   29.70%
I0206 12:35:57.058267667   58764 report.cc:117]              Server CPU usage: 66.82%
+ '[' '' '!=' '' ']'
PASSED: qps_json_driver.cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure [time=39.8sec; retries=0:0]

PASSED: qps_json_driver.quit [time=2.0sec; retries=0:0]

I0206 12:35:17.302615496   58760 server_builder.cc:247]      Synchronous server. Num CQs: 12, Min pollers: 1, Max Pollers: 2147483647, CQ timeout (msec): 1000
PASSED: qps_worker_c++_0 [time=41.8sec; retries=0:0]
QPS worker "localhost:10010" is still running.
QPS worker "localhost:10010" is still running.

I0206 12:35:17.305455078   58761 server_builder.cc:247]      Synchronous server. Num CQs: 12, Min pollers: 1, Max Pollers: 2147483647, CQ timeout (msec): 1000
PASSED: qps_worker_c++_1 [time=46.1sec; retries=0:0]
All QPS workers finished.

Without that change:

GRPC_POLL_STRATEGY=poll-cv tools/run_tests/run_performance_tests.py -l c++ -r cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure
START: Preparing hosts.
PASSED: local_prepare [time=0.3sec; retries=0:0]
SUCCESS: Prepare step completed successfully.
START: Building.
PASSED: local_build [time=31.0sec; retries=0:0]
SUCCESS: Built successfully.
START: qps_worker_c++_0
START: qps_worker_c++_1

++ dirname tools/run_tests/performance/run_qps_driver.sh
+ cd tools/run_tests/performance/../../..
+ bins/opt/qps_json_driver '--scenarios_json={"scenarios": [{"name": "cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure", "warmup_seconds": 5, "benchmark_seconds": 30, "num_servers": 1, "server_config": {"async_server_threads": 0, "security_params": null, "payload_config": null, "server_type": "SYNC_SERVER"}, "client_config": {"client_type": "ASYNC_CLIENT", "security_params": null, "payload_config": null, "client_channels": 64, "async_client_threads": 0, "outstanding_rpcs_per_channel": 100, "rpc_type": "STREAMING", "load_params": {"closed_loop": {}}, "histogram_params": {"max_possible": 60000000000.0, "resolution": 0.01}}, "num_clients": 0}]}' --scenario_result_file=scenario_result.json
RUNNING SCENARIO: cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure
I0206 12:43:59.642962276   75267 driver.cc:259]              Starting server on localhost:10000 (worker #0)
D0206 12:43:59.643045299   75267 ev_posix.c:107]             Using polling engine: poll-cv
I0206 12:43:59.648651425   75267 driver.cc:306]              Starting client on localhost:10010 (worker #1)
D0206 12:43:59.648677461   75267 driver.cc:322]              Client 0 gets 64 channels
I0206 12:43:59.757211908   75267 driver.cc:343]              Initiating
I0206 12:43:59.770925615   75267 driver.cc:364]              Warming up
I0206 12:44:04.771027461   75267 driver.cc:370]              Starting
I0206 12:44:04.863617798   75267 driver.cc:397]              Running
I0206 12:44:34.771065355   75267 driver.cc:411]              Finishing clients
I0206 12:44:34.771800498   75267 driver.cc:425]              Received final status from client 0
I0206 12:50:24.111830214   75267 driver.cc:457]              Finishing servers
I0206 12:50:24.118813390   75267 driver.cc:471]              Received final status from server 0
I0206 12:51:21.321436807   75267 report.cc:81]               QPS: 1015.1
I0206 12:51:21.323846692   75267 report.cc:91]               QPS: 1015.1 (84.6/server core)
I0206 12:51:21.323880727   75267 report.cc:96]               Latencies (50/90/95/99/99.9%-ile): 2688554.1/6099182.0/10075834.0/27601184.2/34718785.0 us
I0206 12:51:21.323893818   75267 report.cc:106]              Server system time: 46.95%
I0206 12:51:21.323904589   75267 report.cc:108]              Server user time:   88.74%
I0206 12:51:21.323914730   75267 report.cc:110]              Client system time: 12.43%
I0206 12:51:21.323924663   75267 report.cc:112]              Client user time:   4.07%
I0206 12:51:21.323935180   75267 report.cc:117]              Server CPU usage: 17.49%
+ '[' '' '!=' '' ']'
PASSED: qps_json_driver.cpp_protobuf_async_client_sync_server_streaming_qps_unconstrained_insecure [time=441.8sec; retries=0:0]

PASSED: qps_json_driver.quit [time=0.0sec; retries=0:0]

I0206 12:43:59.511150308   75259 server_builder.cc:247]      Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2147483647, CQ timeout (msec): 1000
TIMEOUT: qps_worker_c++_0 [pid=75259]

I0206 12:43:59.518450586   75260 server_builder.cc:247]      Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2147483647, CQ timeout (msec): 1000
TIMEOUT: qps_worker_c++_1 [pid=75260]

@lock lock bot locked as resolved and limited conversation to collaborators Oct 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants