Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python tests often don't finish (in grpc/ubuntu/master/grpc_basictests_multilang) #12479

Closed
jtattermusch opened this issue Sep 11, 2017 · 5 comments · Fixed by #12507
Closed

Comments

@jtattermusch
Copy link
Contributor

Python tests are started, but in ~ 50% (!!!) of runs, they don't ever finish and eventually the kokoro build times out after top-level timeout set for the entire job elapses.

  • why doesn't run_tests_matrix.py make them timeout?
  • why do they hang?

In this log run_tests_python_linux_dbg_native never finishes

tools/run_tests/run_tests_matrix.py -f basictests linux multilang --inner_jobs 16 -j 2 --internal_ci --bq_result_table aggregate_results
IMPORTANT: The changes you are testing need to be locally committed
because only the committed changes in the current branch will be
copied to the docker environment or into subworkspaces.
Will run these tests:
  run_tests_sanity_linux_dbg_native
  run_tests_sanity_linux_opt_native
  run_tests_php7_linux_dbg_native
  run_tests_php7_linux_opt_native
  run_tests_csharp_linux_dbg_native
  run_tests_csharp_linux_opt_native
  run_tests_node_linux_dbg_native
  run_tests_node_linux_opt_native
  run_tests_python_linux_dbg_native
  run_tests_python_linux_opt_native
  run_tests_ruby_linux_dbg_native
  run_tests_ruby_linux_opt_native
  run_tests_php_linux_dbg_native
  run_tests_php_linux_opt_native
2017-09-10 03:20:34,313 START: Running test matrix.
2017-09-10 03:20:34,313 START: run_tests_sanity_linux_dbg_native
2017-09-10 03:20:34,315 START: run_tests_sanity_linux_opt_native
2017-09-10 03:30:29,842 PASSED: run_tests_sanity_linux_dbg_native [time=595.5sec; retries=0:0]
2017-09-10 03:30:29,842 START: run_tests_php7_linux_dbg_native
2017-09-10 03:30:34,334 PASSED: run_tests_sanity_linux_opt_native [time=600.0sec; retries=0:0]
2017-09-10 03:30:34,334 START: run_tests_php7_linux_opt_native
2017-09-10 03:31:32,362 PASSED: run_tests_php7_linux_dbg_native [time=62.5sec; retries=0:0]
2017-09-10 03:31:32,362 START: run_tests_csharp_linux_dbg_native
2017-09-10 03:31:33,393 PASSED: run_tests_php7_linux_opt_native [time=59.1sec; retries=0:0]
2017-09-10 03:31:33,393 START: run_tests_csharp_linux_opt_native
2017-09-10 03:33:35,669 PASSED: run_tests_csharp_linux_dbg_native [time=123.3sec; retries=0:0]
2017-09-10 03:33:35,670 START: run_tests_node_linux_dbg_native
2017-09-10 03:33:44,282 PASSED: run_tests_csharp_linux_opt_native [time=130.9sec; retries=0:0]
2017-09-10 03:33:44,282 START: run_tests_node_linux_opt_native
2017-09-10 03:35:44,860 PASSED: run_tests_node_linux_opt_native [time=120.6sec; retries=0:0]
2017-09-10 03:35:44,860 START: run_tests_python_linux_dbg_native
2017-09-10 03:35:45,536 PASSED: run_tests_node_linux_dbg_native [time=129.9sec; retries=0:0]
2017-09-10 03:35:45,536 START: run_tests_python_linux_opt_native
2017-09-10 03:54:09,747 PASSED: run_tests_python_linux_dbg_native [time=1104.9sec; retries=0:0]
2017-09-10 03:54:09,747 START: run_tests_ruby_linux_dbg_native
2017-09-10 03:57:02,813 PASSED: run_tests_ruby_linux_dbg_native [time=173.1sec; retries=0:0]
2017-09-10 03:57:02,814 START: run_tests_ruby_linux_opt_native
2017-09-10 03:59:44,416 PASSED: run_tests_ruby_linux_opt_native [time=161.6sec; retries=0:0]
2017-09-10 03:59:44,416 START: run_tests_php_linux_dbg_native
2017-09-10 04:00:21,782 PASSED: run_tests_php_linux_dbg_native [time=37.4sec; retries=0:0]
2017-09-10 04:00:21,782 START: run_tests_php_linux_opt_native
2017-09-10 04:00:44,305 PASSED: run_tests_php_linux_opt_native [time=22.5sec; retries=0:0]
...
ERROR: Aborting VM command due to timeout of 14400 seconds

Instances
https://sponge.corp.google.com/invocation?id=ec0684b0-cfda-4619-b36b-7f924e83a48a&searchFor=
https://sponge.corp.google.com/invocation?tab=Kokoro&id=b6e845a6-4bb9-44cc-bad2-f0ed5ee5147d&searchFor=
.. and many more

@jtattermusch
Copy link
Contributor Author

we'll need to differentiate timeouts in run_tests_matrix.py a little bit:

_RUNTESTS_TIMEOUT = 4*60*60

@jtattermusch
Copy link
Contributor Author

Also seen hapenning on Jenkins:
06:56:02 2017-09-11 13:56:02,242 TIMEOUT: run_tests_python_linux_opt_native [pid=16893]

@jtattermusch jtattermusch changed the title Python tests often never finish (in grpc/ubuntu/master/grpc_basictests_multilang) Python tests often don't finish (in grpc/ubuntu/master/grpc_basictests_multilang) Sep 12, 2017
@apolcyn
Copy link
Contributor

apolcyn commented Sep 14, 2017

Reopening since seeing the python test timeout issue (was it closed manually in #12507 ?).

Seen again on jenkins, where the suite aggregate_tests.run_tests_python_linux_opt_native timed out.

https://grpc-testing.appspot.com/job/gRPC_pull_requests_linux_opt/1295/testReport/junit/(root)/aggregate_tests/run_tests_python_linux_opt_native/

@jtattermusch
Copy link
Contributor Author

Thanks for reopening. I actually never closed this, it looks like github is acting pretty stupid and mentioning "will not fix #XXXX" will trigger closing of XXXX upon merging :-)

@jtattermusch
Copy link
Contributor Author

I haven't seen this for a while.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants