Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemini stuck during high network latency #237

Closed
aleksbykov opened this issue May 11, 2020 · 7 comments
Closed

gemini stuck during high network latency #237

aleksbykov opened this issue May 11, 2020 · 7 comments
Milestone

Comments

@aleksbykov
Copy link
Contributor

gemini command was running in job https://jenkins.scylladb.com/view/master/job/scylla-master/view/master-test/job/gemini-/job/gemini-3h/92/:

gemini -d --duration 10800s --warmup 1800s -c 100 -m mixed -f --non-interactive --cql-features normal --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" --test-cluster=10.0.163.195 --outfile /home/centos/gemini_result_479cc503-b34d-48da-88fa-acced65bb1ba.log --seed 8 --oracle-cluster=10.0.25.75

During this job look was high latency on network. and gemini a bit stuck and instead of 3 hours it was running for 8 hours:

{"L":"INFO","T":"2020-05-07T20:44:44.740Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2020-05-07T21:14:44.828Z","M":"Warmup done"}
{"L":"INFO","T":"2020-05-07T21:16:16.154Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:17.994Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:20.774Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:27.440Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:28.606Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:28.623Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:30.228Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:30.314Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:30.840Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:31.642Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:31.868Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:31.983Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:33.751Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:35.169Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:35.324Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:36.009Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:36.338Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:38.152Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:38.320Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:38.877Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:38.942Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:16:39.834Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:17:09.080Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:17:53.585Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:17:59.921Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:18:41.093Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:19:53.461Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:19:52.973Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:20:49.058Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:20:52.262Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:20:55.805Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2020-05-07T21:21:15.652Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}

and then, job was interrupted by jenkins.

@aleksbykov
Copy link
Contributor Author

link to gemini output log: gemini_output_log

@dahankzter dahankzter added this to the 1.7 milestone May 14, 2020
@dahankzter
Copy link
Contributor

Did it continue to run queries just slower? Did it stall completely?

@dahankzter
Copy link
Contributor

Can you verify which signal is being used to kill gemini @aleksbykov ?

@aleksbykov
Copy link
Contributor Author

@dahankzter if job exceed timeout and jenkins stop the job, then loader instance is terminated where gemini was running.
if out sct code terminate the running of gemini, it uses the signal TERM

@aleksbykov
Copy link
Contributor Author

Did it continue to run queries just slower? Did it stall completely?

i saw only one time when gemini totally stuck, all other times it continue to run queries with very slow req/s.

@dahankzter
Copy link
Contributor

I think it should go faster in the 1.6.10 release @aleksbykov

@dkropachev
Copy link
Collaborator

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants