Allows overriding 'timeout' and 'gather_job_timeout' to 'manage.up' runner call #40018

meaksh · 2017-03-14T17:25:40Z

What does this PR do?

This PR allows overriding timeout and gather_job_timeout parameters to manage.up and manage.status runners. These two parameters are set in our master configuration with a high value, as we want to wait by default for jobs that takes long to execute:

timeout: 120
gather_job_timeout: 120

When dealing with an scenario where some minions are down, if we execute manage.up runner to check for minions that are up and running, it will end up after reaching the two timeouts. This makes manage.up runner unusable as presence mechanism because we cannot get a response in some short and fixed time.

With this PR, we are able to call manage.up runner passing timeout and gather_job_timeout parameters which are used to perform the upcoming test.ping job that the runner is going to trigger.

This way we can get a quick response of manage.up runner in case of unreachable minions. Even if we call the runner via API.

Previous Behavior

$ time salt-run manage.up
- headref-minsles12sp2.tf.local

real	4m1.052s
user	0m3.729s
sys	0m0.264s

New Behavior

$ time salt-run manage.up timeout=2 gather_job_timeout=1
- headref-minsles12sp2.tf.local

real	0m3.964s
user	0m0.693s
sys	0m0.119s

Tests written?

No

What do you think about this?

I also tried with manage.present runner but it doesn't fit well in our scenario. Is there another quick way to check for the actual presence of minions in a context with unreachable minions?

Any feedback or opinions are more than welcome! 😄
Thanks

/cc @moio

ghost · 2017-03-14T17:25:46Z

@meaksh, thanks for your PR! By analyzing the history of the files in this pull request, we identified @0xDEC0DE, @cachedout and @cvrebert to be potential reviewers.

cachedout

This seems totally reasonable. I like it.

meaksh · 2017-03-15T10:04:39Z

Great @cachedout. Do you think we should also allow those parameters for the general test.ping command?

BTW there're some conflicts merging this PR forward to 2016.11 branch. If needed, I can do another PR with the fixed version for 2016.11, or you can just ping me to verify the resolution.

cachedout · 2017-03-15T19:42:44Z

@meaksh If you want to do it to test.ping I have no objection so long as it's done as a kwarg.

rallytime · 2017-03-15T21:11:02Z

@meaksh If you have the time to put a PR together for 2016.11, that would be awesome. The merge conflict isn't too terrible, but a little bit trickier than normal. That way I won't miss something in resolving the conflict.

meaksh · 2017-03-16T09:15:02Z

@rallytime - Here it is the PR for 2016.11 branch: #40072

meaksh added 2 commits March 14, 2017 16:39

Allows to set 'timeout' and 'gather_job_timeout' via kwargs

2102d9c

Allows to set custom timeouts for 'manage.up' and 'manage.status'

9f5c3b7

cachedout approved these changes Mar 14, 2017

View reviewed changes

meaksh mentioned this pull request Mar 15, 2017

Allowing custom timeouts for Manage.up(), Test.ping() and on a generic LocalCall SUSE/salt-netapi-client#195

Merged

cachedout merged commit 8dcffc7 into saltstack:2016.3 Mar 15, 2017

meaksh mentioned this pull request Mar 16, 2017

[2016.11] Allows overriding 'timeout' and 'gather_job_timeout' to 'manage.up' runner call #40072

Merged

meaksh deleted the 2016.3-handling-timeouts-for-manage.up-runner branch March 16, 2017 09:15

meaksh mentioned this pull request Mar 23, 2017

Makes sure "gather_job_timeout" is an Integer #40264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allows overriding 'timeout' and 'gather_job_timeout' to 'manage.up' runner call #40018

Allows overriding 'timeout' and 'gather_job_timeout' to 'manage.up' runner call #40018

meaksh commented Mar 14, 2017

ghost commented Mar 14, 2017

cachedout left a comment

meaksh commented Mar 15, 2017

cachedout commented Mar 15, 2017

rallytime commented Mar 15, 2017

meaksh commented Mar 16, 2017

Allows overriding 'timeout' and 'gather_job_timeout' to 'manage.up' runner call #40018

Allows overriding 'timeout' and 'gather_job_timeout' to 'manage.up' runner call #40018

Conversation

meaksh commented Mar 14, 2017

What does this PR do?

Previous Behavior

New Behavior

Tests written?

ghost commented Mar 14, 2017

cachedout left a comment

Choose a reason for hiding this comment

meaksh commented Mar 15, 2017

cachedout commented Mar 15, 2017

rallytime commented Mar 15, 2017

meaksh commented Mar 16, 2017