Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding command_timeout option for run() #645

Closed
wants to merge 1 commit into from

Conversation

fruch
Copy link
Contributor

@fruch fruch commented Jul 1, 2019

Adding a parameter command_timeout for run() calls,
on Local() runner it would set a threading.Timer and when expire would call os.kill(pid, signal.SIGKILL)
would raise an CommandTimedOut exception.

This implementation would enable fabric to use this new parameter on it's paramiko channel.settimeout() before running the command.

@fruch fruch force-pushed the add_command_timeout branch 4 times, most recently from 23f8169 to ce0e7ce Compare July 1, 2019 17:15
@fruch
Copy link
Contributor Author

fruch commented Jul 1, 2019

current fabric would raise:
TypeError: start() got an unexpected keyword argument 'command_timeout'
I'll fix it raise it only if command_timeout is actually passed

invoke/__init__.py Outdated Show resolved Hide resolved
invoke/exceptions.py Outdated Show resolved Hide resolved
invoke/runners.py Outdated Show resolved Hide resolved
invoke/runners.py Outdated Show resolved Hide resolved
invoke/exceptions.py Outdated Show resolved Hide resolved
@fruch
Copy link
Contributor Author

fruch commented Jul 2, 2019

thanks @bentsi for the review

@bentsi
Copy link

bentsi commented Jul 2, 2019

@fruch please edit PR info with updated details

@bitprophet
Copy link
Member

bitprophet commented Jul 19, 2019

My personal checklist for this (will branch and hack shortly):

  • Doublecheck prior art re: the high level approach of "threading timer + SIGKILL"
    • I don't think Fabric 1 had support for local timeout, only remote, but it's worth a revisit.
    • Even the approach it seems to use for remote timeout - checking whether the time-since-last-recv went over a timeout threshold - might be applicable here if we want? though I'm not at all sure that is actually preferable, and it's technically different behavior than an overall start-to-finish timeout.
  • Black/flake8/style-and-naming pass
  • Make sure docstrings, etc are up to snuff
  • Changelog entry
  • Close out Add support for command_timeout in run() fabric/fabric#1989 in case doing so exposes issues on this end
  • Merge

@bitprophet
Copy link
Member

bitprophet commented Jul 19, 2019

Notes for posterity about tweaks I'm making here:

  • Not a fan of using the kwarg command_timeout here: in Invoke, there is no other useful kind of timeout (vs in Fabric, at least v1, where it was to distinguish from the connection oriented timeout setting that came before it) so it's unnecessarily verbose. I'll just call it timeout since it is "the timeout you want for this command execution".
    • One could argue that it's still necessary to name it this way on the Fabric side & then to be consistent between APIs. However, I don't think this holds water - Fabric 2's connect timeout is in Connection's constructor (and in the timeouts.connect config value) and then this command timeout only comes into play within Connection.run. The two are neatly separated and should IMO be obvious from context.
    • Furthermore, and somewhat ironically, in Fabric, the connection timeout is now explicitly named (it's Connection(..., connect_timeout=xxx)) so even if they were in the same signature there'd still be no problem. But they're not so it's moot.
    • This seems like a good spot to leverage the fact that Fabric, at least, set up its config structure to consider >1 type of timeout - we may want to simply pull that down into Invoke as well.
  • CommandTimedOut inherits from UnexpectedExit, but I think that's incorrect-ish, given there is a further superclass of Failure, and the string display of UnexpectedExit is specifically "the command ran, and exited nonzero/unexpected-exit-code".
    • I think the obvious fix here is to refactor the bits that ARE useful in UnexpectedExit (such as the conditional display of the tail of stdout/err - very useful for figuring out why a program may have timed out!) into either a subroutine or Failure, depending.
    • Also not sure if it should be CommandTimedOut or CommandTimeout (or even simply Timeout) but I'll have to think about that, it may just be me wanting to change all the things to prove I'm adding value 🙃
    • Also also, I don't see why this exception class allows None for the timeout argument (why would it have timed out if there was no specific timeout requested?) so I'm going to see how well it works as a required argument.
  • Not sure why the new code does TypeError checking around the new argument; if it's mostly for backwards compat (e.g. if a subclass doesn't implement timeout support for some reason) there's a better pattern elsewhere that boils down to "only inject kwargs if given" which I'll use instead.
  • Migrated the "actually run a subprocess that sleeps and do a real timeout" test(s) to the integration suite and left basic unit style tests in the unit suite.

@bitprophet
Copy link
Member

Here's an interesting wrinkle, this patch implicitly decides that we don't timeout if warn=True. Is that right? Prior to now warn only said "if the command would have exited nonzero, print a message and keep going". Should exceeding timeout be treated identically to this?

Arguably, there's precedent, which is that warn does not mute exceptions raised within execution, such as inner thread errors (usually from IO problems), and especially also not watcher errors (e.g. sudo prompt auth fails).

This is really subtle; in fact I'm not even sure that watcher errors are correct in being exempt from warn, looking at it right now. But that's the established precedent - warn only guarding against "finished running and did not exit what was expected". Any other problem preventing the command from finishing, currently ignores warn. So I think that applies here too.

@fruch
Copy link
Contributor Author

fruch commented Jul 21, 2019

Thanks @bitprophet,

As for the warn=True, cause that logic was there already ontop of UnexpectedExit... I've put it into my logic too, didn't thought much about if it's makes sense or not.

As the naming of things, I was using something we already had in our code, it can be shorter and clean here.

One more thing to add,
We are using this for a few weeks now, and it's working great for us, we are using mostly the remote timeout, and not the local timeout.

Thanks again for looking into it

@bitprophet
Copy link
Member

bitprophet commented Aug 5, 2019

Got back on this today. Still a little on the fence about the exception name, but everything else is cleaned up at this point.

I moved the timer implementation to Local where I think it belongs (Runner itself should not know or care about the details, only whether the subprocess is considered to have timed out - thus a new property is born) and most of my above checklist as well - many more tests, including integration, edge cases etc. Docstrings, versionadded's, changelog entry. See https://github.com/pyinvoke/invoke/compare/645-int

Tackling Fabric 2's side should go swiftly, tomorrow, and then I can merge both and move on to other tickets. This all took a lot of effort for w/e reason :(

@bitprophet
Copy link
Member

Famous last words, eh? May need to tweak this further for it to work for Fabric, see fabric/fabric#1989 (comment) and maybe subsequent.

@bitprophet
Copy link
Member

bitprophet commented Aug 6, 2019

Did a big ol overhaul based on those comments and I think I like the result better. Runner itself now owns the timer stuff, defines a new API member kill(), and subclasses just define that to say what they want to happen on a timeout-driven termination. So Local says os.kill(me, SIGKILL), and fabric's Remote says Channel.close().

This also handily removes a lot of the worry about backwards compatibility - eg a currently-released Fabric 2 trying to use this new Invoke version will only end up invoking a too-new API on an actual timeout - which that version of Fabric won't know how to request. I think. It's been a long day.

bitprophet added a commit to fabric/fabric that referenced this pull request Aug 6, 2019
@bitprophet
Copy link
Member

Added CLI flags, tidied up the Fabric side of things, and merged everything to respective masters. Done?!

@bitprophet bitprophet closed this Aug 6, 2019
@fruch
Copy link
Contributor Author

fruch commented Aug 6, 2019

@bitprophet thanks a lot, there new versions uploaded to pypi ? for this and for fabric ?

@bitprophet
Copy link
Member

@fruch Yes, Invoke 1.3 and Fabric 2.5 went up yesterday!

fruch added a commit to fruch/scylla-cluster-tests that referenced this pull request Aug 7, 2019
fruch added a commit to fruch/scylla-cluster-tests that referenced this pull request Aug 7, 2019
fruch added a commit to fruch/scylla-cluster-tests that referenced this pull request Aug 7, 2019
fruch added a commit to fruch/scylla-cluster-tests that referenced this pull request Aug 7, 2019
bentsi pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Aug 8, 2019
bentsi pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Aug 8, 2019
since those two PR were merged by Jeff:
* pyinvoke/invoke#645
* fabric/fabric#1989

(cherry picked from commit 58eb4ee)
bentsi pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Aug 8, 2019
since those two PR were merged by Jeff:
* pyinvoke/invoke#645
* fabric/fabric#1989

(cherry picked from commit 58eb4ee)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants