Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task abort and other stop fix-ups #253

Merged
merged 9 commits into from
Oct 17, 2022
Merged

Task abort and other stop fix-ups #253

merged 9 commits into from
Oct 17, 2022

Conversation

mhasself
Copy link
Member

@mhasself mhasself commented Jan 19, 2022

Description

  • Task abort is supported by OCSAgent.
  • stopper / aborter functions run in same threading model as the start function (by default; can override).
  • stopper / aborter wrapping code performs checks to avoid running the stopper / aborter if op is not in progress (solves Prevent stop calls to already 'done' operations #214).

Motivation and Context

Task abort has always been a planned part of the API, with similar behavior to Process stop. As we transition to more realistic and automated operation, long-running Tasks need to have smart abort capabilities (we can't just cycle the Agent whenever a Task needs to be interrupted).

Resolves #214.

How Has This Been Tested?

I tested this using the FakeDataAgent and HostManager, through ocs-web. I also confirmed the OCSClient interface passes abort requests successfully.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

The potentially breaking change here is that stopper functions (for Processes) will now be run in the same threading pattern (reactor or worker thread) as the starter function, unless the user explicitly requests the other pattern when registering the Process. Previously, the default was to always launch the stopper in a worker thread.

So the new default could affect any Process that is currently registered with blocking=False. However (a) such Processes are rare and (b) there's a bit of code that will catch the case that a stopper function doesn't return a Deferred.

  • W.r.t. (a), I've fixed up the stopper functions in OCS to be compatible with the new defaults. In SOCS, the only cases seem to be the acu_agent (which is going to be modified in response to this work anyway) and the meinberg_m1000 (which looks like it will be handled properly by (b)).
  • W.r.t. (b), if blocking=False, then the stopper function is invoked from the reactor. If the function is decorated with inlineCallbacks, then this will return a Deferred. So: if the function doesn't return a Deferred, it must have been implemented to run in a worker thread. We just ran it in the reactor, which is not the same, but is probably ok since stopper functions usually just set update some instance variable so the Process knows to exit. So, mission accomplished. It's not exactly the same as running in the worker thread but is probably safe and leads to the same result. (When this work-around is applied, a message is printed to the logs.)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • Unless I am preparing a release, I have opened this PR onto the develop branch.

@mhasself mhasself marked this pull request as draft January 19, 2022 21:10
@mhasself
Copy link
Member Author

mhasself commented Jan 19, 2022

This is still in draft ... the other things I'd like to add (or someone else can):

  • Make process "stop" get called in the same way that start is: either in a thread or not depending on "blocking"
  • Enable task "abort" function. (Default behavior is that caller gets not-implemented error, but I think we want to make this an official part of the API.)

@mhasself mhasself marked this pull request as ready for review July 6, 2022 02:03
@mhasself mhasself changed the title OCSAgent: make process stop() report it's been run Task abort and other stop fix-ups Jul 6, 2022
@BrianJKoopman
Copy link
Member

@mhasself, mind rebasing this now that #284 is merged? Testing locally it applies cleanly.

simonsobs and others added 6 commits October 10, 2022 20:39
Also catch errors in the stop function.  Document the signatures for
process/task start/stop functions.
By default, the code will expect the stop function to run the same way
as the start function.  Some existing agents have non-blocking start
functions, but the stoppers are not written with inlineCallbacks; the
current code will notice this and print a warning message.
They should run in the reactor now.
This functions the same way as Process "stop", except that it is
entirely optional and should normally cause the operation to exit with
error.
In addition to errback there is now also a callback to print something
to the log on success.  The errback had a typo so it probably never
worked in the past.
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I caught a couple of small things, but otherwise looks good.

I would like to incorporate this into the Agent writing guide so that there's documentation for how to implement a task aborter. I'm thinking maybe the barebones agent's print task could be renamed "repeat" or something, having it repeat a string X number of times, waiting a second in between, and allowing us the ability to abort. Just trying to keep the task simple...

What about for more complicated, longer running tasks? Should the Agent just occasionally check session.status and return if it sees it's no longer "running"?

ocs/ocs_agent.py Outdated Show resolved Hide resolved
ocs/agents/fake_data/agent.py Outdated Show resolved Hide resolved
@BrianJKoopman
Copy link
Member

Alright, short new section in the task part of the Agent writing guide introduced in d36c161.

One thing that occurred to me is that maybe we should have some indication of which tasks can be aborted in the docstrings for tasks that do have an aborter implemented? Any ideas here?

@mhasself
Copy link
Member Author

Thanks for the fixes and docs.

One thing that occurred to me is that maybe we should have some indication of which tasks can be aborted in the docstrings for tasks that do have an aborter implemented? Any ideas here?

Ya, perhaps:?

**Task** (abortable) - Print some text passed to the Task.

Also it may be worth saying, in the section that you added, that stop functions in processes are basically the same thing. Also please mention that when a Task is aborted it should normally return with error.

@BrianJKoopman BrianJKoopman self-requested a review October 17, 2022 14:32
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the suggestions. I pushed a small update. Will go ahead with the merge when the checks pass. Then prep a new ocs release.

@BrianJKoopman BrianJKoopman merged commit a9093ef into develop Oct 17, 2022
@BrianJKoopman BrianJKoopman deleted the stop-better branch October 17, 2022 15:04
@BrianJKoopman BrianJKoopman mentioned this pull request Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent stop calls to already 'done' operations
2 participants