Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-34076: Improve timeout handling by MPGraphExecutor #174

Merged
merged 2 commits into from Mar 17, 2022

Conversation

andy-slac
Copy link
Collaborator

Detect the case when a task manages to finish successfully before it
gets killed due to timeout. This guarantees more consistent status in
a quantum report.

Increase running time for timed out task in a unit test.
Running time for timing-out task increased to 100 seconds, this should
reduce the chance for it to finish successfully before timeout is
detected by a parent.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay. My main confusion is why something that is timed out but still finishes before the time out completes can't be treated as a success.


@property
def state(self):
"""Job processing state (JobState)"""
return self._state

@property
def terminated(self):
"""Return True if job was killed by stop() method (`bool`)"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but also if the exit code was negative?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of implied, if stop was called but did not kill the process then it was not killed by stop. I'll expand the docstring.

jobs.setJobState(job, JobState.FAILED)
if job.terminated:
# Was killed due to timeout.
if self.report.status == ExecutionStatus.SUCCESS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is killed due to time out and reports success why can't that be treated as success? How can you be timed out and successful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is treated as success, terminated is False for that case, and check for exitcode == 0 is in the if branch above. This branch is only for the case when it was actually killed by a signal (had non-zero exitcode)

Detect the case when a task manages to finish successfully before it
gets killed due to timeout. This guarantees more consistent status in
a quantum report.
Running time for timing-out task increased to 100 seconds, this should
reduce the chance for it to finish successfully before timeout is
detected by a parent.
@andy-slac andy-slac merged commit 768b03a into main Mar 17, 2022
@andy-slac andy-slac deleted the tickets/DM-34076 branch March 17, 2022 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants