Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3 enhancements for the SLURM job runner #3218

Merged
merged 4 commits into from Nov 30, 2016

Conversation

Projects
None yet
3 participants
@nsoranzo
Copy link
Member

commented Nov 25, 2016

  • Fix _get_slurm_state_with_sacct() for not completed jobs
  • Handle also COMPLETED state in _complete_terminal_job
  • Fix _get_slurm_state_with_sacct() for not completed jobs

Ping @natefoo and @lparsons.

@lparsons
Copy link
Contributor

left a comment

Looks good to me, though I don't really have a way to test it.

@@ -78,6 +91,8 @@ def _get_slurm_state():
if slurm_state == 'NOT_FOUND':
log.warning( '(%s/%s) Job not found, assuming job check exceeded MinJobAge and completing as successful', ajs.job_wrapper.get_id_tag(), ajs.job_id )
drmaa_state = self.drmaa_job_states.DONE
elif slurm_state == 'COMPLETED':
drmaa_state = self.drmaa_job_states.DONE

This comment has been minimized.

Copy link
@natefoo

natefoo Nov 29, 2016

Member

Can we log something here? I think the case that DRMAA reports failure but Slurm reports success and thus Galaxy reports success should probably be noted somehow.

if stripped_line == SLURM_MEMORY_LIMIT_EXCEEDED_MSG:
return 'This job was terminated because it used more memory than it was allocated.'
elif any(_ in stripped_line for _ in SLURM_MEMORY_LIMIT_EXCEEDED_PARTIAL_WARNINGS):
return 'This job was cancelled probably because it used more memory than it was allocated.'

This comment has been minimized.

Copy link
@natefoo

natefoo Nov 29, 2016

Member

This now causes jobs w/ the warning to fail, right? Is that intentional?

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 30, 2016

Author Member

No, this function is called only when slurm_state is already 'CANCELLED', it will only change the fail_message.

This comment has been minimized.

Copy link
@natefoo

natefoo Nov 30, 2016

Member

Right, ok, thanks!

@nsoranzo nsoranzo force-pushed the nsoranzo:slurm_fixes branch from a13bfb5 to cc9b741 Nov 30, 2016

@natefoo

This comment has been minimized.

Copy link
Member

commented Nov 30, 2016

Not sure what's up with Travis, but...

@natefoo natefoo merged commit ead8e15 into galaxyproject:dev Nov 30, 2016

3 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
api test Build finished. 234 tests run, 0 skipped, 0 failed.
Details
framework test Build finished. 126 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 580 tests run, 0 skipped, 0 failed.
Details

@nsoranzo nsoranzo deleted the nsoranzo:slurm_fixes branch Nov 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.