Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"unknown signal?!" reported from JobInfo terminatedSignal #26

Open
EricR86 opened this issue May 29, 2019 · 0 comments
Open

"unknown signal?!" reported from JobInfo terminatedSignal #26

EricR86 opened this issue May 29, 2019 · 0 comments

Comments

@EricR86
Copy link
Contributor

EricR86 commented May 29, 2019

Hello,

I'm not sure if this the origin of this particular bug, but I have not successfully reproduced this error on other DRMAA implementations.

I've submitted jobs to my SLURM 18.08 system where, occassionally, I get a reported "unknown signal?!". The exact same job, when resubmitted, may or may not have this issue. I cannot track down exactly what happens when this occurs or what causes this.

I have run strace on the job itself that was submitted on equivalent jobs, one which reports the "unknown signal" vs a regular exiting job and I cannot find any discernable difference and notably when tracing specifically for any signals.

sacct reports nothing unusual, and actually seems to indicate that the job exited without issue. The sysadmin for our cluster system seems to agree and cannot find any issue.

This could be a cluster-specific issue, DRMAA issue, or not. If I'm looking in the wrong place please kindly redirect me. I'm not sure where or how I could start tracking down this issue.

Thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant