Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Communicate held jobs to the user #10

Open
JaikrishnaTS opened this issue Apr 20, 2017 · 0 comments
Open

Communicate held jobs to the user #10

JaikrishnaTS opened this issue Apr 20, 2017 · 0 comments

Comments

@JaikrishnaTS
Copy link
Contributor

The jobs that are held by condor due to issues need to be communicated to the client with the status message/email. Available options are allowing the user to cancel the other jobs and return with the logs (in case of a power user/debug enabled); cancel the held jobs (from condor) with a descriptive file in the result about them and proceed with other jobs.

Also, the jobs that are directly managed through condor - held, stopped etc, don't propagate their status to the DB and EMS keeps querying them over and over leading to a performance issue. Without maintenance clearing of the DB, this leads to condor_history using lots of CPU. Solving the above issue needs to be done in a way that this one is avoided. This particular issue could be fixed by modifying https://github.com/GRAPLE/GWS/blob/master/ems.py#L140 process_once function to also account for held jobs (make up a new experiment status - 'held'/'error').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant