Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix command line crashing #1109

Merged
merged 1 commit into from
Jun 2, 2023
Merged

Fix command line crashing #1109

merged 1 commit into from
Jun 2, 2023

Conversation

jan-janssen
Copy link
Member

Certain queuing systems like flux list pyiron jobs as failed, this is fixed now.

Certain queuing systems like flux list pyiron jobs as failed, this is fixed now.
@jan-janssen jan-janssen merged commit 6f44f34 into main Jun 2, 2023
23 checks passed
@delete-merged-branch delete-merged-branch bot deleted the command_line_fix branch June 2, 2023 05:09
@pmrv
Copy link
Contributor

pmrv commented Jun 2, 2023

Does this mean on certain queuing systems the log file is not properly written or we are running in a different directory?

@jan-janssen
Copy link
Member Author

Everything that happens in pyiron_base/jobs/job/wrapper.py uses its own logger. So in this case no pyiron.log file exists. When we submit jobs to the queue it internally calls python -m pyiron_base.cli wrapper -p {{working_directory}} -j {{job_id}} so these calls do not write log files as they use the wrapper module. Consequently, when we try to delete the log file but not log file exists the job crashes and returns an exit status 1 rather than 0. At least the flux queuing system lists these jobs as failed, while pyiron lists them as finished correctly. With this pull request this is fixed.

@pmrv
Copy link
Contributor

pmrv commented Jun 15, 2023

The patch is perfectly good, I'm just trying to understand why this happens only with flux and not pysqa. From #1120 I guess that it also just calls python -m pyiron_base.cli wrapper -p {{working_directory}} -j {{job_id}}. I'll try and see if I can at least make everything use the same logging file.

@pmrv
Copy link
Contributor

pmrv commented Jun 15, 2023

So I found that wrapper does its own logger instantiation (of the same name) here as opposed the global one, which doesn't add a FileHandler for pyiron.log. I'm guessing this is because we don't want to have a pyiron.log in the job folder (at least on our default SLURM we capture stdout/stderr [from the default StreamHandler] into error.log instead). Then again pyiron_base.jobs.job.wrapper does import pyiron_base.state.logger via pyiron_base.state, which should then install also the FileHandler for pyiron.log. So the only influence of pyiron_base.jobs.job.wrapper.setup_logger should be to change the default logging levels.

The only way I can then see the logging file not being present is if the python program crashes very early, before either state or the wrapper are imported, probably because of some other import error. Is it possible that you only had those errors with flux because some environment variable or other was not forwarded and that made it fail so early? (Guessing, I have no idea how flux works.)

Either way, we should maybe discuss what is the intended logging that we want and simplify the situation accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants