You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While debugging flux-framework/flux-sched#1182 it took much longer than necessary to determine what was going on because the prolog was failing silently.
It so happens that flux-perilog-run.py checks for offline ranks so it can avoid targetting them with flux-exec(1)
# Check for any offline ranks and subtract them from targets.# Optionally drain offline ranks with a unique message that prolog/epilog# failed due to offline state:#offline=offline_ranks(handle) &ranksifoffline:
returncode=1LOGGER.info("%s: %s: ranks %s offline. Skipping.", jobid, name, offline)
ranks.subtract(offline)
ifargs.drain_offline:
drain(handle, offline, f"offline for {jobid}{name}")
I guess LOGGER.info() messages are not emitted by default, because the "Skipping" message is not emitted without -v, thus the silent treatment. Also, the prolog is set to fail if there is any offline ranks.
This should be improved to log the error message. Also, it would be helpful to emit a specific exception here instead of generic "prolog failed with exit code=1" exception.
The text was updated successfully, but these errors were encountered:
While debugging flux-framework/flux-sched#1182 it took much longer than necessary to determine what was going on because the prolog was failing silently.
It so happens that
flux-perilog-run.py
checks for offline ranks so it can avoid targetting them withflux-exec(1)
I guess
LOGGER.info()
messages are not emitted by default, because the "Skipping" message is not emitted without-v
, thus the silent treatment. Also, the prolog is set to fail if there is any offline ranks.This should be improved to log the error message. Also, it would be helpful to emit a specific exception here instead of generic "prolog failed with exit code=1" exception.
The text was updated successfully, but these errors were encountered: