forked from flux-framework/flux-core
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
job-manager: make some replay errors non-fatal
Problem: if a few jobs get messed up in the KVS due to an improper shutdown, recovery is a tedious process involving starting flux in --recovery mode, fixing one job, and starting again. When a job cannot be replayed from the KVS and the reason is that the directory is incomplete, log the failure at LOG_ERR level but let replay continue and ultimately the flux restart be successful. If a job has more serious problems like incorrect content in the eventlog, treat that as a fatal error as before. This avoids breaking the 'valid' tests that check backwards compatibility with older kvs dumps, which might use an older eventlog format. Update t2219-job-manage-restart.t to expect warnings rather than failure when such jobs are encountered during replay. Fixes flux-framework#5147
- Loading branch information
Showing
4 changed files
with
45 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters