New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-29614: Have bps report show info from multiple submit nodes #61
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some of the functions responsible for extracting job information from DAGman files where rising StopIteration exception when a file was missing. Modified them to raise the FileNotFoundError to make things less confusing.
MichelleGower
approved these changes
Oct 21, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions regarding correctness of code along with some docstring/comment suggestions. Can be merged after addressing.
Some WMS (e.g. HTCondor) uses distributed job queues. I added a new flag, --global, to bps report allowing one to control whether it should query for information all available job queues or only local one (the default).
When a user provides submission directory as a run id to bps report, it retrieves the required information from various HTCondor files in the submission directory. However, none of this files contains the global job id of the DAGMan job (at least in ver. 8.8 and 9.0 of HTCondor). Hence I added an additional step to HTCondor submission which persists the global id in a JSON file for future reference.
Function htc_version() was returning a HTCondor version where major, minor, and revision numbers were padded with zeros, i.e., 000X.000Y.000Z. However, in other parts of the code this string was compared against non-padded (X.Y.Z) form. Removed the padding and used a third-party utility (included in the LSST stack) packaging.version() to make this comparisons works as expected.
Names of some local variables varied somewhat strongly between functions though these variables were associated with objects of the same type. While there was nothing wrong with the code because of it, I changed names of some variables to keep them consistent and hopefully make the code easier to follow.
Probably due to some earlier git rebase gone wrong the arugments of cancel() were messed up. Fixed them.
Some WMS (e.g. HTCondor) uses distributed job queues. I added a new flag, --global, to bps cancel allowing one to control whether it should attempt to remove jobs matching the search criteria from all available job queues or only local one (the default).
PanDA plugin overrides only report() method (though it doesn't look like it uses it anyway). I updated it's signature to reflect adding `--global option to bps report command.
Made bps report to show global job id when printing a detailed report for a given job.
The additional information displayed by report() in case of any errors was shown before showing error messages reported by the plugin which was slightly confusing. Switched the order in these messages are being displayed.
According to HTCondor docs for ver. 8.8 says one can use 'location_ad' as the name argument when initializing a Schedd (an object representing condor_scheduler). However, HTCondor's throws a fit when it is actually used so I removed it (it works fine with ver. 9.0 though).
When using the submission path as an id, bps report was not displaying the correct status for deleted jobs. Fixed that.
Some of the earlier changes might have stopped Pegasus plugin from working. I made adjustments to prevent that from happening.
mxk62
force-pushed
the
tickets/DM-29614
branch
2 times, most recently
from
October 25, 2021 18:00
65e71cf
to
cfd1fee
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Checklist
doc/changes