Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43381: bps report doesn't show error codes with submit directory as the run id #31

Merged
merged 16 commits into from Apr 10, 2024

Conversation

mxk62
Copy link
Collaborator

@mxk62 mxk62 commented Mar 20, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link

codecov bot commented Mar 20, 2024

Codecov Report

Attention: Patch coverage is 97.89474% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 35.34%. Comparing base (d7ef384) to head (42c63d8).
Report is 1 commits behind head on main.

❗ Current head 42c63d8 differs from pull request most recent head 4ec2a92. Consider uploading reports for the commit 4ec2a92 to get more accurate results

Files Patch % Lines
python/lsst/ctrl/bps/htcondor/lssthtc.py 78.57% 6 Missing ⚠️
tests/test_handlers.py 98.86% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #31       +/-   ##
===========================================
+ Coverage   17.01%   35.34%   +18.33%     
===========================================
  Files           5        7        +2     
  Lines        1387     1737      +350     
  Branches      292      318       +26     
===========================================
+ Hits          236      614      +378     
+ Misses       1148     1121       -27     
+ Partials        3        2        -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions. Merge approved.

doc/changes/DM-43381.bugfix.rst Outdated Show resolved Hide resolved
python/lsst/ctrl/bps/htcondor/lssthtc.py Show resolved Hide resolved
@mxk62 mxk62 force-pushed the tickets/DM-43381 branch 3 times, most recently from 24c6cfe to 26736dd Compare March 26, 2024 19:24
Copy link
Contributor

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge approved. Minor comments mostly suggestions about comments.

Overall I have concerns that someday calling multiple functions per job trying to find the right handler vs just the nested if conditions may end up being too much overhead when there are 100k+ jobs. Right now these overheads are hidden by the time for reading multiple files. Also, this approach was to make future changes easier by just writing another handler. Note, however, the code will need changes if the handlers are needed for values independent of exit values (i.e., can't just quit trying handlers if exit values are handled).

python/lsst/ctrl/bps/htcondor/handlers.py Outdated Show resolved Hide resolved
python/lsst/ctrl/bps/htcondor/lssthtc.py Outdated Show resolved Hide resolved
python/lsst/ctrl/bps/htcondor/handlers.py Outdated Show resolved Hide resolved
python/lsst/ctrl/bps/htcondor/handlers.py Show resolved Hide resolved
python/lsst/ctrl/bps/htcondor/handlers.py Outdated Show resolved Hide resolved
python/lsst/ctrl/bps/htcondor/lssthtc.py Outdated Show resolved Hide resolved
tests/test_handlers.py Show resolved Hide resolved
python/lsst/ctrl/bps/htcondor/handlers.py Show resolved Hide resolved
tests/test_handlers.py Outdated Show resolved Hide resolved
tests/test_handlers.py Show resolved Hide resolved
mxk62 and others added 16 commits April 8, 2024 17:06
``bps report`` wasn't showing error codes/counts when the submit
directory was used as the run id due to missing attributes in jobs'
ClassAds. Made changes to ensure that these attributes will be added to
the ClassAds.
Existing unit tests for ``_get_exit_codes_summary()`` were covering only
the main use case scenario.  Added tests that also covered some of the
possible alternative scenarios (e.g. missing attribute, unknown job
status).
With multiple "corner" cases the function responsible for
including jobs' exit statuses became too complex.  I divided it to
smaller, more manageable pieces.
Both 'Handler' and 'Chain' classes are intended to be fairly generic.
The docstrings and some error messages implied that they can be only
used for dealing with job ClassAds.  Edited the docstrings and slightly
refactored Chain.append() to make it clear.
Some unit tests were using assertIs() instead of just
assertFalse()/assertTrue() and were implicitly taking advantage of the
fact that the ad returned by the handler is the same ad that is passed
to it which in general doesn't need to be true.  Fixed all of those.
Added extra checks to handlers which should allow handlers to be more
specific about what job ClassAds they can handle to avoid situation in
which, for example, a handler for completed jobs will attempt to process
a ClassAd for a held job.
@mxk62
Copy link
Collaborator Author

mxk62 commented Apr 10, 2024

Overall I have concerns that someday calling multiple functions per job trying to find the right handler vs just the nested if conditions may end up being too much overhead when there are 100k+ jobs.

Based on some ad hoc tests I made on my workstation with a workflow consisting of approx. 100k jobs (99,215), this version (i.e. with the bug fix) is roughly 1 s (or ~3%) slower than the original version (without the bug fix). However, the original version effectively does next to nothing in terms of determining job exist status when the submit directory is used as the run id (hence this bug fix). I'd expect the difference between the new version and the a version which works correctly, but uses multi-branch, likely nested if statements would be smaller (if any).

Also, this approach was to make future changes easier by just writing another handler. Note, however, the code will need changes if the handlers are needed for values independent of exit values (i.e., can't just quit trying handlers if exit values are handled).

Of course, it's a matter of personal preferences, but I do think that implementing future changes (and writing unit tests for them) should be easier when the "business logic" of dealing with a job of the given type is encapsulated in a fairly limited, isolated piece of code instead being a part of a (large) set of if statements.

@mxk62 mxk62 merged commit 63a1caa into main Apr 10, 2024
11 checks passed
@mxk62 mxk62 deleted the tickets/DM-43381 branch April 10, 2024 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants