New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-36145: Add additional quanta information for pipetask run #207
Conversation
eaa5348
to
2c8d074
Compare
nQuanta, | ||
len(qgraph.taskGraph), | ||
qgraph.graphID, | ||
qg_tasks = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since iterating through the graph is not "free" can you add a if _LOG.isEnabledFor(logging.INFO)
around this?
6ee3e71
to
36a78fd
Compare
Codecov ReportBase: 83.38% // Head: 83.43% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #207 +/- ##
==========================================
+ Coverage 83.38% 83.43% +0.04%
==========================================
Files 48 48
Lines 3870 3881 +11
Branches 589 591 +2
==========================================
+ Hits 3227 3238 +11
Misses 473 473
Partials 170 170
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
12d2ba5
to
d6ef725
Compare
f"QuantumGraph contains {nQuanta} quanta for {len(qgraph.taskGraph)} tasks, " | ||
f"graph ID: '{qgraph.graphID}'" | ||
) | ||
qg_title = f"\033[31m Quanta {'Task'.center(task_col_max_len)}\033[m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't embed opaque color escape codes in print statements. No-one knows what they mean. I think we have a python package in rubin-env that does let you declare them by name, although I worry that using color here is a bit risky given you don't know the color of someone's terminal. Why do we need color?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Colors were chosen to match the outputs of, e.g., butler query-collections
. Is that not desirable? If not, I can revert to no color overrides. How/where is this handled by butler command line queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In butler scripts we create an astropy Table and print it. It might be better for you to do that so you get more flexibility in the output rather than trying to do the padding yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to avoid making multiple calls to the logger for this info block. That would ensure that the preceeding blurb (INFO 2022-10-10T14:49:27.472-07:00 lsst.ctrl.mpexec.cmdLineFwk ()(cmdLineFwk.py:606)
) only gets printed once.
When I switch to using an astropy table and adding it into the logger, e.g.: _LOG.info([qg_header] + [qg_table])
, I get output that looks like this:
INFO 2022-10-10T14:49:27.472-07:00 lsst.ctrl.mpexec.cmdLineFwk ()(cmdLineFwk.py:606) - ["QuantumGraph contains 10 quanta
for 5 tasks, graph ID: '1665438567.4693098-3251389'", <Table length=5>
Quanta Tasks
int64 str23
------ -----------------------
2 isr
2 characterizeImage
2 calibrate
2 writePreSourceTable
2 transformPreSourceTable]
which doesn't seem optimal. Is there a way to get around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to use str()
there? Inside a stringifying a list results in each element using repr()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the purpose of the _LOG.isEnabledFor
clause just to be able to use f-strings (which I thought we should avoid to keep log message evaluation deferred)? It seems like you could keep a single call like _LOG.info("QG has %d quanta for %d tasks, graph ID: %r\n%s", nQuanta, len(qgraph.taskGraph), qgraph.graphID, str(get_qgraph_task_table()))
. Either way I suggest making a separate function to return the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the isEnabledFor is that I was worried this could be a non trivial calculation and we should only do it if we know the message will be issued
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback both. I've just pushed with these edits in place:
- Astropy Table is now used to collect and print the results on screen
- Reverted back to the former logger approach (avoiding use of f-strings)
- Made a separate convenience method,
_generateTaskTable
, which does the heavy lifting in generating these statistics.
In addition, over on the sibling pipe_base
PR (lsst/pipe_base#285), an extra method (getNumberOfQuantaForTask
) has been added which simplifies and speeds up this kind of number generation exercise. In testing for RC2 step 1 (~220000 quanta), this additional for
loop takes ~0.01 seconds when using the older len(qnodes)
framework, and a staggeringly fast ~0.001 seconds when using the newer getNumberOfQuantaForTask
method.
9ebbb11
to
34aba1f
Compare
34aba1f
to
ed4f937
Compare
Checklist
doc/changes