Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-36145: Add additional quanta information for pipetask run #207

Merged
merged 1 commit into from Oct 12, 2022

Conversation

leeskelvin
Copy link
Contributor

@leeskelvin leeskelvin commented Oct 5, 2022

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@leeskelvin leeskelvin force-pushed the tickets/DM-36145 branch 3 times, most recently from eaa5348 to 2c8d074 Compare October 5, 2022 21:55
nQuanta,
len(qgraph.taskGraph),
qgraph.graphID,
qg_tasks = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since iterating through the graph is not "free" can you add a if _LOG.isEnabledFor(logging.INFO) around this?

@leeskelvin leeskelvin force-pushed the tickets/DM-36145 branch 4 times, most recently from 6ee3e71 to 36a78fd Compare October 10, 2022 18:39
@codecov
Copy link

codecov bot commented Oct 10, 2022

Codecov Report

Base: 83.38% // Head: 83.43% // Increases project coverage by +0.04% 🎉

Coverage data is based on head (ed4f937) compared to base (52ca013).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #207      +/-   ##
==========================================
+ Coverage   83.38%   83.43%   +0.04%     
==========================================
  Files          48       48              
  Lines        3870     3881      +11     
  Branches      589      591       +2     
==========================================
+ Hits         3227     3238      +11     
  Misses        473      473              
  Partials      170      170              
Impacted Files Coverage Δ
python/lsst/ctrl/mpexec/cmdLineFwk.py 69.56% <100.00%> (+1.16%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@leeskelvin leeskelvin force-pushed the tickets/DM-36145 branch 2 times, most recently from 12d2ba5 to d6ef725 Compare October 10, 2022 20:45
f"QuantumGraph contains {nQuanta} quanta for {len(qgraph.taskGraph)} tasks, "
f"graph ID: '{qgraph.graphID}'"
)
qg_title = f"\033[31m Quanta {'Task'.center(task_col_max_len)}\033[m"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't embed opaque color escape codes in print statements. No-one knows what they mean. I think we have a python package in rubin-env that does let you declare them by name, although I worry that using color here is a bit risky given you don't know the color of someone's terminal. Why do we need color?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colors were chosen to match the outputs of, e.g., butler query-collections. Is that not desirable? If not, I can revert to no color overrides. How/where is this handled by butler command line queries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In butler scripts we create an astropy Table and print it. It might be better for you to do that so you get more flexibility in the output rather than trying to do the padding yourself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to avoid making multiple calls to the logger for this info block. That would ensure that the preceeding blurb (INFO 2022-10-10T14:49:27.472-07:00 lsst.ctrl.mpexec.cmdLineFwk ()(cmdLineFwk.py:606)) only gets printed once.

When I switch to using an astropy table and adding it into the logger, e.g.: _LOG.info([qg_header] + [qg_table]), I get output that looks like this:

INFO 2022-10-10T14:49:27.472-07:00 lsst.ctrl.mpexec.cmdLineFwk ()(cmdLineFwk.py:606) - ["QuantumGraph contains 10 quanta
 for 5 tasks, graph ID: '1665438567.4693098-3251389'", <Table length=5>
Quanta          Tasks         
int64           str23         
------ -----------------------
     2                     isr
     2       characterizeImage
     2               calibrate
     2     writePreSourceTable
     2 transformPreSourceTable]

which doesn't seem optimal. Is there a way to get around this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to use str() there? Inside a stringifying a list results in each element using repr().

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of the _LOG.isEnabledFor clause just to be able to use f-strings (which I thought we should avoid to keep log message evaluation deferred)? It seems like you could keep a single call like _LOG.info("QG has %d quanta for %d tasks, graph ID: %r\n%s", nQuanta, len(qgraph.taskGraph), qgraph.graphID, str(get_qgraph_task_table())). Either way I suggest making a separate function to return the table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the isEnabledFor is that I was worried this could be a non trivial calculation and we should only do it if we know the message will be issued

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback both. I've just pushed with these edits in place:

  • Astropy Table is now used to collect and print the results on screen
  • Reverted back to the former logger approach (avoiding use of f-strings)
  • Made a separate convenience method, _generateTaskTable, which does the heavy lifting in generating these statistics.

In addition, over on the sibling pipe_base PR (lsst/pipe_base#285), an extra method (getNumberOfQuantaForTask) has been added which simplifies and speeds up this kind of number generation exercise. In testing for RC2 step 1 (~220000 quanta), this additional for loop takes ~0.01 seconds when using the older len(qnodes) framework, and a staggeringly fast ~0.001 seconds when using the newer getNumberOfQuantaForTask method.

@leeskelvin leeskelvin force-pushed the tickets/DM-36145 branch 2 times, most recently from 9ebbb11 to 34aba1f Compare October 11, 2022 20:14
@leeskelvin leeskelvin merged commit 14023ac into main Oct 12, 2022
@leeskelvin leeskelvin deleted the tickets/DM-36145 branch October 12, 2022 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants