Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/changes/DM-52360.perf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Switch the default `finalJob` implementation to the new `aggregate-graph` command, which makes use of multiple cores much more effectively than `transfer-from-graph`.
2 changes: 1 addition & 1 deletion doc/lsst.ctrl.bps/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1161,7 +1161,7 @@ New YAML Section
implementation: JOB
concurrencyLimit: db_limit
command1: >-
${DAF_BUTLER_DIR}/bin/butler transfer-from-graph
${DAF_BUTLER_DIR}/bin/butler aggregate-graph
{fileDistributionEndPoint}{qgraphFile}
{butlerConfig}
--register-dataset-types
Expand Down
7 changes: 6 additions & 1 deletion python/lsst/ctrl/bps/etc/bps_defaults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ useLazyCommands: True
executionButler:
whenCreate: "NEVER"

extraAggregateOptions: ""
finalJob:
retryUnlessExit: 2
updateOutputChain: "--update-output-chain"
Expand All @@ -124,12 +125,16 @@ finalJob:
implementation: JOB
concurrencyLimit: db_limit
finalPreCmdOpts: "{defaultPreCmdOpts}"
requestCpus: 16
requestMemory: 32768
command1: >-
${DAF_BUTLER_DIR}/bin/butler {finalPreCmdOpts} transfer-from-graph
${DAF_BUTLER_DIR}/bin/butler {finalPreCmdOpts} aggregate-graph
{fileDistributionEndPoint}{qgraphFile}
{butlerConfig}
-j {requestCpus}
--register-dataset-types
{updateOutputChain}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments don't match current implementation. Also, the value for -j doesn't match value given for requestCpus. Should it?

I would add a couple more variables to make changes and running with mocks easier. If a different name makes more sense, please change.

finalJobNumProcesses: 32 
extraAggregateOptions: ""
finalJob:
  retryUnlessExit: 2
  updateOutputChain: "--update-output-chain"
  whenSetup: "NEVER"
  whenRun: "ALWAYS"
  # Added for future flexibility, e.g., if prefer workflow instead of shell
  # script.
  implementation: JOB
  concurrencyLimit: db_limit
  finalPreCmdOpts: "{defaultPreCmdOpts}"
  requestCpus: "{finalJobNumProcesses}"
  requestMemory: 65536
  command1: >-
    ${DAF_BUTLER_DIR}/bin/butler {finalPreCmdOpts} aggregate-graph
    {fileDistributionEndPoint}{qgraphFile}
    {butlerConfig} 
    -j {finalJobNumProcesses} 
    --register-dataset-types
    {updateOutputChain}
    {extraAggregateOptions}

So on my smaller workstation running a ci_middleware workflow with mocked failures, I needed to add the following to my submit yaml:

finalJobNumProcesses: 10
extraAggregateOptions: "--mock-storage-classes"

p.s., I wouldn't object to these going inside the final section like finalPreCmdOpts. But looking forward to having a side aggregate job, setting the values once at the root level that would be used by both the aggregate and final jobs would be ideal if generally they would match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adopted you recommended configuration pretty-much as-is (I just revised the default resource usage lower, per the other thread). I didn't realize one could define a new BPS variable implicitly like this.

As for using the finalJob section for the new variables, at this point I'm not really anticipating adding a side aggregate, but I don't think we'll know for sure until CM starts using this in production. I am still hoping to support running it manually on the side, much as one can already do with the transfer-from-graph finalJob. Should we just move the new variables into that section, then?

{extraAggregateOptions}

# Set a global default memory limit (in MiB) for the automatic memory scaling
# mechanism. The value, 480 GiB, picked based on the cluster specifications
Expand Down
Loading