DM-26385: Perform pipetask --initonly in a single job. #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

MichelleGower merged 2 commits into master from tickets/DM-26385

Oct 12, 2020

Collaborator

MichelleGower commented Oct 9, 2020

Prior to running actual Quanta, an intialization step should be run
to save config files, software versions, schemas, etc. Previously,
there was a single job per PipelineTask in the pipeline that called
pipetask --initonly. The work in this ticket changed that to be a
single job per submission (i.e., once for entire QuantumGraph).

Note: This code change requires removing --extend-run from the
runQuantumArgs entry for pipetask_init in user BPS configs.

Review Note: Did not write unit test for these code changes as
bps_core.py's code will be reorganized in next ticket (DM-27039).


          Perform pipetask --initonly in a single job.

0cd11ee

Prior to running actual Quanta, an intialization step should be run
to save config files, software versions, schemas, etc.  Previously,
there was a single job per PipelineTask in the pipeline that called
`pipetask --initonly`.  The work in this ticket changed that to be a
single job per submission (i.e., once for entire QuantumGraph).

Note:  This code change requires removing --extend-run from the
runQuantumArgs entry for pipetask_init in user BPS configs.

mxk62 approved these changes

View reviewed changes

Contributor

mxk62 left a comment

I'm fine with merging it to master. I'd appreciate if you could address few minor issues though.

python/lsst/ctrl/bps/bps_core.py Outdated

    
                      initgraph = networkx.DiGraph()

                      # create nodes for executing --init-only

                      tnode_name = f"pipetask_init"

Contributor

mxk62 Oct 12, 2020

It doesn't look like you need an f-string here.

python/lsst/ctrl/bps/bps_core.py Outdated

    
                      # create nodes for executing --init-only

                      tnode_name = f"pipetask_init"

                      lfn = "pipetask_init"

Contributor

mxk62 Oct 12, 2020

This variable seems to be unused until it's assigned a new value in line 443. I suspect that your intend was to use it in line 433 (analogously to line 449). If it is not the case, remove it. if you intend to keep it around though, could you please add a comment what lfn (which, as I understand, stands for "logical file name") means in the context of a task node?

Collaborator Author

MichelleGower Oct 12, 2020

Not needed for task node. Removed.

python/lsst/ctrl/bps/bps_core.py Outdated

    
                          tnode_name,

                          node_type=TASKNODE,

                          task_abbrev="pipetask_init",

                          label="pipetask_init",

Contributor

mxk62 Oct 12, 2020

I don't quite follow what's going on here. It looks liketask_abbrev and label holds the same value as tnode_name. Do they have to be initialized independently? In other words, why just don't use tnode_name to initialize them?

python/lsst/ctrl/bps/bps_core.py Outdated

    
                          shape="box",

                          style="rounded",

                      )

                      initgraph.add_edge(qnode_name, tnode_name)

Contributor

mxk62 Oct 12, 2020

I would suggest renaming qnode_name to inode_name similarly to onode_name which is used later (line 460). Alternatively, I'd use fnode_name in both cases as it's quite clear from the context what node we are dealing with.

Collaborator Author

MichelleGower Oct 12, 2020

The code for init was kept similar to the main code. qnode_name was purposefully picked because it stood for QuantumGraph pickle file. But changing it to fnode_name.

python/lsst/ctrl/bps/bps_core.py Outdated

    
                      _LOG.info("creating init task input(s)")

                      lfn = basename(self.qgraph_filename)

                      qnode_name = lfn

Contributor

mxk62 Oct 12, 2020

If lfn is used as a file node name later (line 446), why do you need a separate variable for it, to improve code readability? Or lfn being a node name is only applicable to the initialization graph, but not to the entire workflow graph?

(The comment also applies to lines 460-461.)

Collaborator Author

MichelleGower Oct 12, 2020

Just changed all the separate variables to fnode_name.

python/lsst/ctrl/bps/bps_core.py Outdated

    
                          job_attrib={"bps_jobabbrev": "pipetask_init"},

                          shape="box",

                          fillcolor="gray",

                          # style='"filled,bold"',

Contributor

mxk62 Oct 12, 2020

If not used, what is the reason behind keeping it?

python/lsst/ctrl/bps/bps_core.py Outdated

    
                      ----------

                      init_nodes: `dict`

                          Dict of task and file nodes for init tasks

                      Assumes initialization jobs must be new (and only) source in workflow graph.

Contributor

mxk62 Oct 12, 2020

I have problems with interpreting this sentence. Initially you talk about multiple jobs, but then refer to them as a single source. Did you mean that each of the initialization jobs needs to be a source or are you saying that initialization jobs cannot be represented as a disjoint graph (i.e. they need to have a single source)?

Collaborator Author

MichelleGower Oct 12, 2020

I meant there would no longer be any job sources from the original workflow graph still as job sources. This subgraph becomes the parent of all job sources in the original workflow graph.

Collaborator Author

MichelleGower Oct 12, 2020

Changing comment to: Assumes that all of the initialization should be executed prior to any of the current workflow.


          Make changes recommended by reviewer.

ab04aaa

MichelleGower merged commit a1fe33c into master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet