Skip to content

Conversation

@MichelleGower
Copy link
Collaborator

Prior to running actual Quanta, an intialization step should be run
to save config files, software versions, schemas, etc. Previously,
there was a single job per PipelineTask in the pipeline that called
pipetask --initonly. The work in this ticket changed that to be a
single job per submission (i.e., once for entire QuantumGraph).

Note: This code change requires removing --extend-run from the
runQuantumArgs entry for pipetask_init in user BPS configs.

Review Note: Did not write unit test for these code changes as
bps_core.py's code will be reorganized in next ticket (DM-27039).

Prior to running actual Quanta, an intialization step should be run
to save config files, software versions, schemas, etc.  Previously,
there was a single job per PipelineTask in the pipeline that called
`pipetask --initonly`.  The work in this ticket changed that to be a
single job per submission (i.e., once for entire QuantumGraph).

Note:  This code change requires removing --extend-run from the
runQuantumArgs entry for pipetask_init in user BPS configs.
Copy link
Contributor

@mxk62 mxk62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with merging it to master. I'd appreciate if you could address few minor issues though.

initgraph = networkx.DiGraph()

# create nodes for executing --init-only
tnode_name = f"pipetask_init"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you need an f-string here.


# create nodes for executing --init-only
tnode_name = f"pipetask_init"
lfn = "pipetask_init"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable seems to be unused until it's assigned a new value in line 443. I suspect that your intend was to use it in line 433 (analogously to line 449). If it is not the case, remove it. if you intend to keep it around though, could you please add a comment what lfn (which, as I understand, stands for "logical file name") means in the context of a task node?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed for task node. Removed.

tnode_name,
node_type=TASKNODE,
task_abbrev="pipetask_init",
label="pipetask_init",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow what's going on here. It looks liketask_abbrev and label holds the same value as tnode_name. Do they have to be initialized independently? In other words, why just don't use tnode_name to initialize them?

shape="box",
style="rounded",
)
initgraph.add_edge(qnode_name, tnode_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest renaming qnode_name to inode_name similarly to onode_name which is used later (line 460). Alternatively, I'd use fnode_name in both cases as it's quite clear from the context what node we are dealing with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for init was kept similar to the main code. qnode_name was purposefully picked because it stood for QuantumGraph pickle file. But changing it to fnode_name.


_LOG.info("creating init task input(s)")
lfn = basename(self.qgraph_filename)
qnode_name = lfn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lfn is used as a file node name later (line 446), why do you need a separate variable for it, to improve code readability? Or lfn being a node name is only applicable to the initialization graph, but not to the entire workflow graph?

(The comment also applies to lines 460-461.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changed all the separate variables to fnode_name.

job_attrib={"bps_jobabbrev": "pipetask_init"},
shape="box",
fillcolor="gray",
# style='"filled,bold"',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not used, what is the reason behind keeping it?

----------
init_nodes: `dict`
Dict of task and file nodes for init tasks
Assumes initialization jobs must be new (and only) source in workflow graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have problems with interpreting this sentence. Initially you talk about multiple jobs, but then refer to them as a single source. Did you mean that each of the initialization jobs needs to be a source or are you saying that initialization jobs cannot be represented as a disjoint graph (i.e. they need to have a single source)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant there would no longer be any job sources from the original workflow graph still as job sources. This subgraph becomes the parent of all job sources in the original workflow graph.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing comment to: Assumes that all of the initialization should be executed prior to any of the current workflow.

@MichelleGower MichelleGower merged commit a1fe33c into master Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants