Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reusable output node #40

Open
eirrgang opened this issue Apr 29, 2018 · 3 comments
Open

reusable output node #40

eirrgang opened this issue Apr 29, 2018 · 3 comments

Comments

@eirrgang
Copy link
Collaborator

eirrgang commented Apr 29, 2018

We need to be able to append trajectories sensibly and robustly along a single simulation pipeline.

Most basically, some workflows will included several stages of simulation that should produce a single continuous trajectory. Whether the lower level implementation involves multiple GROMACS program launches or a single launch (that runs for a bit, changes parameters, and runs more) is not relevant at the higher-level interface. So this issue has a few parts:

  1. What does the workflow graph look like for output in multi-stage / adaptive simulations?
  2. What is the sensible implementation in the short and long term?
  3. What does that look like in the execution graph?

One way is to have a single output node represented in the workflow. Multiple simulation nodes in the workflow graph could run as a pipeline. In order for the output node to be responsible for writing the entire trajectory, the intermediate nodes would "pass through" trajectory frames for time steps before the simulation node is active.

Two obvious alternatives are

  1. Each simulation node has an output node to perform the operation of writing trajectory data out (filesystem I/O is not a native workflow data stream type). We could try to handle appending to the same trajectory automagically or allow input parameters for the output nodes to specify accumulating frames by appending a single trajectory file.
  2. More fully embrace the idea of data as graph edges. When the stream is initialized, specify that it is a file-backed stream and carry the necessary metadata to properly maintain the trajectory file as the graph is executed.

I like the latter and it seems more TensorFlow-ish, but I have thought about it less and it implies introducing more formalism into one or both of our graph schema: the workflow specification graph (specified in the high-level API) and the execution / data flow DAG (currently evolving fluidly).

@peterkasson
Copy link
Collaborator

Just checking--this is a workflow node but not one that requires communication between contexts, right? i.e. it could effectively be a map() operation so contexts do their output independently.

@eirrgang
Copy link
Collaborator Author

I see what you mean about a map() operation, but that's not quite what I was getting at here. I've updated the description in the hopes of clarifying.

Map operation is a good thing to bring up, though. The grand plan is to have parallel execution essentially work that way, but right now it is rigidly structured as an array of pipelines with possible operations across the full array (ensemble operations). That is, we have simulation scope and ensemble scope for operations or data flow. I think the first step in allowing something more map() like would be to explicitly expand the parallel edges and nodes in the execution DAG. Right now we have it more as "array operations of a given width can feed array operations of the same width, with the exception of many-to-one and one-to-many". Otherwise, we start dealing with more scheduling logic, which I would rather defer to specific context implementations (and try to pawn it off to third-party data flow graph execution managers). I think some of the necessary metadata will become clearer as we implement full workflow checkpointing.

@eirrgang eirrgang added this to To do in 0.0.6 May 24, 2018
@eirrgang eirrgang added this to the gmxapi_workspec_0_2 milestone May 28, 2018
@eirrgang eirrgang added this to To do in 0.0.7.2 via automation May 29, 2018
@eirrgang eirrgang removed this from To do in 0.0.6 May 29, 2018
eirrgang added a commit to eirrgang/gmxapi that referenced this issue Jun 26, 2018
Relates to issue kassonlab#40

Functionally, add a boolean `append_output` property to MD simulations
that defaults to `true`. Implemented with an optional keyword / param
that, if set `false`, gives the `mdrun -noappend` behavior.
@eirrgang
Copy link
Collaborator Author

Recording some comments from the pull request pull request that may not be resolved in that pull request:

In current plans for gmxapi 0.1.0, each operation in each identifiably distinct work description is uniquely tagged. In order to implement [more output flexibility], we need to be specific about behavior and artifact handling.

  • What indicates whether two MD operations should continue a previous simulation as if continuing from checkpoint?
  • What indicates that an MD operation should use another simulation output as input, but start a fresh trajectory and log files?
  • For an MD operation that continues by appending to another's output, how do we determine whether to retain the snapshot of the output from the first operation (distinct artifacts) rather than to let the second operation completely take ownership (as if they were the same operation)?

An update should specify behavior in a sufficient level of detail that we can discuss whether the specified behavior as written should be different.

eirrgang added a commit to eirrgang/gmxapi that referenced this issue Jun 28, 2018
eirrgang added a commit that referenced this issue Jul 24, 2018
Relates to issue #40

Functionally, add a boolean `append_output` property to MD simulations
that defaults to `true`. Implemented with an optional keyword / param
that, if set `false`, gives the `mdrun -noappend` behavior.
eirrgang added a commit that referenced this issue Jul 24, 2018
Relates to issue #40 and to bug #130

Resolves task in #130 when merged.
@eirrgang eirrgang moved this from To do to On deck in 0.0.7.2 Nov 15, 2018
@eirrgang eirrgang moved this from On deck to To do in 0.0.7.2 Nov 15, 2018
@eirrgang eirrgang moved this from To do to On deck in 0.0.7.2 Dec 6, 2018
@eirrgang eirrgang moved this from On deck to To do in 0.0.7.2 Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants