Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more execution graph node types #36

Open
eirrgang opened this issue Apr 29, 2018 · 2 comments
Open

more execution graph node types #36

eirrgang opened this issue Apr 29, 2018 · 2 comments

Comments

@eirrgang
Copy link
Collaborator

eirrgang commented Apr 29, 2018

For flexible workflow configuration, we need to move forward with abstractions for more input and output nodes. Coupled to this, we need more concrete design for distinguishing types of dependency relationships / edge types. This is in part to clarify the binding process both to a high-level user and for implementation purposes.

The current sense is that workflow elements should have each of their interactions explicit to avoid unexpected behavior. This can be done by requiring the elements to be named in keyword parameters that the context can resolve to API-specified connection types during graph translation. For instance, a plugin that can provide both a force calculation interface and a stop condition to an MD operation would be listed twice in the MD parameters, such as (restraint=myplugin, stop=myplugin). During translation to the executable graph, the translator for myplugin will participate in two binding protocols, one for each interaction type.

In the long run, there is an important distinction between interactive edge types and data flow edge types that will need to be worked out. In the above case, a stop condition can ultimately be a data event, but the restraint force calculation interface is a tightly bound interaction with data flow in both directions during a single time step and is dependent on the MD engine implementation. However, this just means that in reality the MD engine is represented by several nodes corresponding to different phases of the MD loop iteration. Right now, this is implicit, but maybe we should make it explicit for consistency and be clear that those several nodes of MD engine and MD plugin are fused from the perspective of the workflow-level scheduler and deferred to the simulation-level scheduler.

In addition, we need to figure out where to put the protocol for declaring that no interaction needs to take place for x number of steps, which is a lower-level optimization for infrequent call-backs.

MD input

  • structure/configuration
  • topology
  • integrator state
  • simulation parameters (such as nsteps and other MDP options)?
  • stop condition (an edge type) Stop condition hook #62

MD output

  • structure/configuration
  • checkpoint information(?)

Data operations

  • simulation operations
    • add
    • mean (client of operation should not need to know the nature of the domain decomposition)
  • ensemble operations
    • add
    • mean (client of operation should not need to know the size of the ensemble)
  • logical operations: may be necessary to implement workflow logic not already available, such as to produce a stop condition for a simulation that has converged or run long enough.

Data source

Define and initialize a data structure (array, scalar, or key-value block) that can be initialized, checkpointed, and updated while passing through other operations on a sweep of the graph in a TensorFlow-like manner.

@eirrgang
Copy link
Collaborator Author

Per conference with @peterkasson and @jmhays these ideas have been fleshed out a bit. We want to avoid ambiguity in what the params of an element are for. Are they general work element parameters? Parameters to the operation and/or its translator? Or parameters to be passed to the underlying API object?

To resolve the ambiguity, the parameters structure will be more rigidly specified as a hierarchical map structure. The top level keys will have special specified meaning to allow the Context to interpret intended behavior. This exposes some details and constraints of the data flow graph, which is not yet specified, to a higher level interface, but this may be appropriate at this point.

Specified top-level keys:

  • input: maps named input "ports" of the element to named output ports of other defined elements
  • data: is a serializable data structure (most likely a JSON-compatible key--value map) that is provided as an argument to the translator.

Specified grammar:

  • The period (.) has special meaning for symbols in a work specification, indicating namespace hierarchy.
  • The interface for input and output hooks of an element are input and output, respectively.
  • Within input and output are named "ports" that are not yet specified by the schema, but which we probably want to standardize for some categorization of graph edge data types or named streams.
  • output ports produce a single data event at the end of an operation and provide some sort of immutable datagram, such as a scalar constant or static structured data.
  • ports for asynchronous data streams or that produce more than a single data event are not yet specified.

The first use case is as a way to allow parameters of API objects to be provided either as constructor arguments or as a data event. To avoid ambiguity, @peterkasson proposed that it is an error for an element to have both specified. I'm not sure that is necessary to specify, though. If both are provided, the intuitive behavior is that the object is initialized with data and updated by input. For operations for which that doesn't make sense, we can certainly add error checking, but I don't think the constraint needs to be specified in the API at this point. Since users will not generally access this schema directly, additional logic and error checking can be built into the helper functions. Additionally, we haven't yet fully specified the mechanism by which the context maps elements to a translator to build the data flow graph, and it makes sense that this map would be the place where we put that sort of meta-data, along with the meta-data for context/data-flow requirements/constraints that have been looking for a permanent home.

Note: the depends list now contains redundant information, but that is okay for now. Suggestion: it is an error for the dependency list not to contain all of the elements used as input ports, and we retain the constraint that elements in depends must exist in the work specification before the element can be added.

Note: As we improve the serializability and element generation, we can relax the current constraints on how elements are updated after they are added to the graph. In the Python module, Element objects will be more proper proxies to the work specification and not store copies of their parameters. This simplifies the model mapping Python objects to shared structured data. An Element object is a "view" to the workspec object.

It seems worth pointing out how this compares (so far) to the TensorFlow model. Up to now, the only type of data represented by gmxapi is as graph edges with kind of a vague implicit way to pass the output stream from one operation to the input stream of an operation on the subsequent sweep. In TensorFlow, persistence between sweeps of the graph is through variables, which live outside of the data flow graph. If variables are used in a way such that order of operations matters, TensorFlow produces an error. Instead of transformations of a data stream, data flow conditions can be explicitly specified for variables through separate API calls. The analog for gmxapi would be that the current data streams are variables with implicit flow conditions, and the more tightly coupled interactions are the only TensorFlow-like graph edges. The Restraint force calculation, then, is a "layer" in which an MD operation provides a coordinates edge to the calculate operation, which provides a forces edge to a summing operation in the MD, all of which are "optimized" into a fused operation: the Restraint::evaluate() call. In the above grammar, output is a collection of operations producing tensors, and input is a collection of scoped variables (which may or may not be placeholders, initialized from constants, or part of the TRAINABLE set). We can work more to apply additional wisdom about ensemble simulation data flows to refine the gmxapi model, but it seems like we just need to do some clarification on the distinct forms of data and data interactions.

@eirrgang
Copy link
Collaborator Author

An early idea was to add operations whose job it was to initiate a data stream and provide hooks for sending data through other operations, broadcasting, or collecting. As workspec_0_2 is formalized, we may find that we need a node to represent the operation that turns persistent data into flow data at the upstream end of the DAG, and captures it back to persistent data for subsequent passes. In early workspec_0_1 syntax, an example is embedded as comments in src/gmx/context.py and src/gmx/test/test_mpiarraycontext.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
0.0.9
  
To do
Development

No branches or pull requests

1 participant