COMPUTE computation The definition & execution of networked operation is split in 1+2 phases:
COMPOSITION
COMPILATION
EXECUTION
... it is constrained by these IO data-structures:
operation
(s) (withneeds
&provides
for each one)- given
inputs
- asked
outputs
... populates these low-level data-structures:
network graph
(COMPOSE time)execution dag
(COMPILE time)execution steps
(COMPILE time)solution
(EXECUTE time)... and utilizes these main classes:
graphtik.op.FunctionalOperation graphtik.netop.NetworkOperation graphtik.network.Network graphtik.network.ExecutionPlan graphtik.network.Solution
compose COMPOSITION The phase where operation
s are constructed and grouped into netop
s and corresponding network
s.
Tip
- Use
~.graphtik.operation()
builder class to construct.FunctionalOperation
instances. - Use~.graphtik.compose()
factory to prepare thenet
internally, and build.NetworkOperation
instances.
compile COMPILATION The phase where the .Network
creates a new execution plan
by pruning
all graph
nodes into a subgraph dag
, and deriving the execution steps
.
execute EXECUTION sequential The phase where the .ExecutionPlan
calls the underlying functions of all operation
s contained in execution steps
, with inputs
/outputs
taken from the solution
.
Currently there are 2 ways to execute:
- sequential
- parallel, with a
multiprocessing.ProcessPool
Plans may abort their execution by setting the
abort run
global flag.
parallel parallel execution execution pool task Execute
operation
s in parallel, with a thread pool
or process pool
(instead of sequential
). Operations and netop
are marked as such on construction, or enabled globally from configurations
.
Note that
sideffects
are not expected to function with process pools, certainly not whenmarshalling
is enabled.
- process pool
When the
multiprocessing.Pool
class is used forparallel
execution, thetask
s must be communicated to/from the worker process, which requires pickling, and that may fail. With pickling failures you may trymarshalling
with dill library, and see if that helps.Note that
sideffects
are not expected to function at all. certainly not whenmarshalling
is enabled.- thread pool
When the
multiprocessing.dummy.Pool
class forparallel
execution, thetask
s are run in process, so nomarshalling
is needed.- marshalling
Pickling
parallel
operation
s and theirinputs
/outputs
using thedill
module. It isconfigured <configurations>
either globally with.set_marshal_tasks()
or set with a flag on each operation /netop
.Note that
sideffects
do not work when this is enabled.- configurations
The functions controlling
compile
&execution
globally are defined in.config
module; they underlying global data are stored incontextvars.ContextVar
instances, to allow for nested control.All boolean configuration flags are tri-state (
None, False, True
), allowing to "force" all operations, when they are not set to theNone
value. All of them default toNone
(false).
graph network graph The .Network.graph
(currently a DAG) contains all FunctionalOperation
and _DataNode
nodes of some netop
.
They are layed out and connected by repeated calls of
.Network._append_operation()
by Network constructor.This graph is then
prune
d to extract thedag
, and theexecution steps
are calculated, all ingredients for a newExecutionPlan
.
dag execution dag solution dag There are 2 directed-acyclic-graphs instances used:
- the
.ExecutionPlan.dag
, in theexecution plan
, which contains theprune
d nodes, used to decide theexecution steps
;- the
.Solution.dag
in thesolution
, which derives thecanceled operation
s due toreschedule
d/failed operations upstream.
steps execution steps The ExecutionPlan.steps
contains a list of the operation-nodes only from the dag
, topologically sorted, and interspersed with instruction steps needed to compute
the asked outputs
from the given inputs
.
It is built by
.Network._build_execution_steps()
based on the subgraphdag
.The only instruction step is for performing
evictions
.
- evictions
The
_EvictInstruction
steps
erase items fromsolution
as soon as they are not needed further down the dag, to reduce memory footprint while computing.- solution
A
.Solution
instance created internally by.NetworkOperation.compute()
to hold the values bothinputs
&outputs
, and the status of executed operations. It is based on acollections.ChainMap
, to keep one dictionary for eachoperation
executed +1 for inputs.The results of the last operation executed "wins" in the final outputs produced, BUT while executing, the
needs
of each operation receive the solution values in reversed order, that is, the 1st operation result (or given input) wins for some needs name.Rational:
During execution we want stability (the same input value used by all operations), and that is most important when consuming input values - otherwise, we would use (possibly overwritten and thus changing)) intermediate ones.
But at the end we want to affect the calculation results by adding operations into some netop - furthermore, it wouldn't be very useful to get back the given inputs in case of
overwrites
.- overwrites
Values in the
solution
that have been written by more than oneoperation
s, accessed bySolution.overwrites
:
net network the .Network
contains a graph
of operation
s and can compile
an execution plan
or prune
a cloned network for given inputs
/outputs
/node predicate
.
plan execution plan Class .ExecutionPlan
perform the execution
phase which contains the dag
and the steps
.
Compile
ed execution plans are cached in.Network._cached_plans
across runs with (inputs
,outputs
,predicate
) as key.
- inputs
The named input values that are fed into an
operation
(ornetop
) through.Operation.compute()
method according to itsneeds
.These values are either:
- given by the user to the outer
netop
, at the start of acomputation
, or - derived from
solution
using needs as keys, during intermediateexecution
.
- given by the user to the outer
- outputs
The dictionary of computed values returned by an
operation
(or anetop
) matching itsprovides
, when method.Operation.compute()
is called.Those values are either:
- retained in the
solution
, internally duringexecution
, keyed by the respective provide, or - returned to user after the outer netop has finished
computation
.
When no specific outputs requested from a netop,
.NetworkOperation.compute()
returns all intermediateinputs
along with the outputs, that is, noevictions
happens.An operation may return
partial outputs
.- retained in the
- returns dictionary
When an operation is marked with this flag, the underlying function is not expected to return a sequence but a dictionary; hence, no "zipping" of outputs/provides takes place.
- operation
Either the abstract notion of an action with specified
needs
andprovides
, or the concrete wrapper.FunctionalOperation
for arbitrary functions (anycallable
), that feeds oninputs
and updateoutputs
, from/tosolution
, or given-by/returned-to the user by anetop
.The distinction between needs/provides and inputs/outputs is akin to function parameters and arguments during define-time and run-time.
netop network operation The .NetworkOperation
class holding a network
of operation
s.
- needs
A list of (positionally ordered) names of the data needed by an
operation
to receive asinputs
, roughly corresponding to the arguments of the underlying callable. The corresponding data-values will be extracted fromsolution
(or given by the user) when.Operation.compute()
is called duringexecution
.Modifiers
may annotate certain names asoptionals
,sideffects
, or map them to differently named function arguments.The
graph
is laid out by matching the needs &provides
of all operations.- provides
A list of names to be zipped with the data-values produced when the
operation
's underlying callable executes. The resultingoutputs
dictionary will be stored into thesolution
or returned to the user after.Operation.compute()
is called duringexecution
.Modifiers
may annotate certain names assideffects
.The
graph
is laid out by matching theneeds
& provides of all operations.- modifiers
Annotations on specific arguments of
needs
and/orprovides
such asoptionals
&sideffects
(seegraphtik.modifiers
module).- optionals
Needs
corresponding either:- to function arguments-with-defaults (annotated with
.optional
), or - to
*args
(annotated with.vararg
&.varargs
),
that do not hinder execution of the
operation
if absent frominputs
.- to function arguments-with-defaults (annotated with
- sideffects
Fictive
needs
orprovides
not consumed/produced by the underlying function of anoperation
, annotated with.sideffect
. A sideffect participates in thecompilation
of the graph, and is updated into thesolution
, but is never given/asked to/from functions.
prune pruning A subphase of compilation
performed by method .Network._prune_graph()
, which extracts a subgraph dag
that does not contain any unsatisfied operation
s.
It topologically sorts the
graph
, and prunes based on giveninputs
, askedoutputs
,node predicate
andoperation
needs
&provides
.
- unsatisfied operation
The core of
pruning
&rescheduling
, performed by.network._unsatisfied_operations()
function, which collects alloperation
s that fall into any of these 2 cases:- they have
needs
that do not correspond to any of the giveninputs
or the intermediatelycompute
doutputs
of thesolution
; - all their
provides
are NOT needed by any other operation, nor are asked as outputs.
- they have
reschedule rescheduling partial outputs partial operation canceled operation The partial pruning
of the solution
's dag during execution
. It happens when any of these 2 conditions apply:
- an
operation
is marked with theFunctionalOperation.rescheduled
attribute, which means that its underlying callable may produce only a subset of itsprovides
(partial outputs);endurance
is enabled, either globally (in theconfigurations
), or for a specific operation.the solution must then reschedule the remaining operations downstream, and possibly cancel some of those ( assigned in
.Solution.canceled
).
- endurance
Keep executing as many
operation
s as possible, even if some of them fail. Endurance for an operation is enabled if.set_endure_operations()
is true globally in theconfigurations
or if.FunctionalOperation.endurance
is true.You may interrogate
.Solution.executed
to discover the status of each executed operations or call.scream_if_incomplete()
.
predicate node predicate A callable(op, node-data) that should return true for nodes to be included in graph
during compilation
.
- abort run
A global
configurations
flag that when set with.abort_run()
function, it halts the execution of all currently or futureplan
s.It is reset automatically on every call of
.NetworkOperation.compute()
(after a successful intermediatecompilation
), or manually, by calling.reset_abort()
.