compute computation phase The definition & execution of networked operation is split in 1+2 phases:
composition
compilation
execution
... it is constrained by these IO data-structures:
operation
(s)dependencies <dependency>
(needs
&provides
)- given
inputs
- asked
outputs
... populates these low-level data-structures:
network graph
(COMPOSE time)execution dag
(COMPILE time)execution steps
(COMPILE time)solution
(EXECUTE time)... and utilizes these main classes:
graphtik.op.FunctionalOperation graphtik.pipeline.Pipeline graphtik.network.Network graphtik.execution.ExecutionPlan graphtik.execution.Solution
... plus those for `plotting`:
graphtik.plot.Plotter graphtik.plot.Theme
compose composition The phase
where operation
s are constructed and grouped into pipeline
s and corresponding network
s based on their dependencies
<dependency>
.
Tip
- Use
.operation
factory to construct.FunctionalOperation
instances (a.k.a. operations). - Use.compose()
factory to build.Pipeline
instances (a.k.a. pipelines).
- combine pipelines
When
operation
s and/orpipeline
s arecompose
d together, there are two ways to combine the operations contained into the new pipeline:operation merging
(default) andoperation nesting
.They are selected by the
nest
parameter of.compose()
factory.- operation merging
The default method to
combine pipelines
, also applied when simply mergingoperation
s.Any identically-named operations override each other, with the operations added earlier in the
.compose()
call (further to the left) winning over those added later (further to the right).- seealso
operation-merging
- operation nesting
The elaborate method to
combine pipelines
forming clusters.The original pipelines are preserved intact in "isolated" clusters, by prefixing the names of their operations (and optionally data) by the name of the respective original pipeline that contained them (or the user defines the renames).
- seealso
operation-nesting
,.compose
,.RenArgs
,.nest_any_node()
,.dep_renamed()
,.PlotArgs.clusters
compile compilation The phase
where the .Network
creates a new execution plan
by pruning
all graph
nodes into a subgraph dag
, and deriving the execution steps
.
execute execution sequential The phase
where the .ExecutionPlan
calls the underlying functions of all operation
s contained in execution steps
, with inputs
/outputs
taken from the solution
.
Currently there are 2 ways to execute:
- sequential
- parallel, with a
multiprocessing.pool.ProcessPool
Plans may abort their execution by setting the
abort run
global flag.
net network the .Network
contains a graph
of operation
s and can compile
(and cache) execution plan
s, or prune
a cloned network for given inputs
/outputs
/node predicate
.
plan execution plan Class .ExecutionPlan
perform the execution
phase which contains the dag
and the steps
.
compile
ed execution plans are cached in.Network._cached_plans
across runs with (inputs
,outputs
,predicate
) as key.
- solution
A
.Solution
instance created internally by.Pipeline.compute()
to hold the values bothinputs
&outputs
, and the status of executed operations. It is based on acollections.ChainMap
, to keep one dictionary for eachoperation
executed +1 for inputs.The results of the last operation executed "wins" in the outputs produced, and the base (least precedence) is the inputs given when the
execution
started.
graph network graph A graph of operation
s linked by their dependencies <dependency>
forming a pipeline
.
The
.Network.graph
(currently a DAG) contains all.FunctionalOperation
and data-nodes (string ormodifier
) of apipeline
.They are layed out and connected by repeated calls of
.Network._append_operation()
by Network constructor duringcomposition
.This graph is then
prune
d to extract thedag
, and theexecution steps
are calculated, all ingredients for a new.ExecutionPlan
.
prune pruning A subphase of compilation
performed by method .Network._prune_graph()
, which extracts a subgraph dag
that does not contain any unsatisfied operation
s.
It topologically sorts the
graph
, and prunes based on giveninputs
, askedoutputs
,node predicate
andoperation
needs
&provides
.
- unsatisfied operation
The core of
pruning
&rescheduling
, performed by.network.unsatisfied_operations()
function, which collects alloperation
s with unreachable `dependencies <dependency>`:- they have
needs
that do not correspond to any of the giveninputs
or the intermediatelycompute
doutputs
of thesolution
; - all their
provides
are NOT needed by any other operation, nor are asked as outputs.
- they have
dag execution dag solution dag There are 2 directed-acyclic-graphs instances used:
- the
.ExecutionPlan.dag
, in theexecution plan
, which contains theprune
d nodes, used to decide theexecution steps
;- the
.Solution.dag
in thesolution
, which derives thecanceled operation
s due toreschedule
d/failed operations upstream.
steps execution steps The plan
contains a list of the operation-nodes only from the dag
, topologically sorted, and interspersed with instruction steps needed to compute
the asked outputs
from the given inputs
.
They are built by
.Network._build_execution_steps()
based on the subgraphdag
.The only instruction step is for performing
evictions
.
- evictions
A memory footprint optimization where intermediate
inputs
&outputs
are erased fromsolution
as soon as they are not needed further down thedag
.Evictions are pre-calculated during
compilation
, where._EvictInstruction
steps
are inserted in theexecution plan
.- overwrite
Values in the
solution
that have been written by more than oneoperation
s, accessed by.Solution.overwrites
. Note that asideffected
dependency
produce usually an overwrite.- inputs
The named input values that are fed into an
operation
(orpipeline
) through.Operation.compute()
method according to itsneeds
.These values are either:
- given by the user to the outer
pipeline
, at the start of acomputation
, or - derived from
solution
using needs as keys, during intermediateexecution
.
- given by the user to the outer
- outputs
The dictionary of computed values returned by an
operation
(or apipeline
) matching itsprovides
, when method.Operation.compute()
is called.Those values are either:
- retained in the
solution
, internally duringexecution
, keyed by the respective provide, or - returned to user after the outer pipeline has finished
computation
.
When no specific outputs requested from a pipeline,
.Pipeline.compute()
returns all intermediateinputs
along with the outputs, that is, noevictions
happens.An operation may return
partial outputs
.- retained in the
- pipeline
The
.Pipeline
class holding anetwork
ofoperation
s anddependencies <dependency>
.- operation
Either the abstract notion of an action with specified
needs
andprovides
, dependencies, or the concrete wrapper.FunctionalOperation
for (anycallable
), that feeds oninputs
and updateoutputs
, from/tosolution
, or given-by/returned-to the user by apipeline
.The distinction between needs/provides and inputs/outputs is akin to function parameters and arguments during define-time and run-time, respectively.
- dependency
The name of a
solution
value anoperation
needs
orprovides
.- Dependencies are declared during
composition
, when building.FunctionalOperation
instances. Operations are then interlinked together, by matching the needs & provides of all operations contained in apipeline
. - During
compilation
thegraph
is thenprune
d based on thereachability <unsatisfied operation>
of the dependencies. During
execution
.Operation.compute()
performs 2 "matchings":- inputs & outputs in solution are accessed by the needs & provides names of the operations;
- operation needs & provides are zipped against the underlying function's arguments and results.
These matchings are affected by
modifier
s, print-out withdiacritic
s.
- Dependencies are declared during
needs fn_needs The list of dependency
names an operation
requires from solution
as inputs
,
roughly corresponding to underlying function's arguments (fn_needs).
Specifically,
.Operation.compute()
extracts input values from solution by these names, and matches them against function arguments, mostly by their positional order. Whenever this matching is not 1-to-1, and function-arguments differ from the regular needs,modifier
s must be used.
provides op_provides fn_provides The list of dependency
names an operation
writes to the solution
as outputs
,
roughly corresponding to underlying function's results (fn_provides).
Specifically,
.Operation.compute()
"zips" this list-of-names with theoutput <outputs>
values produced when theoperation
's function is called. Whenever this "zipping" is not 1-to-1, and function-results differ from the regular operation (op_provides) (or results are not a list), it is possible to:
- mark the operation that its function
returns dictionary
,- artificially extended the provides with
alias
ed fn_provides, or- use
modifier
s to annotate certain names assideffects
,
- alias
Map an existing name in
fn_provides
into a duplicate, artificial one inop_provides
.You cannot alias an alias. See
aliases
- returns dictionary
When an
operation
is marked withFunctionalOperation.returns_dict
flag, the underlying function is not expected to returnfn_provides
as a sequence but as a dictionary; hence, no "zipping" of function-results -->fn_provides
takes place.Usefull for operations returning
partial outputs
to have full control over whichoutputs
were actually produced, or to cancelsideffects
.
modifier diacritic A modifier
change dependency
behavior during compilation
or execution
.
For instance,
needs
may be annotated asoptionals
function arguments,provides
and needs can be annotated as "ghost"sideffects
.See
graphtik.modifiers
module.
- optionals
A
needs
onlymodifier
for ainputs
that do not hinderoperation
execution (prune
) if absent fromsolution
.In the underlying function it corresponds to either:
- non-compulsory function arguments (with defaults), annotated with
.optional
, or varargish
arguments, annotated with.vararg
or.varargs
.
- non-compulsory function arguments (with defaults), annotated with
- varargish
A
needs
onlymodifier
forinputs
to be appended as*args
(if present insolution
).There are 2 kinds, both, by definition, `optionals`:
- the
.vararg
annotates any solution value to be appended once in the*args
; - the
.varargs
annotates iterable values and all its items are appended in the*args
one-by-one.
In printouts, it is denoted either with
*
or+
diacritic
.- the
- sideffects
A
modifier
denoting a fictivedependency
linkingoperation
s into virtual flows, without real data exchanges.The side-effect modification may happen to some internal state not fully represented in the
graph
&solution
.There are actually 2 relevant modifiers:
- An abstract sideffect modifier (annotated with
.sfx
) describing modifications taking place beyond the scope of the solution. It may have just the "optional"diacritic
in printouts. - The
sideffected
modifier (annotated with.sfxed
) denoting modifications on a real dependency read from and written to the solution.
Both kinds of sideffects participate in the
compilation
of the graph, and both may be given or asked in theinputs
&outputs
of apipeline
, but they are never given to functions. A function of areturns dictionary
operation can return a falsy value to declare it ascanceled <partial outputs>
.- An abstract sideffect modifier (annotated with
- sideffected
A
modifier
that denotessideffects
on adependency
that exists insolution
, allowing to declare anoperation
that bothneeds
andprovides
that sideffected dependency.Note
To be precise, the "sideffected dependency" is the name held in
._Modifier.sideffected
attribute of a modifier created by.sfxed
function.The
outputs
of a sideffected dependency will produce anoverwrite
if the sideffected dependency is declared both as needs and provides of some operation.It is annotated with
.sfxed
; it may have alldiacritic
s in printouts.
reschedule rescheduling partial outputs canceled operation The partial pruning
of the solution
's dag during execution
. It happens when any of these 2 conditions apply:
- an
operation
is marked with the.FunctionalOperation.rescheduled
attribute, which means that its underlying callable may produce only a subset of itsprovides
(partial outputs);endurance
is enabled, either globally (in theconfigurations
), or for a specific operation.the solution must then reschedule the remaining operations downstream, and possibly cancel some of those ( assigned in
.Solution.canceled
).Partial operations are usually declared with
returns dictionary
so that the underlying function can control which of the outputs are returned.See
rescheduled
endurance endured Keep executing as many operation
s as possible, even if some of them fail. Endurance for an operation is enabled if .set_endure_operations()
is true globally in the configurations
or if .FunctionalOperation.endured
is true.
You may interrogate
.Solution.executed
to discover the status of each executed operations or call one of.check_if_incomplete()
or.scream_if_incomplete()
.See
endured
predicate node predicate A callable(op, node-data) that should return true for nodes to be included in graph
during compilation
.
- abort run
A global
configurations
flag that when set with.abort_run()
function, it halts the execution of all currently or futureplan
s.It is reset automatically on every call of
.Pipeline.compute()
(after a successful intermediatecompilation
), or manually, by calling.reset_abort()
.
parallel parallel execution execution pool task execute
operation
s in parallel, with a thread pool
or process pool
(instead of sequential
). Operations and pipeline
are marked as such on construction, or enabled globally from configurations
.
Note a
sideffects
are not expected to function with process pools, certainly not whenmarshalling
is enabled.
- process pool
When the
multiprocessing.pool.Pool
class is used forparallel
execution, thetask
s must be communicated to/from the worker process, which requires pickling, and that may fail. With pickling failures you may trymarshalling
with dill library, and see if that helps.Note that
sideffects
are not expected to function at all. certainly not whenmarshalling
is enabled.- thread pool
When the
multiprocessing.dummy.Pool
class is used forparallel
execution, thetask
s are run in process, so nomarshalling
is needed.- marshalling
Pickling
parallel
operation
s and theirinputs
/outputs
using thedill
module. It isconfigured <configurations>
either globally with.set_marshal_tasks()
or set with a flag on each operation /pipeline
.Note that
sideffects
do not work when this is enabled.- plottable
Objects that can plot their graph network, such as those inheriting
.Plottable
, (.FunctionalOperation
,.Pipeline
,.Network
,.ExecutionPlan
,.Solution
) or a _ instance (the result of the.Plottable.plot()
method).Such objects may render as SVG in Jupiter notebooks (through their
plot()
method) and can render in a Sphinx site with with the :rstgraphtik
RsT directive. You may control the rendered image as explained in the tip of theplotting
section.SVGs are in rendered with the zoom-and-pan javascript library
plotter plotting A .Plotter
is responsible for rendering plottable
s as images. It is the active plotter
that does that, unless overridden in a .Plottable.plot()
call. Plotters can be customized by various means <plot-customizations>
, such plot theme
.
active plotter default active plotter The plotter
currently installed "in-context" of the respective graphtik
configuration
- this term implies also any plot-customizations
done on the active plotter (such as plot theme
).
Installation happens by calling one of
.active_plotter_plugged()
or.set_active_plotter
functions.The default active plotter is the plotter instance that this project comes pre-configured with, ie, when no plot-customizations have yet happened.
plot theme current theme The mergeable and expandable styles <style>
contained in a .plot.Theme
instance.
The current theme in-use is the
.Plotter.default_theme
attribute of theactive plotter
, unless overridden with thetheme
parameter when calling.Plottable.plot()
(conveyed internally as the value of the.PlotArgs.theme
attribute).
style style expansion A style is an attribute of a plot theme
, either a scalar value or a dictionary.
Styles are collected in
stacks <.StylesStack>
and aremerged <.StylesStack.merge>
into a single dictionary after performing the followingexpansions <.StylesStack.expand>
:Tip
if
DEBUG <is_debug>
is enabled, the provenance of all style values appears in the tooltips of plotted graphs.
configurations graphtik configuration The functions controlling compile
& execution
globally are defined in .config
module and +1 in graphtik.plot
module; the underlying global data are stored in contextvars.ContextVar
instances, to allow for nested control.
All boolean configuration flags are tri-state (
None, False, True
), allowing to "force" all operations, when they are not set to theNone
value. All of them default toNone
(false).
- jetsam
When operations fail, the original exception gets annotated with salvaged values from
locals()
and raised intact.See
jetsam
.