Skip to content

Latest commit

 

History

History
231 lines (169 loc) · 8.15 KB

operations.rst

File metadata and controls

231 lines (169 loc) · 8.15 KB

Operations

An operation is a function in a computation pipeline, abstractly represented by the .Operation class. This class specifies the dependencies <dependency> forming the pipeline's network.

Defining Operations

You may inherit the .Operation abstract class and override its .Operation.compute() method to manually do the following:

  • read declared as needs values from solution,
  • match those values into function arguments,
  • call your function to do it's business,
  • "zip" the function's results with the operation's declared provides, and finally
  • hand back those zipped values to solution for further actions.

But there is an easier way -- actually half of the code in this project is dedicated to retrofitting existing functions unaware of all these, into operations.

Operations from existing functions

The .FunctionalOperation provides a concrete lightweight wrapper around any arbitrary function to define and execute within a pipeline. Use the .operation() factory to instantiate one:

>>> from operator import add >>> from graphtik import operation

>>> add_op = operation(add, ... needs=['a', 'b'], ... provides=['a_plus_b']) >>> add_op FunctionalOperation(name='add', needs=['a', 'b'], provides=['a_plus_b'], fn='add')

You may still call the original function at .FunctionalOperation.fn, bypassing thus any operation pre-processing:

>>> add_op.fn(3, 4) 7

But the proper way is to call the operation (either directly or by calling the .FunctionalOperation.compute() method). Notice though that unnamed positional parameters are not supported:

>>> add_op(a=3, b=4) {'a_plus_b': 7}

Tip

In case your function needs to access the .execution machinery or its wrapping operation, it can do that through the .task_context (unstable API).

Builder pattern

There are two ways to instantiate a .FunctionalOperations, each one suitable for different scenarios.

We've seen that calling manually .operation() allows putting into a pipeline functions that are defined elsewhere (e.g. in another module, or are system functions).

But that method is also useful if you want to create multiple operation instances with similar attributes, e.g. needs:

>>> op_factory = operation(needs=['a'])

Notice that we specified a fn, in order to get back a .FunctionalOperation instance (and not a decorator).

>>> from graphtik import operation, compose >>> from functools import partial

>>> def mypow(a, p=2): ... return a ** p

>>> pow_op2 = op_factory.withset(fn=mypow, provides="^2") >>> pow_op3 = op_factory.withset(fn=partial(mypow, p=3), name='pow_3', provides='^3') >>> pow_op0 = op_factory.withset(fn=lambda a: 1, name='pow_0', provides='^0')

>>> graphop = compose('powers', pow_op2, pow_op3, pow_op0) >>> graphop Pipeline('powers', needs=['a'], provides=['^2', '^3', '^0'], x3 ops: mypow, pow_3, pow_0)

>>> graphop(a=2) {'a': 2, '^2': 4, '^3': 8, '^0': 1}

Tip

See plotting on how to make diagrams like this.

Decorator specification

If you are defining your computation graph and the functions that comprise it all in the same script, the decorator specification of operation instances might be particularly useful, as it allows you to assign computation graph structure to functions as they are defined. Here's an example:

>>> from graphtik import operation, compose

>>> @operation(needs=['b', 'a', 'r'], provides='bar') ... def foo(a, b, c): ... return c * (a + b)

>>> graphop = compose('foo_graph', foo)

  • Notice that if name is not given, it is deduced from the function name.

Specifying graph structure: provides and needs

Each operation is a node in a computation graph, depending and supplying data from and to other nodes (via the solution), in order to compute.

This graph structure is specified (mostly) via the provides and needs arguments to the .operation factory, specifically:

needs

this argument names the list of (positionally ordered) inputs data the operation requires to receive from solution. The list corresponds, roughly, to the arguments of the underlying function (plus any sideffects).

It can be a single string, in which case a 1-element iterable is assumed.

seealso

needs, modifier, .FunctionalOperation.needs, .FunctionalOperation.op_needs, .FunctionalOperation._fn_needs

provides

this argument names the list of (positionally ordered) outputs data the operation provides into the solution. The list corresponds, roughly, to the returned values of the fn (plus any sideffects & aliases).

It can be a single string, in which case a 1-element iterable is assumed.

If they are more than one, the underlying function must return an iterable with same number of elements (unless it returns dictionary).

seealso

provides, modifier, .FunctionalOperation.provides, .FunctionalOperation.op_provides, .FunctionalOperation._fn_provides

Declarations of needs and provides is affected by modifiers like .keyword:

Map inputs to different function arguments

graphtik.modifiers.keyword

Operations may execute with missing inputs

graphtik.modifiers.optional

Calling functions with varargs (*args)

graphtik.modifiers.vararg

graphtik.modifiers.varargs

Aliased provides

Sometimes, you need to interface functions & operations where they name a dependency differently. This is doable without introducing "pipe-through" interface operation, either by annotating certain needs with .keyword modifiers (above), or by aliassing certain provides to different names:

>>> op = operation(str, ... name="provides with aliases", ... needs="anything", ... provides="real thing", ... aliases=("real thing", "phony"))

Considerations for when building pipelines

When many operations are composed into a computation graph, Graphtik matches up the values in their needs and provides to form the edges of that graph (see graph-composition for more on that), like the operations from the script in quick-start:

>>> from operator import mul, sub >>> from functools import partial >>> from graphtik import compose, operation

>>> def abspow(a, p): ... """Compute ^p. """ ... c = abs(a) ** p ... return c

>>> # Compose the mul, sub, and abspow operations into a computation graph. >>> graphop = compose("graphop", ... operation(mul, needs=["a", "b"], provides=["ab"]), ... operation(sub, needs=["a", "ab"], provides=["a_minus_ab"]), ... operation(name="abspow1", needs=["a_minus_ab"], provides=["abs_a_minus_ab_cubed"]) ... (partial(abspow, p=3)) ... ) >>> graphop Pipeline('graphop', needs=['a', 'b', 'ab', 'a_minus_ab'], provides=['ab', 'a_minus_ab', 'abs_a_minus_ab_cubed'], x3 ops: mul, sub, abspow1)

  • Notice the use of functools.partial() to set parameter p to a constant value.
  • And this is done by calling once more the returned "decorator* from operation(), when called without a functions.

The needs and provides arguments to the operations in this script define a computation graph that looks like this: