Skip to content

Latest commit

 

History

History
672 lines (470 loc) · 22.5 KB

plotting.rst

File metadata and controls

672 lines (470 loc) · 22.5 KB

Plotting and Debugging

Plotting

For debugging it is necessary to visualize the graph-operation. You may plot any plottable and annotate on top the execution plan and solution of the last computation, calling methods with arguments like this:

pipeline.plot(True)                   # open a matplotlib window
pipeline.plot("pipeline.svg")            # other supported formats: png, jpg, pdf, ...
pipeline.plot()                       # without arguments return a pydot.DOT object
pipeline.plot(solution=solution)      # annotate graph with solution values
solution.plot()                    # plot solution only

... or for the last ...:

solution.plot(...)

execution plan

The legend for all graphtik diagrams, generated by .legend().

The legend for all graphtik diagrams, generated by .legend().

The same .Plottable.plot() method applies also for:

  • .FunctionalOperation
  • .Pipeline
  • .Network
  • .ExecutionPlan
  • .Solution

each one capable to producing diagrams with increasing complexity.

For instance, when a pipeline has just been composed, plotting it will come out bare bone, with just the 2 types of nodes (data & operations), their dependencies, and (optionally, if plot theme include_steps is true) the sequence of the execution-steps of the plan.

barebone graph

But as soon as you run it, the net plot calls will print more of the internals. Internally it delegates to .ExecutionPlan.plot() of the plan. attribute, which caches the last run to facilitate debugging. If you want the bare-bone diagram, plot the network:

pipeline.net.plot(...)

If you want all details, plot the solution:

solution.net.plot(...)

Note

For plots, Graphviz program must be in your PATH, and pydot & matplotlib python packages installed. You may install both when installing graphtik with its plot extras:

pip install graphtik[plot]

Tip

A description of the similar API to _ instance returned by plot() methods is here: https://pydotplus.readthedocs.io/reference.html#pydotplus.graphviz.Dot

Jupyter notebooks

The _ instances returned by .Plottable.plot() are rendered directly in Jupyter/IPython notebooks as SVG images.

You may increase the height of the SVG cell output with something like this:

pipeline.plot(jupyter_render={"svg_element_styles": "height: 600px; width: 100%"})

See .default_jupyter_render for those defaults and recommendations.

Plot customizations

Rendering of plots is performed by the active plotter (class .plot.Plotter). All Graphviz styling attributes are controlled by the active plot theme, which is the .plot.Theme instance installed in its .Plotter.default_theme attribute.

The following style expansion\s apply in the attribute-values of Theme instances:

You may customize the theme and/or plotter behavior with various strategies, ordered by breadth of the effects (most broadly effecting method at the top):

  1. (zeroth, because it is discouraged!)

    Modify in-place .Theme class attributes, and monkeypatch .Plotter methods.

    This is the most invasive method, affecting all past and future plotter instances, and future only(!) themes used during a Python session.

  1. Modify the .default_theme attribute of the default active plotter, like that:

    get_active_plotter().default_theme.kw_op["fillcolor"] = "purple"

    This will affect all .Plottable.plot() calls for a Python session.

  2. Create a new .Plotter with customized .Plotter.default_theme, or clone and customize the theme of an existing plotter by the use of its .Plotter.with_styles method, and make that the new active plotter.
    • This will affect all calls in context <contextvars.ContextVar>.
    • If customizing theme constants is not enough, you may subclass and install a new Plotter class in context.
  3. Pass theme or plotter arguments when calling .Plottable.plot():

    pipeline.plot(plotter=Plotter(kw_legend=None))
    pipeline.plot(theme=Theme(include_steps=True)

    You may clone and customize an existing plotter, to preserve any pre-existing customizations:

    active_plotter = get_active_plotter()
    pipeline.plot(theme={"include_steps": True})

    ... OR:

    pipeline.plot(plotter=active_plotter.with_styles(kw_legend=None))

    You may create a new class to override Plotter's methods that way.

    Hint

    This project dogfoods (3) in its own docs/source/conf.py sphinx file. In particular, it configures the base-url of operation node links (by default, nodes do not link to any url):

    ## Plot graphtik SVGs with links to docs.
    #
    def _make_py_item_url(fn):
    if not inspect.isbuiltin(fn):

    fn_name = base.func_name(fn, None, mod=1, fqdn=1, human=0) if fn_name: return f"../reference.html#{fn_name}"

    plotter = plot.get_active_plotter() plot.set_active_plotter( plot.get_active_plotter().with_styles( kw_op_label={ **plotter.default_theme.kw_op_label, "op_url": lambda plot_args: _make_py_item_url(plot_args.nx_item), "fn_url": lambda plot_args: _make_py_item_url(plot_args.nx_item.fn), } ) )

Sphinx-generated sites

This library contains a new Sphinx extension (adapted from the sphinx.ext.doctest) that can render plottables in sites from python code in "doctests".

To enabled it, append module graphtik.sphinxext as a string in you docs/conf.py : extensions list, and then intersperse the :rstgraphtik or :rstgraphtik-output directives with regular doctest-code to embed graph-plots into the site; you may refer to those plotted graphs with the :rstgraphtik role referring to their :name: option(see sphinxext-examples below).

Hint

Note that Sphinx is not doctesting the actual python modules, unless the plotting code has ended up, somehow, in the site (e.g. through some autodoc directive). Contrary to pytest and doctest standard module, the module's globals are not imported (until sphinx#6590 is resolved), so you may need to import it in your doctests, like this:

>> from <this.module> import * >> __name__ = "<this.module>"

Unfortunately, you cannot use relative import, and have to write your module's full name.

Directives

Configurations

graphtik_default_graph_format

  • type: Union[str, None]
  • default: None

The file extension of the generated plot images (without the leading dot .), used when no :graph-format: option is given in a :rst:dir:`graphtik or :rstgraphtik-output directive.

If None, the format is chosen from graphtik_graph_formats_by_builder configuration.

graphtik_graph_formats_by_builder

  • type: Map[str, str]
  • default: check the sources

a dictionary defining which plot image formats to choose, depending on the active builder.

  • Keys are regexes matching the name of the active builder;
  • values are strings from the supported formats for pydot library, e.g. png (see .supported_plot_formats()).

If a builder does not match to any key, and no format given in the directive, no graphtik plot is rendered; so by default, it only generates plots for html & latex.

Warning

Latex is probably not working :-(

graphtik_zoomable_svg

  • type: bool
  • default: True

Whether to render SVGs with the zoom-and-pan javascript library, unless the :zoomable: directive-option is given (and not empty).

Attention

Zoom-and-pan does not work in Sphinx sites for Chrome locally - serve the HTML files through some HTTP server, e.g. launch this command to view the site of this project:

python -m http.server 8080 --directory build/sphinx/html/

graphtik_zoomable_options

  • type: str
  • default: {controlIconsEnabled: true, zoomScaleSensitivity: 0.4, fit: true}

A JS-object with the options for the interactive zoom+pan pf SVGs, when the :zoomable-opts: directive option is missing. If empty, {} assumed (library's default options).

graphtik_plot_keywords

  • type: dict
  • default: {}

Arguments or .build_pydot() to apply when rendering plottables.

graphtik_save_dot_files - type: bool, None - default: None

For debugging purposes, if enabled, store another <img>.txt file next to each image file with the DOT text that produced it.

When none (default), controlled by .config.is_debug from configurations (which by default obeys to GRAPHTIK_DEBUG environment variable), otherwise, any boolean takes precedence here.

graphtik_warning_is_error

  • type: bool
  • default: false

If false, suppress doctest errors, and avoid failures when building site with -W option, since these are unrelated to the building of the site.

doctest_test_doctest_blocks (foreign config)

Don't disable doctesting of literal-blocks, ie, don't reset the doctest_test_doctest_blocks configuration value, or else, such code would be invisible to :rstgraphtik directive.

trim_doctest_flags (foreign config)

This configuration is forced to False (default was True).

Attention

This means that in the rendered site, options-in-comments like # doctest: +SKIP and <BLACKLINE> artifacts will be visible.

Examples

The following directive renders a diagram of its doctest code, beneath it:

.. graphtik::
   :graphvar: addmul
   :name: addmul-operation

   >>> from graphtik import compose, operation
   >>> addmul = compose(
   ...       "addmul",
   ...       operation(name="add", needs="abc".split(), provides="ab")(lambda a, b, c: (a + b) * c)
   ... )

>>> from graphtik import compose, operation

>>> addmul = compose( ... "addmul", ... operation(name="add", needs="abc".split(), provides="ab")(lambda a, b, c: (a + b) * c) ... )

which you may reference <addmul-operation> with this syntax:

you may :graphtik:`reference <addmul-operation>` with ...

Hint

In this case, the :graphvar: parameter is not really needed, since the code contains just one variable assignment receiving a subclass of .Plottable or _ instance.

Additionally, the doctest code producing the plottables does not have to be contained in the graphtik directive as a whole.

So the above could have been simply written like this:

>>> from graphtik import compose, operation
>>> addmul = compose(
...       "addmul",
...       operation(name="add", needs="abc".split(), provides="ab")(lambda a, b, c: (a + b) * c)
... )

.. graphtik::
   :name: addmul-operation

Errors & debugging

Graphs may become arbitrary deep. Launching a debugger-session to inspect deeply nested stacks is notoriously hard

Logging

Increase the logging verbosity; logging statements have been placed melticulously to describe the execution flows (but not compilation :-(), with each log statement accompanied by the solution id <.Solution.solid> of that flow, like the (3C40) & (8697) below, important for when running pipelines in parallel:

--------------------- Captured log call ---------------------
DEBUG    === Compiling pipeline(t)...
DEBUG    ... cache-updated key: ((), None, None)
DEBUG    === (3C40) Executing pipeline(t), in parallel, on inputs[], according to ExecutionPlan(needs=[], provides=['b'], x2 steps: op1, op2)...
DEBUG    +++ (3C40) Parallel batch['op1'] on solution[].
DEBUG    +++ (3C40) Executing OpTask(FunctionalOperation|(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>'), sol_keys=[])...
INFO     graphtik.op.py:534 Results[sfx: 'b'] contained +1 unknown provides[sfx: 'b']
FunctionalOperation|(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>')
DEBUG    ... (3C40) op(op1) completed in 1.406ms.

...

DEBUG    === Compiling pipeline(t)...
DEBUG    ... cache-hit key: ((), None, None)
DEBUG    === (8697) Executing pipeline(t), evicting, on inputs[], according to ExecutionPlan(needs=[], provides=['b'], x3 steps: op1, op2, sfx: 'b')...
DEBUG    +++ (8697) Executing OpTask(FunctionalOperation(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>'), sol_keys=[])...
INFO     graphtik.op.py:534 Results[sfx: 'b'] contained +1 unknown provides[sfx: 'b']
FunctionalOperation(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>')
DEBUG    ... (8697) op(op1) completed in 0.149ms.
DEBUG    +++ (8697) Executing OpTask(FunctionalOperation(name='op2', needs=[sfx: 'b'], provides=['b'], fn='<lambda>'), sol_keys=[sfx: 'b'])...
DEBUG    ... (8697) op(op2) completed in 0.08ms.
DEBUG    ... (8697) evicting 'sfx: 'b'' from solution[sfx: 'b', 'b'].
DEBUG    === (8697) Completed pipeline(t) in 0.229ms.

DEBUG flag

Enable the .set_debug() in configurations, or externally, by setting the GRAPHTIK_DEBUG environment variable, to enact the following:

Tip

From code you may wrap the code you are interested in with .config.debug_enabled "context-manager", to get augmented print-outs for selected code-paths only.

Jetsam on exceptions

Additionally, when some operation fails, the original exception gets annotated with the following properties, as a debug aid:

>>> from graphtik import compose, operation >>> from pprint import pprint

>>> def scream(*args): ... raise ValueError("Wrong!")

>>> try: ... compose("errgraph", ... operation(name="screamer", needs=['a'], provides=["foo"])(scream) ... )(a=None) ... except ValueError as ex: ... pprint(ex.jetsam) {'aliases': None, 'args': {'kwargs': {}, 'positional': [None], 'varargs': []}, 'network': Network(x3 nodes, x1 ops: screamer), 'operation': FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream'), 'outputs': None, 'plan': ExecutionPlan(needs=['a'], provides=['foo'], x1 steps: screamer), 'results_fn': None, 'results_op': None, 'solution': {'a': None}, 'task': OpTask(FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream'), sol_keys=['a'])}

In interactive REPL console you may use this to get the last raised exception:

import sys

sys.last_value.jetsam

The following annotated attributes might have meaningful value on an exception:

network

the innermost network owning the failed operation/function

plan

the innermost plan that executing when a operation crashed

operation

the innermost operation that failed

args

either the input arguments list fed into the function, or a dict with both args & kwargs keys in it.

outputs

the names of the outputs the function was expected to return

provides

the names eventually the graph needed from the operation; a subset of the above, and not always what has been declared in the operation.

fn_results

the raw results of the operation's function, if any

op_results

the results, always a dictionary, as matched with operation's provides

solution

an instance of .Solution, contains inputs & outputs till the error happened; note that .Solution.executed contain the list of executed operations so far.

Of course you may use many of the above "jetsam" values when plotting.

Debugger

The plotting capabilities, along with the above annotation of exceptions with the internal state of plan/operation often renders a debugger session unnecessary. But since the state of the annotated values might be incomplete, you may not always avoid one.

You may to enable "post mortem debugging".

If you set a breakpoint() in one of your functions, move up a few frames to find the .ExecutionPlan._handle_task() method, where the "live" .ExecutionPlan & .Solution instances live, useful when investigating problems with computed values.

Accessing wrapper operation from task-context

Alternatively, when the debugger is stopped inside an underlying function, you may access the wrapper .FunctionalOperation and the .Solution through the graphtik.execution.task_context context-var. This is populated with the ._OpTask instance of the currently executing operation, as shown in the pdb session printout, below:

(Pdb) from graphtik.execution import task_context
(Pdb) op_task = task_context.get()

Get possible completions on the returned operation-task with [TAB]:

(Pdb) p op_task.[TAB][TAB]
op_task.__call__
op_task.__class__
...
op_task.get
op_task.logname
op_task.marshalled
op_task.op
op_task.result
op_task.sol
op_task.solid

Printing the operation-task gives you a quick overview of the operation and the available solution keys (but not the values, not to clutter the debugger console):

(Pdb) p op_task
OpTask(FunctionalOperation(name=..., needs=..., provides=..., fn=...), sol_keys=[...])

Print the wrapper operation:

(Pdb) p op_task.op
...

Print the solution:

(Pdb) p op_task.sol
...