Skip to content

Latest commit

 

History

History
273 lines (185 loc) · 9.83 KB

usage.rst

File metadata and controls

273 lines (185 loc) · 9.83 KB

Concepts and usage

This page introduces the essential concepts of genno and demonstrates basic usage. Click on the names of classes and methods to access complete descriptions in the api.

Quantity

genno.Quantity represents a sparse, multi-dimensional array with labels and units. In research code it is common to use terms like ‘variables’, ‘parameters’, etc.; in genno, all data is ‘quantities’.

A Quantity has:

  • 0 or more dimensions, with labels along those dimensions (e.g. specific years; the names of specific technologies);
    • A 0-dimensional Quantity is a single or ‘scalar’ (as opposed to ‘vector’) value.
  • sparse coverage or “missingness,” i.e. there is not necessarily a value for each combination of labels; and
  • associated units.

Notation:

$$\begin{aligned} \begin{align} A^{ij} & = \left[a_{i,j} \right] \\\ i & \in I \\\ j & \in J \\\ a_{i,j} & \in \left\{ \mathbb{R}, \text{NaN} \right\} \\\ a_{i,j} & [=]\, \text{units of X} \end{align} \end{aligned}$$

Dimensionality of quantities

Quantities may have many dimensions. For instance, suppose Xabcdefghij, which has ten dimensions. For some calculations, we may not care about some of these dimensions. In this case, we don't really want the 10-dimensional quantity, but its partial sum over a few dimensions, while others are retained.

Notation. Consider a quantity with three dimensions, Aijk, and another with two, Bkl, and a scalar C. We define partial sums over every possible combination of dimensions:

$$\begin{aligned} \begin{array} AA^{ij} = \left[ a_{i,j} \right], & a_{i,j} = \sum_{k}{a_{i,j,k}} \ \forall \ i, j & \text{similarly } A^{ik}, A^{jk} \\\ A^{i} = \left[ a_i \right], & a_i = \sum_j\sum_{k}{a_{i,j,k}} \ \forall\ i & \text{similarly } A^j, A^k \\\ A = \sum_i\sum_j\sum_k{a_{i,j,k}} & & \text{(a scalar)} \end{array} \end{aligned}$$

Note that A and B share one dimension, k, but the other dimensions are distinct. We specify that simple arithmetic operations result in a quantity whose dimensions are the union of the dimensions of the operands. In other words:

$$\begin{aligned} \begin{array} CC + A^{i} = X^{i} = \left[ x_{i} \right], & x_{i} = C + a_{i} \ \forall \ i \\\ A^{jk} \times B^{kl} = Y^{jkl} = \left[ y_{j,k,l} \right], & y_{j,k,l} = a_{j,k} \times b_{k,l} \ \forall \ j, k, l \\\ A^{j} - B^{j} = Z^{j} = \left[ z_{j} \right], & z_{j} = a_{j} - b_{j} \ \forall \ j \\\ \end{array} \end{aligned}$$

As a result of this rule:

  • The difference Zj has the same dimensionality as both of its operands.
  • The sum Xi has the same dimensionality as one of its operands.
  • The product Yjkl has a different dimensonality from each of its operands.

These operations are called broadcasting and alignment: The scalar value C is broadcast across all labels on the dimension i that it lacks, in order to calculate xi. Ajk and Bkl are aligned on matching values of k, but broadcast over dimensions j and l, respectively.

Key

genno.Key is used to refer to a Quantity, before it is computed. For multi-dimensional calculations, we need keys that distinguish Ai—the partial sum of Aijk used in the calculation of Xi—from Ajk—a different partial sum used in the calculation of Yjkl. It is not sufficient to refer to both as 'A', since this is ambiguous about what calculation we want to perform.

A Key has a name, zero or more dimensions, and an optional tag:

python

from genno import Key

# Quantity named 'A' dimensions i, j, k A_ijk = Key("A", ["i", "j", "k"]) type(A_ijk) repr(A_ijk) str(A_ijk)

# With different dimensions A_jk = Key("A", ["j", "k"]) A_jk

Key has methods that allow producing related keys:

python

# Drop dimensions from a key A_ijk.drop("i")

# Describe a key that is the product of two others; add a tag B_kl = Key("B", ["k", "l"]) B_kl Key.product("Y", A_ijk.drop("i"), B_kl, tag="initial")

A Key object can also be produced by parsing a string representation:

python

Z_j = Key.from_str_or_key("Z:j") Z_j

# Keys compare and hash() identically to their str() representation Z_j == "Z:j"

Z_j == "Y:i-j-k"

Computer

.Computer provides the main interface of genno. Usage of a Computer involves two steps:

  1. Use .Computer.add and other helper methods to describe all the tasks the Computer might perform.
  2. Use .Computer.get to trigger the execution of one or more tasks.

This two-step process allows the genno to deliver good performance by skipping irrelevant tasks and avoiding re-computing intermediate results that are used in multiple places.

Graph

.Computer is built around a graph of nodes and edges; specifically, a directed, acyclic graph. This means:

  • Every edge has a direction; from one node to another.
  • There are no recursive loops in the graph; i.e. no node is its own ancestor.

In the reporting graph, every node represents a task, usually a tuple wherein the first element is a callable like a function. This callable can be:

  • a numerical calculation operating on one or more Quantities;
  • more generally, a computation, including other actions like transforming data formats, reading and writing files, writing plots, etc.

Other elements in the task For a complete description of tasks, see dask:spec.

Every node has a unique label, describing the results of its task. These labels can be .Key (if the task produces a Quantity), :pystr (most other cases) or generally any other hashable object.

A node's computation may depend on certain inputs. These are represented by the edges of the graph.

Describe tasks

For example, the following equation:


C = A + B

…is represented by:

  • A node named "A" that provides the value of A.
  • A node named "B" that provides the value of B.
  • A node named "C" that computes a sum of its inputs.
  • An edge from "A" to 'C', indicating that the value of A is an input to C.
  • An edge from "B" to 'C'.

To describe this using the Computer (step 1):

python

from genno import Computer

# Create a new Computer object c = Computer()

# Add two nodes # These have no inputs; they only return a literal value. c.add("A", 1) c.add("B", 2)

# Add one node and two edges c.add("C", (lambda *inputs: sum(inputs), "A", "B"))

# Equivalent, without parentheses c.add("C", lambda *inputs: sum(inputs), "A", "B")

To unpack this code:

  • Computer.add is used to build the graph.
  • The first argument to add is the label or key of the node; the description of what it will produce.
  • The following arguments describe the task, calculation, or computation to be performed:
    • For nodes ‘A’ and ‘B’, these are simply a raw or literal value. When the node is executed, this value is returned.
    • For node ‘C’, it is a tuple with 3 items: (lambda *inputs: sum(inputs), 'A', 'B').
      1. lambda *inputs: sum(inputs), is an anonymous or ‘lambda’ function that computes the sum of its inputs.
      2. The label "A" is a reference to another node. This indicates that there is a graph edge from node "A" into node "C".
      3. Same as (2)

All the keys in a Computer can be listed with .keys.

Execute tasks

The task to produce "C", and any direct or indirect inputs required, is executed using .Computer.get:

python

c.get("C")

Computer.describe displays a simple textual trace of the tasks used in this computation. A portion of the graph is printed out as a nested list:

python

print(c.describe("C"))

This description shows how genno traverses the graph in order to calculate the desired quantity:

  1. The desired value is from node "C", which computes a function of some arguments.
  2. The first argument is "A".
  3. "A" is the name of another node.
  4. Node "A" gives a literal value int(1), which is stored.
  5. The Computer returns to "C" and moves on to the next argument, "B".
  6. Steps 3 and 4 are repeated for "B", giving int(2).
  7. All of the arguments to "C" have been processed.
  8. The computation function for "C" is called.

    As arguments, instead of the strings "A" and "B", this function receives the computed int values from steps 4 and 6 respectively.

  9. The result is returned.

In this example, "A" and "B" are, at most, 1 step away from the node requested, and are each used once. In more realistic examples, the graph can have:

  • Long chains of calculations, each depending on the output of its ancestors, and/or
  • Multiple connection, so that results like "A" are used by more than one child calculations.

However, the Computer still follows the same procedure to traverse the graph and calculate the results.

Computations

A computation is any Python function or callable that operates on Quantities or other data. genno.computations includes many common computations; see the API documentation for descriptions of each.

The power of genno is the ability to link any code, no matter how complex, into the graph, and have it operate on the results of other code. Tasks can perform complex tasks such as:

  • Read in exogenous data, including over a network connection,
  • Trigger output to files(s) or a database, or
  • Execute user-defined methods.