# Basics: Node and MessageNode

`trace` is a comptuational grpah framework for tracing and optimizing codes. Its core data structure is the "node" container of python objects. To create a node, use `node` method, which creates a `Node` object. To access, the content of a node, use the `data` attribute.

In [17]:
from autogen.trace import node

x = node(1)  # node of int
print('node of int', x.data)
x = node('string')  # node of str
print(x.data)
x = node([1,2,3])   # node of list
print(x.data)
x = node({'a': 1, 'b': 2})  # node of dict
print(x.data)

class Foo:
    def __init__(self, x):
        self.x = x
        self.secret = 'secret'
    def print(self, val):
        print(val)

x = node(Foo('foo'))  # node of a class instance
print(x.data)

node of int 1
string
[1, 2, 3]
{'a': 1, 'b': 2}
<__main__.Foo object at 0x7fd4984da730>


`trace` overloads python's magic methods that gives return value explicitly (such as `__add__`), except logical operations such as `__bool__` and setters. (The comparison magic methods compares the level of the nodes in the global graph, rather than comparing the data.) When nodes are used with these magic methods, the output would be a `MessageNode`, which a subclass of `Node` that has the inputs of the method as the parents. The attribute `description` of a `MessageNode` documents the method's function.

In [18]:
def print_node(node):
    print(node)
    print(f"parents: {[p.name for p in node.parents]}")

# Basic arithmetic operations
x = node(1, name='node_x')
y = node(3, name='node_y')
z = x/y
print(z)
print_node(z)
print('\n')

# Index a node
dict_node = node({'a': 1, 'b': 2}, name='dict_node')
a = dict_node['a']
print_node(a)
print('len(dict_node) =', dict_node.len())

print('\n')

# Getting class attribute and calling class method
x = node(Foo('foo'))
x.call('print', 'hello world')
print_node(x.getattr('secret'))


MessageNode: (divide:2, dtype=<class 'float'>, data=0.3333333333333333)
MessageNode: (divide:2, dtype=<class 'float'>, data=0.3333333333333333)
parents: ['node_x:26', 'node_y:20']


MessageNode: (getitem:2, dtype=<class 'int'>, data=1)
parents: ['dict_node:2', 'str:5']
len(dict_node) = MessageNode: (len:2, dtype=<class 'int'>, data=2)


hello world
MessageNode: (getattr:2, dtype=<class 'str'>, data=secret)
parents: ['Foo:5']


Nodes can not be used in logical operations like and, or, not. This design choice to make it explicit that logical operations in python code is traced.

In [19]:
x = node(True)
try:
    if x:
        print('True')
except Exception as e:
    print(e)
    print('Use if x.data instead of if x')


x = node([1,2,3])
try:
    1 in x
except Exception as e:
    print(e)
    print('Use 1 in x.data instead of 1 in x')


Cannot convert Node: (bool:2, dtype=<class 'bool'>, data=True) to bool.
Use if x.data instead of if x
Cannot use 'in' operator on Node: (list:5, dtype=<class 'list'>, data=[1, 2, 3]).
Use 1 in x.data instead of 1 in x


## Writing Custom Node Operators
In addition to magical methods, we can use `trace_op` to write custome methods that are traceable. When decorating a method with `trace_op`, it needs a description of the method. It has a format of `[method_name] description`. `trace_op` would automatically add all nodes whose `data` attribute is used as the parents of the output `MessageNode`.

In [20]:
from autogen.trace import trace_op


@trace_op('[add_1] Add 1 to input x')
def foo(x):
    return x.data + 1
x = node(1, name='node_x')
z = foo(x)
print_node(z)
print('\n')


@trace_op('[add] Add input x and input y')
def foo(x, y):
    return x+y
x = node(1, name='node_x')
y = node(2, name='node_y')
z = foo(x, y)
print_node(z)
print('\n')


# The output is a node of a tuple of two nodes
@trace_op('[pass_through] No operation, just return inputs')
def foo(x, y):
    return x, y
x = node(1, name='node_x')
y = node(2, name='node_y')
z = foo(x, y)
print(z)
from autogen.trace.nodes import Node
assert isinstance(z, Node)
assert isinstance(z.data, tuple)
assert len(z.data) == 2
print('\n')


# The output is a tuple of two nodes
@trace_op('[pass_through] No operation, just return inputs', n_outputs=2)
def foo(x, y):
    return x, y
x = node(1, name='node_x')
y = node(2, name='node_y')
z = foo(x, y)
print(z)
assert isinstance(z, tuple)
assert len(z) == 2

MessageNode: (add_1:16, dtype=<class 'int'>, data=2)
parents: ['node_x:27']


MessageNode: (add:6, dtype=<class 'int'>, data=3)
parents: ['node_x:28', 'node_y:21']


MessageNode: (pass_through:0, dtype=<class 'tuple'>, data=(<autogen.trace.nodes.Node object at 0x7fd4984ff7f0>, <autogen.trace.nodes.Node object at 0x7fd474578b80>))


(<autogen.trace.nodes.Node object at 0x7fd4984fffa0>, <autogen.trace.nodes.Node object at 0x7fd3b0b302e0>)


### Describing Relationship between Inputs and Outputs and Nodes in the Graph
One can additionally provide `node_dict` to specify how each varaible mentioned in `description` is related to the nodes in the graph. This relationhsip is stored in the `inputs` attribute of `MessageNode`. See examples 
below.

In [21]:

# The default value of node_dict is None. In this case, the key of the inputs dict is the name of the input nodes.
@trace_op('[add_1] Add 1 to input x')
def foo(x):
    return x.data + 1
z = foo(x)
print({k:(v.name, v.data) for k,v in z.inputs.items()})

# When node_dict is set to 'auto', the key of the inputs dict is the name specified in the function signature.
@trace_op('[add_1] Add 1 to input x', node_dict='auto')
def foo(input):
    return x.data + 1
z = foo(x)
print({k:(v.name, v.data) for k,v in z.inputs.items()})

# When node_dict is set to a dict, the key of the inputs dict is the name specified in the dict.
node_dict = {'custom_x':x}
@trace_op('[add_1] Add 1 to input x', node_dict=node_dict)
def foo(input):
    return x.data + 1
z = foo(x)
print({k:(v.name, v.data) for k,v in z.inputs.items()})


{'node_x:30': ('node_x:30', 1)}
{'input': ('node_x:30', 1)}
{'input': ('node_x:30', 1), 'custom_x': ('node_x:30', 1)}


Using node_dict is useful when the function uses nodes that are not in the function signature.


In [22]:
# By default, the inputs dict only contains the nodes that are in the function signature. One can update the inputs dict by using node_dict.
x = node(1, name='node_x')
y = node(2, name='node_y')

# Here inputs dict has keys that are the node's name.
@trace_op('[add_1] Add input x to node_y.')
def foo(x):
    return x.data + y.data
z = foo(x)
print({k:(v.name, v.data) for k,v in z.inputs.items()})


@trace_op('[add_1] Add input x to node_y.', node_dict='auto')
def foo(x):
    return x.data + y.data
try:
    z = foo(x)
except Exception as e:
    # Since the function signature does not contain y, the function will raise an error.
    print(e)

# We can use node_dict to add y to the inputs dict.
node_dict = {'node_y':y}
@trace_op('[add_1] Add input x to node_y.', node_dict=node_dict)
def foo(x):
    return x.data + y.data
z = foo(x)
print({k:(v.name, v.data) for k,v in z.inputs.items()})

{'node_x:31': ('node_x:31', 1), 'node_y:24': ('node_y:24', 2)}
All used_nodes must be in the spec.
{'x': ('node_x:31', 1), 'node_y': ('node_y:24', 2)}
