## Compiling a nn like in the nnvm tutorial

Let's just define some nn and compile it. After that we'll look into nnvm passes which are applied during this process.

In [1]:
import nnvm.compiler
import nnvm.symbol as sym
import tvm

Let's create some simple neural network-like graph

In [2]:
x = sym.Variable("x")
w = sym.Variable("w")
b = sym.Variable("b")
z = sym.conv2d(data=x, weight=w, bias=b, channels=3, kernel_size=(5,5))
z = z + sym.conv2d(data=z, weight=w, bias=b, channels=3, kernel_size=(5,5), padding=[2,2])

compute_graph = nnvm.graph.create(z)

We can print the graph in human-readable form with the `ir` method

In [3]:
print(compute_graph.ir())

Graph(%x, %w, %b) {
  %3 = conv2d(%x, %w, %b, kernel_size='(5, 5)', channels='3')
  %4 = conv2d(%3, %w, %b, channels='3', padding='[2, 2]', kernel_size='(5, 5)')
  %5 = broadcast_add(%3, %4)
  ret %5
}


Before building the graph, let's perform an ugly trick and modify the function `Graph.apply`, which applies a list of passes to a graph, so that it print the applied passes and the resulting graph. It will help us to figure out what the building process consists of. Note though that this won't show some passes which are called from C++ code (like PlanMemory which is applied inside GraphFuseCompile)

In [4]:
old_graph_apply = nnvm.graph.Graph.apply

In [5]:
def my_modified_graph_apply(self, passes):
    res = old_graph_apply(self, passes)
    # printing ir is also implemented as a pass, so we have to prevent an infinite loop
    if passes not in [["PrintGraphIR"], "PrintGraphIR", ["SaveJSON"], "SaveJSON"]:
        print("Applied passes " + str(passes))
        print(res.ir())
    return res

nnvm.graph.Graph.apply = my_modified_graph_apply

Now let's call the build function

In [6]:
with nnvm.compiler.build_config(opt_level=2):
    deploy_graph, lib, params = nnvm.compiler.build(
        compute_graph, target="llvm", shape={"x": (7,3,11,13)}, dtype="float32")

Applied passes CorrectLayout
Graph(%x, %w, %b) {
  %3 = conv2d(%x, %w, %b, kernel_size='(5, 5)', channels='3')
  %4 = conv2d(%3, %w, %b, channels='3', padding='[2, 2]', kernel_size='(5, 5)')
  %5 = broadcast_add(%3, %4)
  ret %5
}
graph_attr_keys = [layout]

Applied passes InferShape
Graph(%x, %w, %b) {
  %3 = conv2d(%x, %w, %b, kernel_size='(5, 5)', channels='3')
  %4 = conv2d(%3, %w, %b, channels='3', padding='[2, 2]', kernel_size='(5, 5)')
  %5 = broadcast_add(%3, %4)
  ret %5
}
graph_attr_keys = [shape_num_unknown_nodes, layout, shape]

Applied passes ['InferShape', 'SimplifyInference']
Graph(%x, %w, %b) {
  %3 = conv2d(%x, %w, %b, kernel_size='(5, 5)', channels='3')
  %4 = conv2d(%3, %w, %b, channels='3', padding='[2, 2]', kernel_size='(5, 5)')
  %5 = broadcast_add(%3, %4)
  ret %5
}
Applied passes InferShape
Graph(%x, %w, %b) {
  %3 = conv2d(%x, %w, %b, kernel_size='(5, 5)', channels='3')
  %4 = conv2d(%3, %w, %b, channels='3', padding='[2, 2]', kernel_size='(5, 5)')
  %5 = broad

There is a lot of output. First the graph is transformed by some passes. Then in the pass GraphFuseCompile, lowering is performed. If you look as the source code, you'll see that it calls some function `"nnvm.compiler.lower"` dynamically by its name. It's actually called `_lower` and it can be found in `nnvm/python/nnvm/compiler/build_module.py`. It calls `tvm.lower` which does stuff (here is the boundary between nnvm and tvm!) and, if the logging level is debug, it also prints out some lowered representation (not exactly the one actually used though). The lowering process is investigated in a separate notebook.


In [7]:
# Let's return the function to its original state
nnvm.graph.Graph.apply = old_graph_apply

## NNVM Passes

Here is the list of all passes registered with `NNVM_REGISTER_PASS` (TODO: How to get this list with python?):
- AlterOpLayout
- CorrectLayout
- PlanMemory
- InferShape
- LoadJSON
- SaveJSON
- GraphFusePartition
- GraphFuseCompile
- Gradient
- PlaceDevice
- OrderMutation
- FoldScaleAxis
- SimplifyInference
- PrecomputePrune
- InferShape
- InferType
- PrintGraphIR

Few of them are applied during compilation. In my opinion, the most important ones are GraphFusePartition and GraphFuseCompile. The best source of information on what passes do is tests (or just grep by the pass name).

## Graph Fusion

Let's look at how graph fusion works.

So let's define some neural net. We have to assign shapes and dtypes to some variables because we will perform compilation steps manually. Dtype is a number (TODO: Figure out what type each number means). You may notice that weights can be omitted, in which case they are automatically added as variables with some sensible names.

In [8]:
x = sym.Variable("x", shape=(1,13,100,100), dtype=1)
z = x
z = sym.conv2d(data=z, channels=13, kernel_size=(3,3), padding=[1,1])
z = sym.relu(data=z)
z = sym.conv2d(data=z, channels=13, kernel_size=(3,3), padding=[1,1])
z = sym.relu(data=z)
z = x + z
z = sym.flatten(z)
z = sym.dense(data=z, units=10)
z = sym.softmax(data=z)
# TODO: Doesn't work without keepdims which might be a bug in nnvm, investigate
z = sym.sum(z, keepdims=True)

graph = nnvm.graph.create(z)
print(graph.ir())

Graph(%x,
      %conv2d2_weight,
      %conv2d2_bias,
      %conv2d3_weight,
      %conv2d3_bias,
      %dense0_weight,
      %dense0_bias) {
  %3 = conv2d(%x, %conv2d2_weight, %conv2d2_bias, channels='13', padding='[1, 1]', kernel_size='(3, 3)')
  %4 = relu(%3)
  %7 = conv2d(%4, %conv2d3_weight, %conv2d3_bias, channels='13', padding='[1, 1]', kernel_size='(3, 3)')
  %8 = relu(%7)
  %9 = broadcast_add(%x, %8)
  %10 = flatten(%9)
  %13 = dense(%10, %dense0_weight, %dense0_bias, units='10')
  %14 = softmax(%13)
  %15 = sum(%14, keepdims='True')
  ret %15
}


We need to infer shapes and types first (and it's actually may be interesting to look at them). Such passes add information as graph attributes which are usually just lists of node attribute values.

In [9]:
graph = graph.apply('InferShape').apply('InferType')

print("shapes of nodes: " + str(graph.json_attr('shape')))
print("types of nodes: " + str(graph.json_attr('dtype')))

shapes of nodes: [[1, 13, 100, 100], [13, 13, 3, 3], [13], [1, 13, 100, 100], [1, 13, 100, 100], [13, 13, 3, 3], [13], [1, 13, 100, 100], [1, 13, 100, 100], [1, 13, 100, 100], [1, 130000], [10, 130000], [10], [1, 10], [1, 10], [1, 1]]
types of nodes: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


We can also print out nodes of the graph.

In [10]:
graph.index.nodes

[{'attrs': {'dtype': '1', 'shape': '(1, 13, 100, 100)'},
  'inputs': [],
  'name': 'x',
  'op': 'null'},
 {'attrs': {'channels': '13', 'kernel_size': '(3, 3)', 'padding': '[1, 1]'},
  'inputs': [],
  'name': 'conv2d2_weight',
  'op': 'null'},
 {'attrs': {'channels': '13', 'kernel_size': '(3, 3)', 'padding': '[1, 1]'},
  'inputs': [],
  'name': 'conv2d2_bias',
  'op': 'null'},
 {'attrs': {'channels': '13', 'kernel_size': '(3, 3)', 'padding': '[1, 1]'},
  'inputs': [[0, 0, 0], [1, 0, 0], [2, 0, 0]],
  'name': 'conv2d2',
  'op': 'conv2d'},
 {'inputs': [[3, 0, 0]], 'name': 'relu0', 'op': 'relu'},
 {'attrs': {'channels': '13', 'kernel_size': '(3, 3)', 'padding': '[1, 1]'},
  'inputs': [],
  'name': 'conv2d3_weight',
  'op': 'null'},
 {'attrs': {'channels': '13', 'kernel_size': '(3, 3)', 'padding': '[1, 1]'},
  'inputs': [],
  'name': 'conv2d3_bias',
  'op': 'null'},
 {'attrs': {'channels': '13', 'kernel_size': '(3, 3)', 'padding': '[1, 1]'},
  'inputs': [[4, 0, 0], [5, 0, 0], [6, 0, 0]],
  

In [11]:
print("10th node:\t" + str(graph.index.nodes[10]))
print("and its shape:\t" + str(graph.json_attr('shape')[10]))

10th node:	{'op': 'flatten', 'name': 'flatten0', 'inputs': [[9, 0, 0]]}
and its shape:	[1, 130000]


Now let's partition the graph and fuse its parts.

In [12]:
graph = graph.apply('GraphFusePartition')

Partition is done according to certain rules which can be found in one of the papers about tvm/nnvm. All operation are categorized into several categories ("patterns"): ElemWise, Broadcast, Injective, CommReduce, OutEWiseFusable, Opaque. The rules control whether two operations belonging to some categories can be fused together and what will be the category of the resulting fused operation. Operators are categorized in the python part of code, grep by `register_pattern`

TODO: What are these? What are root and master nodes?

In [13]:
print("roots: " + str(graph.json_attr('group_root')))
print("masters: " + str(graph.json_attr('group_master')))
print()

for n, r, m in zip(graph.index.nodes, graph.json_attr('group_root'), graph.json_attr('group_master')):
    print(n['name'] + 
          " has root " + (graph.index.nodes[r]['name'] if r >= 0 else "None") + 
          " and master " + (graph.index.nodes[m]['name'] if m >= 0 else "None"))

roots: [0, 1, 2, 4, 4, 5, 6, 9, 9, 9, 10, 11, 12, 13, 14, 15]
masters: [-1, -1, -1, 3, 3, -1, -1, 7, 7, 7, 10, -1, -1, 13, 14, 15]

x has root x and master None
conv2d2_weight has root conv2d2_weight and master None
conv2d2_bias has root conv2d2_bias and master None
conv2d2 has root relu0 and master conv2d2
relu0 has root relu0 and master conv2d2
conv2d3_weight has root conv2d3_weight and master None
conv2d3_bias has root conv2d3_bias and master None
conv2d3 has root __add_symbol__1 and master conv2d3
relu1 has root __add_symbol__1 and master conv2d3
__add_symbol__1 has root __add_symbol__1 and master conv2d3
flatten0 has root flatten0 and master flatten0
dense0_weight has root dense0_weight and master None
dense0_bias has root dense0_bias and master None
dense0 has root dense0 and master dense0
softmax0 has root softmax0 and master softmax0
sum0 has root sum0 and master sum0


For now to understand how the graph was partitioned, let's just call GraphFuseCompile

In [14]:
graph._set_json_attr("target", "llvm", "str")
with tvm.target.create("llvm"):
    compiled = graph.apply('GraphFuseCompile')

In [15]:
print(compiled.ir())

Graph(%x,
      %conv2d2_weight,
      %conv2d2_bias,
      %conv2d3_weight,
      %conv2d3_bias,
      %dense0_weight,
      %dense0_bias) {
  %3 = tvm_op(%x, %conv2d2_weight, %conv2d2_bias, num_outputs='1', num_inputs='3', flatten_data='0', func_name='fuse_conv2d_relu')
  %6 = tvm_op(%x, %3, %conv2d3_weight, %conv2d3_bias, num_outputs='1', num_inputs='4', flatten_data='0', func_name='fuse_conv2d_relu_broadcast_add')
  %7 = tvm_op(%6, num_outputs='1', num_inputs='1', flatten_data='0', func_name='fuse_flatten')
  %10 = tvm_op(%7, %dense0_weight, %dense0_bias, num_outputs='1', num_inputs='3', flatten_data='0', func_name='fuse_dense')
  %11 = tvm_op(%10, num_outputs='1', num_inputs='1', flatten_data='0', func_name='fuse_softmax')
  %12 = tvm_op(%11, num_outputs='1', num_inputs='1', flatten_data='0', func_name='fuse_sum')
  ret %12
}
graph_attr_keys = [storage_id, dtype, dltype, shape, module]



So, you can guess by the func_names which operations have been fused. Not much, to my taste.

## Gradients

There is a pass called Gradient, so let's compute some gradients. Actually, there is a nice python function which passes needed parameters to this pass.

In [16]:
from nnvm.compiler.graph_util import gradients, get_gradient_graph

Our previous graph contains broadcast_add whose gradients were not implemented at the time of writing. So let's define another nn.

In [17]:
x = sym.Variable("x", shape=(1,13,100,100), dtype=1)
z = x
z = sym.conv2d(data=z, channels=13, kernel_size=(3,3), padding=[1,1])
z = sym.relu(data=z)
z = sym.flatten(z)
z = sym.dense(data=z, units=10)
z = sym.softmax(data=z)
z = sym.sum(z, keepdims=True)

We can create a graph returning gradients.

In [18]:
grad_graph = get_gradient_graph(z, x)
print(grad_graph.ir())

Graph(%x,
      %conv2d4_weight,
      %conv2d4_bias,
      %dense1_weight,
      %dense1_bias) {
  %3 = conv2d(%x, %conv2d4_weight, %conv2d4_bias, channels='13', padding='[1, 1]', kernel_size='(3, 3)')
  %4 = relu(%3)
  %5 = flatten(%4)
  %8 = dense(%5, %dense1_weight, %dense1_bias, units='10')
  %9 = softmax(%8)
  %10 = sum(%9, keepdims='True')
  %11 = ones_like(%10)
  %12 = expand_like(%11, %9, exclude='0', axis='[]')
  %13 = elemwise_mul(%12, %9)
  %14 = sum(%13, keepdims='true', axis='-1')
  %15 = broadcast_mul(%14, %9)
  %16 = elemwise_sub(%13, %15)
  %17 = matmul(%16, %dense1_weight)
  %18 = reshape_like(%17, %4, ='')
  %19 = zeros_like(%3)
  %20 = greater(%3, %19, exclude='true')
  %21 = elemwise_mul(%18, %20)
  %22 = _conv2d_grad(%21, %x, %conv2d4_weight, channels='13', padding='[1, 1]', kernel_size='(3, 3)')
  ret %22.0
}


Or we can use the function gradients to create symbols computing gradients (behind the scenes this function creates a graph and calls get_gradient_graph). If we want to return several values from a graph, we can group symbols with `nnvm.sym.Group`.

In [19]:
dz_dx = gradients(z, x)
# dz_dx is a list containing one symbol, so it looks a bit awckward
graph = nnvm.graph.create(nnvm.sym.Group([z] + dz_dx))
print(graph.ir())

Graph(%x,
      %conv2d4_weight,
      %conv2d4_bias,
      %dense1_weight,
      %dense1_bias) {
  %3 = conv2d(%x, %conv2d4_weight, %conv2d4_bias, channels='13', padding='[1, 1]', kernel_size='(3, 3)')
  %4 = relu(%3)
  %5 = flatten(%4)
  %8 = dense(%5, %dense1_weight, %dense1_bias, units='10')
  %9 = softmax(%8)
  %10 = sum(%9, keepdims='True')
  %11 = ones_like(%10)
  %12 = expand_like(%11, %9, exclude='0', axis='[]')
  %13 = elemwise_mul(%12, %9)
  %14 = sum(%13, keepdims='true', axis='-1')
  %15 = broadcast_mul(%14, %9)
  %16 = elemwise_sub(%13, %15)
  %17 = matmul(%16, %dense1_weight)
  %18 = reshape_like(%17, %4, ='')
  %19 = zeros_like(%3)
  %20 = greater(%3, %19, exclude='true')
  %21 = elemwise_mul(%18, %20)
  %22 = _conv2d_grad(%21, %x, %conv2d4_weight, channels='13', padding='[1, 1]', kernel_size='(3, 3)')
  ret %10, %22.0
}


There are many operation for which gradients are not implemented, here is the list:
- `__add_symbol__`
- `__div_symbol__`
- `__layout_transform__`
- `__mul_symbol__`
- `__sub_symbol__`
- `__undef__`
- `_contrib_conv2d_NCHWc`
- `_conv2d_grad`
- `_max_pool2d_grad`
- `avg_pool2d`
- `batch_norm`
- `broadcast_add`
- `broadcast_div`
- `broadcast_mul`
- `broadcast_sub`
- `broadcast_to`
- `cast`
- `concatenate`
- `conv2d_transpose`
- `dropout`
- `flip`
- `full`
- `global_avg_pool2d`
- `global_max_pool2d`
- `multibox_transform_loc`
- `nms`
- `ones`
- `pad`
- `prelu`
- `resize`
- `split`
- `tvm_op`
- `upsampling`
- `yolo2_reorg`
- `zeros`