# Caffe2 model

I'm going to explore _ResNet50_, an official model for which can be obtained as
```
curl -LO https://raw.githubusercontent.com/caffe2/models/master/resnet50/predict_net.pbtxt
```

This file defines network operations, but omits tensor shapes. This information is stored separately, along with pre-trained parameters:

```
curl -LO https://github.com/caffe2/models/raw/master/resnet50/init_net.pb
```

The file is quite large (> 100MB), and when loaded as-is, it can bring the Python kernel to its knees. Hence we remove all parameters from the protobuf message definition before deserializing it:

```
curl -LO https://raw.githubusercontent.com/caffe2/caffe2/master/caffe2/proto/caffe2.proto
sed -i '/floats\|float_data\|int32_data\|byte_data\|string_data\|double_data\|int64_data/d' caffe2.proto
```

Finally, we compile the definition to a Python file (you need the [protocol compiler](https://github.com/google/protobuf#protocol-compiler-installation) installed on your system):
```
protoc -I=. --python_out=. caffe2.proto
```

For any given operation, it's sufficient to know the number of output channels -- spatial dimensions can be computed from kernel size and stride/padding information.

In [None]:
import caffe2_pb2

with open('init_net.pb', 'rb') as f:
  caffe2_tensor_defs = caffe2_pb2.NetDef()
  caffe2_tensor_defs.ParseFromString(f.read())

# We're only interested in weight shapes (ending with _w)
tensor_out_channels = {o.output[0]: o.arg[0].ints[0] for o in caffe2_tensor_defs.op if o.output[0].endswith('_w')}

In [None]:
import google.protobuf.text_format as pb_text_format

with open('predict_net.pbtxt') as f:
  caffe2_model_def = pb_text_format.Parse(f.read(), caffe2_pb2.NetDef())

A seemingly staightforward way to compute spatial dimensions is to feed the output shape of the previous op to the succeeding op. This wouldn't work in the context of ResNet, however, due to the network topology featuring shortcut connections; hence we use a dictionary of operation name <-> output dimensions.

In [None]:
from math import floor

def compute_out_dimensions(in_h, in_w, kernel_h, kernel_w, padding, stride):
  out_h = floor((in_h - kernel_h + 2 * (padding or 0)) / (stride or 1)) + 1
  out_w = floor((in_w - kernel_w + 2 * (padding or 0)) / (stride or 1)) + 1
  return out_h, out_w

def parse_caffe2_op(op, in_h, in_w, in_channels):
  args = {a.name: a.i for a in op.arg}
  padding = args.get('pad')
  stride = args.get('stride')
  kernel_size = args.get('kernel')
  weights_tensor = next((inp for inp in op.input if inp.endswith('_w')), None)
  out_channels = tensor_out_channels[weights_tensor] if weights_tensor else in_channels
  out_h, out_w = compute_out_dimensions(in_h, in_w, kernel_size, kernel_size, padding, stride) if kernel_size else [in_h, in_w]
  
  description =  [op.type, in_channels, out_channels, kernel_size, kernel_size, padding, stride, out_h, out_w]

  return (out_h, out_w, out_channels, description)

descriptions = []
dimensions = {'gpu_0/data': (224, 224, 3)}

for op in caffe2_model_def.op:
  input_h, input_w, input_channels = dimensions[op.input[0]]
  output_h, output_w, output_channels, descr = parse_caffe2_op(op, input_h, input_w, input_channels)
  dimensions[op.output[0]] = (output_h, output_w, output_channels)
  descriptions.append(descr)

Finally, we convert model descriptions to CSV to analyze them later in a spreadsheet form. We also translate operation names so as to stay consistent between Caffe2 and TensorFlow.

In [None]:
op_names = {'Conv': 'Conv', 'SpatialBN': 'BatchNorm', 'Relu': 'Relu', 'MaxPool': 'MaxPool', 'Sum': 'Sum',
            'AveragePool': 'AveragePool', 'FC': 'FullyConnected', 'Softmax': 'Softmax'}

def translate_op_name(descr):
  descr[0] = op_names[descr[0]]
  return descr

def stringify(val):
  return str(val) if val is not None else ''

descriptions = [','.join(translate_op_name(list(map(stringify, descr)))) for descr in descriptions]
for d in descriptions: print(d)

# TensorFlow model

I have a graph definition file already prepared -- copied from a checkpoint -- for the same model (_ResNet50_):

In [None]:
import tensorflow as tf

with open('graph.pbtxt') as f:
  graph_def = pb_text_format.Parse(f.read(), tf.GraphDef())

Output shape information is present in the graph definition, so we don't have to hunt for additional files like with Caffe2. Kernel shape is not stated explicitly, but we can extract it from weight tensor shapes.

In [None]:
ops = ['Conv2D', 'FusedBatchNorm', 'MaxPool', 'Relu', 'Add']
op_names = {'Conv2D': 'Conv', 'FusedBatchNorm': 'BatchNorm', 'Relu': 'Relu', 'MaxPool': 'MaxPool', 'Add': 'Sum'}

def extract_conv_param_hw(node, graph_def):
  param_node_name = next(name for name in node.input if name.endswith('read'))
  return var_dimensions[param_node_name][0:2]

def extract_output_dims(node):
  return [d.size for d in node.attr['_output_shapes'].list.shape[0].dim]

def describe_tf_node(node, in_channels):
  _, out_h, out_w, out_channels = extract_output_dims(node)
  
  kernel_h, kernel_w = (node.attr['ksize'].list.i[1:3] if 'ksize' in node.attr
    else extract_conv_param_hw(node, graph_def) if node.op == 'Conv2D'
    else [None] * 2)

  stride = node.attr['strides'].list.i[1] if node.attr['strides'] and len(node.attr['strides'].list.i) > 0 else None # assuming vertical stride = horizontal stide
  
  return (out_channels, list(map(stringify,
    [op_names[node.op], in_channels, out_channels, kernel_h, kernel_w, None, stride, out_h, out_w])))

var_dimensions = {n.name: extract_output_dims(n) for n in graph_def.node if n.name.endswith('read')}

nodes = [n for n in graph_def.node if n.op in ops and n.name.startswith('v/tower_0/cg')]
descriptions = []
output_channels_per_input = {nodes[0].input[0]: 3}
for n in nodes:
  out_channels, descr = describe_tf_node(n, output_channels_per_input[n.input[0]])
  output_channels_per_input[n.name] = out_channels
  descriptions.append(descr)

for d in descriptions: print(','.join(d))