Skip to content
Browse files

trace json parsing and statistics utility (#3638)

A helper utility to interpret and generate statistics for glow traces, hopefully this will make understanding performance a bit easier.

By default the tool prints some basic statistics about the run:
❯ ./ ../build_Release/trace.json
Total time of trace: 11.13s
Time covered by events: 11.12s
Unattributed time: 11.32ms

More usefully, with the `--layers` argument it will print statistics about each operator.
❯ ./bin/image-classifier -m=resnet50 -minibatch=1 -model-input-name=gpu_0/data -use-imagenet-normalization -image-mode=0to1 --auto-instrument --trace-path=trace.json ../tests/images/imagenet/*
❯ ./ --layers ../build_Release/trace.json | head -10
gpu_0_conv11 (convolution) 3 events, mean: 142.25ms, stddev: 1.61ms, total: 426.74ms
gpu_0_res2_1_branch2b1 (convolution) 3 events, mean: 105.86ms, stddev: 1.27ms, total: 317.58ms
gpu_0_res3_3_branch2b1 (convolution) 3 events, mean: 103.90ms, stddev: 6.53ms, total: 311.71ms
gpu_0_res2_2_branch2b1 (convolution) 3 events, mean: 103.73ms, stddev: 1.57ms, total: 311.19ms
gpu_0_res3_0_branch2b1 (convolution) 3 events, mean: 102.76ms, stddev: 1.01ms, total: 308.29ms
gpu_0_res2_0_branch2b1 (convolution) 3 events, mean: 102.59ms, stddev: 550.93us, total: 307.76ms
gpu_0_res3_2_branch2b1 (convolution) 3 events, mean: 101.82ms, stddev: 1.87ms, total: 305.45ms
gpu_0_res3_1_branch2b1 (convolution) 3 events, mean: 100.72ms, stddev: 2.14ms, total: 302.17ms
gpu_0_res4_0_branch2b1 (convolution) 3 events, mean: 100.13ms, stddev: 1.52ms, total: 300.40ms
gpu_0_res4_5_branch2b1 (convolution) 3 events, mean: 99.86ms, stddev: 2.47ms, total: 299.60ms

With the `--kinds` argument it will aggregate on operator Kind instead:
❯ ./ --kinds ../build_Release/trace.json
convolution 159 events, mean: 69.09ms, stddev: 27.82ms, total: 10.98s
splat 36 events, mean: 821.47us, stddev: 779.49us, total: 29.57ms
deallocactivation 207 events, mean: 129.20us, stddev: 345.37us, total: 26.75ms
elementmax 147 events, mean: 172.35us, stddev: 186.90us, total: 25.34ms
maxpool 3 events, mean: 7.92ms, stddev: 618.31us, total: 23.76ms
matmul 3 events, mean: 6.10ms, stddev: 56.39us, total: 18.30ms
elementadd 48 events, mean: 142.52us, stddev: 96.59us, total: 6.84ms
transpose 3 events, mean: 1.31ms, stddev: 44.61us, total: 3.92ms
allocactivation 207 events, mean: 1.87us, stddev: 1.72us, total: 387.00us
avgpool 3 events, mean: 125.00us, stddev: 11.34us, total: 375.00us
runtime 21 events, mean: 15.24us, stddev: 21.23us, total: 320.00us
softmax 3 events, mean: 13.67us, stddev: 9.43us, total: 41.00us
batchedadd 3 events, mean: 5.33us, stddev: 4.03us, total: 16.00us
tensorview 3 events, mean: 0.67us, stddev: 0.47us, total: 2.00us

This works across backends with different Kind names (e.g. the OCL backend uses kernel name right now):

❯ ./ --kinds ../build_Release/trace-ocl.json
conv_forward_mem 159 events, mean: 29.38ms, stddev: 13.78ms, total: 4.67s
runtime 189 events, mean: 639.20us, stddev: 61.65ms, total: 120.81ms
elementmaxW16 147 events, mean: 66.05us, stddev: 52.83us, total: 9.71ms
elementaddW16 48 events, mean: 160.04us, stddev: 91.36us, total: 7.68ms
matmulW 3 events, mean: 1.38ms, stddev: 72.21us, total: 4.13ms
splatW16 36 events, mean: 109.75us, stddev: 270.26us, total: 3.95ms
oclmaxpoolW 3 events, mean: 1.11ms, stddev: 36.94us, total: 3.32ms
copyInputsToDevice 6 events, mean: 82.00us, stddev: 122.41us, total: 492.00us
oclavgpoolW 3 events, mean: 106.67us, stddev: 52.26us, total: 320.00us
copyOutputsFromDevice 6 events, mean: 42.67us, stddev: 44.98us, total: 256.00us
batchedaddW 3 events, mean: 52.33us, stddev: 49.47us, total: 157.00us
softmaxW 3 events, mean: 43.00us, stddev: 33.35us, total: 129.00us

Total time of trace: 4.83s
Time covered by events: 4.82s
Unattributed time: 12.02ms

Documentation: ./ --help

(depends on #3637 for json parsing)
Pull Request resolved: #3638

Test Plan: Ran against traces produced by a variety of networks on a variety of backends.

Differential Revision: D17989618

Pulled By: nickgg

fbshipit-source-id: 1cb923668a1f85f7df73809a69074446b418eb45
  • Loading branch information...
nickgg authored and facebook-github-bot committed Oct 18, 2019
1 parent ebad15a commit e0a3a5e10e125807bea4290318e5af7551c09902
Showing with 199 additions and 0 deletions.
  1. +199 −0 utils/
@@ -0,0 +1,199 @@
#!/usr/bin/env python3
# Copyright (c) Glow Contributors. See CONTRIBUTORS file.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import json
import numpy
from collections import defaultdict
from operator import attrgetter, itemgetter

def formatUs(time):
""" Format human readable time (input in us). """
if time < 1000:
return f"{time:.2f} us"

time = time / 1000
if time < 1000:
return f"{time:.2f} ms"

time = time / 1000
return f"{time:.2f} s"

class Event:
""" Class to hold TraceEvents, matches glow::TraceEvent. """

def __init__(self, name, start, end, optype): = name
self.start = start
self.end = end
self.optype = optype
self.children = []
self.child_time = 0

def __repr__(self):
return f"Event({}, {self.start}, {self.end}, {self.optype})"

def printTree(self, tabs):
""" Pretty print the tree. """
indent = tabs * "\t"
print(f"{indent}{} ({self.optype})")
for c in self.children:
c.printTree(tabs + 1)

def totalOverlap(self, event):
""" Returns True if this Event completely incloses the provided event. """
return self.start <= event.start and self.end >= event.end

def addChild(self, event):
""" Add an enclosed event. """

def updateChildTime(self):
""" Determine the total time cost of all children. """
self.child_time = 0
for child in self.children:
self.child_time += child.end - child.start

def selfTime(self):
""" Return this Event's time cost above the sum of its children. """
return (self.end - self.start) - self.child_time

def loadEvents(filename):
""" Load the json trace file and create Events. """
trace = None
with open(filename) as f:
trace = json.load(f)
events = []
partialEvents = {}
for line in trace:
if "name" in line:
name = line["name"]
evtype = line["ph"]
start = int(line["ts"])
optype = "runtime"
if "args" in line:
if "type" in line["args"]:
optype = line["args"]["type"]
elif "kind" in line["args"]:
optype = line["args"]["kind"]
end = 0
if evtype == "X":
end = start + int(line["dur"])
events.append(Event(name, start, end, optype))
elif evtype == "B":
partialEvents[name] = Event(name, start, end, optype)
elif evtype == "E":
if not name in partialEvents:
# This is a bug in Glow tracing, but ignore for now.
ev = partialEvents[name]
ev.end = start

return events

def stackEvents(events):
""" Find all enclosed events and move them to be children. Returns a tree of Events
where parents completely enclose the timeline of their children. """
# Ensure events are sorted by time.
events = sorted(events, key=attrgetter("end"), reverse=True)
events = sorted(events, key=attrgetter("start"))
result = []
lastEvent = None
for ev in events:
# If ev is enclosed by the previous event, add it as a child.
if lastEvent:
if lastEvent.totalOverlap(ev):
# If we're closing the previous event, recursively stack its children.
if lastEvent.children:
lastEvent.children = stackEvents(lastEvent.children)
# If not enclosed its a new top-level event, which may enclose other events.
lastEvent = ev
# Stack children of the last Event.
if lastEvent.children:
lastEvent.children = stackEvents(lastEvent.children)
return result

def dumpAccumulate(events, keyfunc, traceTime):
""" Accumulate Event durations by a key produced by keyfunc. Keyfunc is a lambda which
takes an Event as a parameter. """
nameMap = defaultdict(list)
for ev in events:
name = keyfunc(ev)

layers = []
for (name, times) in nameMap.items():
layers.append((name, len(times), numpy.mean(times),
numpy.std(times), numpy.sum(times)))

# Iterate sorted by total time.
for (name, num, mean, stddev, total) in sorted(layers,
key=itemgetter(4), reverse=True):
mean = formatUs(mean)
stddev = formatUs(stddev)
pc = (total / traceTime) * 100
total = formatUs(total)
f"{name} {num} events, mean: {mean}, stddev: {stddev}, total: {total} ({pc:.2f}%)")

def main():
parser = argparse.ArgumentParser(description='process trace json')
parser.add_argument('filename', type=str,
help='filename for trace file to load')
parser.add_argument("--layers", action='store_true')
parser.add_argument("--kinds", action='store_true')

args = parser.parse_args()
events = loadEvents(args.filename)

# Stack events so we can determine selfTime.
stacked = stackEvents(events)

# Ensure events are sorted by startTime.
stacked = sorted(stacked, key=attrgetter("start"))
totalTime = stacked[-1].end - stacked[0].start
coveredTime = 0
for ev in stacked:
coveredTime += ev.end - ev.start

if args.layers:
dumpAccumulate(events, lambda ev: "%s (%s)" %
(, ev.optype), coveredTime)

if args.kinds:
dumpAccumulate(events, lambda ev: ev.optype, coveredTime)

print("Total time of trace:", formatUs(totalTime))
print("Time covered by events:", formatUs(coveredTime))
print("Unattributed time:", formatUs(totalTime - coveredTime))

if __name__ == "__main__":

0 comments on commit e0a3a5e

Please sign in to comment.
You can’t perform that action at this time.