Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracing] trace json parsing and statistics utility #3638

Closed
wants to merge 1 commit into from

Conversation

@nickgg
Copy link
Contributor

nickgg commented Oct 17, 2019

Summary: A helper utility to interpret and generate statistics for glow traces, hopefully this will make understanding performance a bit easier.

By default the tool prints some basic statistics about the run:

❯ ./trace_parser.py ../build_Release/trace.json
Total time of trace: 11.13s
Time covered by events: 11.12s
Unattributed time: 11.32ms

More usefully, with the --layers argument it will print statistics about each operator.

❯ ./bin/image-classifier -m=resnet50 -minibatch=1 -model-input-name=gpu_0/data -use-imagenet-normalization -image-mode=0to1 --auto-instrument --trace-path=trace.json ../tests/images/imagenet/*
❯ ./trace_parser.py --layers ../build_Release/trace.json | head -10
gpu_0_conv11 (convolution) 3 events, mean: 142.25ms, stddev: 1.61ms, total: 426.74ms
gpu_0_res2_1_branch2b1 (convolution) 3 events, mean: 105.86ms, stddev: 1.27ms, total: 317.58ms
gpu_0_res3_3_branch2b1 (convolution) 3 events, mean: 103.90ms, stddev: 6.53ms, total: 311.71ms
gpu_0_res2_2_branch2b1 (convolution) 3 events, mean: 103.73ms, stddev: 1.57ms, total: 311.19ms
gpu_0_res3_0_branch2b1 (convolution) 3 events, mean: 102.76ms, stddev: 1.01ms, total: 308.29ms
gpu_0_res2_0_branch2b1 (convolution) 3 events, mean: 102.59ms, stddev: 550.93us, total: 307.76ms
gpu_0_res3_2_branch2b1 (convolution) 3 events, mean: 101.82ms, stddev: 1.87ms, total: 305.45ms
gpu_0_res3_1_branch2b1 (convolution) 3 events, mean: 100.72ms, stddev: 2.14ms, total: 302.17ms
gpu_0_res4_0_branch2b1 (convolution) 3 events, mean: 100.13ms, stddev: 1.52ms, total: 300.40ms
gpu_0_res4_5_branch2b1 (convolution) 3 events, mean: 99.86ms, stddev: 2.47ms, total: 299.60ms

With the --kinds argument it will aggregate on operator Kind instead:

❯ ./trace_parser.py --kinds ../build_Release/trace.json
convolution 159 events, mean: 69.09ms, stddev: 27.82ms, total: 10.98s
splat 36 events, mean: 821.47us, stddev: 779.49us, total: 29.57ms
deallocactivation 207 events, mean: 129.20us, stddev: 345.37us, total: 26.75ms
elementmax 147 events, mean: 172.35us, stddev: 186.90us, total: 25.34ms
maxpool 3 events, mean: 7.92ms, stddev: 618.31us, total: 23.76ms
matmul 3 events, mean: 6.10ms, stddev: 56.39us, total: 18.30ms
elementadd 48 events, mean: 142.52us, stddev: 96.59us, total: 6.84ms
transpose 3 events, mean: 1.31ms, stddev: 44.61us, total: 3.92ms
allocactivation 207 events, mean: 1.87us, stddev: 1.72us, total: 387.00us
avgpool 3 events, mean: 125.00us, stddev: 11.34us, total: 375.00us
runtime 21 events, mean: 15.24us, stddev: 21.23us, total: 320.00us
softmax 3 events, mean: 13.67us, stddev: 9.43us, total: 41.00us
batchedadd 3 events, mean: 5.33us, stddev: 4.03us, total: 16.00us
tensorview 3 events, mean: 0.67us, stddev: 0.47us, total: 2.00us

This works across backends with different Kind names (e.g. the OCL backend uses kernel name right now):

❯ ./trace_parser.py --kinds ../build_Release/trace-ocl.json
conv_forward_mem 159 events, mean: 29.38ms, stddev: 13.78ms, total: 4.67s
runtime 189 events, mean: 639.20us, stddev: 61.65ms, total: 120.81ms
elementmaxW16 147 events, mean: 66.05us, stddev: 52.83us, total: 9.71ms
elementaddW16 48 events, mean: 160.04us, stddev: 91.36us, total: 7.68ms
matmulW 3 events, mean: 1.38ms, stddev: 72.21us, total: 4.13ms
splatW16 36 events, mean: 109.75us, stddev: 270.26us, total: 3.95ms
oclmaxpoolW 3 events, mean: 1.11ms, stddev: 36.94us, total: 3.32ms
copyInputsToDevice 6 events, mean: 82.00us, stddev: 122.41us, total: 492.00us
oclavgpoolW 3 events, mean: 106.67us, stddev: 52.26us, total: 320.00us
copyOutputsFromDevice 6 events, mean: 42.67us, stddev: 44.98us, total: 256.00us
batchedaddW 3 events, mean: 52.33us, stddev: 49.47us, total: 157.00us
softmaxW 3 events, mean: 43.00us, stddev: 33.35us, total: 129.00us


Total time of trace: 4.83s
Time covered by events: 4.82s
Unattributed time: 12.02ms

Documentation: ./trace-parser.py --help

(depends on #3637 for json parsing)

Test Plan: Ran against traces produced by a variety of networks on a variety of backends.

Copy link
Contributor

SplitInfinity left a comment

Very cool, my comments are all nits. Three things I want to point out:

  1. I think we should use f-strings instead of the old string formatting syntax. I suggested edits in a few places to demonstrate this.
  2. Docstrings for functions and classes are usually placed below def func(...): and indented. Same goes for classes.
  3. I personally love type annotations so if you could add those that would be great but if not that's cool too.
utils/trace_parser.py Outdated Show resolved Hide resolved
utils/trace_parser.py Outdated Show resolved Hide resolved
utils/trace_parser.py Outdated Show resolved Hide resolved
utils/trace_parser.py Outdated Show resolved Hide resolved
utils/trace_parser.py Outdated Show resolved Hide resolved
utils/trace_parser.py Outdated Show resolved Hide resolved


def loadEvents(filename):
trace = json.load(open(filename))

This comment has been minimized.

Copy link
@SplitInfinity

SplitInfinity Oct 17, 2019

Contributor
Suggested change
trace = json.load(open(filename))
with open(filename) as f:
trace = json.load(f)

This comment has been minimized.

Copy link
@nickgg

nickgg Oct 17, 2019

Author Contributor

This feels like a bad place for with: it will indent a large block and the result when f is invalid is the same: an exception is thrown.

This comment has been minimized.

Copy link
@SplitInfinity

SplitInfinity Oct 17, 2019

Contributor

I was more worried about closing the file. Also, I don't think you have to indent whole thing; I think you can close the file right after loading the JSON into trace.

This comment has been minimized.

Copy link
@nickgg

nickgg Oct 17, 2019

Author Contributor

Generally don't like defining a var in an internal scope, but ok.

utils/trace_parser.py Outdated Show resolved Hide resolved
# Find all enclosed events and move them to be children.


def stackEvents(events):

This comment has been minimized.

Copy link
@SplitInfinity

SplitInfinity Oct 17, 2019

Contributor

I'm having some trouble following along what's going on here. I think some comments might help.

@nickgg

This comment has been minimized.

Copy link
Contributor Author

nickgg commented Oct 17, 2019

@nickgg nickgg force-pushed the nickgg:traceParse branch from 9611623 to 07f33d6 Oct 17, 2019
Copy link
Contributor Author

nickgg left a comment

Addressed comments, but I'm kind of ideologically against type annotations in Python.

@nickgg nickgg force-pushed the nickgg:traceParse branch 2 times, most recently from 6fa0c92 to 31b5cfe Oct 17, 2019
Copy link
Contributor

SplitInfinity left a comment

Just a few nits left

utils/trace_parser.py Outdated Show resolved Hide resolved
utils/trace_parser.py Outdated Show resolved Hide resolved
utils/trace_parser.py Outdated Show resolved Hide resolved
Copy link

facebook-github-bot left a comment

@rdzhabarov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@opti-mix

This comment has been minimized.

Copy link
Contributor

opti-mix commented Oct 18, 2019

@nickgg Very cool!

I've got two simple question:

  • Does it have an option to sort by time? It is usually very useful to easily see what is a hotspot
  • Could you show not only absolute times, but also percentage of total time used by a given (aggregated) operator? Like conv_forward_mem took 45% of the overall execution time.
@nickgg

This comment has been minimized.

Copy link
Contributor Author

nickgg commented Oct 18, 2019

@opti-mix It's sorted by time by default. I could add percentages if you think that would be easier to read.

@nickgg nickgg force-pushed the nickgg:traceParse branch from 31b5cfe to cf71652 Oct 18, 2019
@nickgg

This comment has been minimized.

Copy link
Contributor Author

nickgg commented Oct 18, 2019

addressed comments and added percentage for layers/kinds. Looks like this now:

❯ ./trace_parser.py --kinds ../build_Release/trace.json
conv_forward_mem 159 events, mean: 37.09 ms, stddev: 18.87 ms, total: 5.90 s (96.01%)
runtime 189 events, mean: 1.11 ms, stddev: 17.51 ms, total: 210.12 ms (3.42%)
elementmaxW16 147 events, mean: 60.38 us, stddev: 73.11 us, total: 8.88 ms (0.14%)
oclmaxpoolW 3 events, mean: 2.67 ms, stddev: 2.05 ms, total: 8.01 ms (0.13%)
elementaddW16 48 events, mean: 158.44 us, stddev: 96.98 us, total: 7.61 ms (0.12%)
matmulW 3 events, mean: 1.79 ms, stddev: 253.11 us, total: 5.36 ms (0.09%)
splatW16 36 events, mean: 106.83 us, stddev: 316.82 us, total: 3.85 ms (0.06%)
copy 12 events, mean: 52.67 us, stddev: 96.41 us, total: 632.00 us (0.01%)
oclavgpoolW 3 events, mean: 97.00 us, stddev: 22.05 us, total: 291.00 us (0.00%)
batchedaddW 3 events, mean: 25.33 us, stddev: 14.64 us, total: 76.00 us (0.00%)
softmaxW 3 events, mean: 21.67 us, stddev: 11.56 us, total: 65.00 us (0.00%)

Copy link

facebook-github-bot left a comment

@nickgg is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot

This comment has been minimized.

Copy link

facebook-github-bot commented Oct 18, 2019

@nickgg merged this pull request in e0a3a5e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.