Benchmarks: Unify metric names of benchmarks #252

yukirora · 2021-12-01T12:04:42Z

Description
Revise metric names of benchmarks.

codecov · 2021-12-01T12:13:05Z

Codecov Report

Merging #252 (5a176fa) into main (c13ed2a) will increase coverage by 0.06%.
The diff coverage is 98.48%.

@@            Coverage Diff             @@
##             main     #252      +/-   ##
==========================================
+ Coverage   87.61%   87.67%   +0.06%     
==========================================
  Files          70       70              
  Lines        3947     3967      +20     
==========================================
+ Hits         3458     3478      +20     
  Misses        489      489

Flag	Coverage Δ
cpu-unit-test	`72.19% <89.39%> (+0.14%)`	⬆️
cuda-unit-test	`87.62% <98.48%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...arks/micro_benchmarks/ib_validation_performance.py	`86.97% <95.00%> (+0.64%)`	⬆️
.../docker_benchmarks/rocm_onnxruntime_performance.py	`90.90% <100.00%> (ø)`
...ro_benchmarks/computation_communication_overlap.py	`83.01% <100.00%> (ø)`
...ks/micro_benchmarks/cuda_gemm_flops_performance.py	`71.69% <100.00%> (ø)`
...ch/benchmarks/micro_benchmarks/disk_performance.py	`98.96% <100.00%> (ø)`
...ks/micro_benchmarks/gemm_flops_performance_base.py	`96.77% <100.00%> (+0.10%)`	⬆️
.../benchmarks/micro_benchmarks/gpcnet_performance.py	`93.10% <100.00%> (+1.10%)`	⬆️
...hmarks/micro_benchmarks/gpu_copy_bw_performance.py	`97.56% <100.00%> (ø)`
...hmarks/micro_benchmarks/ib_loopback_performance.py	`88.88% <100.00%> (ø)`
...chmarks/micro_benchmarks/kernel_launch_overhead.py	`85.00% <100.00%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c13ed2a...5a176fa. Read the comment docs.

docs/user-tutorial/benchmarks/micro-benchmarks.md

abuccts

two general questions:

do we need to have a unified metrics format, e.g., benchmark/description_type_statistic? type can be time/bw/flops/count, etc., statistic can be min/max/avg, etc.
is it necessary to include the unit in the metrics? like time_ms, etc.

docs/user-tutorial/benchmarks/micro-benchmarks.md

superbench/benchmarks/micro_benchmarks/ib_loopback_performance.py

superbench/benchmarks/micro_benchmarks/ib_validation_performance.py

superbench/benchmarks/micro_benchmarks/disk_performance.py

cp5555 · 2021-12-08T10:01:10Z

docs/design-docs/benchmarks.md

+        'throughput-train-fp32': [[step1_time, ..., stepK_time], ..., […]],
+        'throughput-train-fp16': [[step1_time, ..., stepK_time], ..., […]],
+        'throughput-inference-fp32': [[step1_time, ..., stepK_time], ..., […]],
+        'throughput-inference-fp16': [[step1_time, ..., stepK_time], ..., […]],


Change to avg_throughput

…uperbenchmark into v-yujiang/doc-metric

format metric doc

5af1e8c

yukirora added the documentation Improvements or additions to documentation label Dec 1, 2021

yukirora requested a review from abuccts December 1, 2021 12:04

yukirora requested a review from cp5555 as a code owner December 1, 2021 12:04