-
Notifications
You must be signed in to change notification settings - Fork 22.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TorchDynamo Performance DashBoard #93794
Comments
Compilation ProfileThe tables show the worst 50 models for different metrics Compilation Latencysee moredtype=float32, unit=seconds
Peak Memorysee moredtype=float32, unit=GB
Number of graphssee moredtype=float32, unit=graphs
|
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Metrics over timehuggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Metrics over timehuggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
|
Dashboard to track the performance of different backends.
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire
The text was updated successfully, but these errors were encountered: