[torchbench] The official benchmark for performance and accuracy check #7040

shenh10 · 2024-05-09T08:33:21Z

❓ Questions and Help

Hi I found two available codebases for testing torchbench with pytorch/xla:

The one provided by pytorch official: https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo
Another one provided by pytorch/xla team: https://github.com/pytorch/xla/tree/master/benchmarks

However for the first codebase, it seems the support for dynamo + openxla backend would not trigger xla compilation actually. Is it no longer maintained?

And for the second one, I found it is able to test the performance, but has no way to validate the accuracy comparing to eager mode, while the first benchmark tool is able to do that. Any support for this?

Looking forward to your feedback.

JackCaoG · 2024-05-09T17:09:41Z

@zpcore Can you provide more details on how to run torchbench with pytorch/xla?

zpcore · 2024-05-09T17:46:04Z

Here is the configuration script we use to run torchbench on TPU/GPU: https://github.com/GoogleCloudPlatform/ml-auto-solutions/blob/master/dags/pytorch_xla/configs/pytorchxla_torchbench_config.py.

For example when targetting tpu, get_torchbench_tpu_config() is the main entry function that constructs all the commands including installing dependencies and torchbench models, run torchbench and upload result to gcs bucket (you may not need).

Similar to GPU but all commands are running in our torch_xla GPU release docker.

zpcore · 2024-05-09T20:41:01Z

However for the first codebase, it seems the support for dynamo + openxla backend would not trigger xla compilation actually. Is it no longer maintained?

I don't think they support openxla backend in native torchbench. We need to move model to xla devices, which is handled in https://github.com/pytorch/xla/tree/master/benchmarks.

And for the second one, I found it is able to test the performance, but has no way to validate the accuracy comparing to eager mode, while the first benchmark tool is able to do that. Any support for this?

We don't have plan to add the accuracy metric at this time.

shenh10 · 2024-05-12T15:35:33Z

Here is the configuration script we use to run torchbench on TPU/GPU: https://github.com/GoogleCloudPlatform/ml-auto-solutions/blob/master/dags/pytorch_xla/configs/pytorchxla_torchbench_config.py.

For example when targetting tpu, get_torchbench_tpu_config() is the main entry function that constructs all the commands including installing dependencies and torchbench models, run torchbench and upload result to gcs bucket (you may not need).

Similar to GPU but all commands are running in our torch_xla GPU release docker.

Thank you for your reply. I did not use Google Cloud. It seems that this mainly uses the running method of xla/benchmark/experimental_runner.py. I know how to use this, but unfortunately it seems that it does not support accuracy check. I would like to confirm with you that the benchmark under pytorch/benchmarks/dynamo/ (mainly torchbench.py and common.py) is not maintained by your group, as it seems to be partially supported, but not corrected supported.

shenh10 · 2024-05-12T15:54:00Z

Here is the configuration script we use to run torchbench on TPU/GPU: https://github.com/GoogleCloudPlatform/ml-auto-solutions/blob/master/dags/pytorch_xla/configs/pytorchxla_torchbench_config.py.
For example when targetting tpu, get_torchbench_tpu_config() is the main entry function that constructs all the commands including installing dependencies and torchbench models, run torchbench and upload result to gcs bucket (you may not need).
Similar to GPU but all commands are running in our torch_xla GPU release docker.

Thank you for your reply. I did not use Google Cloud. It seems that this mainly uses the running method of xla/benchmark/experimental_runner.py. I know how to use this, but it seems that it does not support accuracy check.

Okay, I think accuracy check is probably quite important. FYI, I modified torchbench under pytorch/benchmarks/dynamo to utilize its accuracy check method for correctness verification (I have moved both the model and example_inputs to the xla device). Below is the correctness verification I did for the remaining models after excluding examples that still could not run with pytorch/xla within torchbench. This may be a helpful reference for investigating correctness issues with pytorch/xla:

Environment: NVIDIA A100 80G with CUDA 12.1
Configuration: Default example batch size
PyTorch version: torch 2.3.0-rc12 (compiled from source), pytorch/xla 2.3.0-rc12 (compiled from source)

Experimental Control groups: 1. dynamo-openxla vs eager 2. dynamo-inductor vs eager
(Tolerance refers to https://github.com/pytorch/pytorch/blob/037615b989b37b1bf5eff0c031055fc8d1fbe5ae/torch/_dynamo/utils.py#L1303's tol.)
Testing command example:

./benchmarks/dynamo/torchbench.py --device=cuda --iterations-per-run=1 --output=torchbench_training_fp32_xla.csv --output-directory=./reports_only --trace-on-xla --backend=openxla --accuracy --train --iterations=10 --xla-tolerance 0.1 --only=dcgan --float32

List the three control groups in the table, where:

red indicates accuracy check failure;
green indicates accuracy check pass
yellow indicates that the eager two-round execution itself cannot be aligned and can be ignored.

Let's note:

Exp1. dynamo-openxla tolerance=0.01
Exp2. dynamo-openxla tolerance=0.1
Exp3. dynamo-inductor tolerance=0.01

Comparing Exp1 and Exp3, it can be observed that under the same tolerance, dynamo inductor shows a performance very close to eager in terms of accuracy, while dynamo openxla has a significant accuracy difference.

Contrasting Exp1 and Exp2, it is noticeable that when relaxing the accuracy tolerance threshold to a relatively high level, some cases can pass. However, cases that still cannot pass under Exp2 are likely due to bugs in the compilation, such as the issue mentioned in #7042, which is present in most hf_xxx models and leads to incorrect calculations.

zpcore · 2024-05-14T20:35:39Z

Hi @shenh10 , thanks for checking the accuracy.

@JackCaoG , do you know what can cause the accuracy difference with the openxla backend? They are using the native torchbench script for the experiment.

JackCaoG · 2024-05-15T00:08:32Z

There are a couple possibility

It is something wrong with the PyTorch/XLA's dynamo implementation.
XLA:GPU perform some optimization which has implication on accuracy.

I think the easiest way to check is to use LazyTensor to run the model(pretty much just drop torch.compile and add a mark_step after the loss.backward) and compare the gradient of that with the native GPU runs. If the issue is in our dynamo we can try to figure out why. If the issue is in XLA:GPU, we will likely need to figure out which optimization pass is causing accuracy difference..

BTW we don't really recommend user to do torch.compile with training now(we only use dynamo for inference now), all of our training runs are with LazyTensor.

shenh10 changed the title ~~[torchbench] The official benchmark entrance~~ [torchbench] The official benchmark for performance and accuracy check May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torchbench] The official benchmark for performance and accuracy check #7040

[torchbench] The official benchmark for performance and accuracy check #7040

shenh10 commented May 9, 2024 •

edited

Loading

JackCaoG commented May 9, 2024

zpcore commented May 9, 2024

zpcore commented May 9, 2024

shenh10 commented May 12, 2024 •

edited

Loading

shenh10 commented May 12, 2024 •

edited

Loading

zpcore commented May 14, 2024

JackCaoG commented May 15, 2024

[torchbench] The official benchmark for performance and accuracy check #7040

[torchbench] The official benchmark for performance and accuracy check #7040

Comments

shenh10 commented May 9, 2024 • edited Loading

❓ Questions and Help

JackCaoG commented May 9, 2024

zpcore commented May 9, 2024

zpcore commented May 9, 2024

shenh10 commented May 12, 2024 • edited Loading

shenh10 commented May 12, 2024 • edited Loading

zpcore commented May 14, 2024

JackCaoG commented May 15, 2024

shenh10 commented May 9, 2024 •

edited

Loading

shenh10 commented May 12, 2024 •

edited

Loading

shenh10 commented May 12, 2024 •

edited

Loading