Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchbench] The official benchmark for performance and accuracy check #7040

Open
shenh10 opened this issue May 9, 2024 · 7 comments
Open

Comments

@shenh10
Copy link

shenh10 commented May 9, 2024

❓ Questions and Help

Hi I found two available codebases for testing torchbench with pytorch/xla:

  1. The one provided by pytorch official: https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo
  2. Another one provided by pytorch/xla team: https://github.com/pytorch/xla/tree/master/benchmarks

However for the first codebase, it seems the support for dynamo + openxla backend would not trigger xla compilation actually. Is it no longer maintained?

And for the second one, I found it is able to test the performance, but has no way to validate the accuracy comparing to eager mode, while the first benchmark tool is able to do that. Any support for this?

Looking forward to your feedback.

@shenh10 shenh10 changed the title [torchbench] The official benchmark entrance [torchbench] The official benchmark for performance and accuracy check May 9, 2024
@JackCaoG
Copy link
Collaborator

JackCaoG commented May 9, 2024

@zpcore Can you provide more details on how to run torchbench with pytorch/xla?

@zpcore
Copy link
Collaborator

zpcore commented May 9, 2024

Here is the configuration script we use to run torchbench on TPU/GPU: https://github.com/GoogleCloudPlatform/ml-auto-solutions/blob/master/dags/pytorch_xla/configs/pytorchxla_torchbench_config.py.

For example when targetting tpu, get_torchbench_tpu_config() is the main entry function that constructs all the commands including installing dependencies and torchbench models, run torchbench and upload result to gcs bucket (you may not need).

Similar to GPU but all commands are running in our torch_xla GPU release docker.

@zpcore
Copy link
Collaborator

zpcore commented May 9, 2024

However for the first codebase, it seems the support for dynamo + openxla backend would not trigger xla compilation actually. Is it no longer maintained?

I don't think they support openxla backend in native torchbench. We need to move model to xla devices, which is handled in https://github.com/pytorch/xla/tree/master/benchmarks.

And for the second one, I found it is able to test the performance, but has no way to validate the accuracy comparing to eager mode, while the first benchmark tool is able to do that. Any support for this?

We don't have plan to add the accuracy metric at this time.

@shenh10
Copy link
Author

shenh10 commented May 12, 2024

Here is the configuration script we use to run torchbench on TPU/GPU: https://github.com/GoogleCloudPlatform/ml-auto-solutions/blob/master/dags/pytorch_xla/configs/pytorchxla_torchbench_config.py.

For example when targetting tpu, get_torchbench_tpu_config() is the main entry function that constructs all the commands including installing dependencies and torchbench models, run torchbench and upload result to gcs bucket (you may not need).

Similar to GPU but all commands are running in our torch_xla GPU release docker.

Thank you for your reply. I did not use Google Cloud. It seems that this mainly uses the running method of xla/benchmark/experimental_runner.py. I know how to use this, but unfortunately it seems that it does not support accuracy check. I would like to confirm with you that the benchmark under pytorch/benchmarks/dynamo/ (mainly torchbench.py and common.py) is not maintained by your group, as it seems to be partially supported, but not corrected supported.

@shenh10
Copy link
Author

shenh10 commented May 12, 2024

Here is the configuration script we use to run torchbench on TPU/GPU: https://github.com/GoogleCloudPlatform/ml-auto-solutions/blob/master/dags/pytorch_xla/configs/pytorchxla_torchbench_config.py.
For example when targetting tpu, get_torchbench_tpu_config() is the main entry function that constructs all the commands including installing dependencies and torchbench models, run torchbench and upload result to gcs bucket (you may not need).
Similar to GPU but all commands are running in our torch_xla GPU release docker.

Thank you for your reply. I did not use Google Cloud. It seems that this mainly uses the running method of xla/benchmark/experimental_runner.py. I know how to use this, but it seems that it does not support accuracy check.

Okay, I think accuracy check is probably quite important. FYI, I modified torchbench under pytorch/benchmarks/dynamo to utilize its accuracy check method for correctness verification (I have moved both the model and example_inputs to the xla device). Below is the correctness verification I did for the remaining models after excluding examples that still could not run with pytorch/xla within torchbench. This may be a helpful reference for investigating correctness issues with pytorch/xla:

Environment: NVIDIA A100 80G with CUDA 12.1
Configuration: Default example batch size
PyTorch version: torch 2.3.0-rc12 (compiled from source), pytorch/xla 2.3.0-rc12 (compiled from source)

Experimental Control groups: 1. dynamo-openxla vs eager 2. dynamo-inductor vs eager
(Tolerance refers to https://github.com/pytorch/pytorch/blob/037615b989b37b1bf5eff0c031055fc8d1fbe5ae/torch/_dynamo/utils.py#L1303's tol.)
Testing command example:

./benchmarks/dynamo/torchbench.py --device=cuda --iterations-per-run=1 --output=torchbench_training_fp32_xla.csv --output-directory=./reports_only --trace-on-xla --backend=openxla --accuracy --train --iterations=10 --xla-tolerance 0.1 --only=dcgan --float32
image

List the three control groups in the table, where:

  1. red indicates accuracy check failure;
  2. green indicates accuracy check pass
  3. yellow indicates that the eager two-round execution itself cannot be aligned and can be ignored.

Let's note:

Exp1. dynamo-openxla tolerance=0.01
Exp2. dynamo-openxla tolerance=0.1
Exp3. dynamo-inductor tolerance=0.01

Comparing Exp1 and Exp3, it can be observed that under the same tolerance, dynamo inductor shows a performance very close to eager in terms of accuracy, while dynamo openxla has a significant accuracy difference.

Contrasting Exp1 and Exp2, it is noticeable that when relaxing the accuracy tolerance threshold to a relatively high level, some cases can pass. However, cases that still cannot pass under Exp2 are likely due to bugs in the compilation, such as the issue mentioned in #7042, which is present in most hf_xxx models and leads to incorrect calculations.

@zpcore
Copy link
Collaborator

zpcore commented May 14, 2024

Hi @shenh10 , thanks for checking the accuracy.

@JackCaoG , do you know what can cause the accuracy difference with the openxla backend? They are using the native torchbench script for the experiment.

@JackCaoG
Copy link
Collaborator

There are a couple possibility

  1. It is something wrong with the PyTorch/XLA's dynamo implementation.
  2. XLA:GPU perform some optimization which has implication on accuracy.

I think the easiest way to check is to use LazyTensor to run the model(pretty much just drop torch.compile and add a mark_step after the loss.backward) and compare the gradient of that with the native GPU runs. If the issue is in our dynamo we can try to figure out why. If the issue is in XLA:GPU, we will likely need to figure out which optimization pass is causing accuracy difference..

BTW we don't really recommend user to do torch.compile with training now(we only use dynamo for inference now), all of our training runs are with LazyTensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants