-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torch Compile is slightly slower than eager mode. #98441
Comments
So, on the example from #98102 I noticed your trainer script was including compile time as part of the overall train samples per second. Is this using the same trainer? This won't tell you if the compiled code is faster or not. (Of course, it may not be good ROI to spend the time compiling if you're doing a very short run, but that's a separate question.) |
@ezyang avg of second half is the average step length of the second half number of steps (500 last steps in this case) that's what we use to measure the perf. |
OK, noob question, how do I get the "avg of second half" stats? Is
this? Or am I supposed to go do wandb or something |
change the transformers trainer.py you can either change on the packages or change and build from source. |
Deberta is known to have lots of graph breaks, should hopefully be fixed by #98158 |
🐛 Describe the bug
When running some models on Torch, I have noticed that the torch.compile mode is slightly slower than the eager mode.
It may or may not be related to this issue : #98102
one example is : microsoft-deberta-base
To reproduce:
go to this folder transformers/examples/pytorch/language-modeling/ and run:
eager mode:
python run_mlm.py --model_name_or_path microsoft/deberta-v3-base --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train --do_eval --overwrite_output_dir --output_dir ./outputs/ --seed 1137 --fp16 --report_to none --max_train_samples 1000
torch.compile:
python run_mlm.py --model_name_or_path microsoft/deberta-v3-base --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train --do_eval --overwrite_output_dir --output_dir ./outputs/ --seed 1137 --fp16 --report_to none --max_train_samples 1000 --torch_compile
results :
Ran on a Single Tesla V100 16GB GPU.
Versions
[conda] numpy 1.24.1 pypi_0 pypi
[conda] pytorch-triton 2.1.0+46672772b4 pypi_0 pypi
[conda] torch 2.1.0.dev20230404+cu117 pypi_0 pypi
[conda] torch-ort 1.14.0 pypi_0 pypi
[conda] torchaudio 2.0.0.dev20230313+cu117 pypi_0 pypi
[conda] torchvision 0.15.0.dev20230313+cu117 pypi_0 pypi
[conda] triton 2.0.0 pypi_0 pypi
cc @ezyang @soumith @msaroufim @wconstab @ngimel @bdhirsh
The text was updated successfully, but these errors were encountered: