-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch 2.1.0 Performance regression from PyTorch 2.0.1 #117081
Comments
tentatively marking hi-pri for the regression |
@sirutBuasai Are you seeing the issues with the latest PT2.2 RC or the Nightlines as well? |
@sirutBuasai are you using |
@bdhirsh Can confirm that without |
@chauhang Regression still exists in nightly build when using
Training output: |
This does not sound like a regression, but rather a feature work: |
Well, the point of filling empty is that it is can be directly used by a user, and in that case we don't necessarily know if they will properly initialize it. But I think having a "trust me" mode for "I promise not to directly use uninitialized memory" is very reasonable. |
Thank you for your responses. We've root caused and notified the customers. Closing issue. |
Hmm, I wonder if next step would be to update https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html doc to advertise this option and advice that initializing memory is expensive (also we probably should use faster |
馃悰 Describe the bug
Hi, we have found performance regression on PyTorch 2.1.0 from PyTorch 2.0.1 on AWS
g5.2xlarge
instance type. Below are the results we observed from running an example training scripts. I have also ran the script on PyTorch 2.1.2 which shows that the regression still exists.PyTorch 2.0.1 + CUDA 11.8:
average step time: 0.11788356158540853
PyTorch 2.1.0 + CUDA 11.8:
average step time: 0.1284193184877639
PyTorch 2.1.0 + CUDA 12.1:
average step time: 0.12725605948790183
PyTorch 2.1.2 + CUDA 11.8:
average step time: 0.12841543558533886
PyTorch 2.1.2 + CUDA 12.1:
average step time: 0.12724469848789585
We suspect that the regression may be related to aten: fill_ kernel that we see in the trace files within PyTorch 2.1.* that does not exist in PyTorch 2.0.1. We observed 10% performance regression drop for this training script but our customer has reported 30% performance drop.
I have attached the training script, trace files, conda environment template, as well as steps to reproduce the results. The gist is located here. Or to download the trace files through pt2.1-regression.zip.
Steps to reproduce:
conda env create -f pt201-cu118.yml
andconda activate pt201-cu118
python train.py
Versions
cc @ezyang @gchanan @zou3519 @kadeng @ptrblck
The text was updated successfully, but these errors were encountered: