-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] enable faster_load_save for Fused_SGD #125456
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125456
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c4eff75 with merge base b08072f ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Rebase failed due to Command
Raised by https://github.com/pytorch/pytorch/actions/runs/8934168228 |
FYI as of yesterday |
@crcrpar could you take a look at this PR? |
@petrex my approval is conditional on the CI fully passing. Looks like you'll need to manually rebase. |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
e14cd11
to
c4eff75
Compare
@pytorchbot merge |
Merge failedReason: Approvers from one of the following sets are needed:
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Reopen due to rebase error. Fixes pytorch#117599 The reported hang test : `test_cuda.py::TestCuda::test_grad_scaling_autocast_fused_optimizers` is passing with this PR HSA Async copy / host wait on completion signal is resolved in MultiTensorApply.cuh ``` :4:command.cpp :347 : 8881368803196 us: [pid:1268211 tid:0x7f5af80d7180] Command (InternalMarker) enqueued: 0xc4e2070 :4:rocvirtual.cpp :556 : 8881368803201 us: [pid:1268211 tid:0x7f5af80d7180] Host wait on completion_signal=0x7f5967df3e00 :3:rocvirtual.hpp :66 : 8881368803209 us: [pid:1268211 tid:0x7f5af80d7180] Host active wait for Signal = (0x7f5967df3e00) for -1 ns ``` Pull Request resolved: pytorch#125456 Approved by: https://github.com/jeffdaily, https://github.com/eqy, https://github.com/janeyx99
Reopen due to rebase error. Fixes #117599
The reported hang test :
test_cuda.py::TestCuda::test_grad_scaling_autocast_fused_optimizers
is passing with this PRHSA Async copy / host wait on completion signal is resolved in MultiTensorApply.cuh
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang