-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DISABLED test_2d_fsdp_tp_ac_compile (__main__.TestDTensorCompileE2E) #113781
Comments
Hello there! From the DISABLED prefix in this issue title, it looks like you are attempting to disable a test in PyTorch CI. The information I have parsed is below:
Within ~15 minutes, To modify the platforms list, please include a line in the issue body, like below. The default action will disable the test for all platforms if no platforms list is specified.
We currently support the following platforms: asan, dynamo, inductor, linux, mac, macos, rocm, slow, win, windows. |
Reopen this issue because the test starts failing on multigpu after https://hud.pytorch.org/pytorch/pytorch/commit/3e49621f3b4652b8e7782aa8dafb28f9d985598b. The error looks relevant cc @awgu |
@huydhn I am working on a fix! |
This is a forward fix for #113781. We lazily compute the hash so that we do not try to compute the hash on `SymInt`s (for the stride) during Dynamo tracing. Tested via: ``` python test/distributed/_tensor/test_dtensor_compile.py -k test_2d_fsdp_tp_ac_compile ``` Pull Request resolved: #114322 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930, #114141, #113915, #114140
Closing this as it should be fixed now. |
Platforms: linux
Broken on multigpu
To reenable on your PR, put
Fixes #<this issue number>
in the PR body and add theciflow/periodic
tag to trigger multigpuProbably caused by #113547 or something in its stack @wanchaol do you mind providing a forward fix?
First known bad: https://hud.pytorch.org/pytorch/pytorch/commit/93372455a73043332c16a71cb9dccdf3e0412a57
Last known good: https://hud.pytorch.org/pytorch/pytorch/commit/a1e3c501652101e8b37baac62216db7ca22c9923
Ex. https://github.com/pytorch/pytorch/actions/runs/6863856295/job/18665805628
This test was disabled because it is failing on main branch (recent examples).
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519
The text was updated successfully, but these errors were encountered: