-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pytorch_windows_vs2019_py36_cuda10.1_test1 started to fail frequently, which doesn't look like a regression specific to a particular PR #49558
Comments
Failure often reproduces if one to run InvalidGradients and DeepRenentrant back to back:
|
DeepReentrant always fails in WinDbg:
|
If the stack is to believed, the logic related is somewhere around here: pytorch/torch/csrc/autograd/engine.cpp Line 946 in d20483a
|
@mszhanyi Anything CUDA related is an unlikely culprit, as same pattern of failures happen on CPU see https://app.circleci.com/pipelines/github/pytorch/pytorch/252793/workflows/21c2af00-79d0-46b4-b7a7-85b9a6463c66/jobs/9719246 for example. |
I'm trying to debug this, but I am having trouble with basic things like running test_api.exe (fails with missing c10.dll on the command line, seems to do nothing in windbg. Is there a doc on how to get this running? |
@swolchok , just copy it along with OpenMP library into torch/lib folder and run from there |
This is likely due to #49359: #49359 (comment) |
That linked explanation explains the DeepReentrant failure, but what about the other test failures? No idea what's going on there. |
@swolchok What do you refer to "other" tests here? If you are talking about the failed tests except |
Irrelevant as we've stopped building 10.1 a while back, closing |
See https://app.circleci.com/pipelines/github/pytorch/pytorch/252588/workflows/8cc3f8b0-41c4-4ca6-a91c-bd68f912ee4b/jobs/9708630 for example
cc @ezyang @gchanan @zou3519 @bdhirsh @seemethere @malfet @walterddr @pytorch/pytorch-dev-infra @peterjc123 @maxluk @nbcsm @guyang3532 @gunandrose4u @mszhanyi @skyline75489
The text was updated successfully, but these errors were encountered: