Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: Add note about recovering from OOM #35214

Closed
wants to merge 1 commit into from

Conversation

peterbell10
Copy link
Collaborator

Closes #18853

This documents the workaround needed to solve the issues in #18853

@dr-ci
Copy link

dr-ci bot commented Mar 23, 2020

💊 CircleCI build failures summary and remediations

As of commit 6637049 (more details on the Dr. CI page):


  • 5/5 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_xenial_py3_6_clang7_build (1/1)

Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 23 16:53:06 torch_xla/csrc/aten_xla_type_default.cpp:11446:8: error: no matching member function for call to 'impl_unboxedOnlyKernel'
Mar 23 16:53:06       ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Mar 23 16:53:06 /opt/conda/lib/python3.6/site-packages/torch/include/ATen/core/op_registration/op_registration.h:294:74: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'kernel_func' 
Mar 23 16:53:06     std::enable_if_t<guts::is_function_type<FuncType>::value, Options&&> impl_unboxedOnlyKernel(DispatchKey dispatch_key) && { 
Mar 23 16:53:06                                                                          ^ 
Mar 23 16:53:06 torch_xla/csrc/aten_xla_type_default.cpp:11185:8: error: no matching member function for call to 'impl_unboxedOnlyKernel' 
Mar 23 16:53:06       .impl_unboxedOnlyKernel<at::Tensor &(at::Tensor &, double, double, at::Generator *), &AtenXlaType::normal_>(at::DispatchKey::XLATensorId) 
Mar 23 16:53:06       ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Mar 23 16:53:06 /opt/conda/lib/python3.6/site-packages/torch/include/ATen/core/op_registration/op_registration.h:294:74: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'kernel_func' 
Mar 23 16:53:06     std::enable_if_t<guts::is_function_type<FuncType>::value, Options&&> impl_unboxedOnlyKernel(DispatchKey dispatch_key) && { 
Mar 23 16:53:06                                                                          ^ 
Mar 23 16:53:06 torch_xla/csrc/aten_xla_type_default.cpp:11446:8: error: no matching member function for call to 'impl_unboxedOnlyKernel' 
Mar 23 16:53:06       .impl_unboxedOnlyKernel<at::Tensor(const at::Tensor &, const at::Tensor &, at::Scalar, at::Scalar, bool, at::Generator *), &AtenXlaType::rrelu_with_noise>(at::DispatchKey::XLATensorId) 
Mar 23 16:53:06       ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Mar 23 16:53:06 /opt/conda/lib/python3.6/site-packages/torch/include/ATen/core/op_registration/op_registration.h:294:74: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'kernel_func' 
Mar 23 16:53:06     std::enable_if_t<guts::is_function_type<FuncType>::value, Options&&> impl_unboxedOnlyKernel(DispatchKey dispatch_key) && { 
Mar 23 16:53:06                                                                          ^ 
ackages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/batch_norm.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/batch_norm.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1 
Mar 23 16:53:13 11 errors generated. 
torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/layout_manager.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/layout_manager.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1 
on3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/view.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/view.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1 
te-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/nll_loss.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/nll_loss.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1 

4 failures not recognized by patterns:

Job Step Status
CircleCI binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build Checkout pytorch/builder repo
CircleCI binary_linux_manywheel_2_7mu_cpu_devtoolset7_build Checkout pytorch/builder repo
CircleCI caffe2_onnx_py2_gcc5_ubuntu16_04_build Build
CircleCI caffe2_onnx_main_py3_6_clang7_ubuntu16_04_build Build

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 2 times.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in bd0ef78.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent recovery from CUDA OOMs
4 participants