Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure in PyTorch Windows CI doesn't terminate early enough #4990

Closed
ezyang opened this issue Feb 1, 2018 · 4 comments · Fixed by #8277
Closed

Build failure in PyTorch Windows CI doesn't terminate early enough #4990

ezyang opened this issue Feb 1, 2018 · 4 comments · Fixed by #8277
Assignees

Comments

@ezyang
Copy link
Contributor

ezyang commented Feb 1, 2018

In this log: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-build/1241/consoleFull

The true failure is:

C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-build\aten\src\ATen\native\cudnn\Conv.cpp(1207): error C2398: Element '2': conversion from 'int64_t' to '::size_t' requires a narrowing conversion

However, there are many many pages of scrollback after this, and eventually the thing that terminates the build is:

ninja: error: 'torch/lib/tmp_install/share/ATen/Declarations.yaml', needed by 'torch/csrc/autograd/generated/Functions.cpp', missing and no known rule to make it
Command '['C:\\Jenkins\\workspace\\pytorch-builds\\pytorch-win-ws2016-cuda9-cudnn7-py3-build\\Miniconda3\\lib\\site-packages\\ninja\\data\\bin\\ninja', '-f', 'build\\build.global.ninja']' returned non-zero exit status 1.
-- Building with NumPy bindings
-- Detected cuDNN at C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64\cudnn.lib, C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0\include
-- Detected CUDA at C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0
-- Not using NCCL
-- Building without distributed package
Build step 'Execute shell' marked build as failure
Finished: FAILURE

This is not very user friendly.

@peterjc123
Copy link
Collaborator

peterjc123 commented Feb 2, 2018

There's an option in ninja that controls this, which is -k N that means keeps going until N jobs fail. But presently k is already set to 1. There's two possible explanations. The first one is that the error during compilation may not be seen as a build failure. The second one is that it might be thrown at the project level. It doesn't stop until the project has completed. We have to further look into this.

@andrewssobral
Copy link

Same problem for me... =/

running build_ext
-- Building with NumPy bindings
-- Detected cuDNN at C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0\lib/x64\cudnn.lib, C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0\include
-- Detected CUDA at C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0
-- Not using NCCL
-- Building without distributed package
ninja: error: 'torch/lib/tmp_install/share/ATen/Declarations.yaml', needed by 'torch/csrc/autograd/generated/Functions.cpp', missing and no known rule to make it
Command '['C:\\Anaconda2\\envs\\pytorch\\lib\\site-packages\\ninja\\data\\bin\\ninja', '-f', 'build\\build.global.ninja']' returned non-zero exit status 1.

(pytorch) E:\GitHub\pytorch>

I followed https://gist.github.com/peterjc123/a4ac6ce4b0ed4b1b497334baaeb595e3#file-build-ps1-L52 with the following parameters:

set CMAKE_GENERATOR=Visual Studio 15 2017 Win64
set PYTHON_VERSION=3.6
set DISTUTILS_USE_SDK=1

peterjc123 added a commit to peterjc123/pytorch that referenced this issue Feb 11, 2018
@peterjc123
Copy link
Collaborator

peterjc123 commented Feb 11, 2018

PR #5175 partially resolves this. However, it is currently not possible to stop a build process within a project.

@ezyang
Copy link
Contributor Author

ezyang commented Apr 25, 2018

This happened again for me.

If you ever see that the last line is that linking failed because foo.obj was not available, check and see if compilation of that object failed earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants