Skip to content

Conversation

@SimeonEhrig
Copy link
Contributor

The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.

Additional diagnostic

To find the bug, I also add some diagnostic functions for the PTX compiler.

  1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.

example

- example before:
error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.

- example after:
cling: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
cling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
cling-ptx: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
  1. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac1015/cxx17, windows10/cxx14
How to customize builds

@Axel-Naumann
Copy link
Member

Axel-Naumann commented Oct 30, 2020

@SimeonEhrig great, thanks! Could you check out https://github.com/root-project/roottest.git and modify the expected diagnostics (*.ref files in the corresponding directories)? You can create a PR for roottest, and if you use the same branch name as for this PR, this PR will pick it up when testing! Happy to explain offline if this is too complex / convoluted :-)

@SimeonEhrig
Copy link
Contributor Author

@SimeonEhrig great, thanks! Could you check out https://github.com/root-project/roottest.git and modify the expected diagnostics (*.ref files in the corresponding directories)? You can create a PR for roottest, and if you use the same branch name as for this PR, this PR will pick it up when testing! Happy to explain offline if this is too complex / convoluted :-)

I think I understand what you want. Unfortunately I have no idea what is causing the errors. The tests have nothing to do with my changes and I cannot reproduce the error on my system. I think the test scripts have problems catching the expected errors.

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1015/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on mac1015/cxx17.
Running on macitois17.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2020-12-04T09:00:41.423Z] CMake Error at cmake/modules/SearchInstalledSoftware.cmake:1616 (message):
  • [2020-12-04T09:00:41.424Z] CMake Error at /Volumes/HD2/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1077 (message):

DiagnosticOptions& DiagOpts = InvocationPtr->getDiagnosticOpts();
llvm::IntrusiveRefCntPtr<DiagnosticsEngine> Diags =
SetupDiagnostics(DiagOpts);
SetupDiagnostics(DiagOpts, COpts.CUDADevice ? "cling-ptx" : "cling");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet that this change is causing the test failures due to the output being different than expected. I am okay with showing cling as part of the diagnostic even for ROOT, but we will need to update the .ref files for the test suite. Can you take care of that, @SimeonEhrig or shall I?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I thought I implemented it in such a way that it doesn't affect the output when CUDA mode is not enabled to avoid this kind of problems with the CI, but that's wrong.

@Axel-Naumann
Can you please change the .ref files. I don`t have time to get involved at the moment. And please link the changes in this issue. Maybe then I understand how it works and can do it myself next time.

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-01-26T18:37:05.247Z] ghprbPullLongDescription=The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.\r\n\r\n# Additional diagnostic\r\n\r\nTo find the bug, I also add some diagnostic functions for the PTX compiler.\r\n\r\n1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.\r\n\r\nexample\r\n\r\n- example before:\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\nerror: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n- example after:\r\ncling: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\ncling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\ncling-ptx: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n\r\n2. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.
  • [2021-01-26T19:15:54.928Z] ghprbPullLongDescription=The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.\r\n\r\n# Additional diagnostic\r\n\r\nTo find the bug, I also add some diagnostic functions for the PTX compiler.\r\n\r\n1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.\r\n\r\nexample\r\n\r\n- example before:\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\nerror: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n- example after:\r\ncling: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\ncling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\ncling-ptx: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n\r\n2. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-02-04T17:31:23.761Z] ghprbPullLongDescription=The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.\r\n\r\n# Additional diagnostic\r\n\r\nTo find the bug, I also add some diagnostic functions for the PTX compiler.\r\n\r\n1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.\r\n\r\nexample\r\n\r\n- example before:\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\nerror: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n- example after:\r\ncling: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\ncling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\ncling-ptx: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n\r\n2. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.
  • [2021-02-04T18:11:39.761Z] ghprbPullLongDescription=The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.\r\n\r\n# Additional diagnostic\r\n\r\nTo find the bug, I also add some diagnostic functions for the PTX compiler.\r\n\r\n1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.\r\n\r\nexample\r\n\r\n- example before:\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\nerror: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n- example after:\r\ncling: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\ncling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\ncling-ptx: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n\r\n2. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.

@Axel-Naumann
Copy link
Member

The windows "error" is spurious, caused by the error log parser matching the commit log.

- add the prefix "cling" (normal interpreter error) or
"cling-ptx" (ptx interpreter -> just in CUDA mode) to every
interpreter error message

- example before:
error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.

- example after:
cling: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
cling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
cling-ptx: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
- make public that it is accessible via gCling object during the Cling
runtime.
- now the custom path of `--cuda-path` is correctly set in the ptx
compiler, allowing the use of CUDA SDK's which are not installed on
in the default location
…ult location

- To enable the CUDA test, lit detects the `libcudart.so` in
`LD_LIBRARY_PATH`. Now lit also set the CUDA SDK root of
`libcudart.so` as cling parameter (`--cuda-path`) in the tests.
- Pass through the environment variable `CUDA_VISIBLE_DEVICES`.
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-02-05T10:28:28.693Z] ghprbPullLongDescription=The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.\r\n\r\n# Additional diagnostic\r\n\r\nTo find the bug, I also add some diagnostic functions for the PTX compiler.\r\n\r\n1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.\r\n\r\nexample\r\n\r\n- example before:\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\nerror: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n- example after:\r\ncling: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\ncling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\ncling-ptx: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n\r\n2. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.
  • [2021-02-05T11:10:55.739Z] ghprbPullLongDescription=The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.\r\n\r\n# Additional diagnostic\r\n\r\nTo find the bug, I also add some diagnostic functions for the PTX compiler.\r\n\r\n1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.\r\n\r\nexample\r\n\r\n- example before:\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\nerror: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n- example after:\r\ncling: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\ncling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\ncling-ptx: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n\r\n2. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.

Copy link
Member

@Axel-Naumann Axel-Naumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That inline won't work, I meant to suggest to actually inline :-) See proposed changes.

Comment on lines 749 to 752
IncrementalCUDADeviceCompiler* Interpreter::getCUDACompiler() const {
return m_CUDACompiler.get();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IncrementalCUDADeviceCompiler* Interpreter::getCUDACompiler() const {
return m_CUDACompiler.get();
}

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

@phsft-bot
Copy link

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-02-05T15:51:10.498Z] ghprbPullLongDescription=The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.\r\n\r\n# Additional diagnostic\r\n\r\nTo find the bug, I also add some diagnostic functions for the PTX compiler.\r\n\r\n1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.\r\n\r\nexample\r\n\r\n- example before:\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\nerror: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n- example after:\r\ncling: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\ncling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\ncling-ptx: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n\r\n2. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.
  • [2021-02-05T16:35:12.117Z] ghprbPullLongDescription=The cling argument --cuda-path is necessary if the CUDA SDK is not installed under /usr/local/cuda, e.g. on HPC systems. The integration tests are also updated to handle a CUDA SDK under a non-standard location.\r\n\r\n# Additional diagnostic\r\n\r\nTo find the bug, I also add some diagnostic functions for the PTX compiler.\r\n\r\n1. Now error messages are prefixed in CUDA mode to better decide which of the two compiler pipelines is causing the error.\r\n\r\nexample\r\n\r\n- example before:\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\nerror: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\nerror: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n- example after:\r\ncling: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\ncling-ptx: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.\r\ncling-ptx: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.\r\n\r\n\r\n2. Now, the class cudaIncrementalDeviceCompiler available through reflection via the gCling object.

Failing tests:

@SimeonEhrig
Copy link
Contributor Author

@Axel-Naumann I applied your suggestions. I think, the CI is also fine. The MacOS 11 and CentOS 8 looks not related to my changes, and the Windows 10 job has the parser problems.

@Axel-Naumann
Copy link
Member

Great, please go ahead and hit "rebase and merge"!

@SimeonEhrig SimeonEhrig merged commit 962e763 into root-project:master Feb 10, 2021
@SimeonEhrig
Copy link
Contributor Author

Great, please go ahead and hit "rebase and merge"!

done 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants