Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle Clang-CUDA #2075

Merged
merged 8 commits into from Jan 6, 2022
Merged

handle Clang-CUDA #2075

merged 8 commits into from Jan 6, 2022

Conversation

fsb4000
Copy link
Contributor

@fsb4000 fsb4000 commented Jul 24, 2021

Fixes #1949
Fixes DevCom-1572308 / VSO-1441025 / AB#1441025

clang defines only __CUDACC__ but it doesn't define __CUDACC_VER_MAJOR__ ,__CUDACC_VER_MINOR__ ,__CUDACC_VER_BUILD__ at all.

Actually it can't pass our cuda test: https://github.com/microsoft/STL/blob/main/tests/std/tests/GH_000639_nvcc_include_all/test.compile.pass.cpp

Because clang adds the incudes: C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\Llvm\x64\lib\clang\12.0.0\include\cuda_wrappers

изображение

And they don't protect themselves from macroized "new":

// All STL headers should protect themselves from macroized new.
#pragma push_macro("new")
#undef new
#define new WILL NOT COMPILE

But it seems to be working:

#include <iostream>

__global__ void axpy(float a, float* x, float* y) {
  y[threadIdx.x] = a * x[threadIdx.x];
}

int main() {
  const int kDataLen = 4;

  float a = 2.0f;
  float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
  float host_y[kDataLen];

  // Copy input data to device.
  float* device_x;
  float* device_y;
  cudaMalloc(&device_x, kDataLen * sizeof(float));
  cudaMalloc(&device_y, kDataLen * sizeof(float));
  cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
             cudaMemcpyHostToDevice);

  // Launch the kernel.
  axpy<<<1, kDataLen>>>(a, device_x, device_y);

  // Copy output data to host.
  cudaDeviceSynchronize();
  cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
             cudaMemcpyDeviceToHost);

  // Print the results.
  for (int i = 0; i < kDataLen; ++i) {
    std::cout << "y[" << i << "] = " << host_y[i] << "\n";
  }

  std::cout << "_MSVC_STL_VERSION " << _MSVC_STL_VERSION  << std::endl;

  cudaDeviceReset();
}
clang++ test.cu -o test.exe --cuda-gpu-arch=sm_35 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64"  -lcudart -Wall -Wextra

изображение

@fsb4000 fsb4000 requested a review from a team as a code owner July 24, 2021 09:02
@StephanTLavavej StephanTLavavej added the bug Something isn't working label Jul 27, 2021
@StephanTLavavej StephanTLavavej added this to Initial Review in Code Reviews via automation Jul 27, 2021
@StephanTLavavej StephanTLavavej self-assigned this Aug 4, 2021
@fodinabor
Copy link

The fix works-for-me ™️
Should we add the new macro guards to the internal Clang headers to enable the STL test suite to move this forward?

@fsb4000
Copy link
Contributor Author

fsb4000 commented Sep 15, 2021

We can't add anything to clang headers. But I could add to __msvc_all_public_headers.hpp something like this :

#if !(defined( __CUDACC__ ) && defined(__clang__))
// All STL headers should protect themselves from macroized new.
#pragma push_macro("new")
#undef new
#define new WILL NOT COMPILE
#endif // !(defined( __CUDACC__ ) && defined(__clang__))

And enable the cuda test.

I will do after the commit will be merged: #2204

@fodinabor
Copy link

fodinabor commented Sep 15, 2021

Looks like a fine workaround, but I actually wondered, whether it would make sense that I create a PR for Clang to push/pop the new macro in Clang's new header :)
(won't be in a released Clang version before early 2022 then, though)

@mojca
Copy link

mojca commented Nov 25, 2021

Any chance to move this forward?
(I just accidentally reported a duplicate bug.)
The cited PR 2204 has been merged.

I didn't run the tests, so I didn't have any issues with __msvc_all_public_headers.hpp, but I need to keep adding -D_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH to compiler flags to work around the problem.

@Artem-B
Copy link

Artem-B commented Nov 30, 2021

We can't add anything to clang headers.

We certainly can. It may not help released versions of clang, but if reasonable change must be done to clang's headers in order to make it work better for CUDA compilation on windows going forward, I'm open to the idea.

@fsb4000
Copy link
Contributor Author

fsb4000 commented Dec 7, 2021

We can't add anything to clang headers.

We certainly can. It may not help released versions of clang, but if reasonable change must be done to clang's headers in order to make it work better for CUDA compilation on windows going forward, I'm open to the idea.

Stephan added the PR to "Nice to have": https://github.com/microsoft/STL/projects/6#card-74026742

And about the macro "new". As far as I understand: MFC changes "new" to a macro. So for MFC and stl to work together, stl must remove this macro and then restore.

But Clang doesn't work with MFC at the moment. https://developercommunity.visualstudio.com/t/ClangCL-and-MFC-Undefined-Behaviour/1541392

And maybe zero people use CUDA + MFC...

So I'm not sure if changes at libc++ headers are necessary to distract developers from their current problems.

@Artem-B
Copy link

Artem-B commented Dec 7, 2021

And about the macro "new". As far as I understand: MFC changes "new" to a macro. So for MFC and stl to work together, stl must remove this macro and then restore.

Adding the guard to cuda_wrappers when we're compiling on windows should be doable.

But Clang doesn't work with MFC at the moment. https://developercommunity.visualstudio.com/t/ClangCL-and-MFC-Undefined-Behaviour/1541392
And maybe zero people use CUDA + MFC

My bet is that the vast majority of CUDA code to be compiled with clang on windows initially will be TensorFlow, possibly followed by other ML toolkits. Those do care about a subset of the standard C++ library working, but not so much about MFC (unless microsoft's STL depends on it). As your example demonstrated, things do work in general. Support for the standard C++ library on the GPU is rather limited anyways cmath, complex, new, and whatever happens to be constexpr is pretty much the extent of it on the linux side. As long as roughly the same set works with MSFT's STL, I'd consider it to be good enough.

Here are the tests I'm running on LLVM's CUDA test bots: https://github.com/llvm/llvm-test-suite/tree/main/External/CUDA
Getting those to compile and work would get you most of the way to 'quite usable in practice' state.

stl/inc/yvals_core.h Outdated Show resolved Hide resolved
@StephanTLavavej
Copy link
Member

My apologies for taking so long to review this. I've pushed a comment whitespace change (plus a conflict-free merge with main since a test run is going to happen anyways). I performed an initial audit of the codebase for other Clang-CUDA interactions and filed #2430 to track that, but that doesn't need to block this focused PR from being merged.

@StephanTLavavej StephanTLavavej removed their assignment Dec 18, 2021
@StephanTLavavej StephanTLavavej moved this from Initial Review to Final Review in Code Reviews Dec 18, 2021
@CaseyCarter CaseyCarter moved this from Final Review to Ready To Merge in Code Reviews Dec 18, 2021
@CaseyCarter CaseyCarter self-assigned this Jan 6, 2022
which admits both clang-CUDA and `cl /showIncludes /D__CUDACC__` into the STL.
@CaseyCarter
Copy link
Member

I'm going to add this to the next batch of changes to merge - please notify me if any further commits are pushed.

@stl: I made a change after you approved. Normally, I would wait for reapproval to merge, but since this addresses a P0 bug I'm going to merge now and we can iron out any issues after you return from vacation.

@StephanTLavavej
Copy link
Member

@CaseyCarter LGTM. (BTW, I am @StephanTLavavej here.)

@CaseyCarter CaseyCarter merged commit 16251e3 into microsoft:main Jan 6, 2022
Code Reviews automation moved this from Ready To Merge to Done Jan 6, 2022
@CaseyCarter CaseyCarter removed their assignment Jan 6, 2022
@CaseyCarter
Copy link
Member

Thanks for ensuring that Clang could'a worked with CUDA!

@fsb4000 fsb4000 deleted the fix1949 branch January 6, 2022 07:00
nilsfriess added a commit to nilsfriess/OpenSYCL that referenced this pull request Mar 17, 2023
When compiling the tests with the CUDA backend, we're getting errors
that have been fixed in for Visual Studio 2022 here:
microsoft/STL#2075
illuhad pushed a commit to AdaptiveCpp/AdaptiveCpp that referenced this pull request Apr 27, 2023
* Rewrite github action file for windows

- Use windows-2019 image
- Use prebuilt LLVM 15
- Build boost in CI

* Add full path to clang

In the Github runner, another LLVM is installed in C:\Program Files,
we want to use the prebuilt one

* Remove continue-on-error

* Prepend path to prebuilt LLVM to PATH

* Comment out code that does not compile with LLVM13+

TODO: Check if this is still necessary and how to make it work with
LLVM13+

* Link against LLVMSupport on Windows

Required for symbols from LLVM CommandLine library

* Build in Release mode to match build type of LLVM

* Add step to build CPU tests

* Add missing cd command

* Use correct path to hipsycl-config.cmake

* Add missing quotes

* Run CPU tests

* Fix path

* Fix typo

* Use cmd instead of powershell

* Add path to install_dir/bin to PATH to make dlls findable

* Add missing backslash

* Add steps to install CUDA

* Fix typo in CUDA version

* Add step to build tests with CUDA backend

* Build CUDA tests in own directory

* Use different gencode for CUDA tests

* Fix typo in CUDA gencode

* Remove 10.2 from tested CUDA versions

* Only test CUDA 11.0 for now

* Use correct version when caching CUDA

* Switch to Windows Server 2022 as OS

When compiling the tests with the CUDA backend, we're getting errors
that have been fixed in for Visual Studio 2022 here:
microsoft/STL#2075

* Use different CUDA gencode again

* Switch from hipSYCL_DIR and HIPSYCL_TARGETS to OpenSYCL_DIR and OPENSYCL_TARGETS

* Fix path to OpenSYCL.config

* Add missing quotes

* Remove step to build CUDA tests

The prebuilt LLVM was not build with CUDA enabled

* Add steps to build with prebuilt clang 11

* Enable calling `setDeviceMangleContext` for LLVM 11 and 12

In fact, this is not available in LLVM 11 (only when patched with
https://reviews.llvm.org/D69322) but the prebuilt LLVM in CI is
exactly a patched LLVM 11.

* Remove clang11 from tested clang versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

<yvals_core.h>: CUDA version check doesn't handle Clang-CUDA
6 participants