Skip to content

Conversation

@asudarsa
Copy link
Contributor

@asudarsa asudarsa commented Apr 27, 2024

Following are changes in this PR:

  1. Generate new SYCL device library files using new offload model
  2. Pass these new files to clang-linker-wrapper
  3. Avoid use of these files in old offload model
  4. Remove support for bundled objects in clang-linker-wrapper (This can be added back in a cleaner way if there is need for backward compatibility).

Thanks

@asudarsa asudarsa requested review from a team as code owners April 27, 2024 01:21
@asudarsa asudarsa requested a review from againull April 27, 2024 01:21
@asudarsa asudarsa marked this pull request as draft April 27, 2024 01:22
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All recent changes related to unbundling of device objects have been removed. Once we turn on the new offloading model by default, all library developers are expected to recompile their files using the new compiler. This proposal will be discussed with the 'SYCL upstreaming' team and we will add backward compatibility if required in a later PR.

@asudarsa
Copy link
Contributor Author

Test will be added in upcoming commit. Also existing tests might need to be fixed.

@asudarsa
Copy link
Contributor Author

This PR can be closed if #13524 goes in soon. Thanks

@asudarsa asudarsa temporarily deployed to WindowsCILock May 2, 2024 01:02 — with GitHub Actions Inactive
@asudarsa asudarsa temporarily deployed to WindowsCILock May 2, 2024 01:34 — with GitHub Actions Inactive
@maksimsab maksimsab self-requested a review May 2, 2024 13:41
@asudarsa asudarsa marked this pull request as ready for review May 2, 2024 13:51
asudarsa added 6 commits May 2, 2024 08:08
…s using new offload model

Following are changes in this PR:
1. Generate new SYCL device library files using new offload model
2. Pass these new files to clang-linker-wrapper
3. Avoid use of these files in old offload model
4. Remove support for bundled objects in clang-linker-wrapper

Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
… targets

Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
@asudarsa asudarsa force-pushed the build_sycl_device_libs_using_new_offload_model branch from fb0599b to ff0fd35 Compare May 2, 2024 15:20
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
break;
}

// Backend/Assemble actions are not used for the SYCL device side
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change (7883-7887) provided by @mdtoguchi. Thanks Mike. For SYCL offloading, we want -c compilation to NOT invoke backend tools. For new offloading model, we see that the backend tools are being called for -c compilation. This change prevents the backend tools from being invoked for -c compilation.

@asudarsa
Copy link
Contributor Author

asudarsa commented May 2, 2024

merge issues have been resolved. Most of the merge issues were caused when merging #13604
@mdtoguchi, can you please take a look and verify if the merging has happened correctly?

Thanks

@asudarsa
Copy link
Contributor Author

asudarsa commented May 2, 2024

@maksimsab

This is now ready for review.

Thanks

@asudarsa asudarsa temporarily deployed to WindowsCILock May 2, 2024 15:36 — with GitHub Actions Inactive
@asudarsa asudarsa temporarily deployed to WindowsCILock May 2, 2024 16:37 — with GitHub Actions Inactive
// RUN: | FileCheck -check-prefix WRAPPER_OPTIONS %s
// WRAPPER_OPTIONS: clang-linker-wrapper{{.*}} "--triple=spir64"
// WRAPPER_OPTIONS-SAME: "-sycl-device-libraries=libsycl-crt.bc,libsycl-complex.bc,libsycl-complex-fp64.bc,libsycl-cmath.bc,libsycl-cmath-fp64.bc,libsycl-imf.bc,libsycl-imf-fp64.bc,libsycl-imf-bf16.bc,libsycl-itt-user-wrappers.bc,libsycl-itt-compiler-wrappers.bc,libsycl-itt-stubs.bc"
// WRAPPER_OPTIONS-SAME: "-sycl-device-libraries=libsycl-crt.new.o,libsycl-complex.new.o,libsycl-complex-fp64.new.o,libsycl-cmath.new.o,libsycl-cmath-fp64.new.o,libsycl-imf.new.o,libsycl-imf-fp64.new.o,libsycl-imf-bf16.new.o,libsycl-itt-user-wrappers.new.o,libsycl-itt-compiler-wrappers.new.o,libsycl-itt-stubs.new.o"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me to understand this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I have a similar question. Arvind had mentioned supporting raw BC files in the linker wrapper was difficult so we were using fat objects, but in this test we are giving it raw BC files. Is that just a test thing and it wouldn't have actually worked with raw BC files and always required fat objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the older version with .bc files was a recent change introduced by @mdtoguchi when he added change to replace bundled fat devicelib object with .bc files. .bc files can be easily consumed in the old offloading model. However, .bc files cannot be consumed by the current flow in clang-linker-wrapper. It requires packaged fat object files. All .new.o files are devicelib bc files packaged and thus can be consumed easily by the clang-linker-wrapper.

if (IsNVPTX)
if (IsNVPTX) {
LibPostfix = ".cubin";
NewLibPostfix = ".new.cubin";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we miss tests with ".new.cubin".

Copy link
Contributor Author

@asudarsa asudarsa May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.cubin are intermediate filenames that are created in the old offloading driver flow. So, we cannot add tests for this. I will discuss with relevant experts on why we need this renaming.
Thanks

static bool IsSYCLDeviceLibObj(std::string ObjFilePath, bool isMSVCEnv) {
StringRef ObjFileName = llvm::sys::path::filename(ObjFilePath);
StringRef ObjSuffix = isMSVCEnv ? ".obj" : ".o";
StringRef NewObjSuffix = isMSVCEnv ? ".new.obj" : ".new.o";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we miss tests with ".new.obj".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me add them. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new test in upcoming commit. Thanks

for (auto &File : UnbundledDeviceLibFiles)
const llvm::Triple Triple(Args.getLastArgValue(OPT_triple_EQ));
SmallVector<std::string, 16> ExtractedDeviceLibFiles;
for (auto &File : DeviceLibFiles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this loop in a dedicated function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we would want to do that? I do not see any benefits as we do not need this functionality elsewhere....Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale was that the function name linkDevice suggests that some linking is happening here while this particular loop is performing some extraction rather than linking. In order to understand that a reader will have to make a cognitive effort to read the actual code.

Anyway, I don't insist that it should be a blocker.

Comment on lines +61 to +62
"-nocudalib"
"--cuda-gpu-arch=sm_50")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify what this change is about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For SYCL offloading, we generate .bc files for -c compilation for all JIT and AOT targets. For -c compilation with nvptx targets, there is a call to getTargetDeviceOptions(..) at some stage and this relies on an arch being present, thought this arch information is not used in the .bc file generation. We just set it to a dummy arch. This is done in our existing compilation flow as well.

This is some text from our doc:
"The CUDA backend should work on Windows or Linux operating systems with any GPU with compute capability (SM version) sm_50 or above. The default SM version for the NVIDIA CUDA backend is sm_50."

Thanks

@sarnex
Copy link
Contributor

sarnex commented May 3, 2024

Sorry, I didn't review yet but your previous comment says this PR is unneeded if #13524 goes in first which it did, so do you mind explaining what this PR is for given that #13524 is merged? Thanks a lot.

@asudarsa
Copy link
Contributor Author

asudarsa commented May 3, 2024

Sorry, I didn't review yet but your previous comment says this PR is unneeded if #13524 goes in first which it did, so do you mind explaining what this PR is for given that #13524 is merged? Thanks a lot.

Hi @sarnex

That's a good question. I had discussed this with Mike. Let me record it here. PR #13524 added .bc versions of the device library files. Current version of the clang-linker-wrapper is set up to consume 'packaged' device IR and it requires extra work to support unpackaged raw .bc files. Hence, PR #13524 does not help here.
I should have updated my comment accordingly, sorry and thanks for catching this.
Sincerely

@asudarsa
Copy link
Contributor Author

asudarsa commented May 3, 2024

This PR can be closed if #13524 goes in soon. Thanks

Please ignore comment. Reason: #13579 (comment)

Thanks.

Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libdevice/cmake/modules/SYCLLibdevice.cmake
Looks good to me.

@sarnex
Copy link
Contributor

sarnex commented May 3, 2024

@asudarsa Got it, thanks. I will try to review this on Monday.

@asudarsa asudarsa requested a review from sarnex May 3, 2024 19:36
Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mostly, just a few questions

DESTINATION ${install_dest_lib}
COMPONENT libsycldevice)

set(devicelib-obj-file-new-offload ${obj_new_offload_binary_dir}/${obj_filename}.${new-offload-lib-suffix})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we're generating fat objects because the offload packager isn't set up to take in raw BC files, but probably that is simplier in the long term. so I'm okay with generating fat objects for now but we should have an internal tracker to enhance clang-linker-wrapper to accept raw BC inputs and then we can simplify/remove some stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this. In my experience, I feel that it's a lot cleaner and in sync with community code to handle packaged files instead of bc files. I suppose there are pros and cons, This is something we can discuss further.

// RUN: | FileCheck -check-prefix WRAPPER_OPTIONS %s
// WRAPPER_OPTIONS: clang-linker-wrapper{{.*}} "--triple=spir64"
// WRAPPER_OPTIONS-SAME: "-sycl-device-libraries=libsycl-crt.bc,libsycl-complex.bc,libsycl-complex-fp64.bc,libsycl-cmath.bc,libsycl-cmath-fp64.bc,libsycl-imf.bc,libsycl-imf-fp64.bc,libsycl-imf-bf16.bc,libsycl-itt-user-wrappers.bc,libsycl-itt-compiler-wrappers.bc,libsycl-itt-stubs.bc"
// WRAPPER_OPTIONS-SAME: "-sycl-device-libraries=libsycl-crt.new.o,libsycl-complex.new.o,libsycl-complex-fp64.new.o,libsycl-cmath.new.o,libsycl-cmath-fp64.new.o,libsycl-imf.new.o,libsycl-imf-fp64.new.o,libsycl-imf-bf16.new.o,libsycl-itt-user-wrappers.new.o,libsycl-itt-compiler-wrappers.new.o,libsycl-itt-stubs.new.o"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I have a similar question. Arvind had mentioned supporting raw BC files in the linker wrapper was difficult so we were using fat objects, but in this test we are giving it raw BC files. Is that just a test thing and it wouldn't have actually worked with raw BC files and always required fat objects?

for (auto &File : InputFiles)
CmdArgs.push_back(File);
for (auto &File : InputFiles) {
auto IRFile = sycl::convertSPIRVToIR(File, Args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this line. Don't we need to check if it's a SPIR-V file? Or can we only hit this with SPV files and not with fat objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convertSPIRVToIR will handle cases where the file is not a SPIR-V file. it will return the original file if it's not a SPIR-V file.
However, I do think it should be possible to check here if the image type is SPIR-V. i will add a TODO to address this soon. Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we rename the function then, the name sounds like it only accepts SPV and always returns IR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a supporting comment? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment and a TODO to check for SPIR-V image type and exit early on top of function definition.
Thanks

Signed-off-by: Sudarsanam <arvind.sudarsanam@intel.com>
@asudarsa asudarsa requested review from maksimsab, mdtoguchi and sarnex May 6, 2024 18:27
Signed-off-by: Sudarsanam <arvind.sudarsanam@intel.com>
@asudarsa asudarsa temporarily deployed to WindowsCILock May 6, 2024 19:27 — with GitHub Actions Inactive
@asudarsa asudarsa temporarily deployed to WindowsCILock May 6, 2024 20:46 — with GitHub Actions Inactive
@hdelan
Copy link
Contributor

hdelan commented May 7, 2024

Compilation fails on this branch using:

$ clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_60 test.cpp --offload-new-driver 
clang++: /home/hugh/llvm/clang/lib/Driver/ToolChains/Cuda.cpp:951: virtual void clang::driver::toolchains::CudaToolChain::addClangTargetOptions(const llvm::opt::ArgList&, llvm::opt::ArgStringList&, clang::driver::Action::OffloadKind) const: Assertion `!GpuArch.empty() && "Must have an explicit GPU arch."' failed.
PLEASE submit a bug report to https://github.com/intel/llvm/issues and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_60 test.cpp --offload-new-driver
1.	Compilation construction
2.	Building compilation jobs
3.	Building compilation jobs
4.	Building compilation jobs
5.	Building compilation jobs
6.	Building compilation jobs
7.	Building compilation jobs
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang++   0x00005a943bf4c570 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 240
1  clang++   0x00005a943bf4997f llvm::sys::RunSignalHandlers() + 47
2  clang++   0x00005a943bf49ad5
3  libc.so.6 0x0000759597a42520
4  libc.so.6 0x0000759597a969fc pthread_kill + 300
5  libc.so.6 0x0000759597a42476 raise + 22
6  libc.so.6 0x0000759597a287f3 abort + 211
7  libc.so.6 0x0000759597a2871b
8  libc.so.6 0x0000759597a39e96
9  clang++   0x00005a943cc8d114
10 clang++   0x00005a943cc3c43c
11 clang++   0x00005a943cb78507 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 8167
12 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
13 clang++   0x00005a943cb7bad1
14 clang++   0x00005a943cb9dc34 clang::driver::OffloadAction::doOnEachDeviceDependence(llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)> const&) const + 100
15 clang++   0x00005a943cb766e7 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 455
16 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
17 clang++   0x00005a943cb77804 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 4836
18 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
19 clang++   0x00005a943cb7b5c0
20 clang++   0x00005a943cb9dc34 clang::driver::OffloadAction::doOnEachDeviceDependence(llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)> const&) const + 100
21 clang++   0x00005a943cb776ab clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 4491
22 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
23 clang++   0x00005a943cb77804 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 4836
24 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
25 clang++   0x00005a943cb7c343 clang::driver::Driver::BuildJobs(clang::driver::Compilation&) const + 1699
26 clang++   0x00005a943cb996c3 clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) + 6243
27 clang++   0x00005a943a8f6489 clang_main(int, char**, llvm::ToolContext const&) + 3673
28 clang++   0x00005a943a907f3b main + 107
29 libc.so.6 0x0000759597a29d90
30 libc.so.6 0x0000759597a29e40 __libc_start_main + 128
31 clang++   0x00005a943a8f2ae5 _start + 37
[1]    901760 IOT instruction (core dumped)  clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_60 test.cpp --offload-new-driver

@asudarsa
Copy link
Contributor Author

asudarsa commented May 7, 2024

Compilation fails on this branch using:

$ clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_60 test.cpp --offload-new-driver 
clang++: /home/hugh/llvm/clang/lib/Driver/ToolChains/Cuda.cpp:951: virtual void clang::driver::toolchains::CudaToolChain::addClangTargetOptions(const llvm::opt::ArgList&, llvm::opt::ArgStringList&, clang::driver::Action::OffloadKind) const: Assertion `!GpuArch.empty() && "Must have an explicit GPU arch."' failed.
PLEASE submit a bug report to https://github.com/intel/llvm/issues and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_60 test.cpp --offload-new-driver
1.	Compilation construction
2.	Building compilation jobs
3.	Building compilation jobs
4.	Building compilation jobs
5.	Building compilation jobs
6.	Building compilation jobs
7.	Building compilation jobs
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang++   0x00005a943bf4c570 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 240
1  clang++   0x00005a943bf4997f llvm::sys::RunSignalHandlers() + 47
2  clang++   0x00005a943bf49ad5
3  libc.so.6 0x0000759597a42520
4  libc.so.6 0x0000759597a969fc pthread_kill + 300
5  libc.so.6 0x0000759597a42476 raise + 22
6  libc.so.6 0x0000759597a287f3 abort + 211
7  libc.so.6 0x0000759597a2871b
8  libc.so.6 0x0000759597a39e96
9  clang++   0x00005a943cc8d114
10 clang++   0x00005a943cc3c43c
11 clang++   0x00005a943cb78507 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 8167
12 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
13 clang++   0x00005a943cb7bad1
14 clang++   0x00005a943cb9dc34 clang::driver::OffloadAction::doOnEachDeviceDependence(llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)> const&) const + 100
15 clang++   0x00005a943cb766e7 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 455
16 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
17 clang++   0x00005a943cb77804 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 4836
18 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
19 clang++   0x00005a943cb7b5c0
20 clang++   0x00005a943cb9dc34 clang::driver::OffloadAction::doOnEachDeviceDependence(llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)> const&) const + 100
21 clang++   0x00005a943cb776ab clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 4491
22 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
23 clang++   0x00005a943cb77804 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 4836
24 clang++   0x00005a943cb7b219 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::SmallVector<clang::driver::InputInfo, 4u>, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const, llvm::SmallVector<clang::driver::InputInfo, 4u>>>>&, clang::driver::Action::OffloadKind) const + 457
25 clang++   0x00005a943cb7c343 clang::driver::Driver::BuildJobs(clang::driver::Compilation&) const + 1699
26 clang++   0x00005a943cb996c3 clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) + 6243
27 clang++   0x00005a943a8f6489 clang_main(int, char**, llvm::ToolContext const&) + 3673
28 clang++   0x00005a943a907f3b main + 107
29 libc.so.6 0x0000759597a29d90
30 libc.so.6 0x0000759597a29e40 __libc_start_main + 128
31 clang++   0x00005a943a8f2ae5 _start + 37
[1]    901760 IOT instruction (core dumped)  clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_60 test.cpp --offload-new-driver

Hi @hdelan

thanks so much for taking a look. This is a known issue. We currently support only spir-v JIT targets for the new offload model. AOT support for Intel and non- Intel hardware will be added in a subsequent step. In the meanwhile, I can try to add a more graceful way to exit for such cases. In a separate PR hopefully.

thanks

Copy link
Contributor

@maksimsab maksimsab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some nits.
Apart from that, it looks good.

}
if (!CompatibleBinaryFound)
WithColor::warning(errs(), LinkerExecutable)
<< "Compatible SYCL device library binary not found\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.
According to LLVM style I would propose the following format:
not found SYCL device library binary compatible with @Triple, where @Triple is the function argument.

// LLVM-SPIRV is not called in dry-run
// CHK-CMDS-NEXT: offload-wrapper: input: [[LLVMSPIRVOUT:.*]].table, output: [[WRAPPEROUT:.*]].bc
// CHK-CMDS-NEXT: "{{.*}}llc.exe" -filetype=obj -o [[LLCOUT:.*]].o [[WRAPPEROUT]].bc
// CHK-CMDS-NEXT: "{{.*}}/ld" -- HOST_LINKER_FLAGS -dynamic-linker HOST_DYN_LIB -o a.out [[LLCOUT]].o HOST_LIB_PATH HOST_STAT_LIB {{.*}}test-sycl.o
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.
It seems for me that there is no ld linker present in the Windows environment. There is different linker specific to windows - link.exe. It is not relevant to the current PR.

for (auto &File : UnbundledDeviceLibFiles)
const llvm::Triple Triple(Args.getLastArgValue(OPT_triple_EQ));
SmallVector<std::string, 16> ExtractedDeviceLibFiles;
for (auto &File : DeviceLibFiles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale was that the function name linkDevice suggests that some linking is happening here while this particular loop is performing some extraction rather than linking. In order to understand that a reader will have to make a cognitive effort to read the actual code.

Anyway, I don't insist that it should be a blocker.

@asudarsa
Copy link
Contributor Author

asudarsa commented May 7, 2024

Hi @maksimsab

Thanks for the review.
I will add these minor changes in an upcoming PR.

@asudarsa
Copy link
Contributor Author

asudarsa commented May 7, 2024

Hi @intel/llvm-gatekeepers

This is ready for merge. Can you please take a look?

Thanks

@steffenlarsen steffenlarsen merged commit ece73ad into intel:sycl May 7, 2024
steffenlarsen pushed a commit that referenced this pull request May 8, 2024
…build option (#13692)

This bug was introduced in #13579
Unfortunately CI testing decided not to run cuda testing for the guilty
PR and the PR was merged. This PR resolves the issue.

Thanks

Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants