Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Generalize local accessor to shared mem pass #5149

Merged
merged 8 commits into from Jan 20, 2022

Conversation

jchlanda
Copy link
Contributor

Now, that it lives in SYCLLowerIR it can be easily shared between AMDGCN and NVPTX backends.

This requires the same alignment fix as for Cuda, see: #5113

Fixes #5013

@jchlanda
Copy link
Contributor Author

Looks like the pass is a bit too eager and runs on all amdgcn amdhsa kernels, moving it to WIP, till I update it to be only triggered on SYCL kernels.

@jchlanda jchlanda marked this pull request as draft December 15, 2021 12:20
@jchlanda jchlanda changed the title [SYCL] Generalize local accessor to shared mem pass [WIP] [SYCL] Generalize local accessor to shared mem pass Dec 15, 2021
@jchlanda jchlanda changed the title [WIP] [SYCL] Generalize local accessor to shared mem pass [SYCL] Generalize local accessor to shared mem pass Dec 17, 2021
@jchlanda jchlanda marked this pull request as ready for review December 17, 2021 08:43
@jchlanda
Copy link
Contributor Author

I decided to hide the pass behind sycl-enable-local-accessor option, as there is no way for AMDGCN backend to distinguish if the kernel comes from SYCL. The same is true for NVPTX, but the code gets away with using triple's getOS() == Tripel::CUDA.

@jchlanda jchlanda requested review from a team as code owners January 3, 2022 18:57
AlexeySachkov
AlexeySachkov previously approved these changes Jan 10, 2022
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass changes LGTM, a couple of minor comments about tests

…-triple.ll

Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
AlexeySachkov
AlexeySachkov previously approved these changes Jan 12, 2022
@bader
Copy link
Contributor

bader commented Jan 13, 2022

@intel/dpcpp-clang-driver-reviewers, @intel/llvm-reviewers-runtime, ping.

alexbatashev
alexbatashev previously approved these changes Jan 14, 2022
Copy link
Contributor

@alexbatashev alexbatashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime changes LGTM

@bader
Copy link
Contributor

bader commented Jan 18, 2022

@intel/dpcpp-clang-driver-reviewers, ping.

Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for Driver

Copy link
Contributor

@alexbatashev alexbatashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime changes lgtm

@bader bader merged commit 58508ba into intel:sycl Jan 20, 2022
@@ -5787,6 +5787,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-treat-scalable-fixed-error-as-warning");
}

// Enable local accessor to shared memory pass for SYCL.
if (isa<BackendJobAction>(JA) && IsSYCL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdtoguchi, shouldn't the condition check IsSYCLOffloadDevice instead of IsSYCL?
abi/user_mangling.cpp and regression/fsycl-save-temps.cpp from check-sycl suite fails on my system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. If this is to be only set for device, then this should be using IsSYCLOffloadDevice, as IsSYCL is for all compilations (host and device) when SYCL device offloading is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get following error:

clang (LLVM option parsing): Unknown command line argument '-sycl-enable-local-accessor'. Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--enable-local-reassign'?

Honestly, I don't really understand why the options is not visible for FE in host mode, but using IsSYCLOffloadDevice fixed the problem and I don't think running the pass enabled by -sycl-enable-local-accessor is needed in the host mode.
Another mystery is why this issue is not exposed by CI system. I suppose it some how related to the difference in cmake configuration - I don't build NVPTX and AMDGPU targets, which I suppose link the library with "unknown" option.

We definitely need to do more investigation on this issue.

@jchlanda, FYI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional problem: -sycl-enable-local-accessor is only being set when we do some kind of code generation step. As the device compilation does not do this, -sycl-enable-local-accessor is never set for device. It is only being emitted for host compilations as that goes through the assembling step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at a high level, -sycl-enable-local-accessor does only emit for the code generation step with the nvptx64 target. The steps are not combined for nvptx64 allowing the option to only be emitted for device there. Kind of a round-about way to restrict the option, but it can leak out to host if -S is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, @jchlanda is away this week.
@AerialMantis, can someone else to take a look into this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I'll have someone else take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PR up to fix this #5408
I believe the CI did not catch this as it builds for both cuda and hip then runs for each backend, correct me if I am wrong. If you build for CUDA or HIP then -sycl-enable-local-accessor is then usable.
I think that -sycl-enable-local-accessor is not available when not building for CUDA/HIP because the pass is initialised within the llvm NVPTX and AMDGPU backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost glad that this bug surfaced @bader , @mdtoguchi . TBH, -sycl-enable-local-accessor is a workaround, that I never liked. The problem I had was that there wasn't a way to tell that a kernel was compiled from SYCL. Simply relying on the calling convention (CallingConv::AMDGPU_KERNEL) is not enough, as there are multiple paths that use it (OpenCL, OpenMP, SYCL). I was wondering if it would be better to follow NVIDIA here and use metadata nodes to denote kernels (https://github.com/intel/llvm/blob/HEAD/clang/lib/CodeGen/TargetInfo.cpp#L7242). This would work for all the passes that we only want to run on SYCL kernels (for instance we'd like to generalize https://github.com/intel/llvm/blob/sycl/llvm/lib/Target/NVPTX/SYCL/GlobalOffset.cpp would benefit from it).

bader pushed a commit that referenced this pull request Jan 28, 2022
This PR resolves an issue raised in #5149.
It changes the the check from `IsSYCL` to `IsSYCLOffloadDevice` and limits its usage to nvptx and amdgcn.
A new test is added, to prevent regression.
pvchupin pushed a commit that referenced this pull request May 5, 2022
The purpose of this patch is to generalize SYCL global offset pass and enable it for AMDGPU.

* enable global offset in AMD's HIP
* decorate SYCL kernel with dedicated MDNode: This removes the need for command line options added by the SYCL driver, discussed here: [SYCL] Generalize local accessor to shared mem pass #5149 (comment)
* extract common helpers for local accessor and global offset passes
* generalize the pass
* introduce builtin_amdgcn_implicit_offset and enable the pass for ADMGPU
* implement spirv_GlobalOffset_[x,y,z]
* update the docs

The main deviation from the NVPTX is the need for supporting address spaces. For AMD kernel arguments reside in constant address space, which for the case with offset forces a copy to private AS, in order to keep the call-graph interface coherent (we can't allocate const address space for the case without offset).

Corresponding test-suit PR: intel/llvm-test-suite#941
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SYCL][HIP] sssp results mismatch
8 participants