Skip to content

[NATIVECPU] Emit Native CPU properties #19429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: sycl
Choose a base branch
from

Conversation

uwedolinsky
Copy link
Contributor

@uwedolinsky uwedolinsky commented Jul 14, 2025

Extends clang-offload-wrapper and SYCLOffloadWrapper(clang-linker-wrapper), enabling adding kernel properties that are specific only to Native CPU. Adds a compiler pass that checks whether a kernel comes from a sycl::nd-range and adds a Native CPU - only property for it.

This new "kernel property infrastructure" will be extended in the future to encode other kernel capabilities. For example the applied vector width which determines how many workgroups could be executed in one kernel invocation. This could help making the kernel launches more efficient. Another property could encode whether the kernel supports peeling - without peeling the kernel would have less branches and the NativeCPU adapter could schedule peeling invocations (of the scalar kernel) in separate threads which might benefit some performance scenario.

This PR replaces and extends #16152 which still used the old UR api/repo.

This PR also adds testing to ensure the new clang-linker-wrapper integration produces the expected IR on SYCL code.

Update: Since #19550 got merged which removed the unsafe optimization this PR was also trying to make safe, the scope of this PR has now been reduced to emitting properties. The optimization (processing more work items per kernel invocation) removed by #19550 will be added back in a separate PR.

@uwedolinsky uwedolinsky requested review from a team as code owners July 14, 2025 12:38
Comment on lines 1225 to 1226
if (LangOpts.SYCLIsNativeCPU)
llvm::sycl::utils::addSYCLNativeCPUEarlyPasses(MPM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed internally, ideally we'd get rid of both LangOpts.SYCLIsNativeCPU and SYCLNativeCPUBackend and check what target we're building for, but this is blocked on #19344, without that we don't have enough information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #19344 is now merged I've updated the PR and removed the usage of LangOpts.SYCLIsNativeCPU.

@@ -149,7 +149,8 @@ UR_APIEXPORT ur_result_t UR_APICALL urEnqueueKernelLaunch(
bool isLocalSizeOne =
ndr.LocalSize[0] == 1 && ndr.LocalSize[1] == 1 && ndr.LocalSize[2] == 1;
if (isLocalSizeOne && ndr.GlobalSize[0] > numParallelThreads &&
!kernel->hasLocalArgs()) {
!kernel->hasLocalArgs() && !hKernel->isNDRangeKernel()) {
// TODO: Check if !kernel->hasLocalArgs() is needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made the logic say that if it has local args, it's automatically necessary an NDRangeKernel. I'm not sure yet whether that's 100% accurate, but if it is, hasLocalArgs() does become redundant.

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes in BackendUtil.cpp look good to me.

@uwedolinsky uwedolinsky changed the title [NATIVECPU] Emit Native CPU properties (correctness) [NATIVECPU] Emit Native CPU properties Aug 7, 2025
Copy link
Contributor

@maksimsab maksimsab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just noticed that the functionality is duplicated amongst ClangOffloadWrapper.cpp and SYCLOffloadWrapper.cpp. Is that possible to make only one function and use that function in both files?

@uwedolinsky
Copy link
Contributor Author

I've just noticed that the functionality is duplicated amongst ClangOffloadWrapper.cpp and SYCLOffloadWrapper.cpp. Is that possible to make only one function and use that function in both files?

Both wrappers now call the same function for NativeCPU. More (non-NativeCPU) functions could probably be shared between the wrappers if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants