Skip to content

Conversation

@bader
Copy link
Contributor

@bader bader commented Nov 7, 2025

Some compile time properties work as a replacement for kernel
attributes. For example, work_group_size semantics must be identical to
sycl::reqd_work_group_size kernel attribute. The problem is kernel
attributes are lowered to LLVM metadata by Clang, but work_group_size
represented as an LLVM attribute.
CompileTimePropertiesPass converts attribute to canonical metadata
representation, but does it late in the opimization pipeline.
This patch moves CompileTimePropertiesPass to the beginning of the
optimization pipeline to keep canonical representation for SYCL kernel
attributes information passes via compile-time properties.

Some compile time properties work as a replacement for kernel
attributes. For example, work_group_size semantics must be identical to
sycl::reqd_work_group_size kernel attribute. The problem is kernel
attributes are lowered to LLVM metadata by Clang, but work_group_size
represented as an LLVM attribute.
CompileTimePropertiesPass converts attribute to canonical metadata
representation, but does it late in the opimization pipeline.
This patch moves CompileTimePropertiesPass to the beginning of the
optimization pipeline to keep canonical representation for SYCL kernel
attributes information passes via compile-time properties.
@bader bader requested review from a team as code owners November 7, 2025 23:23
@bader bader requested a review from steffenlarsen November 7, 2025 23:24
@steffenlarsen
Copy link
Contributor

Discussed shortly offline. It looks like some of the metadata gets lost if added early. An option could be to run the pass twice and separate the transformations into early and late transformations. The kernel properties, i.e. the ones that have OpenCL parallels (for example sycl-work-group-size -> reqd_work_group_size) could be done early while the rest could be done in the slot we apply it today.

@elizabethandrews
Copy link
Contributor

Out of curiosity how/why is metadata lost if pass is earlier?

@steffenlarsen
Copy link
Contributor

Out of curiosity how/why is metadata lost if pass is earlier?

Seemingly it affects the cache control properties the worst. Looks to me like the !spirv.Decorations metadata on the loads and stores are lost if it is added early, which is probably not very surprising. Kernel functions are unlikely to transform drastically, but pointer loads and stores are likely the target of a lot of transformations, and since metadata isn't guaranteed to be preserved it is likely lost along some of those.

@bader
Copy link
Contributor Author

bader commented Nov 20, 2025

@aratajew, can we change the cache control LLVM representation from instruction metadata to something that can survive LLVM optimizations (e.g. llvm.ptr.annotation)?

Today, SYCL compiler emits llvm.ptr.annotation intrinsic annotating pointer with cache control hints, which works well for that purpose, but we have to maintain another LLVM pass to convert the intrinsic to SPIR-V metadata with the restriction to run this pass as close to SPIR-V CodeGen (or LLVM-SPIRV-Translator) as possible. I would prefer to have a single representation in LLVM for cache control hints respected by the LLVM optimizations.

SYCL compiler choice of llvm.ptr.annotation intrinsic might not be the best solution due to semantic definition (https://llvm.org/docs/LangRef.html#llvm-ptr-annotation-intrinsic):

Semantics:
This intrinsic allows annotation of a pointer to an integer with arbitrary strings. This can be useful for special purpose optimizations that want to look for these annotations. These have no other defined use; transformations preserve annotations on a best-effort basis but are allowed to replace the intrinsic with its first argument without breaking semantics and the intrinsic is completely dropped during instruction selection.

So far, all standard LLVM passes seem to do a good job with preserving annotations. Better alternatives are welcome.

@bader
Copy link
Contributor Author

bader commented Nov 21, 2025

An option could be to run the pass twice and separate the transformations into early and late transformations.

@steffenlarsen, I implemented this suggestion in the 2c8d041.

Copy link
Contributor

@maarquitos14 maarquitos14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, otherwise LGTM.

@aratajew
Copy link
Contributor

@aratajew, can we change the cache control LLVM representation from instruction metadata to something that can survive LLVM optimizations (e.g. llvm.ptr.annotation)?

Do you generate cache control metadata attached to the pointer used by load/store instruction, or to the load/store instruction itself? The Khronos SPIRV-LLVM Translator initially supported only the former approach, which was indeed very prone to being optimized out. However, when the Triton Compiler faced this issue, the solution was this change: KhronosGroup/SPIRV-LLVM-Translator#2587. This change allows cache control metadata to be generated directly on a load/store instruction. The SPIRV-LLVM Translator then automatically generates a dummy GEP and reattaches the metadata to it, ensuring that proper SPIR-V can be generated.

class CompileTimePropertiesPass
: public PassInfoMixin<CompileTimePropertiesPass> {
public:
CompileTimePropertiesPass(bool ConvertCacheControls = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; I think I would have preferred to call it something like EarlyRun or EarlyPass, so if other features are added that care whether it's transformed early or late, the bool still represents their needs.

@bader
Copy link
Contributor Author

bader commented Nov 21, 2025

Do you generate cache control metadata attached to the pointer used by load/store instruction, or to the load/store instruction itself?

We generate cache control metadata attached to the instruction emitting the pointer argument for load/store instruction.

The Khronos SPIRV-LLVM Translator initially supported only the former approach, which was indeed very prone to being optimized out. However, when the Triton Compiler faced this issue, the solution was this change: KhronosGroup/SPIRV-LLVM-Translator#2587. This change allows cache control metadata to be generated directly on a load/store instruction. The SPIRV-LLVM Translator then automatically generates a dummy GEP and reattaches the metadata to it, ensuring that proper SPIR-V can be generated.

@aratajew, thanks for the hint! Let me try it out.

@steffenlarsen, @maarquitos14, @elizabethandrews, for the reviews. I really don't like the current solution with running the pass twice, so I'm going to try attaching the metadata to load/store instructions. If it doesn't work, I'll address your comments for the current patch.

Ideally, I would like Clang's IRGen to emit SPIR-V metadata for cache controls and avoid using CompileTimeProperties pass for that.

@bader bader requested a review from a team as a code owner November 22, 2025 00:25
@bader bader requested a review from sergey-semenov November 22, 2025 00:25

// CHECK: spir_kernel{{.*}}cache_control_read_hint_func
// CHECK: {{.*}}addrspacecast ptr addrspace(1){{.*}}!spirv.Decorations [[RHINT:.*]]
// CHECK: store float 5.500000e+01, ptr addrspace(1) %{{.*}}, !spirv.Decorations ![[RHINT:[0-9]+]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: There is a bug in the test, which I reported here: #20718.
RHINT must be applied to load instructions.

@bader
Copy link
Contributor Author

bader commented Nov 22, 2025

@aratajew, thanks again! New approach works like a charm. The metadata attached to the load/store instructions survives optimization. The least for the test we have in our pre-commit.

@steffenlarsen, I'm not sure if cache control hints feature is covered well. I updated the test checking LLVM IR, but it would be nice to check that hints are applied correctly at SPIR-V level as well. Do we have such tests?

@steffenlarsen
Copy link
Contributor

@steffenlarsen, I'm not sure if cache control hints feature is covered well. I updated the test checking LLVM IR, but it would be nice to check that hints are applied correctly at SPIR-V level as well. Do we have such tests?

https://github.com/KhronosGroup/SPIRV-LLVM-Translator/tree/2f2a95e686e72ec77e6d0dfbf22413cf46c0e338/test/extensions/INTEL/SPV_INTEL_cache_controls has some tests related to the SPIR-V code generation. Is this what you had in mind?

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the changes here, we may need to make sure that the SPIR-V translator is ready to make the proper conversions from the load/store instructions rather than the GEPs. Based on the testing in test/extensions/INTEL/SPV_INTEL_cache_controls it doesn't look like we have testing for such a case.

Tag @MrSidims & @maarquitos14

Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>
@bader
Copy link
Contributor Author

bader commented Nov 24, 2025

@steffenlarsen, I'm not sure if cache control hints feature is covered well. I updated the test checking LLVM IR, but it would be nice to check that hints are applied correctly at SPIR-V level as well. Do we have such tests?

https://github.com/KhronosGroup/SPIRV-LLVM-Translator/tree/2f2a95e686e72ec77e6d0dfbf22413cf46c0e338/test/extensions/INTEL/SPV_INTEL_cache_controls has some tests related to the SPIR-V code generation. Is this what you had in mind?

This is a bare minimum. In addition to these, it might be worth adding a test checking the SPIR-V emitted from SYCL sources. What do you think?

@steffenlarsen
Copy link
Contributor

This is a bare minimum. In addition to these, it might be worth adding a test checking the SPIR-V emitted from SYCL sources. What do you think?

Off the top of my head, it's not something we usually do. Typically we would check the resulting LLVM-IR, then we could have LLVM SPIR-V translator tests that check that uses the LLVM-IR output from the SYCL tests as the input of their tests. I personally like that structure as it separates the responsibilities of the tooling.

@bader
Copy link
Contributor Author

bader commented Nov 24, 2025

This is a bare minimum. In addition to these, it might be worth adding a test checking the SPIR-V emitted from SYCL sources. What do you think?

Off the top of my head, it's not something we usually do. Typically we would check the resulting LLVM-IR, then we could have LLVM SPIR-V translator tests that check that uses the LLVM-IR output from the SYCL tests as the input of their tests. I personally like that structure as it separates the responsibilities of the tooling.

I'm okay with that. If there are no objections, this patch should be ready for merge.
As you mentioned in this comment, SPIR-V translator part is not covered. I rely on @maarquitos14 and/or @MrSidims to add the translator part when they back to work.

I hope integration is validated by the end-to-end tests.

@bader
Copy link
Contributor Author

bader commented Nov 25, 2025

The pre-commit failure is not related to the patch and tracked by #20750.

@bader bader merged commit 02d8168 into intel:sycl Nov 25, 2025
28 of 29 checks passed
@bader bader deleted the compile-time-properties branch November 25, 2025 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants