[SYCL] Split lightweight detail headers out of umbrella paths#21762
Open
koparasy wants to merge 1 commit intointel:syclfrom
Open
[SYCL] Split lightweight detail headers out of umbrella paths#21762koparasy wants to merge 1 commit intointel:syclfrom
koparasy wants to merge 1 commit intointel:syclfrom
Conversation
koparasy
added a commit
to koparasy/llvm
that referenced
this pull request
Apr 14, 2026
Move Builder out of helpers.hpp into detail/builder.hpp:
- Extract the Builder class and declptr helper into a focused header
that declares only the forward types it actually needs (item, group,
h_item, id, nd_item, range). Device-side SPIR-V built-in access
is kept self-contained via spirv_vars.hpp.
Move SPIR-V fence helpers out of helpers.hpp into
detail/spirv_memory_semantics.hpp:
- getSPIRVMemorySemanticsMask (memory_order and fence_space overloads)
now lives in a header that only pulls spirv_types.hpp, access/access.hpp,
and memory_enums.hpp.
Make detail/helpers.hpp a thin forwarder:
- Include builder.hpp + spirv_memory_semantics.hpp.
- Retain get_or_store<T> and is_power_of_two in-place.
- Drop the forwarding class/enum declarations now in builder.hpp.
- All existing #include <sycl/detail/helpers.hpp> sites continue to
work without modification.
Split sycl/sub_group.hpp into focused detail headers:
- detail/sub_group_core.hpp: sub_group struct with lightweight query
API (get_local_id, get_group_id, leader, etc.) fully inline, plus
forward declarations for the deprecated load/store and barrier
members. Includes only spirv_vars.hpp, access/access.hpp, the narrow
fwd/multi_ptr.hpp forward header, id.hpp, range.hpp, and
memory_enums.hpp. No spirv_ops.hpp, no bit_cast, no generic_type_traits.
- detail/sub_group_extra.hpp: out-of-line definitions of the deprecated
barrier() and barrier(fence_space) methods. Includes spirv_ops.hpp
and spirv_memory_semantics.hpp.
- detail/sub_group_load_store.hpp: the detail::sub_group namespace
block-load/store helpers and the out-of-line definitions for the
deprecated load/store member templates.
- detail/sub_group.hpp: internal aggregator (core + extra + load_store)
for SYCL runtime and extension headers that need the full type.
Make sycl/sub_group.hpp a thin aggregator:
- Include detail/sub_group_core.hpp + detail/sub_group_extra.hpp +
detail/sub_group_load_store.hpp + nd_item.hpp.
- Keep the out-of-line nd_item::get_sub_group() definition here.
Narrow ext/oneapi/free_function_queries.hpp:
- Include detail/builder.hpp and detail/sub_group_core.hpp directly
instead of the heavier group.hpp + sub_group.hpp umbrella includes,
avoiding the load/store machinery for a header whose only sub_group
use is constructing a default sub_group().
Update include_deps tests to reflect the new include graph.
No ABI or API changes: all deprecated sub_group load/store and barrier
members are preserved with the same signatures. Existing
free_function_queries.hpp remain public entry points).
Compile-time impact on ext/oneapi/free_function_queries.hpp
-----------------------------------------------------------
Measured with clang -ftime-trace on a device-only SYCL compilation.
Transitive SYCL headers: 36 -> 32; stdlib headers: 17 -> 10.
Headers removed from the include closure of free_function_queries.hpp:
sycl/sub_group.hpp (replaced by sub_group_core.hpp)
sycl/__spirv/spirv_ops.hpp (was pulled by sub_group load/store)
sycl/detail/generic_type_traits.hpp (was pulled by SelectBlockT)
sycl/detail/address_space_cast.hpp (was pulled by dynamic_address_cast)
sycl/bit_cast.hpp (was pulled by block read/write casting)
+ 7 stdlib headers (utility, limits, initializer_list, and friends)
driven out by the above.
Per-header isolated compile time (device-only, spir64):
Header PR: intel#21762 DEV Delta
free_function_queries.hpp (whole) 109 ms 71 ms -38 ms (-35%)
sycl/sub_group.hpp 107 ms n/a eliminated
sycl/__spirv/spirv_ops.hpp 45 ms n/a eliminated
sycl/detail/generic_type_traits.hpp 50 ms n/a eliminated
sycl/detail/sub_group_core.hpp n/a 52 ms new (replaces sub_group.hpp)
The layering also sets up a clean future deletion path: if/when the
deprecated load/store API is removed, the work reduces to deleting
sub_group_load_store.hpp, sub_group_extra.hpp, and ~18 declaration
lines in sub_group_core.hpp — no surgery on a 671-line monolith.
koparasy
added a commit
to koparasy/llvm
that referenced
this pull request
Apr 14, 2026
Move Builder out of helpers.hpp into detail/builder.hpp:
- Extract the Builder class and declptr helper into a focused header
that declares only the forward types it actually needs (item, group,
h_item, id, nd_item, range). Device-side SPIR-V built-in access
is kept self-contained via spirv_vars.hpp.
Move SPIR-V fence helpers out of helpers.hpp into
detail/spirv_memory_semantics.hpp:
- getSPIRVMemorySemanticsMask (memory_order and fence_space overloads)
now lives in a header that only pulls spirv_types.hpp, access/access.hpp,
and memory_enums.hpp.
Make detail/helpers.hpp a thin forwarder:
- Include builder.hpp + spirv_memory_semantics.hpp.
- Retain get_or_store<T> and is_power_of_two in-place.
- Drop the forwarding class/enum declarations now in builder.hpp.
- All existing #include <sycl/detail/helpers.hpp> sites continue to
work without modification.
Split sycl/sub_group.hpp into focused detail headers:
- detail/sub_group_core.hpp: sub_group struct with lightweight query
API (get_local_id, get_group_id, leader, etc.) fully inline, plus
forward declarations for the deprecated load/store and barrier
members. Includes only spirv_vars.hpp, access/access.hpp, the narrow
fwd/multi_ptr.hpp forward header, id.hpp, range.hpp, and
memory_enums.hpp. No spirv_ops.hpp, no bit_cast, no generic_type_traits.
- detail/sub_group_extra.hpp: out-of-line definitions of the deprecated
barrier() and barrier(fence_space) methods. Includes spirv_ops.hpp
and spirv_memory_semantics.hpp.
- detail/sub_group_load_store.hpp: the detail::sub_group namespace
block-load/store helpers and the out-of-line definitions for the
deprecated load/store member templates.
- detail/sub_group.hpp: internal aggregator (core + extra + load_store)
for SYCL runtime and extension headers that need the full type.
Make sycl/sub_group.hpp a thin aggregator:
- Include detail/sub_group_core.hpp + detail/sub_group_extra.hpp +
detail/sub_group_load_store.hpp + nd_item.hpp.
- Keep the out-of-line nd_item::get_sub_group() definition here.
Narrow ext/oneapi/free_function_queries.hpp:
- Include detail/builder.hpp and detail/sub_group_core.hpp directly
instead of the heavier group.hpp + sub_group.hpp umbrella includes,
avoiding the load/store machinery for a header whose only sub_group
use is constructing a default sub_group().
Update include_deps tests to reflect the new include graph.
No ABI or API changes: all deprecated sub_group load/store and barrier
members are preserved with the same signatures. Existing
free_function_queries.hpp remain public entry points).
Compile-time impact on ext/oneapi/free_function_queries.hpp
-----------------------------------------------------------
Measured with clang -ftime-trace on a device-only SYCL compilation.
Transitive SYCL headers: 36 -> 32; stdlib headers: 17 -> 10.
Headers removed from the include closure of free_function_queries.hpp:
sycl/sub_group.hpp (replaced by sub_group_core.hpp)
sycl/__spirv/spirv_ops.hpp (was pulled by sub_group load/store)
sycl/detail/generic_type_traits.hpp (was pulled by SelectBlockT)
sycl/detail/address_space_cast.hpp (was pulled by dynamic_address_cast)
sycl/bit_cast.hpp (was pulled by block read/write casting)
+ 7 stdlib headers (utility, limits, initializer_list, and friends)
driven out by the above.
Per-header isolated compile time (device-only, spir64):
Header PR: intel#21762 DEV Delta
free_function_queries.hpp (whole) 109 ms 71 ms -38 ms (-35%)
sycl/sub_group.hpp 107 ms n/a eliminated
sycl/__spirv/spirv_ops.hpp 45 ms n/a eliminated
sycl/detail/generic_type_traits.hpp 50 ms n/a eliminated
sycl/detail/sub_group_core.hpp n/a 52 ms new (replaces sub_group.hpp)
The layering also sets up a clean future deletion path: if/when the
deprecated load/store API is removed, the work reduces to deleting
sub_group_load_store.hpp, sub_group_extra.hpp, and ~18 declaration
lines in sub_group_core.hpp — no surgery on a 671-line monolith.
Contributor
Author
|
@rolandschulz , @sergey-semenov This is part of ongoing work to reduce SYCL header compile-time overhead, especially for JIT / kernel-heavy use cases. I aimed for a balance between improving dependency isolation and keeping the change reviewable. Happy to iterate further if we think additional splitting is worthwhile. |
7b32e9c to
ce7422f
Compare
sergey-semenov
approved these changes
Apr 16, 2026
Split general-purpose utility umbrellas into narrow internal headers so users that only need one helper stop paying for unrelated machinery. These change save around 30ms when building `sycl/ext/oneapi/free_function_queries.hpp`, coming from removing 17 header includes (and their transitive dependencies) that were not actually needed for the building of `group.hpp` * detail/assert.hpp: extracted __SYCL_ASSERT macro from common.hpp into its own minimal header; common.hpp now includes it. * detail/loop.hpp: extracted detail::loop / loop_impl from helpers.hpp into a standalone header; retargeted accessor.hpp, group_algorithm.hpp, detail/builtins/builtins.hpp, and source/builtins/host_helper_macros.hpp to include the narrow helper directly. * detail/nd_loop.hpp: extracted NDLoop, NDLoopIterateImpl, and InitializedVal from common.hpp; rewired cg_types.hpp and group.hpp to include nd_loop.hpp rather than the heavier common.hpp. * detail/device_info_types.hpp: moved uuid_type / luid_type out of the broad type_traits.hpp into a dedicated device-info header; included that header from info/info_desc.hpp and relaxed the runtime check in device_impl.hpp to size + trivially-copyable requirements so the move stays source-compatible. * group.hpp: replaced common.hpp / generic_type_traits.hpp / type_traits.hpp / item.hpp with the new narrow headers; added a private convertToOpenCLGroupAsyncCopyPtr helper that inlines the OpenCL pointer-conversion logic without pulling in the full generic conversion machinery. * detail/async_work_group_copy_ptr.hpp: new narrow header providing async_copy_elem_type<T> and convertToOpenCLGroupAsyncCopyPtr. Dependencies are access/access.hpp, fwd/half.hpp, fwd/multi_ptr.hpp, <stdint.h>, <cstddef>, <type_traits> — all already required by any async_work_group_copy caller, so zero transitive cost is added. Uses std::make_signed_t / std::make_unsigned_t instead of hand-rolled fixed-width alias chains. * detail/type_traits/bool_traits.hpp: new narrow header providing is_scalar_bool, is_vector_bool, is_bool, change_base_type_t. Depends only on vec_marray_traits.hpp + <type_traits>, so it does not pull in the heavier type_traits.hpp chain. type_traits.hpp includes it and removes its own duplicate definitions, so existing callers are unaffected. * group.hpp: remove inline group_async_copy_opencl_type family and convertToOpenCLGroupAsyncCopyPtr; include the new header instead. Drop now-unnecessary fwd/half.hpp, <cstddef>, and bfloat16 forward declaration (all moved into the new header). * nd_item.hpp: replace #include <generic_type_traits.hpp> (which pulled in aliases.hpp, bit_cast.hpp, limits) with the new narrow header; replace ConvertToOpenCLType_t + DestT(ptr.get()) pattern with convertToOpenCLGroupAsyncCopyPtr(ptr) at all four call sites. * test/include_deps/deps_known.sh: add sed rule for the unified-runtime/ subdirectory so ur_api.h and ur_api_funcs.def are stripped to bare filenames rather than emitting absolute build paths. * test/include_deps/*.cpp: regenerated all golden files to reflect the updated include graphs and the deps_known.sh fix.
576669c to
1a2f48a
Compare
Contributor
Author
|
Hi @intel/dpcpp-esimd-reviewers , gentle ping on this PR. This is part of the ongoing SYCL header compile-time reduction work. @rolandschulz is on away AFAIK. I Would appreciate any feedback when you have time. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR breaks up several general-purpose “umbrella” headers into smaller, focused internal headers. The goal is to avoid pulling in unrelated machinery when only a subset of utilities is needed.
Motivation
Many helpers live in broad headers (common.hpp, helpers.hpp, type_traits.hpp) that introduce dependencies required for a subset of their provided support.
This increases frontend compile time, especially in JIT / kernel-language use cases.
This change reduces compile time by ~30ms when building free_function_queries.hpp.
Key Changes