Skip to content

[SYCL] Split lightweight detail headers out of umbrella paths#21762

Open
koparasy wants to merge 1 commit intointel:syclfrom
koparasy:compile-time/split-lightweight-detail-headers
Open

[SYCL] Split lightweight detail headers out of umbrella paths#21762
koparasy wants to merge 1 commit intointel:syclfrom
koparasy:compile-time/split-lightweight-detail-headers

Conversation

@koparasy
Copy link
Copy Markdown
Contributor

@koparasy koparasy commented Apr 14, 2026

This PR breaks up several general-purpose “umbrella” headers into smaller, focused internal headers. The goal is to avoid pulling in unrelated machinery when only a subset of utilities is needed.

Motivation

Many helpers live in broad headers (common.hpp, helpers.hpp, type_traits.hpp) that introduce dependencies required for a subset of their provided support.
This increases frontend compile time, especially in JIT / kernel-language use cases.
This change reduces compile time by ~30ms when building free_function_queries.hpp.

Key Changes

  • Extracted narrow headers
    • detail/assert.hpp → __SYCL_ASSERT
    • detail/loop.hpp → loop helpers from helpers.hpp
    • detail/nd_loop.hpp → ND loop utilities from common.hpp
    • detail/device_info_types.hpp → uuid_type, luid_type
    • detail/type_traits/bool_traits.hpp → bool-related traits
  • Async copy cleanup
    • Introduced detail/async_work_group_copy_ptr.hpp
    • Centralizes async_copy_elem_type and pointer conversion logic
    • Removes duplicated OpenCL conversion helpers
  • Header dependency tightening
    • Replaced broad includes in group.hpp and nd_item.hpp with narrow headers
    • Removed redundant forward declarations and unused includes
  • Tests
    • Updated include-dependency tests
    • Fixed path normalization in deps_known.sh

@koparasy koparasy requested review from a team and rolandschulz as code owners April 14, 2026 13:33
@koparasy koparasy requested a review from sergey-semenov April 14, 2026 13:33
koparasy added a commit to koparasy/llvm that referenced this pull request Apr 14, 2026
Move Builder out of helpers.hpp into detail/builder.hpp:
- Extract the Builder class and declptr helper into a focused header
  that declares only the forward types it actually needs (item, group,
  h_item, id, nd_item, range). Device-side SPIR-V built-in access
  is kept self-contained via spirv_vars.hpp.

Move SPIR-V fence helpers out of helpers.hpp into
detail/spirv_memory_semantics.hpp:
- getSPIRVMemorySemanticsMask (memory_order and fence_space overloads)
  now lives in a header that only pulls spirv_types.hpp, access/access.hpp,
  and memory_enums.hpp.

Make detail/helpers.hpp a thin forwarder:
- Include builder.hpp + spirv_memory_semantics.hpp.
- Retain get_or_store<T> and is_power_of_two in-place.
- Drop the forwarding class/enum declarations now in builder.hpp.
- All existing #include <sycl/detail/helpers.hpp> sites continue to
  work without modification.

Split sycl/sub_group.hpp into focused detail headers:
- detail/sub_group_core.hpp: sub_group struct with lightweight query
  API (get_local_id, get_group_id, leader, etc.) fully inline, plus
  forward declarations for the deprecated load/store and barrier
  members. Includes only spirv_vars.hpp, access/access.hpp, the narrow
  fwd/multi_ptr.hpp forward header, id.hpp, range.hpp, and
  memory_enums.hpp. No spirv_ops.hpp, no bit_cast, no generic_type_traits.
- detail/sub_group_extra.hpp: out-of-line definitions of the deprecated
  barrier() and barrier(fence_space) methods. Includes spirv_ops.hpp
  and spirv_memory_semantics.hpp.
- detail/sub_group_load_store.hpp: the detail::sub_group namespace
  block-load/store helpers and the out-of-line definitions for the
  deprecated load/store member templates.
- detail/sub_group.hpp: internal aggregator (core + extra + load_store)
  for SYCL runtime and extension headers that need the full type.

Make sycl/sub_group.hpp a thin aggregator:
- Include detail/sub_group_core.hpp + detail/sub_group_extra.hpp +
  detail/sub_group_load_store.hpp + nd_item.hpp.
- Keep the out-of-line nd_item::get_sub_group() definition here.

Narrow ext/oneapi/free_function_queries.hpp:
- Include detail/builder.hpp and detail/sub_group_core.hpp directly
  instead of the heavier group.hpp + sub_group.hpp umbrella includes,
  avoiding the load/store machinery for a header whose only sub_group
  use is constructing a default sub_group().

Update include_deps tests to reflect the new include graph.

No ABI or API changes: all deprecated sub_group load/store and barrier
members are preserved with the same signatures. Existing
free_function_queries.hpp remain public entry points).

Compile-time impact on ext/oneapi/free_function_queries.hpp
-----------------------------------------------------------
Measured with clang -ftime-trace on a device-only SYCL compilation.
Transitive SYCL headers: 36 -> 32; stdlib headers: 17 -> 10.

Headers removed from the include closure of free_function_queries.hpp:
  sycl/sub_group.hpp          (replaced by sub_group_core.hpp)
  sycl/__spirv/spirv_ops.hpp  (was pulled by sub_group load/store)
  sycl/detail/generic_type_traits.hpp  (was pulled by SelectBlockT)
  sycl/detail/address_space_cast.hpp   (was pulled by dynamic_address_cast)
  sycl/bit_cast.hpp           (was pulled by block read/write casting)
  + 7 stdlib headers (utility, limits, initializer_list, and friends)
    driven out by the above.

Per-header isolated compile time (device-only, spir64):

  Header                                   PR: intel#21762      DEV       Delta
  free_function_queries.hpp (whole)            109 ms     71 ms    -38 ms (-35%)
  sycl/sub_group.hpp                           107 ms     n/a       eliminated
  sycl/__spirv/spirv_ops.hpp                    45 ms     n/a       eliminated
  sycl/detail/generic_type_traits.hpp           50 ms     n/a       eliminated
  sycl/detail/sub_group_core.hpp                  n/a     52 ms     new (replaces sub_group.hpp)

The layering also sets up a clean future deletion path: if/when the
deprecated load/store API is removed, the work reduces to deleting
sub_group_load_store.hpp, sub_group_extra.hpp, and ~18 declaration
lines in sub_group_core.hpp — no surgery on a 671-line monolith.
koparasy added a commit to koparasy/llvm that referenced this pull request Apr 14, 2026
Move Builder out of helpers.hpp into detail/builder.hpp:
- Extract the Builder class and declptr helper into a focused header
  that declares only the forward types it actually needs (item, group,
  h_item, id, nd_item, range). Device-side SPIR-V built-in access
  is kept self-contained via spirv_vars.hpp.

Move SPIR-V fence helpers out of helpers.hpp into
detail/spirv_memory_semantics.hpp:
- getSPIRVMemorySemanticsMask (memory_order and fence_space overloads)
  now lives in a header that only pulls spirv_types.hpp, access/access.hpp,
  and memory_enums.hpp.

Make detail/helpers.hpp a thin forwarder:
- Include builder.hpp + spirv_memory_semantics.hpp.
- Retain get_or_store<T> and is_power_of_two in-place.
- Drop the forwarding class/enum declarations now in builder.hpp.
- All existing #include <sycl/detail/helpers.hpp> sites continue to
  work without modification.

Split sycl/sub_group.hpp into focused detail headers:
- detail/sub_group_core.hpp: sub_group struct with lightweight query
  API (get_local_id, get_group_id, leader, etc.) fully inline, plus
  forward declarations for the deprecated load/store and barrier
  members. Includes only spirv_vars.hpp, access/access.hpp, the narrow
  fwd/multi_ptr.hpp forward header, id.hpp, range.hpp, and
  memory_enums.hpp. No spirv_ops.hpp, no bit_cast, no generic_type_traits.
- detail/sub_group_extra.hpp: out-of-line definitions of the deprecated
  barrier() and barrier(fence_space) methods. Includes spirv_ops.hpp
  and spirv_memory_semantics.hpp.
- detail/sub_group_load_store.hpp: the detail::sub_group namespace
  block-load/store helpers and the out-of-line definitions for the
  deprecated load/store member templates.
- detail/sub_group.hpp: internal aggregator (core + extra + load_store)
  for SYCL runtime and extension headers that need the full type.

Make sycl/sub_group.hpp a thin aggregator:
- Include detail/sub_group_core.hpp + detail/sub_group_extra.hpp +
  detail/sub_group_load_store.hpp + nd_item.hpp.
- Keep the out-of-line nd_item::get_sub_group() definition here.

Narrow ext/oneapi/free_function_queries.hpp:
- Include detail/builder.hpp and detail/sub_group_core.hpp directly
  instead of the heavier group.hpp + sub_group.hpp umbrella includes,
  avoiding the load/store machinery for a header whose only sub_group
  use is constructing a default sub_group().

Update include_deps tests to reflect the new include graph.

No ABI or API changes: all deprecated sub_group load/store and barrier
members are preserved with the same signatures. Existing
free_function_queries.hpp remain public entry points).

Compile-time impact on ext/oneapi/free_function_queries.hpp
-----------------------------------------------------------
Measured with clang -ftime-trace on a device-only SYCL compilation.
Transitive SYCL headers: 36 -> 32; stdlib headers: 17 -> 10.

Headers removed from the include closure of free_function_queries.hpp:
  sycl/sub_group.hpp          (replaced by sub_group_core.hpp)
  sycl/__spirv/spirv_ops.hpp  (was pulled by sub_group load/store)
  sycl/detail/generic_type_traits.hpp  (was pulled by SelectBlockT)
  sycl/detail/address_space_cast.hpp   (was pulled by dynamic_address_cast)
  sycl/bit_cast.hpp           (was pulled by block read/write casting)
  + 7 stdlib headers (utility, limits, initializer_list, and friends)
    driven out by the above.

Per-header isolated compile time (device-only, spir64):

  Header                                   PR: intel#21762      DEV       Delta
  free_function_queries.hpp (whole)            109 ms     71 ms    -38 ms (-35%)
  sycl/sub_group.hpp                           107 ms     n/a       eliminated
  sycl/__spirv/spirv_ops.hpp                    45 ms     n/a       eliminated
  sycl/detail/generic_type_traits.hpp           50 ms     n/a       eliminated
  sycl/detail/sub_group_core.hpp                  n/a     52 ms     new (replaces sub_group.hpp)

The layering also sets up a clean future deletion path: if/when the
deprecated load/store API is removed, the work reduces to deleting
sub_group_load_store.hpp, sub_group_extra.hpp, and ~18 declaration
lines in sub_group_core.hpp — no surgery on a 671-line monolith.
@koparasy
Copy link
Copy Markdown
Contributor Author

@rolandschulz , @sergey-semenov This is part of ongoing work to reduce SYCL header compile-time overhead, especially for JIT / kernel-heavy use cases.

I aimed for a balance between improving dependency isolation and keeping the change reviewable. Happy to iterate further if we think additional splitting is worthwhile.

@koparasy koparasy changed the title SYCL: split lightweight detail headers out of umbrella paths [SYCL] split lightweight detail headers out of umbrella paths Apr 16, 2026
@koparasy koparasy force-pushed the compile-time/split-lightweight-detail-headers branch from 7b32e9c to ce7422f Compare April 16, 2026 14:03
@sergey-semenov sergey-semenov changed the title [SYCL] split lightweight detail headers out of umbrella paths [SYCL] Split lightweight detail headers out of umbrella paths Apr 16, 2026
Split general-purpose utility umbrellas into narrow internal headers
so users that only need one helper stop paying for unrelated machinery.
These change save around 30ms when building `sycl/ext/oneapi/free_function_queries.hpp`,
coming from removing 17 header includes (and their transitive dependencies)
that were not actually needed for the building of `group.hpp`

* detail/assert.hpp: extracted __SYCL_ASSERT macro from common.hpp
  into its own minimal header; common.hpp now includes it.

* detail/loop.hpp: extracted detail::loop / loop_impl from helpers.hpp
  into a standalone header; retargeted accessor.hpp, group_algorithm.hpp,
  detail/builtins/builtins.hpp, and source/builtins/host_helper_macros.hpp
  to include the narrow helper directly.

* detail/nd_loop.hpp: extracted NDLoop, NDLoopIterateImpl, and
  InitializedVal from common.hpp; rewired cg_types.hpp and group.hpp
  to include nd_loop.hpp rather than the heavier common.hpp.

* detail/device_info_types.hpp: moved uuid_type / luid_type out of the
  broad type_traits.hpp into a dedicated device-info header; included
  that header from info/info_desc.hpp and relaxed the runtime check in
  device_impl.hpp to size + trivially-copyable requirements so the move
  stays source-compatible.

* group.hpp: replaced common.hpp / generic_type_traits.hpp /
  type_traits.hpp / item.hpp with the new narrow headers; added a
  private convertToOpenCLGroupAsyncCopyPtr helper that inlines the
  OpenCL pointer-conversion logic without pulling in the full generic
  conversion machinery.

* detail/async_work_group_copy_ptr.hpp: new narrow header providing
  async_copy_elem_type<T> and convertToOpenCLGroupAsyncCopyPtr.
  Dependencies are access/access.hpp, fwd/half.hpp, fwd/multi_ptr.hpp,
  <stdint.h>, <cstddef>, <type_traits> — all already required by
  any async_work_group_copy caller, so zero transitive cost is added.
  Uses std::make_signed_t / std::make_unsigned_t instead of hand-rolled
  fixed-width alias chains.

* detail/type_traits/bool_traits.hpp: new narrow header providing
  is_scalar_bool, is_vector_bool, is_bool, change_base_type_t.
  Depends only on vec_marray_traits.hpp + <type_traits>, so it does
  not pull in the heavier type_traits.hpp chain.  type_traits.hpp
  includes it and removes its own duplicate definitions, so existing
  callers are unaffected.

* group.hpp: remove inline group_async_copy_opencl_type family and
  convertToOpenCLGroupAsyncCopyPtr; include the new header instead.
  Drop now-unnecessary fwd/half.hpp, <cstddef>, and bfloat16 forward
  declaration (all moved into the new header).

* nd_item.hpp: replace #include <generic_type_traits.hpp> (which pulled
  in aliases.hpp, bit_cast.hpp, limits) with the new narrow header;
  replace ConvertToOpenCLType_t + DestT(ptr.get()) pattern with
  convertToOpenCLGroupAsyncCopyPtr(ptr) at all four call sites.

* test/include_deps/deps_known.sh: add sed rule for the
  unified-runtime/ subdirectory so ur_api.h and ur_api_funcs.def are
  stripped to bare filenames rather than emitting absolute build paths.

* test/include_deps/*.cpp: regenerated all golden files to reflect the
  updated include graphs and the deps_known.sh fix.
@koparasy koparasy force-pushed the compile-time/split-lightweight-detail-headers branch from 576669c to 1a2f48a Compare April 16, 2026 20:33
@koparasy
Copy link
Copy Markdown
Contributor Author

Hi @intel/dpcpp-esimd-reviewers , gentle ping on this PR. This is part of the ongoing SYCL header compile-time reduction work. @rolandschulz is on away AFAIK. I Would appreciate any feedback when you have time.

Copy link
Copy Markdown
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invoke_simd lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants