Skip to content

Conversation

aelovikov-intel
Copy link
Contributor

The change improves "host" compilation times for cases with multiple kernels.

…antiations

The change improves "host" compilation times for cases with multiple
kernels.
@uditagarwal97
Copy link
Contributor

The change improves "host" compilation times for cases with multiple kernels.

Out of curiosity, do you happen to have performance improvement numbers? if so, please share them.

@aelovikov-intel
Copy link
Contributor Author

The change improves "host" compilation times for cases with multiple kernels.

Out of curiosity, do you happen to have performance improvement numbers? if so, please share them.

14.4s -> 8.6s for $ time clang++ -fsycl a.cpp -o /dev/null on something like

  int *p;
  sycl::detail::loop<2>([&](auto outer_idx) {
    sycl::detail::loop<200>([&](auto idx) {
      auto krn = [=]() {
        *p = 42;
      };
      auto s = [&](sycl::handler &cgh) {
        // sycl::detail::CheckDeviceCopyable<decltype(krn)>();
        static_assert(std::is_invocable_r_v<void, decltype(krn)>);
        static_assert(!std::is_invocable_r_v<void, decltype(krn), sycl::handler>);
        static_assert(!std::is_invocable_r_v<void, decltype(krn), sycl::kernel_handler>);
        // krn();
        cgh.single_task(krn);
      };
      (void)sycl::detail::type_erased_cgfo_ty{s};
      static_assert(std::is_invocable_r_v<void, decltype(s), sycl::handler &>);
      q.submit(s);
    });
  });

@bader
Copy link
Contributor

bader commented Mar 25, 2025

14.4s -> 8.6s for $ time clang++ -fsycl a.cpp -o /dev/null on something like

@aelovikov-intel, are you using debug build of the compiler? All these numbers seem to be too high.

@aelovikov-intel
Copy link
Contributor Author

14.4s -> 8.6s for $ time clang++ -fsycl a.cpp -o /dev/null on something like

@aelovikov-intel, are you using debug build of the compiler? All these numbers seem to be too high.

It's 400 kernels, see compile-time loops.

@bader
Copy link
Contributor

bader commented Mar 26, 2025

14.4s -> 8.6s for $ time clang++ -fsycl a.cpp -o /dev/null on something like

@aelovikov-intel, are you using debug build of the compiler? All these numbers seem to be too high.

It's 400 kernels, see compile-time loops.

@aelovikov-intel, would you mind checking if this change improves compile time of SYCL-CTS, please? SYCL-CTS compilation time with DPC++ compiler on GitHub runner exceeds the limit. I wonder if this change helps to fix this problem.

@aelovikov-intel aelovikov-intel merged commit 42990a6 into intel:sycl Mar 26, 2025
44 of 46 checks passed
@aelovikov-intel aelovikov-intel deleted the host-kernel-compile-time branch March 26, 2025 17:40
@aelovikov-intel
Copy link
Contributor Author

14.4s -> 8.6s for $ time clang++ -fsycl a.cpp -o /dev/null on something like

@aelovikov-intel, are you using debug build of the compiler? All these numbers seem to be too high.

It's 400 kernels, see compile-time loops.

@aelovikov-intel, would you mind checking if this change improves compile time of SYCL-CTS, please? SYCL-CTS compilation time with DPC++ compiler on GitHub runner exceeds the limit. I wonder if this change helps to fix this problem.

Surprisingly, it might. 13m -> 10.5m on SPR+PVC, where there are a few tests that a bottlenecks. On a less powerful system might be even more impactful.

That said, I'm not sure how stable/reproducible the gain is.

@bader
Copy link
Contributor

bader commented Mar 26, 2025

14.4s -> 8.6s for $ time clang++ -fsycl a.cpp -o /dev/null on something like

@aelovikov-intel, are you using debug build of the compiler? All these numbers seem to be too high.

It's 400 kernels, see compile-time loops.

@aelovikov-intel, would you mind checking if this change improves compile time of SYCL-CTS, please? SYCL-CTS compilation time with DPC++ compiler on GitHub runner exceeds the limit. I wonder if this change helps to fix this problem.

Surprisingly, it might. 13m -> 10.5m on SPR+PVC, where there are a few tests that a bottlenecks. On a less powerful system might be even more impactful.

That said, I'm not sure how stable/reproducible the gain is.

Thanks! I expect tests checking math built-ins and vector operations to benefit from this change. These tests auto-generate a lot of kernels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants