Skip to content

Conversation

@lslusarczyk
Copy link
Contributor

Optimized common path when kernel is already build and exists in the cache. Use BuildStatus already provided by compare_exchange_strong instead of taking lock, calling wait and reading it again.

performance impact

Visible in most benchmarks.
Examples

Instructions decreased from 159.8k to 158.2k over UR baseline 133.7k, that is by 6.1%, see rightmost dots at:
SubmitKernel out of order with completion using events long kernel, CPU count(1)

Instructions decreased from 133.4k to 132.3k over UR baseline 119.2k, that is by 7.7%, see rightmost dots at:
SubmitKernel in order, CPU count(4)

And some example of time over L0 (but note time has high variance, so it is less certain)
Time overhead over L0 before: 18.0% (SYCL 17.65us, L0 14.96us), after 15.7% (SYCL 17.13us, L0 14.80us), reduced by ~2.2%
SubmitKernel in order with completion using events(2)

@lslusarczyk lslusarczyk requested a review from a team as a code owner November 28, 2025 12:20
@lslusarczyk lslusarczyk requested a review from vinser52 November 28, 2025 12:20
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lslusarczyk
Copy link
Contributor Author

@intel/llvm-gatekeepers please merge,
Failed CI also fails on upstream

@sergey-semenov sergey-semenov merged commit e15c474 into intel:sycl Nov 28, 2025
79 of 89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants