[SYCL] avoid lock and wait in KernelProgramCache::getOrBuild #20780
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Optimized common path when kernel is already build and exists in the cache. Use BuildStatus already provided by compare_exchange_strong instead of taking lock, calling wait and reading it again.
performance impact
Visible in most benchmarks.
Examples
Instructions decreased from 159.8k to 158.2k over UR baseline 133.7k, that is by 6.1%, see rightmost dots at:

Instructions decreased from 133.4k to 132.3k over UR baseline 119.2k, that is by 7.7%, see rightmost dots at:

And some example of time over L0 (but note time has high variance, so it is less certain)

Time overhead over L0 before: 18.0% (SYCL 17.65us, L0 14.96us), after 15.7% (SYCL 17.13us, L0 14.80us), reduced by ~2.2%