-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add runtime support for split device binaries #759
Conversation
TODO: add a test once the driver supports split device binaries |
sycl/include/CL/sycl/detail/program_manager/program_manager.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 9 files reviewed, 13 unresolved discussions (waiting on @AGindinson, @Fznamznon, @kbobrovs, @sergey-semenov, and @smaslov-intel)
sycl/doc/SYCLPluginInterface.md, line 101 at r1 (raw file):
The idea is that SYCL runtime should do as much arch-neutral tasks as possible. So for each architecture SYCL runtime could maintain a mapping from kernel name to a set of device binaries which have this kernel implemented. Those binaries are then offered via this interface to the plugin covering this architecture to select the best fit. If there is just 1 such binary (most cases), no call to piextDeviceSelectBinary is needed.
Agreed.
sycl/include/CL/sycl/detail/pi.h, line 429 at r1 (raw file):
Previously, AGindinson (Artem Gindinson) wrote…
Other declarations around this one seems to have their own formatting rules (apparently, "to look nicer"). Don't know if it matters much - just as a note
Right, let's keep each argument on its own line
@sergey-semenov , could you please rebase your changes? |
9e82ee6
to
8c9fb36
Compare
8c9fb36
to
0ec9559
Compare
0ec9559
to
4045eaf
Compare
sycl/include/CL/sycl/detail/program_manager/program_manager.hpp
Outdated
Show resolved
Hide resolved
4045eaf
to
7708c90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not final review.
sycl/include/CL/sycl/detail/program_manager/program_manager.hpp
Outdated
Show resolved
Hide resolved
sycl/include/CL/sycl/detail/program_manager/program_manager.hpp
Outdated
Show resolved
Hide resolved
sycl/include/CL/sycl/detail/program_manager/program_manager.hpp
Outdated
Show resolved
Hide resolved
ce3d8df
to
d46a5ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSModuleHandle participation in kernel resolution remains the open question.
d46a5ce
to
b3ee21d
Compare
Should we? #860 (comment) |
|
||
// OpenCL 2.1 and greater require clCreateProgramWithIL | ||
if (pi::useBackend(pi::SYCL_BE_PI_OPENCL) && | ||
C.get_platform().get_info<info::platform::version>() >= "2.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this lexicographical comparison? Why it is considered reliable here? Maybe add a comment (later is OK)
/// map from a set of kernels to the vector of images containing them and | ||
/// coming from the module. | ||
/// Access must be guarded by the \ref Sync::getGlobalLock() | ||
std::map<OSModuleHandle, KernelToImgsMap> m_DeviceImages; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like ordering of keys is not important here, so unordered_map with O(1) lookup time complexity can be used instead of map with O(log(n)). The same applies to all other maps here. Can be addressed in a separate PR.
sycl/include/CL/sycl/detail/program_manager/program_manager.hpp
Outdated
Show resolved
Hide resolved
sycl/include/CL/sycl/detail/program_manager/program_manager.hpp
Outdated
Show resolved
Hide resolved
This patch enables usage of multiple split device binaries per OS module (executable or shared object file). The required binary is chosen based on the entry table (filled by clang-offload-wrapper) that lists all kernels contained within. Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Co-authored-by: Sergey Semenov <sergey.semenov@intel.com> Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com> Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
499edeb
to
bf5e663
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please address few other comments shortly after.
These tests are fixed by #5359
This patch enables usage of multiple split device binaries per OS
module (executable or shared object file). The required binary is
chosen based on the entry table (filled by clang-offload-wrapper)
that lists all kernels contained within.
Signed-off-by: Sergey Semenov sergey.semenov@intel.com