New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cl_khr_command_buffer_multi_device #1437
Conversation
The formatting scripts had gotten out of sync with each other recently so this also brings those back in line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, but can you run the formatter (at least long lines pop to eye) and add a bit of comments to the overall functioning of the current remap()? Also perhaps squash some of the commits together if it's not too much trouble (I accidentally commented some obsolete issues related to multiple devices, I think, when going through the commits).
lib/CL/clGetDeviceInfo.c
Outdated
|
||
case CL_DEVICE_COMMAND_BUFFER_NUM_SYNC_DEVICES_KHR: | ||
if (device->ops->get_device_info_ext != NULL | ||
&& device->ops->get_device_info_ext(device, param_name, param_value_size,param_value, param_value_size_ret) == CL_SUCCESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line looks long. Did you run the formatter?
lib/CL/clRemapCommandBufferKHR.c
Outdated
@@ -0,0 +1,124 @@ | |||
/* OpenCL runtime library: clRemapCommandBufferKHR() | |||
|
|||
Copyright (c) 2022 Jan Solanti / Tampere University |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this really done in 2022?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's when the initial (private) commits are from. I'll update it to a year range.
lib/CL/clRemapCommandBufferKHR.c
Outdated
|
||
|
||
#define CMD_NODE_COPY_ARRAY(item_t, count, name) \ | ||
do { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indent of this block is wrong.
lib/CL/clRemapCommandBufferKHR.c
Outdated
POCL_GOTO_ERROR_COND ((new_cmd->name == NULL), CL_OUT_OF_HOST_MEMORY); \ | ||
memcpy(new_cmd->name, cmd->name, sizeof (item_t) * count); \ | ||
}while(0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment documenting the overall functionality of the current remap would be useful here.
lib/CL/pocl_util.c
Outdated
@@ -1228,6 +1228,18 @@ pocl_cmdbuf_choose_recording_queue (cl_command_buffer_khr command_buffer, | |||
return CL_SUCCESS; | |||
} | |||
|
|||
cl_command_buffer_properties_khr | |||
pocl_cmdbuf_get_property (cl_command_buffer_khr command_buffer, | |||
cl_command_buffer_properties_khr name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format?
for (unsigned i = 0; i < extensions_size / sizeof(cl_name_version); ++i) | ||
{ | ||
cl_name_version *e = extensions+i; | ||
if (strcmp(e->name, "cl_khr_command_buffer_multi_device") == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format doesn't look correct.
lib/CL/clRemapCommandBufferKHR.c
Outdated
@@ -59,23 +59,29 @@ POname (clRemapCommandBufferKHR) ( | |||
cl_command_buffer_properties_khr universal_sync = pocl_cmdbuf_get_property(command_buffer, CL_COMMAND_BUFFER_UNIVERSAL_SYNC_KHR); | |||
POCL_GOTO_ERROR_COND((universal_sync == 0 && num_queues > 1), CL_INCOMPATIBLE_COMMAND_QUEUE_KHR); | |||
|
|||
POname (clCreateCommandBufferKHR) (num_queues, queues, command_buffer->properties, &errcode); | |||
new_cmdbuf = POname (clCreateCommandBufferKHR) (num_queues, queues, command_buffer->properties, &errcode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format? Long line.
@@ -303,7 +303,7 @@ main (int _argc, char **_argv) | |||
CHECK_CL_ERROR (clReleaseEvent (command_buf_event)); | |||
} | |||
|
|||
/* Remap from 2 to 1 queues & run */ | |||
/* Remap from N to 1 queues & run */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test the opposite case as well? That is, the one that can enable runtime-driven task mapping/perf. portability via pushing all work to a default CPU queue and then remapping to utilize more devices. That likely is a NOP for now. Also it would be good to test a case where the remap is done to queue/devices which are not in the original schedule/buffer. This models the disappearing devices case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current automatic remapping logic is very simple and can't make use of a larger queue list (and is very likely to degrade performance if the number of devices changes.) This is definitely something that should be added in the future, though.
lib/CL/clCreateCommandBufferKHR.c
Outdated
@@ -96,6 +104,8 @@ POname (clCreateCommandBufferKHR) ( | |||
} | |||
} | |||
|
|||
POCL_GOTO_ERROR_COND ((num_queues > 1 && !pocl_cmdbuf_can_queues_sync(num_queues, queues, universal_sync_enabled)), CL_INCOMPATIBLE_COMMAND_QUEUE_KHR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long lines here as well.
lib/CL/pocl_util.c
Outdated
cl_device_id dev = queues[i]->device; | ||
|
||
cl_uint num_sync_devices; | ||
POCL_RETURN_ERROR_COND ((POname(clGetDeviceInfo)(queues[i]->device, CL_DEVICE_COMMAND_BUFFER_NUM_SYNC_DEVICES_KHR, sizeof (cl_uint), &num_sync_devices, NULL)), CL_INCOMPATIBLE_COMMAND_QUEUE_KHR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too.
Can you describe this in the release notes and rebase? |
ff712de
to
59016cb
Compare
Rebased, squashed & formatted. Should be easier to review now. I'm seeing a number of asan warnings though (locally as well), still working through those. |
@jansol: a release note is still missing. |
Multi-device command buffers | ||
============================ | ||
|
||
Initial support for `cl_khr_command_buffer_multi_device` has been added, i.e. it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the CPU devices, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not really anything driver-specific, just like in the original command buffer code. It just hasn't really been tested with other drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps still worth mentioning what are the expectations in terms of perf. improvements (none still, right?) as it "caches" mainly the error checking?
3c79fc1
to
c19ce9a
Compare
cmdbuf: handle profiling queries for multidev buffers cmdbuf: add new platform & device queries multidev command buffers & remap clRemapCommandBufferKHR: leave new command buffer in same state as the original advertise cl_khr_command_buffer_multi_device clEnqueueCommandBufferKHR: fix helper function name command buffers: add multi device test cmdbuf: make clRemapCommandBufferKHR actually available cmdbuf: various bugfixes cmdbuf: more correct cmdbuf flag handling tests/cmdbuf: add automatic remapping flag to clRemapCommandBufferKHR lib: update command buffer code to conform to latest specs Rework *ndrangekernel & cmdbuf remapping multidev-cmdbuf: Use correct queues to run cmds tests: fix memory leaks in multidev cmdbuf test tests: add missing include
ASAN was making a lot of noise about these
The arm bots are broken again, pulling this in regardless. |
With command buffers being implemented "host-side", this is a fairly straightforward thing to support.