Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cl_khr_command_buffer_multi_device #1437

Merged
merged 8 commits into from Apr 18, 2024

Conversation

jansol
Copy link
Contributor

@jansol jansol commented Mar 11, 2024

With command buffers being implemented "host-side", this is a fairly straightforward thing to support.

@jansol
Copy link
Contributor Author

jansol commented Mar 11, 2024

The formatting scripts had gotten out of sync with each other recently so this also brings those back in line.

Copy link
Member

@pjaaskel pjaaskel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but can you run the formatter (at least long lines pop to eye) and add a bit of comments to the overall functioning of the current remap()? Also perhaps squash some of the commits together if it's not too much trouble (I accidentally commented some obsolete issues related to multiple devices, I think, when going through the commits).


case CL_DEVICE_COMMAND_BUFFER_NUM_SYNC_DEVICES_KHR:
if (device->ops->get_device_info_ext != NULL
&& device->ops->get_device_info_ext(device, param_name, param_value_size,param_value, param_value_size_ret) == CL_SUCCESS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks long. Did you run the formatter?

@@ -0,0 +1,124 @@
/* OpenCL runtime library: clRemapCommandBufferKHR()

Copyright (c) 2022 Jan Solanti / Tampere University
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this really done in 2022?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's when the initial (private) commits are from. I'll update it to a year range.



#define CMD_NODE_COPY_ARRAY(item_t, count, name) \
do { \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indent of this block is wrong.

POCL_GOTO_ERROR_COND ((new_cmd->name == NULL), CL_OUT_OF_HOST_MEMORY); \
memcpy(new_cmd->name, cmd->name, sizeof (item_t) * count); \
}while(0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment documenting the overall functionality of the current remap would be useful here.

@@ -1228,6 +1228,18 @@ pocl_cmdbuf_choose_recording_queue (cl_command_buffer_khr command_buffer,
return CL_SUCCESS;
}

cl_command_buffer_properties_khr
pocl_cmdbuf_get_property (cl_command_buffer_khr command_buffer,
cl_command_buffer_properties_khr name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format?

for (unsigned i = 0; i < extensions_size / sizeof(cl_name_version); ++i)
{
cl_name_version *e = extensions+i;
if (strcmp(e->name, "cl_khr_command_buffer_multi_device") == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format doesn't look correct.

@@ -59,23 +59,29 @@ POname (clRemapCommandBufferKHR) (
cl_command_buffer_properties_khr universal_sync = pocl_cmdbuf_get_property(command_buffer, CL_COMMAND_BUFFER_UNIVERSAL_SYNC_KHR);
POCL_GOTO_ERROR_COND((universal_sync == 0 && num_queues > 1), CL_INCOMPATIBLE_COMMAND_QUEUE_KHR);

POname (clCreateCommandBufferKHR) (num_queues, queues, command_buffer->properties, &errcode);
new_cmdbuf = POname (clCreateCommandBufferKHR) (num_queues, queues, command_buffer->properties, &errcode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format? Long line.

@@ -303,7 +303,7 @@ main (int _argc, char **_argv)
CHECK_CL_ERROR (clReleaseEvent (command_buf_event));
}

/* Remap from 2 to 1 queues & run */
/* Remap from N to 1 queues & run */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test the opposite case as well? That is, the one that can enable runtime-driven task mapping/perf. portability via pushing all work to a default CPU queue and then remapping to utilize more devices. That likely is a NOP for now. Also it would be good to test a case where the remap is done to queue/devices which are not in the original schedule/buffer. This models the disappearing devices case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current automatic remapping logic is very simple and can't make use of a larger queue list (and is very likely to degrade performance if the number of devices changes.) This is definitely something that should be added in the future, though.

@@ -96,6 +104,8 @@ POname (clCreateCommandBufferKHR) (
}
}

POCL_GOTO_ERROR_COND ((num_queues > 1 && !pocl_cmdbuf_can_queues_sync(num_queues, queues, universal_sync_enabled)), CL_INCOMPATIBLE_COMMAND_QUEUE_KHR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long lines here as well.

cl_device_id dev = queues[i]->device;

cl_uint num_sync_devices;
POCL_RETURN_ERROR_COND ((POname(clGetDeviceInfo)(queues[i]->device, CL_DEVICE_COMMAND_BUFFER_NUM_SYNC_DEVICES_KHR, sizeof (cl_uint), &num_sync_devices, NULL)), CL_INCOMPATIBLE_COMMAND_QUEUE_KHR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

@pjaaskel
Copy link
Member

Can you describe this in the release notes and rebase?

@jansol jansol force-pushed the multidev_cmdbuf branch 3 times, most recently from ff712de to 59016cb Compare April 4, 2024 15:11
@jansol
Copy link
Contributor Author

jansol commented Apr 4, 2024

Rebased, squashed & formatted. Should be easier to review now. I'm seeing a number of asan warnings though (locally as well), still working through those.

@pjaaskel
Copy link
Member

pjaaskel commented Apr 8, 2024

Can you describe this in the release notes and rebase?

@jansol: a release note is still missing.

Multi-device command buffers
============================

Initial support for `cl_khr_command_buffer_multi_device` has been added, i.e. it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CPU devices, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not really anything driver-specific, just like in the original command buffer code. It just hasn't really been tested with other drivers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps still worth mentioning what are the expectations in terms of perf. improvements (none still, right?) as it "caches" mainly the error checking?

@jansol jansol force-pushed the multidev_cmdbuf branch 2 times, most recently from 3c79fc1 to c19ce9a Compare April 12, 2024 15:03
cmdbuf: handle profiling queries for multidev buffers

cmdbuf: add new platform & device queries

multidev command buffers & remap

clRemapCommandBufferKHR: leave new command buffer in same state as the original

advertise cl_khr_command_buffer_multi_device

clEnqueueCommandBufferKHR: fix helper function name

command buffers: add multi device test

cmdbuf: make clRemapCommandBufferKHR actually available

cmdbuf: various bugfixes

cmdbuf: more correct cmdbuf flag handling

tests/cmdbuf: add automatic remapping flag to clRemapCommandBufferKHR

lib: update command buffer code to conform to latest specs

Rework *ndrangekernel & cmdbuf remapping

multidev-cmdbuf: Use correct queues to run cmds

tests: fix memory leaks in multidev cmdbuf test

tests: add missing include
ASAN was making a lot of noise about these
@pjaaskel
Copy link
Member

The arm bots are broken again, pulling this in regardless.

@pjaaskel pjaaskel merged commit 293add4 into pocl:main Apr 18, 2024
34 checks passed
@jansol jansol deleted the multidev_cmdbuf branch April 18, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants