Feature request: Replay specific extracted enqueued OpenCL kernel(s) (ranges) #290

Novermars · 2023-01-05T14:05:41Z

Introduction

For some scenarios where you don't have access to a program's sources, it can be very convenient to be able to replay part of the program's execution and compare it between platforms/devices.

It is already possible to determine where there is divergence between two runs of the same program on different platforms/devices, by extracting the buffers after each kernel launch via the CLI and comparing them (e.g. via a hash).

The next step would be to specifically look at this kernel and to run it in isolation to further debug any issues (such as code-gen), by comparing the results of running the kernel on different platforms/devices.

Currently, this is a manual process, from extracting the individual parts (kernel source, input buffers, input arguments etc) to writing a small C, C++ or Python program to be compiled and executed.

This process should be able to be fully automated, since all the information is already extracted by the CLI and just have to be put together properly.

Loose requirements

Be able to specify an enqueued kernel number for which a standalone program (C or C++ source file + CMakeLists.txt, or Python script) with the input buffers and arguments should be generated
Possibly extend to a continuous range of kernels
All the files should be put into a separate folder for convenient sharing between different systems

Potential design choices

C/C++/Python for the replay-program
Generate replay program sources directly in the CLI, or just generate some meta-data and have a Python program do this

Visible change to the CLI

Add a control which allows the user to specify an enqueued kernel number for which is a replay-program should be generated

I'm more than willing to do the work for this one and to make a pull request, but if this would be a 10 min job for you then I won't stop you :)

bashbaug · 2023-01-06T06:31:25Z

Hello! I think this would be an awesome enhancement and this is something I've been interested in for a long time - see slide 26 from my IWOCL 2018 presentation.

It'd be great to have your help getting this implemented. I'd be happy to help out where I can, but it's definitely not a 10 minute job :)

A few thoughts and suggestions:

The trickiest part is going to be tracking the different kernel argument values because this is one of the few things that is not queryable in OpenCL. We'll need to track the values of each of the kernel arguments, at least up to the kernel enqueue(s) we want to capture and replay. We could use the existing "enqueue number" to identify which kernel to capture, or we could use some other criteria.
It'd be nice if this feature did not rely on kernel argument reflection via clGetKernelArgInfo because this is only valid for kernels compiled from source, but requiring kernel argument reflection would be fine for an initial implementation.
I'd suggest handling OpenCL buffers and pass-by-value ("scalar") kernel arguments first, since this is likely sufficient for many of the interesting cases. Next, handle USM/SVM, though do note that some USM/SVM scenarios will not work properly because the USM/SVM addresses will be different during playback. Finally, handle any other types of kernel arguments, e.g. images, samplers, pipes, etc, and these only if needed since they're likely to be the most tricky.
There's some infrastructure work we'll need to do regardless, but it'd still be good to nail down how the replay program might look sooner rather than later.
I'd be happy to review and merge smaller, incremental PRs, rather than one mega PR, so no need to wait until everything is "finished".
I'm on the KhronosDevs slack if that'd be more convenient for coordination.

Don't hesitate to reach out if you have any additional questions - Thanks!

Novermars · 2023-01-06T11:42:41Z

Great! I'll be on vacation next week, but afterwards I'll give it a go. I agree with your points, let's do this step by step and the easy parts (i.e. just buffers) first.

The trickiest part is going to be tracking the different kernel argument values because this is one of the few things that is not queryable in OpenCL.

What do you mean by this? It is already possible to get the buffers and scalar arguments in binary form, what would prevent us to just immediately use these values? Edit: Aah, I think you mean for the USM/SVM case, then I see the problem indeed.

There's some infrastructure work we'll need to do regardless, but it'd still be good to nail down how the replay program might look sooner rather than later.

After thinking about it a bit more, I think going the Python route (via pyopencl). would be the most user friendliest. At least for my use case (possible testing on many different machines), it's much more likely/convenient to have/install Python, than having to install a whole compiler C or C++ toolchain (especially on Windows).

bashbaug · 2023-04-05T05:19:02Z

I just merged #294 🎉.

Is there more we should track in this issue or should we close it? Thanks!

Novermars mentioned this issue Feb 22, 2023

Add functionality to extract specified kernels from app so that they can be replayed independently #294

Merged

Novermars closed this as completed Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Replay specific extracted enqueued OpenCL kernel(s) (ranges) #290

Feature request: Replay specific extracted enqueued OpenCL kernel(s) (ranges) #290

Novermars commented Jan 5, 2023

bashbaug commented Jan 6, 2023

Novermars commented Jan 6, 2023 •

edited

bashbaug commented Apr 5, 2023

Feature request: Replay specific extracted enqueued OpenCL kernel(s) (ranges) #290

Feature request: Replay specific extracted enqueued OpenCL kernel(s) (ranges) #290

Comments

Novermars commented Jan 5, 2023

Introduction

Loose requirements

Potential design choices

Visible change to the CLI

bashbaug commented Jan 6, 2023

Novermars commented Jan 6, 2023 • edited

bashbaug commented Apr 5, 2023

Novermars commented Jan 6, 2023 •

edited