Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Replay specific extracted enqueued OpenCL kernel(s) (ranges) #290

Closed
Novermars opened this issue Jan 5, 2023 · 3 comments

Comments

@Novermars
Copy link
Contributor

Introduction

For some scenarios where you don't have access to a program's sources, it can be very convenient to be able to replay part of the program's execution and compare it between platforms/devices.

It is already possible to determine where there is divergence between two runs of the same program on different platforms/devices, by extracting the buffers after each kernel launch via the CLI and comparing them (e.g. via a hash).

The next step would be to specifically look at this kernel and to run it in isolation to further debug any issues (such as code-gen), by comparing the results of running the kernel on different platforms/devices.

Currently, this is a manual process, from extracting the individual parts (kernel source, input buffers, input arguments etc) to writing a small C, C++ or Python program to be compiled and executed.

This process should be able to be fully automated, since all the information is already extracted by the CLI and just have to be put together properly.

Loose requirements

  • Be able to specify an enqueued kernel number for which a standalone program (C or C++ source file + CMakeLists.txt, or Python script) with the input buffers and arguments should be generated
  • Possibly extend to a continuous range of kernels
  • All the files should be put into a separate folder for convenient sharing between different systems

Potential design choices

  • C/C++/Python for the replay-program
  • Generate replay program sources directly in the CLI, or just generate some meta-data and have a Python program do this

Visible change to the CLI

  • Add a control which allows the user to specify an enqueued kernel number for which is a replay-program should be generated

I'm more than willing to do the work for this one and to make a pull request, but if this would be a 10 min job for you then I won't stop you :)

@bashbaug
Copy link
Contributor

bashbaug commented Jan 6, 2023

Hello! I think this would be an awesome enhancement and this is something I've been interested in for a long time - see slide 26 from my IWOCL 2018 presentation.

It'd be great to have your help getting this implemented. I'd be happy to help out where I can, but it's definitely not a 10 minute job :)

A few thoughts and suggestions:

  1. The trickiest part is going to be tracking the different kernel argument values because this is one of the few things that is not queryable in OpenCL. We'll need to track the values of each of the kernel arguments, at least up to the kernel enqueue(s) we want to capture and replay. We could use the existing "enqueue number" to identify which kernel to capture, or we could use some other criteria.
  2. It'd be nice if this feature did not rely on kernel argument reflection via clGetKernelArgInfo because this is only valid for kernels compiled from source, but requiring kernel argument reflection would be fine for an initial implementation.
  3. I'd suggest handling OpenCL buffers and pass-by-value ("scalar") kernel arguments first, since this is likely sufficient for many of the interesting cases. Next, handle USM/SVM, though do note that some USM/SVM scenarios will not work properly because the USM/SVM addresses will be different during playback. Finally, handle any other types of kernel arguments, e.g. images, samplers, pipes, etc, and these only if needed since they're likely to be the most tricky.
  4. There's some infrastructure work we'll need to do regardless, but it'd still be good to nail down how the replay program might look sooner rather than later.
  5. I'd be happy to review and merge smaller, incremental PRs, rather than one mega PR, so no need to wait until everything is "finished".
  6. I'm on the KhronosDevs slack if that'd be more convenient for coordination.

Don't hesitate to reach out if you have any additional questions - Thanks!

@Novermars
Copy link
Contributor Author

Novermars commented Jan 6, 2023

Great! I'll be on vacation next week, but afterwards I'll give it a go. I agree with your points, let's do this step by step and the easy parts (i.e. just buffers) first.

The trickiest part is going to be tracking the different kernel argument values because this is one of the few things that is not queryable in OpenCL.

What do you mean by this? It is already possible to get the buffers and scalar arguments in binary form, what would prevent us to just immediately use these values? Edit: Aah, I think you mean for the USM/SVM case, then I see the problem indeed.

There's some infrastructure work we'll need to do regardless, but it'd still be good to nail down how the replay program might look sooner rather than later.

After thinking about it a bit more, I think going the Python route (via pyopencl). would be the most user friendliest. At least for my use case (possible testing on many different machines), it's much more likely/convenient to have/install Python, than having to install a whole compiler C or C++ toolchain (especially on Windows).

@bashbaug
Copy link
Contributor

bashbaug commented Apr 5, 2023

I just merged #294 🎉.

Is there more we should track in this issue or should we close it? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants