-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Replay specific extracted enqueued OpenCL kernel(s) (ranges) #290
Comments
Hello! I think this would be an awesome enhancement and this is something I've been interested in for a long time - see slide 26 from my IWOCL 2018 presentation. It'd be great to have your help getting this implemented. I'd be happy to help out where I can, but it's definitely not a 10 minute job :) A few thoughts and suggestions:
Don't hesitate to reach out if you have any additional questions - Thanks! |
Great! I'll be on vacation next week, but afterwards I'll give it a go. I agree with your points, let's do this step by step and the easy parts (i.e. just buffers) first.
What do you mean by this? It is already possible to get the buffers and scalar arguments in binary form, what would prevent us to just immediately use these values? Edit: Aah, I think you mean for the USM/SVM case, then I see the problem indeed.
After thinking about it a bit more, I think going the Python route (via pyopencl). would be the most user friendliest. At least for my use case (possible testing on many different machines), it's much more likely/convenient to have/install Python, than having to install a whole compiler C or C++ toolchain (especially on Windows). |
I just merged #294 🎉. Is there more we should track in this issue or should we close it? Thanks! |
Introduction
For some scenarios where you don't have access to a program's sources, it can be very convenient to be able to replay part of the program's execution and compare it between platforms/devices.
It is already possible to determine where there is divergence between two runs of the same program on different platforms/devices, by extracting the buffers after each kernel launch via the CLI and comparing them (e.g. via a hash).
The next step would be to specifically look at this kernel and to run it in isolation to further debug any issues (such as code-gen), by comparing the results of running the kernel on different platforms/devices.
Currently, this is a manual process, from extracting the individual parts (kernel source, input buffers, input arguments etc) to writing a small C, C++ or Python program to be compiled and executed.
This process should be able to be fully automated, since all the information is already extracted by the CLI and just have to be put together properly.
Loose requirements
Potential design choices
Visible change to the CLI
I'm more than willing to do the work for this one and to make a pull request, but if this would be a 10 min job for you then I won't stop you :)
The text was updated successfully, but these errors were encountered: