New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to extract specified kernels from app so that they can be replayed independently #294
Conversation
…eplayed independently
…of(void*) Python script handles the duplicate arguments
… dump them when dumping replayable kernels
Adds a script that automatically captures a specified kernel and then validates it by comparing the replayed results to the dumped results
Fixed typo (Bump->Dump) Now works on Windows and Linux
Getting the binaries from the cl_program (or cl_kernel with an Intel extension) has proved to be very unreliable. This version saves the device binaries which the program inputs into clCreateProgramWithBinary(). The python script then uses these binaries to build the program
I just started having a look at this and in general it looks really great - thank you for your contribution! I'll try to have a more thorough review in the next couple days. Thanks again! |
Will only dump the first time it is encountered
@bashbaug I just added functionality to dump a kernel by name instead of by an enqueue number, should be useful if you're just interested in replaying the kernel instead of a specific kernel. You can also now specify how often you want to run a kernel with the run.py script, for better profiling :) |
Also added functionality for when buffers alias each other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, this all looks really good even though I have a few comments.
Let's see if we can fix some of the simpler issues before merging. It's fine not to fix all of them right now. The sooner we can merge these changes the sooner people can start using it and finding bugs and the sooner we can fix them. 😄
My biggest priority is keeping things fast and robust when the replay controls are not enabled, if that helps.
I had a few warnings when I built on Linux as well, mostly for -Wparentheses. Might be worth checking the CI build logs (when they finally complete, things have been running slow recently...). Example:
|
I just had a look at the warnings on my linux machine, there I can indeed see them! Consider them fixed. I'll push some changes already this evening (Germany time) and will do the rest tomorrow, taking your feedback on some questions in consideration :) |
These changes are not final yet and incomplete
For some reason, I now run into segfaults for certain apps, since I've switched to I'll revert back to manual memory management to see if that fixes the problem, then one by one go back to the modern variants while looking at my test suite. |
Okay, this should work again. I have no clue why, but seems like one of the @bashbaug It should be ready for merging now, assuming C++17 is okay for you :) |
Awesome! I'll do a few final spot checks and aim to merge this by EOD. Thank you! |
(Partially) Implements #290
Description of Changes
First iteration to add functionality to extract specified kernels from app so that they can be replayed independently, currently WIP due to some bugs and missing features.
Things that have to be done/clarified before it can be merged
run.py
manually to the folder where the other files for the replaying, make it automaticallyclCreateProgramWithSource
, building from SPIRV functionality is tbd.Testing Done
Tried with a few basic OpenCL examples on Ubuntu 20.04, partially opening this PR so that I can test more complex programs.