-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable launch latency simulations #3721
Enable launch latency simulations #3721
Conversation
# internal cache variables do not show up in GUIs
set(KOKKOS_ENABLE_TOOLS_FENCE ON CACHE INTERNAL "...")
# ... later ...
if(NOT KOKKOS_ENABLE_TOOLS_FENCE)
target_compile_definitions(kokkoscore PRIVATE KOKKOS_DISABLE_TOOLS_FENCE)
endif() |
(w.r.t. to hiding the option so that users do not abuse it) |
Yeah, you can still grep it from cache, though |
Right, so maybe the option should be called |
Or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, and the name is both sufficiently boring and implies that it shouldn't be in production code.
Users will abuse it, but you could have it print something like "warning, I told you not to enable this." Then they will complain because it's printing on every process of a million-process MPI run, but at least you can ask them to read it and inflict a tiny bit of shame on their cold hearts. |
Status: Awaiting further review/merge |
I have a question about this PR. Presumably the fences are there for a good reason. If you remove them, is there anything that will ensure that the launch latency is large enough for the result be correct. |
AFAICT, the fences are only there for the tools to provide meaningful timings but are not actually necessary for programs to work correctly. |
@Rombur , those fences only exist when a tool is loaded (this makes timing much easier, no need to worry about timing asynchronous launches). The plan is to write a tool that sleeps for some number of microseconds. Problem being, if that tool sleeps for microseconds and introduces a fence, it's way too expensive, and doesn't model the real behavior edit: what @masterleinad said ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also go with a CMake
option. Other than that, this is fine with me.
Thanks for the explanation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an issue to solve our "Tools fence" problem properly? If yes can you link to it, if no can you create it? @DavidPoliakoff
@crtrott , I have the shell of an idea. My hope is to have a real user of Graphs, and use that as a vehicle to prototype what such a system should look like. Right now my understanding of the ideas we have around asynchrony are a little too vague, I'd hate to design a V2 that doesn't handle our real use cases and have to then go to a V3 |
Just open an issue and we can start discussions there, collect requirements. |
So this is a weird one. There's a lot of interest in how badly launch latency impacts our performance. The "well, duh" answer is to have a tool that sleeps for some microseconds before a kernel. Problem is that a begin_parallel_for event means a fence, which perturbs the measurement.
So doing this needs an option to remove said fences. I think this option should not be documented, as users will abuse it. But it would be helpful if we could build Kokkos in a way that enables this functionality (with a requirement to just specify the CMAKE_CXX_FLAG so that nothing shows up in our ccmake). This is a PR where if people say "no, we shouldn't add this in" I completely understand, but I wanted to have the discussion.