Enable launch latency simulations #3721

DavidPoliakoff · 2021-01-12T19:09:21Z

So this is a weird one. There's a lot of interest in how badly launch latency impacts our performance. The "well, duh" answer is to have a tool that sleeps for some microseconds before a kernel. Problem is that a begin_parallel_for event means a fence, which perturbs the measurement.

So doing this needs an option to remove said fences. I think this option should not be documented, as users will abuse it. But it would be helpful if we could build Kokkos in a way that enables this functionality (with a requirement to just specify the CMAKE_CXX_FLAG so that nothing shows up in our ccmake). This is a PR where if people say "no, we shouldn't add this in" I completely understand, but I wanted to have the discussion.

jrmadsen · 2021-01-12T21:29:32Z

# internal cache variables do not show up in GUIs
set(KOKKOS_ENABLE_TOOLS_FENCE ON CACHE INTERNAL "...")

# ... later ...
if(NOT KOKKOS_ENABLE_TOOLS_FENCE)
    target_compile_definitions(kokkoscore PRIVATE KOKKOS_DISABLE_TOOLS_FENCE)
endif()

jrmadsen · 2021-01-12T21:31:19Z

(w.r.t. to hiding the option so that users do not abuse it)

DavidPoliakoff · 2021-01-12T21:35:56Z

Yeah, you can still grep it from cache, though

jrmadsen · 2021-01-12T22:24:12Z

Right, so maybe the option should be called Kokkos_ENABLE_NOTHING_TO_SEE_HERE

DavidPoliakoff · 2021-01-12T22:38:36Z

Or KOKKOS_IMPL_SIMULATE_LAUNCH_LATENCY, that sounds boring

nliber

Looks fine, and the name is both sufficiently boring and implies that it shouldn't be in production code.

mhoemmen · 2021-01-13T03:16:51Z

Users will abuse it, but you could have it print something like "warning, I told you not to enable this." Then they will complain because it's printing on every process of a million-process MPI run, but at least you can ask them to read it and inflict a tiny bit of shame on their cold hearts.

DavidPoliakoff · 2021-01-13T18:11:57Z

Status: Awaiting further review/merge

Rombur · 2021-01-13T18:57:32Z

I have a question about this PR. Presumably the fences are there for a good reason. If you remove them, is there anything that will ensure that the launch latency is large enough for the result be correct.

masterleinad · 2021-01-13T19:29:08Z

I have a question about this PR. Presumably the fences are there for a good reason. If you remove them, is there anything that will ensure that the launch latency is large enough for the result be correct.

AFAICT, the fences are only there for the tools to provide meaningful timings but are not actually necessary for programs to work correctly.

DavidPoliakoff · 2021-01-13T19:29:14Z

@Rombur , those fences only exist when a tool is loaded (this makes timing much easier, no need to worry about timing asynchronous launches). The plan is to write a tool that sleeps for some number of microseconds. Problem being, if that tool sleeps for microseconds and introduces a fence, it's way too expensive, and doesn't model the real behavior

edit: what @masterleinad said ;)

masterleinad

I would also go with a CMake option. Other than that, this is fine with me.

Rombur · 2021-01-13T19:39:15Z

Thanks for the explanation

crtrott

Do we have an issue to solve our "Tools fence" problem properly? If yes can you link to it, if no can you create it? @DavidPoliakoff

DavidPoliakoff · 2021-01-13T19:48:11Z

@crtrott , I have the shell of an idea. My hope is to have a real user of Graphs, and use that as a vehicle to prototype what such a system should look like. Right now my understanding of the ideas we have around asynchrony are a little too vague, I'd hate to design a V2 that doesn't handle our real use cases and have to then go to a V3

crtrott · 2021-01-13T19:50:01Z

Just open an issue and we can start discussions there, collect requirements.

DavidPoliakoff · 2021-01-13T20:30:28Z

@crtrott, done: #3723

Enable launch latency simulations

5bdf605

nliber approved these changes Jan 13, 2021

View reviewed changes

masterleinad reviewed Jan 13, 2021

View reviewed changes

Rombur approved these changes Jan 13, 2021

View reviewed changes

crtrott approved these changes Jan 13, 2021

View reviewed changes

dalg24 merged commit 72ee18b into kokkos:develop Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable launch latency simulations #3721

Enable launch latency simulations #3721

DavidPoliakoff commented Jan 12, 2021

jrmadsen commented Jan 12, 2021

jrmadsen commented Jan 12, 2021

DavidPoliakoff commented Jan 12, 2021

jrmadsen commented Jan 12, 2021

DavidPoliakoff commented Jan 12, 2021

nliber left a comment

mhoemmen commented Jan 13, 2021

DavidPoliakoff commented Jan 13, 2021

Rombur commented Jan 13, 2021

masterleinad commented Jan 13, 2021

DavidPoliakoff commented Jan 13, 2021 •

edited

masterleinad left a comment

Rombur commented Jan 13, 2021

crtrott left a comment

DavidPoliakoff commented Jan 13, 2021

crtrott commented Jan 13, 2021

DavidPoliakoff commented Jan 13, 2021

Enable launch latency simulations #3721

Enable launch latency simulations #3721

Conversation

DavidPoliakoff commented Jan 12, 2021

jrmadsen commented Jan 12, 2021

jrmadsen commented Jan 12, 2021

DavidPoliakoff commented Jan 12, 2021

jrmadsen commented Jan 12, 2021

DavidPoliakoff commented Jan 12, 2021

nliber left a comment

Choose a reason for hiding this comment

mhoemmen commented Jan 13, 2021

DavidPoliakoff commented Jan 13, 2021

Rombur commented Jan 13, 2021

masterleinad commented Jan 13, 2021

DavidPoliakoff commented Jan 13, 2021 • edited

masterleinad left a comment

Choose a reason for hiding this comment

Rombur commented Jan 13, 2021

crtrott left a comment

Choose a reason for hiding this comment

DavidPoliakoff commented Jan 13, 2021

crtrott commented Jan 13, 2021

DavidPoliakoff commented Jan 13, 2021

DavidPoliakoff commented Jan 13, 2021 •

edited