Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable launch latency simulations #3721

Merged

Conversation

DavidPoliakoff
Copy link
Contributor

So this is a weird one. There's a lot of interest in how badly launch latency impacts our performance. The "well, duh" answer is to have a tool that sleeps for some microseconds before a kernel. Problem is that a begin_parallel_for event means a fence, which perturbs the measurement.

So doing this needs an option to remove said fences. I think this option should not be documented, as users will abuse it. But it would be helpful if we could build Kokkos in a way that enables this functionality (with a requirement to just specify the CMAKE_CXX_FLAG so that nothing shows up in our ccmake). This is a PR where if people say "no, we shouldn't add this in" I completely understand, but I wanted to have the discussion.

@jrmadsen
Copy link
Contributor

# internal cache variables do not show up in GUIs
set(KOKKOS_ENABLE_TOOLS_FENCE ON CACHE INTERNAL "...")

# ... later ...
if(NOT KOKKOS_ENABLE_TOOLS_FENCE)
    target_compile_definitions(kokkoscore PRIVATE KOKKOS_DISABLE_TOOLS_FENCE)
endif()

@jrmadsen
Copy link
Contributor

(w.r.t. to hiding the option so that users do not abuse it)

@DavidPoliakoff
Copy link
Contributor Author

Yeah, you can still grep it from cache, though

@jrmadsen
Copy link
Contributor

Right, so maybe the option should be called Kokkos_ENABLE_NOTHING_TO_SEE_HERE

@DavidPoliakoff
Copy link
Contributor Author

Or KOKKOS_IMPL_SIMULATE_LAUNCH_LATENCY, that sounds boring

Copy link
Contributor

@nliber nliber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, and the name is both sufficiently boring and implies that it shouldn't be in production code.

@mhoemmen
Copy link
Contributor

Users will abuse it, but you could have it print something like "warning, I told you not to enable this." Then they will complain because it's printing on every process of a million-process MPI run, but at least you can ask them to read it and inflict a tiny bit of shame on their cold hearts.

@DavidPoliakoff
Copy link
Contributor Author

Status: Awaiting further review/merge

@Rombur
Copy link
Member

Rombur commented Jan 13, 2021

I have a question about this PR. Presumably the fences are there for a good reason. If you remove them, is there anything that will ensure that the launch latency is large enough for the result be correct.

@masterleinad
Copy link
Contributor

I have a question about this PR. Presumably the fences are there for a good reason. If you remove them, is there anything that will ensure that the launch latency is large enough for the result be correct.

AFAICT, the fences are only there for the tools to provide meaningful timings but are not actually necessary for programs to work correctly.

@DavidPoliakoff
Copy link
Contributor Author

DavidPoliakoff commented Jan 13, 2021

@Rombur , those fences only exist when a tool is loaded (this makes timing much easier, no need to worry about timing asynchronous launches). The plan is to write a tool that sleeps for some number of microseconds. Problem being, if that tool sleeps for microseconds and introduces a fence, it's way too expensive, and doesn't model the real behavior

edit: what @masterleinad said ;)

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also go with a CMake option. Other than that, this is fine with me.

@Rombur
Copy link
Member

Rombur commented Jan 13, 2021

Thanks for the explanation

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue to solve our "Tools fence" problem properly? If yes can you link to it, if no can you create it? @DavidPoliakoff

@DavidPoliakoff
Copy link
Contributor Author

@crtrott , I have the shell of an idea. My hope is to have a real user of Graphs, and use that as a vehicle to prototype what such a system should look like. Right now my understanding of the ideas we have around asynchrony are a little too vague, I'd hate to design a V2 that doesn't handle our real use cases and have to then go to a V3

@crtrott
Copy link
Member

crtrott commented Jan 13, 2021

Just open an issue and we can start discussions there, collect requirements.

@dalg24 dalg24 merged commit 72ee18b into kokkos:develop Jan 13, 2021
@DavidPoliakoff
Copy link
Contributor Author

@crtrott, done: #3723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants