Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict queries of event state #750

Merged
merged 2 commits into from
Jun 16, 2022
Merged

Conversation

illuhad
Copy link
Collaborator

@illuhad illuhad commented Jun 15, 2022

In order to limit size of the task graph that is processed by the runtime, it is important to not consider operations that have already completed.
Previously, these queries happened at multiple places using dag_node::is_complete(), which will invoke backend queries (e.g. hipEventQuery) if the event is not complete. Once such a backend query returns that the event has completed, this information is cached and backend queries are no longer invoked.

These backend queries can negatively impact submission latencies.

This PR restricts invoking dag_node::is_complete() to a single point in the beginning submission process. All other places just query the cache using dag_node::is_known_complete(). Because the is_complete() invocation at the beginning will update the cache, this should have very little impact on the optimizations that the runtime can do, while potentially significantly reducing the amount of backend event queries.

Using SYCL-Bench's sequential DAG task throughput benchmark, I see a roughly 20% higher task throughput with this PR.

@al42and @pszi1ard - this might be interesting for you.

@sbalint98
Copy link
Collaborator

For me, the end-to-end performance seems to be around ~5% faster on small GROMACS benchmarks. Unfortunately, measurements are slightly noisy.

@illuhad
Copy link
Collaborator Author

illuhad commented Jun 16, 2022

@sbalint98 Thanks for the feedback - I am not done yet, this just a really low-hanging fruit ;) Hopefully we will get much more than 5% eventually.

@illuhad illuhad merged commit 7102e5e into develop Jun 16, 2022
@illuhad illuhad deleted the feature/restrict-event-queries branch June 16, 2022 15:24
@pszi1ard
Copy link

@illuhad thanks for the heads up.
@sbalint98 thanks for testing this, for reference can you please describe the use-case where you saw the 5%?

@sbalint98
Copy link
Collaborator

I noticed the performance improvement on the adh_dodec benchmark (109k atoms) example from here: https://github.com/jychang48/benchmark-gromacs with thread MPI and 28 omp threads. The test system had a MI100. As said the results were quite noisy but I was able to attain higher performance with this change previous best was 121.63 ns/day new best became 127 ns/day. Let me know if I can provide further information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants