Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In order to limit size of the task graph that is processed by the runtime, it is important to not consider operations that have already completed.
Previously, these queries happened at multiple places using
dag_node::is_complete()
, which will invoke backend queries (e.g.hipEventQuery
) if the event is not complete. Once such a backend query returns that the event has completed, this information is cached and backend queries are no longer invoked.These backend queries can negatively impact submission latencies.
This PR restricts invoking
dag_node::is_complete()
to a single point in the beginning submission process. All other places just query the cache usingdag_node::is_known_complete()
. Because theis_complete()
invocation at the beginning will update the cache, this should have very little impact on the optimizations that the runtime can do, while potentially significantly reducing the amount of backend event queries.Using SYCL-Bench's sequential DAG task throughput benchmark, I see a roughly 20% higher task throughput with this PR.
@al42and @pszi1ard - this might be interesting for you.