New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2/n] Critical Path analysis for GPU events and CPU->GPU and GPU->CPU dependencies #68
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Even though it is an experimental feature, it would be nice to have critical path analysis in the documentation on Read The Docs. It would make it easier for users to find the feature and learn about it.
def _add_gpu_cpu_sync_edge(self, gpu_node: CPNode, runtime_eid: int) -> None: | ||
"""Add an edge between gpu_node and the runtime event on CPU""" | ||
_, end_node = self.get_nodes_for_event(runtime_eid) | ||
logger.info(f"Adding a sync edge between nodes {gpu_node} -> {end_node}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be debug level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change the print statements on lines 396 and 397 to use logger.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #68 +/- ##
==========================================
+ Coverage 90.78% 90.94% +0.15%
==========================================
Files 30 30
Lines 2441 2506 +65
==========================================
+ Hits 2216 2279 +63
- Misses 225 227 +2
☔ View full report in Codecov by Sentry. |
@briancoutinho has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@briancoutinho has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@briancoutinho merged this pull request in 7780150. |
What does this PR do?
This is a follow up to #67 that adds the nodes in the critical path analysis graph for GPU kernels
and in addition we also add the kernel launch delay and synchronization events.
Details
We update the graph with the following
a. Context/Device sync= the last kernel on all streams are a dependency for the Context Sync call on CPU.
b. Stream Sync= the last kernel on the specific stream being synced is a dependency for the Stream Sync runtime call on CPU.
Event sync is handled in the next PR.
Test/Examples
Running this on simple add test.
We can see the edges for kernel launch and kernel-kernel delay.
The example below shows Context Sync event (pink on the right), there are two edges coming in, one each from the last kernel on each CUDA stream.
The final critical path shifts from CPU to GPU in the end of the iteration - highlighted events are on critical path.
Before submitting