New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/n] Critical path analysis CPU operator events #67
Conversation
… the notobook paths
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #67 +/- ##
==========================================
+ Coverage 90.50% 90.78% +0.27%
==========================================
Files 29 30 +1
Lines 2160 2441 +281
==========================================
+ Hits 1955 2216 +261
- Misses 205 225 +20
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comments.
hta/trace_analysis.py
Outdated
Args: | ||
rank (int): rank to generate the time series for. | ||
critical_path_graph: Critical Path Graph object generated previously | ||
output_dir (str): Output director to store overlaid trace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo director
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/director/directory
hta/trace_analysis.py
Outdated
For example, you can use this to limit the analysis to one iteration | ||
by passing annotation='ProfilerStep500'. | ||
by passing annotation='ProfilerStep500'. See notes for how to pick the iteraiton. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick need to fix typo
@briancoutinho In critical_path_analysis.ipynb: Section title has a typo - "More details on the CPGrah object" |
@briancoutinho has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@briancoutinho has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@briancoutinho merged this pull request in 8084627. |
…dencies (#68) Summary: ## What does this PR do? This is a follow up to #67 that adds the nodes in the critical path analysis graph for GPU kernels and in addition we also add the kernel launch delay and synchronization events. ## Details We update the graph with the following 1. GPU kernels have two nods - begin, end and an edge connecting them. 2. Kernel launch delay - if the CUDA runtime launched a kernel when there was no outstanding kernel on the same stream, we add an edge for the delay in the critical path with the weight = launch delay. 3. Kernel kernel delay - the else case of 2, we add an implicit dependency on the last CUDA kernel executed on the stream. 4. Synchronization - using CUDA sync events we track a. Context/Device sync= the last kernel on all streams are a dependency for the Context Sync call on CPU. b. Stream Sync= the last kernel on the specific stream being synced is a dependency for the Stream Sync runtime call on CPU. Event sync is handled in the next PR. ## Test/Examples Running this on simple add test. We can see the edges for kernel launch and kernel-kernel delay. ![Screenshot 2023-10-05 at 11 04 36 AM](https://github.com/facebookresearch/HolisticTraceAnalysis/assets/6922212/85171347-ff50-4c23-9e60-f5d3f6f3fefc) The example below shows Context Sync event (pink on the right), there are two edges coming in, one each from the last kernel on each CUDA stream. ![Screenshot 2023-10-05 at 11 05 16 AM](https://github.com/facebookresearch/HolisticTraceAnalysis/assets/6922212/78f9a33f-769c-43b2-bd37-982d12b28bd1) The final critical path shifts from CPU to GPU in the end of the iteration - highlighted events are on critical path. ![Screenshot 2023-10-05 at 11 06 27 AM](https://github.com/facebookresearch/HolisticTraceAnalysis/assets/6922212/9da54aec-9eb1-4244-99b4-b03a92e51d2b) ## Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] N/A - [x] Did you write any new necessary tests? - [ ] N/A - [ ] Did you make sure to update the docs? - [x] N/A - [] Did you update the [changelog](https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/CHANGELOG.md)? - [x] N/A Pull Request resolved: #68 Reviewed By: anupambhatnagar Differential Revision: D49978363 Pulled By: briancoutinho fbshipit-source-id: 6ce218fb72c3f1b382a175198169b698b3de7018
What does this PR do?
Our initial implementation of Critical Path analysis does not consider dependency between PyTorch operators, this requires combining the trace with Chakra Execution Traces and processing Tensor information.
The key idea here is to simplify the dependency analysis between PyTorch operators
Implementation
In a nutshell, computing the critical path involves 1) constructing a weighted DAG connecting all the operations, 2) finding the longest path in this DAG. The challenging part is constructing the DAG here.
Nodes: The Nodes in the critical path graph represent points in time. Each operator/kernel thus has two nodes viz. a begin and an end node. In case of nested operators we also link the nodes in the order they appear in the callstack.
Edges in this DAG can be one of two types
Before submitting
Note that this feature is not fully implemented, documented and change log updates will come in later commits.