Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/n] Critical path analysis CPU operator events #67

Closed
wants to merge 6 commits into from

Conversation

briancoutinho
Copy link
Contributor

What does this PR do?

Our initial implementation of Critical Path analysis does not consider dependency between PyTorch operators, this requires combining the trace with Chakra Execution Traces and processing Tensor information.

The key idea here is to simplify the dependency analysis between PyTorch operators

  1. We assume that all CPU operators are dependent serially on the last operator that ran on a CPU thread. This frees us from attempting to correlate with Execution Traces.
  2. The operator dependency part can be added back later.

Implementation

In a nutshell, computing the critical path involves 1) constructing a weighted DAG connecting all the operations, 2) finding the longest path in this DAG. The challenging part is constructing the DAG here.

Nodes: The Nodes in the critical path graph represent points in time. Each operator/kernel thus has two nodes viz. a begin and an end node. In case of nested operators we also link the nodes in the order they appear in the callstack.

Edges in this DAG can be one of two types

  1. Timing edges (weight = time): include durations for the operators/kernels as well as delays to launch operators between CPU and GPU.
  2. Dependency edges (weight = 0): do not have a time component but show a dependency between operations themselves. This includes data dependencies and synchronization between CPU and GPU.

Before submitting

Note that this feature is not fully implemented, documented and change log updates will come in later commits.

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
    • N/A
  • Did you write any new necessary tests?
    • N/A
  • Did you make sure to update the docs?
    • N/A
  • Did you update the changelog?
    • N/A

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 18, 2023
@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2023

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (3fe0cd2) 90.50% compared to head (964987d) 90.78%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #67      +/-   ##
==========================================
+ Coverage   90.50%   90.78%   +0.27%     
==========================================
  Files          29       30       +1     
  Lines        2160     2441     +281     
==========================================
+ Hits         1955     2216     +261     
- Misses        205      225      +20     
Files Coverage Δ
hta/analyzers/cuda_kernel_analysis.py 90.97% <100.00%> (ø)
hta/common/call_stack.py 90.00% <100.00%> (+0.65%) ⬆️
hta/common/trace.py 88.54% <100.00%> (+1.16%) ⬆️
hta/trace_analysis.py 96.59% <100.00%> (+0.20%) ⬆️
tests/test_trace_analysis.py 99.54% <98.27%> (-0.46%) ⬇️
hta/analyzers/critical_path_analysis.py 89.23% <89.23%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@anupambhatnagar anupambhatnagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comments.

hta/common/call_stack.py Outdated Show resolved Hide resolved
hta/common/call_stack.py Outdated Show resolved Hide resolved
hta/common/trace.py Outdated Show resolved Hide resolved
hta/trace_analysis.py Outdated Show resolved Hide resolved
hta/trace_analysis.py Show resolved Hide resolved
hta/trace_analysis.py Outdated Show resolved Hide resolved
hta/trace_analysis.py Outdated Show resolved Hide resolved
hta/trace_analysis.py Outdated Show resolved Hide resolved
Args:
rank (int): rank to generate the time series for.
critical_path_graph: Critical Path Graph object generated previously
output_dir (str): Output director to store overlaid trace.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit typo director

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/director/directory

hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/trace_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Outdated Show resolved Hide resolved
hta/analyzers/critical_path_analysis.py Show resolved Hide resolved
For example, you can use this to limit the analysis to one iteration
by passing annotation='ProfilerStep500'.
by passing annotation='ProfilerStep500'. See notes for how to pick the iteraiton.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick need to fix typo

@anupambhatnagar
Copy link
Contributor

@briancoutinho
A couple of tests are failing. Please fix them.

In critical_path_analysis.ipynb: Section title has a typo - "More details on the CPGrah object"

@facebook-github-bot
Copy link
Contributor

@briancoutinho has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@briancoutinho has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@briancoutinho merged this pull request in 8084627.

facebook-github-bot pushed a commit that referenced this pull request Oct 6, 2023
…dencies (#68)

Summary:
## What does this PR do?
This is a follow up to #67 that adds the nodes in the critical path analysis graph for GPU kernels
and in addition we also add the kernel launch delay and synchronization events.

## Details
We update the graph with the following
1. GPU kernels have two nods - begin, end and an edge connecting them.
2. Kernel launch delay - if the CUDA runtime launched a kernel when there was no outstanding kernel on the same stream, we add an edge for the delay in the critical path with the weight = launch delay.
3. Kernel kernel delay - the else case of 2, we add an implicit dependency on the last CUDA kernel executed on the stream.
4. Synchronization - using CUDA sync events we track
a. Context/Device sync= the last kernel on all streams are a dependency for the Context Sync call on CPU.
b. Stream Sync= the last kernel on the specific stream being synced is a dependency for the Stream Sync runtime call on CPU.

Event sync is handled in the next PR.

## Test/Examples

Running this on simple add test.
We can see the edges for kernel launch and kernel-kernel delay.
![Screenshot 2023-10-05 at 11 04 36 AM](https://github.com/facebookresearch/HolisticTraceAnalysis/assets/6922212/85171347-ff50-4c23-9e60-f5d3f6f3fefc)

The example below shows Context Sync event (pink on the right), there are two edges coming in, one each from the last kernel on each CUDA stream.

![Screenshot 2023-10-05 at 11 05 16 AM](https://github.com/facebookresearch/HolisticTraceAnalysis/assets/6922212/78f9a33f-769c-43b2-bd37-982d12b28bd1)

The final critical path shifts from CPU to GPU in the end of the iteration - highlighted events are on critical path.

![Screenshot 2023-10-05 at 11 06 27 AM](https://github.com/facebookresearch/HolisticTraceAnalysis/assets/6922212/9da54aec-9eb1-4244-99b4-b03a92e51d2b)

## Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  - [x] N/A
- [x] Did you write any new necessary tests?
  - [ ] N/A
- [ ] Did you make sure to update the docs?
  - [x] N/A
- [] Did you update the [changelog](https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/CHANGELOG.md)?
  - [x] N/A

Pull Request resolved: #68

Reviewed By: anupambhatnagar

Differential Revision: D49978363

Pulled By: briancoutinho

fbshipit-source-id: 6ce218fb72c3f1b382a175198169b698b3de7018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants