New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3/N] Add critical path analysis GPU->GPU sync dependencies #69
Conversation
@briancoutinho has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
.sort_values(by="ts", axis=0) | ||
) | ||
|
||
def previous_launch(ts: int, pid: int, tid: int) -> Optional[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to _previous_launch
?
def previous_launch(ts: int, pid: int, tid: int) -> Optional[int]: | ||
"""Find the previous CUDA launch on same pid and tid""" | ||
df = runtime_calls.query(f"pid == {pid} and tid == {tid}") | ||
lowerneighbours = df[df["ts"] < ts]["ts"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*nit - lower_neighbors
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #69 +/- ##
==========================================
+ Coverage 90.94% 91.06% +0.11%
==========================================
Files 30 30
Lines 2507 2562 +55
==========================================
+ Hits 2280 2333 +53
- Misses 227 229 +2
☔ View full report in Codecov by Sentry. |
@briancoutinho has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@briancoutinho merged this pull request in 66e3c54. |
What does this PR do?
This final PR adds the smarts to infer inter stream (GPU->GPU) synchronization dependencies.
How it works?
The core part of the logic is to understand synchronization due to CUDA events.
_get_cuda_event_record_df()
(Happy to discuss in person - this was convoluted)
Other changes
get_runtime_launch_events_query()
. This is used in kernel analyzer as well.Test:
Below is an example of inter GPU stream sync, see arrow between events.
Before submitting