Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution Trace Correlation Support #58

Open
briancoutinho opened this issue Jul 17, 2023 · 0 comments
Open

Execution Trace Correlation Support #58

briancoutinho opened this issue Jul 17, 2023 · 0 comments
Assignees
Labels
feature request New feature request good first issue Good for newcomers

Comments

@briancoutinho
Copy link
Contributor

briancoutinho commented Jul 17, 2023

🚀 Motivation and context

Chakra Execution Traces is an open and interoperable graph-based representation of AI/ML workloads focused on enabling and accelerating AI SW/HW co-design. Chakra execution traces represent key operations, such as compute, memory, and communication, data and control dependencies, timing, and resource constraints. Additionally, Chakra includes a complementary set of tools and capabilities to enable the collection, analysis, generation, and adoption of Chakra ETs by a broad range of simulators, emulators, and replay tools.

Correlating Execution Trace with PyTorch timeline traces will lead to an enriched trace data structure containing

  1. Detailed operator input/output tensor information (from ET).
  2. Dependency edges between operators and modules (from ET).
  3. Timeline (start, duration) information of PyTorch framework as well as GPU kernels (from Kineto).

This unlocks work like critical path analysis, estimation of efficiency improvements for anti-pattern detection, better operator input/output details etc.

Description

We can start correlating Execution Trace and Kineto Trace for single rank.
There are two possible cases for correlation

  1. ET and Kineto trace have overlap i.e collected together. This can be easily handled using record function ID ('rf_id') field.
  2. ET and Kineto are from different times. To correlate here we need to use a tree correlation algorithm. Possible implementation for this already exists in param #PR79

Setup

We propose adding param as a third party dependency for this project, this will import the Execution trace parsing datastructures etc.

Alternatives

Additional context

No response

@briancoutinho briancoutinho added feature request New feature request good first issue Good for newcomers labels Jul 17, 2023
@briancoutinho briancoutinho self-assigned this Jul 17, 2023
facebook-github-bot pushed a commit that referenced this issue Jul 26, 2023
Summary:
## What does this PR do?
Add ability to read and correlate execution trace explained in #58

## Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  - [ ] N/A
- [x] Did you write any new necessary tests?
  - [ ] N/A
- [ ] Did you make sure to update the docs?
  - [x] N/A Feature as a whole is not yet ready, so we can wait till some of the foundational blocks are done
- [ ] Did you update the [changelog](https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/CHANGELOG.md)?
  - [ ] N/A

Pull Request resolved: #57

Reviewed By: anupambhatnagar

Differential Revision: D47805905

Pulled By: briancoutinho

fbshipit-source-id: 291bc0ea891a7ab15c9627a6f867497c29fdf466
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant