Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the expected way to collect pytorch execution trace? #3

Closed
chenqianfzh opened this issue Oct 19, 2023 · 1 comment
Closed

Comments

@chenqianfzh
Copy link

wonder whether you can show how to get the pytorch execution trace output that Chakra will take and convert?

I tried to collect the trace using the default trace handler, torch.profiler.tensorboard_trace_handler, and the torch.jit.trace(). The outputs from both trials are very different from what pytorch2chakra_converter would expect.

Thanks.

@TaekyungHeo
Copy link
Contributor

Hi @chenqianfzh,

Thank you for reaching out. While there is a converter from PyTorch execution traces to Chakra execution traces in Chakra (et_converter), it was previously incomplete.

To address this, changes have been made across three repositories: PARAM, Chakra, and ASTRA-sim:

  • In PARAM, the trace_link.py tool is designed to merge PyTorch execution traces (covering CPU operators) with Kineto traces (focused on GPU operators), resulting in a unified execution file (View changes here)
  • The PyTorch ET to Chakra ET converter has received significant enhancements, allowing it to seamlessly bridge and differentiate between GPU and CPU operations (View changes here)
  • Additionally, ASTRA-sim has been updated to better distinguish between CPU and GPU operations (View changes here)

We will merge these changes once we confirm they work as expected.

For the next steps, you'll need to collect both the PyTorch execution traces and Kineto execution traces during the model's execution. Once gathered, use trace_link.py to combine the PyTorch ET with the Kineto ET. This merged trace can then be fed into the converter to produce a simulation-compatible Chakra execution trace. Please refer to the attached figure for clarification:

Screenshot 2023-10-23 at 9 14 07 AM

Lastly, Saeed from HP labs has shared some files to demonstrate the collection of PyTorch execution traces and Kineto traces. I recommend checking them out: examples.tgz

@morphine00 morphine00 transferred this issue from mlcommons/chakra-old Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants