what is the expected way to collect pytorch execution trace? #3

chenqianfzh · 2023-10-19T22:40:46Z

wonder whether you can show how to get the pytorch execution trace output that Chakra will take and convert?

I tried to collect the trace using the default trace handler, torch.profiler.tensorboard_trace_handler, and the torch.jit.trace(). The outputs from both trials are very different from what pytorch2chakra_converter would expect.

Thanks.

TaekyungHeo · 2023-10-23T13:25:08Z

Hi @chenqianfzh,

Thank you for reaching out. While there is a converter from PyTorch execution traces to Chakra execution traces in Chakra (et_converter), it was previously incomplete.

To address this, changes have been made across three repositories: PARAM, Chakra, and ASTRA-sim:

In PARAM, the trace_link.py tool is designed to merge PyTorch execution traces (covering CPU operators) with Kineto traces (focused on GPU operators), resulting in a unified execution file (View changes here)
The PyTorch ET to Chakra ET converter has received significant enhancements, allowing it to seamlessly bridge and differentiate between GPU and CPU operations (View changes here)
Additionally, ASTRA-sim has been updated to better distinguish between CPU and GPU operations (View changes here)

We will merge these changes once we confirm they work as expected.

For the next steps, you'll need to collect both the PyTorch execution traces and Kineto execution traces during the model's execution. Once gathered, use trace_link.py to combine the PyTorch ET with the Kineto ET. This merged trace can then be fed into the converter to produce a simulation-compatible Chakra execution trace. Please refer to the attached figure for clarification:

Lastly, Saeed from HP labs has shared some files to demonstrate the collection of PyTorch execution traces and Kineto traces. I recommend checking them out: examples.tgz

srinivas212 closed this as completed Nov 1, 2023

morphine00 transferred this issue from mlcommons/chakra-old Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is the expected way to collect pytorch execution trace? #3

what is the expected way to collect pytorch execution trace? #3

chenqianfzh commented Oct 19, 2023

TaekyungHeo commented Oct 23, 2023

what is the expected way to collect pytorch execution trace? #3

what is the expected way to collect pytorch execution trace? #3

Comments

chenqianfzh commented Oct 19, 2023

TaekyungHeo commented Oct 23, 2023