# Execution Trace Demo

## About

PyTorch Execution Trace is a reference implementation of the open standard MLCommons - Chakra project
The Execution Trace aims to collect semantic information of a PyTorch model execution.

It essentially is a graph with nodes representing either Operators or Tensors.
1. Operator nodes contain information of input/output Tensors, input/output shapes, and parent-child relationships.
1. Tensor nodes include shape, type and storage information.

Combining Execution Trace with the timing information in PyTorch trace we can develop sophisticated critical path and anti-pattern analysis.

### Python API to collect Execution Trace
The `ExecutionTraceObserver` collects Execution Traces. Each process can have a single `ExecutionTraceObserver` instance. 
1. One can register a callback function to save the Execution Trace to file by calling `register_callback(output_file_path)`. Note that `output_file_path` should be unique for each process/rank.
1. Once an `ExecutionTraceObserver` is created, the `start()` and `stop()` methods control when the event data is recorded.
1. Deleting or calling `unregister_callback()` will unlink the observer and will stop incurring any overheads.

In the following example, we create an ET observer object explicitly. This allows us to control when to start or stop capturing the execution trace. 

```
from torch.profiler import ExecutionTraceObserver

# Create a temp file to save execution trace data.
fp = tempfile.NamedTemporaryFile('w+t', suffix='.json', delete=False)
fp.close()

et = ExecutionTraceObserver()
et.register_callback(fp.name)

for idx in range(10):
    if idx == 3:
        et.start()
    elif idx == 5:
        et.stop()
        et.unregister_callback()
    with record_function(f"## LOOP {idx} ##"):
        payload(use_cuda=use_cuda)

assert fp.name == et.get_output_file_path()
```

One fine detail is that it’s a good practice to always start or stop capturing at the beginning of an iteration. This allows certain objects (for example record_function()) to go out of scope from the previous iteration and the corresponding exit callback is called.


### Correlating Execution Trace and Kineto Trace

We can correlate semantic information in Execution Trace with the PyTorch/Kineto Trace. Collecting PyTorch trace is covered in the official PyTorch recipe [here](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html).

There are two cases we handle
1. ET and PyTorch Trace are collected at the same time. In this case the tensor information will be most accurate but ET collection could add overhead to the collected trace.
1. ET and PyTorch Trace are collected at different time intervals. The two traces can still be correlated but it assumes similar behavior from iteration to iteration.

At the moment the `execution_trace` module only works for case (1) above i.e. both traces are collected simultaneously. These example traces can be found in `tests/data/execution_trace/`

In [1]:
from hta.trace_analysis import TraceAnalysis

In [2]:
from hta.common import execution_trace

## Load Kineto traces and Execution Trace

__<font color='red'>
Note: To run the notebook, ensure that the path to the HolisticTraceAnalysis repo is set appropriately in the `trace_dir` variable below.
</font>__

In [3]:
trace_prefix = "~/HolisticTraceAnalysis"
trace_dir = f"{trace_prefix}/tests/data/execution_trace/"
analyzer = TraceAnalysis(trace_dir=trace_dir)

2023-07-26 10:41:32,146 - hta - trace.py:L404 - INFO - /Users/bcoutinho/Work/hta/HolisticTraceAnalysis2/tests/data/execution_trace
2023-07-26 10:41:32,150 - hta - trace_file.py:L61 - ERROR - If the trace file does not have the rank specified in it, then add the following snippet key to the json files to use HTA; "distributedInfo": {"rank": 0}. If there are multiple traces files, then each file should have a unique rank value.
2023-07-26 10:41:32,150 - hta - trace_file.py:L94 - INFO - Rank to trace file map:
{0: '/Users/bcoutinho/Work/hta/HolisticTraceAnalysis2/tests/data/execution_trace/benchmark_simple_add_trace.json.gz'}
2023-07-26 10:41:32,151 - hta - trace.py:L550 - INFO - ranks=[0]
2023-07-26 10:41:32,152 - hta - trace.py:L132 - INFO - Parsed /Users/bcoutinho/Work/hta/HolisticTraceAnalysis2/tests/data/execution_trace/benchmark_simple_add_trace.json.gz time = 0.00 seconds 


In [4]:
et = execution_trace.load_execution_trace(trace_dir + "benchmark_simple_add_et.json.gz")

2023-07-26 10:41:32,177 - hta - execution_trace.py:L45 - INFO - Parsed Execution Trace file ~/Work/hta/HolisticTraceAnalysis2/tests/data/execution_trace/benchmark_simple_add_et.json.gz, time = 0.00 seconds 


## Correlate Execution Trace and Kineto/PyTorch Profiler Trace

This section uses the above Execution Trace object and HTA Trace object and correlates them.
The final trace dataframe will now contain a `et_node` column that list the unique node ID in Execution Trace.
Since, Execution Trace is mainly recorded on `user_annotations` and `cpu_op` events, these are the events that will an `et_node` annotated.

In [5]:
execution_trace.correlate_execution_trace?

[0;31mSignature:[0m
[0mexecution_trace[0m[0;34m.[0m[0mcorrelate_execution_trace[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mtrace[0m[0;34m:[0m [0mhta[0m[0;34m.[0m[0mcommon[0m[0;34m.[0m[0mtrace[0m[0;34m.[0m[0mTrace[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mrank[0m[0;34m:[0m [0mint[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0met[0m[0;34m:[0m [0mparam[0m[0;34m.[0m[0mpython[0m[0;34m.[0m[0mtools[0m[0;34m.[0m[0mexecution_graph[0m[0;34m.[0m[0mExecutionGraph[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Correlate the trace from a specific rank with Execution Trace object.

Args:
    trace (Trace): Trace object loaded using `TraceAnalysis(trace_dir=trace_dir)`
                    or other method.
    rank (int): Rank to correlate with.
    et (ExecutionGraph): An Execution Trace object to correlate with.

Returns:
    None

Outcome is the trace dataframe for s

In [6]:
execution_trace.correlate_execution_trace(analyzer.t, 0, et)

2023-07-26 10:41:32,218 - root - execution_trace.py:L79 - INFO - Trace and ET have overlap = True
2023-07-26 10:41:32,219 - root - execution_trace.py:L80 - INFO - Trace rf_ids (1, 83),ET rf_ids (1, 36)
2023-07-26 10:41:32,219 - hta - execution_trace.py:L124 - INFO - Supported event type ('cat') symbols = [10, 19]


In [7]:
# Make a copy so that we can modify it and add symbols for readability
trace_df = analyzer.t.get_trace(0).copy()
analyzer.t.symbol_table.add_symbols_to_trace_df(trace_df, col='cat')
analyzer.t.symbol_table.add_symbols_to_trace_df(trace_df, col='name')

## Use ET node information
Now that `et_node` is known we can use it to index and correlate vital information from ET.
One use case of ET is to get the input and output shapes and size information. 
The `add_et_column(...)` function can be used to achieve that.

In [8]:
execution_trace.add_et_column?

[0;31mSignature:[0m
[0mexecution_trace[0m[0;34m.[0m[0madd_et_column[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mtrace_df[0m[0;34m:[0m [0mpandas[0m[0;34m.[0m[0mcore[0m[0;34m.[0m[0mframe[0m[0;34m.[0m[0mDataFrame[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0met[0m[0;34m:[0m [0mparam[0m[0;34m.[0m[0mpython[0m[0;34m.[0m[0mtools[0m[0;34m.[0m[0mexecution_graph[0m[0;34m.[0m[0mExecutionGraph[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcolumn[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Add columns from Execution Trace nodes into the trace dataframe. Please
run this after running correlate_execution_trace(...).
Args:
    trace_df (pd.DataFrame): Dataframe for trace from one rank. Please
                             run correlate_execution_trace() on the trace dataframe
                             first so that the `et_node` is populated..
    

In [9]:
execution_trace.add_et_column(trace_df, et, 'et_node_name')
execution_trace.add_et_column(trace_df, et, 'op_schema')
execution_trace.add_et_column(trace_df, et, 'input_shapes')
execution_trace.add_et_column(trace_df, et, 'input_types')
execution_trace.add_et_column(trace_df, et, 'output_shapes')
execution_trace.add_et_column(trace_df, et, 'output_types')

In [10]:
trace_df.head()

Unnamed: 0,index,cat,name,pid,tid,ts,dur,memory_bw_gbps,Trace iteration,stream,...,correlation,index_correlation,iteration,et_node,et_node_name,op_schema,input_shapes,input_types,output_shapes,output_types
0,0,user_annotation,[param|cuda],563677,563677,0,19814157,-1,-1,-1,...,-1,-1,-1,3.0,[param|cuda],,[],[],[],[]
1,1,cpu_op,aten::rand,563677,563677,2006,19583658,-1,-1,-1,...,-1,-1,-1,4.0,aten::rand,"aten::rand(SymInt[] size, *, ScalarType? dtype...","[[[], []], [], [], [], []]","[GenericList[Int,Int], Int, None, Device, Bool]","[[256, 256]]",[Tensor(float)]
2,2,cpu_op,aten::empty,563677,563677,2047,19583280,-1,-1,-1,...,-1,-1,-1,5.0,aten::empty,"aten::empty.memory_format(SymInt[] size, *, Sc...","[[[], []], [], [], [], [], []]","[GenericList[Int,Int], Int, None, Device, Bool...","[[256, 256]]",[Tensor(float)]
3,3,cpu_op,aten::uniform_,563677,563677,19585454,189,-1,-1,-1,...,-1,-1,-1,8.0,aten::uniform_,"aten::uniform_(Tensor(a!) self, float from=0.,...","[[256, 256], [], [], []]","[Tensor(float), Double, Double, None]","[[256, 256]]",[Tensor(float)]
4,4,cpu_op,aten::rand,563677,563677,19585799,94,-1,-1,-1,...,-1,-1,-1,9.0,aten::rand,"aten::rand(SymInt[] size, *, ScalarType? dtype...","[[[], []], [], [], [], []]","[GenericList[Int,Int], Int, None, Device, Bool]","[[256, 256]]",[Tensor(float)]
