# Memory analysis from traces

When using the Pytorch profiler traces can be generated with profile information by using:

```python
torch.profiler.profile(
    ...
    with_memory=True,
    with_stack=True,
    ...
)
```

This notebook shows how the `TraceAnalysis` object can be used to understand memory usage in Pytorch.

A profile generated with this option is available at `HolisticTraceAnalyis/tests/data/torchtitan_h100/`

__<font color='red'>
Note: To run the notebook, ensure that the path to the HolisticTraceAnalysis repo is set appropriately in the `trace_dir` variable below.
</font>__

In [1]:
trace_prefix = "~/HolisticTraceAnalysis"
# trace_prefix = ".."  # the right prefix for your machine

## Loading the profile and displaying a simple memory timeline

The timeline below plots the total memory allocated and total memory reserved as reported by the events.

In [None]:
from hta.trace_analysis import TraceAnalysis
analyzer = TraceAnalysis(trace_dir=f"{trace_prefix}/tests/data/torchtitan_h100/")
memory_events = analyzer.get_memory_timeline()

In [None]:
memory_events.head()

## Categorising memory usage

A more detailed analysis of the memory timeline can be done by cross-referencing the instantaneous allocation and deallocations with the rest of the events.

This lets us associate memory events with a `stack_name` and a `stack_type`.

In [None]:
categorised_memory_timelines, memory_events = analyzer.get_memory_timeline_per_category()

## Using categorised data for further analysis

The data that was processed from the trace is made available through a dictionary of Dataframes.
The index of dictionary is the device associated with the memory events.

Note the `stack_ids`, `stack_name`, `alloc_or_dealloc_id` and `stack_type`.

In [None]:
print(memory_events.keys())
memory_events[0].head()

This data can also be used to make custom plots - for example with plotly below:

In [None]:
import pandas as pd
pd.options.plotting.backend = "plotly"
print(categorised_memory_timelines[0].columns)
non_category_columns = ['ts', 'bytes_delta', 'stack_name', 'is_plotting_event_only', 'category']
categorised_memory_timelines[0].drop(non_category_columns, axis=1).set_index("ms").plot()

## Using custom classifiers and visualisations

The profile above shows a large memory spike. This tooling can help you
identify what kernels are allocating this data.

The first step is to define a classification function - this is a straightforward function
which operates on the trace dataframes. 

In [None]:
def classify_big_allocs(row):
    if row["bytes_delta"] > 5e9:
        return "very_big_alloc (>5gB)"
    elif row["bytes_delta"] > 1e9:
        return "big_alloc (>1gB)"
    elif row["bytes_delta"] <1e6:
        return "small_alloc (<1mb)"
    return "normal_alloc"

alloc_sizes, _ = analyzer.get_memory_timeline_per_category(classification_func=classify_big_allocs)

In the plot below we show how the timeline data can be melted in a long format such that the `stack_name` variable can be made to appear in the tool tip on the graph.

In [None]:
import plotly.express as px
timelines: pd.DataFrame = alloc_sizes[0]
df_melted = timelines.set_index("ts").sort_index().drop(["category", "bytes_delta"], axis="columns").melt(id_vars=['ms', 'stack_name', 'is_plotting_event_only'], var_name="column", value_name="value", ignore_index=False).dropna()

def reformat_stack_name(stack_name):
    return stack_name.split(";")[-1]


df_melted["stack_name"] = df_melted.stack_name.apply(reformat_stack_name)
# create a plotly express figure
fig = px.line(df_melted, x=df_melted.index, y='value', color='column', hover_data=['stack_name'])
fig.update_layout(title="Hover over this figure to see the stack names on the line")
# show the plot
fig.show()