# Parsing TensorBoard Data

https://github.com/j3soon/tbparse/tree/master  
https://tbparse.readthedocs.io/en/latest/


## Imports

In [1]:
import tempfile

from torch.utils.tensorboard import SummaryWriter
from tbparse import SummaryReader

## Constants & Preparations

In [2]:
# Used to create sample event logs
N_RUNS = 2
N_EVENTS = 3

## Preparing Sample Event Logs

To illustrate the use of `tbparse`, we create some sample event logs as is done [here](https://tbparse.readthedocs.io/en/latest/pages/parsing-scalars.html#preparing-sample-event-logs). 

In [3]:
# Prepare tmpdirs to store event files
tmpdirs = {'torch': tempfile.TemporaryDirectory()}
LOG_DIR = tmpdirs['torch'].name

Next, we simulate two independent training runs.

In [4]:
for i in range(N_RUNS):
    writer = SummaryWriter(f"{LOG_DIR}/run_{i}")
    # We store 2 tags, each with 3 events
    for j in range(N_EVENTS):
        writer.add_scalar('y=2x+C', j * 2 + i, j)
        writer.add_scalar('y=3x+C', j * 3 + i, j)
    writer.close()

## Parsing Scalars

The `SummaryReader` class allows us to read:
- a single event file
- all event files under a (sub-)directory (of the log directory)
- all files under the log directory


### Load Log Directory

https://tbparse.readthedocs.io/en/latest/pages/parsing-scalars.html#load-log-directory

We can distinguish multiple runs (i.e., event files), by passing `extra_columns={'dir_name'}` to the `SummaryReader` as follows:

In [5]:
reader = SummaryReader(LOG_DIR, extra_columns={'dir_name'})  # long format
df = reader.scalars
print(df)

    step     tag  value dir_name
0      0  y=2x+C    0.0    run_0
1      1  y=2x+C    2.0    run_0
2      2  y=2x+C    4.0    run_0
3      0  y=3x+C    0.0    run_0
4      1  y=3x+C    3.0    run_0
5      2  y=3x+C    6.0    run_0
6      0  y=2x+C    1.0    run_1
7      1  y=2x+C    3.0    run_1
8      2  y=2x+C    5.0    run_1
9      0  y=3x+C    1.0    run_1
10     1  y=3x+C    4.0    run_1
11     2  y=3x+C    7.0    run_1


By default, the events are stored in **long format**. To store the events in **wide format**, we pass `pivot=True` to the `SummaryReader` class:

In [6]:
reader = SummaryReader(LOG_DIR, pivot=True, extra_columns={'dir_name'})  # wide format
df = reader.scalars
print(df)

   step  y=2x+C  y=3x+C dir_name
0     0     0.0     0.0    run_0
1     1     2.0     3.0    run_0
2     2     4.0     6.0    run_0
3     0     1.0     1.0    run_1
4     1     3.0     4.0    run_1
5     2     5.0     7.0    run_1
