# Timestamps

RADICAL-Analytics (RA) enables event-based analyses in which the timestamps recorded in a RADICAL-Cybertools (RCT) session are studied as timeseries instead of durations. Those analyses are low-level and, most of the time, useful to 'visualize' the process of execution as it happens in one or more components of the stack.

<div class="alert alert-warning">
    
__Warning:__ Sessions with 100,000+ tasks and resoruces may generate traces with 1M+ events. Depending on the quantity of available memory, plotting that amount of timestamps with RA could not be feasable.

</div>

## Prologue

Load all the Python modules needed to profile and plot a RADICAL-EnsembleToolkit (EnTK) session.

In [None]:
import tarfile

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker

import radical.utils as ru
import radical.pilot as rp
import radical.analytics as ra

from radical.pilot import states as rps

Load the RADICAL Matplotlib style to obtain viasually consistent and publishable-qality plots.

In [None]:
plt.style.use(ra.get_mplstyle('radical_mpl'))

Usually, it is useful to record the stack used for the analysis. 

<div class="alert alert-info">
    
__Note:__ The stack used for the analysis might be different from the stack used to crete the session to analyze. Usually, the two stack have to have the same major release number in order to be compatible.

</div>

In [None]:
! radical-stack

## Event Model

RCT components have each a well-defined event model:

* [RADICAL-Pilot (RP) event model](https://github.com/radical-cybertools/radical.pilot/blob/devel/docs/source/events.md)
* [RADICAL-EnsembleToolkit (EnTK) event model](https://radicalentk.readthedocs.io/en/latest/dev_docs/uml.html#dev-docs-events)

<div class="alert alert-info">
    
__Note:__ RA does not support RADICAL-SAGA.

</div>

Each event belongs to an entity and is timestamped within a component. The succession of the same event over time constitutes a time series. For example, in RP the event `schedule_ok` belongs to a `task` and is timestamped by `AgentSchedulingComponent`. The timeseries of that event indicates indicates the rate at which tasks are scheduled by RP.

## Timestamps analysis

We use RA to derive the timeseries for one or more events of interest. We then plot each time series singularly or on the same plot. When plotting the time series of multiple events together, they must all be ordered in the same way. Typically, we sort the entities by the timestamp of their first event.

RA user workflow for a timestamps analysis:

1. Go at [RADICAL-Pilot (RP) event model](https://github.com/radical-cybertools/radical.pilot/blob/devel/docs/source/events.md), [RP state model](https://github.com/radical-cybertools/radical.pilot/wiki/State-Model-Evolution) or [RADICAL-EnsembleToolkit (EnTK) event model](https://radicalentk.readthedocs.io/en/latest/dev_docs/uml.html#dev-docs-events) and derives the list of events of interest.
2. Convert events and states in RP/RA dict notation.

E.g., a scheduling event and state in RP:

* [schedule_ok: search for task resources succeeded (uid: task)](https://github.com/radical-cybertools/radical.pilot/blob/devel/docs/source/events.md#agentschedulingcomponent-component)
* [AGENT_SCHEDULING - picked up by agent scheduler, attempts to assign cores for execution](https://github.com/radical-cybertools/radical.pilot/wiki/State-Model-Evolution)

In [None]:
event = {ru.EVENT: 'schedule_ok', ru.STATE: None}
state = {ru.EVENT: 'state', ru.STATE: rps.AGENT_SCHEDULING}

3. Filter a RCT session for the entity to which the selected event/state belong
4. use `ra.entity.timestamps()` and the defined event/state to derive the time series for that event/state.

### Session

Name and location of the session we profile.

In [None]:
sid = 're.session.login1.lei.018775.0005'
sdir = 'sessions/'

Unbzip and untar sessions.

In [None]:
sp = sdir+sid+'.tar.bz2'
tar = tarfile.open(sp, mode='r:bz2')
tar.extractall(path=sdir)
tar.close()

Create a ``ra.Session`` object for the session. We are not going to use EnTK-specific traces so we are going to load only the RP traces contained in the EnTK session. Thus, we pass the ``'radical.pilot'`` session type to ``ra.Session``.

<div class="alert alert-warning">
    
__Warning:__ We already know we will want to derive information about pilot(s) and tasks. Thus, we save in memory a session objects filtered for those two identities. This might be too expensive with large sessions, depending on the amount of memory available.

</div>
    
<div class="alert alert-info">
    
__Note:__ We save the ouput of ``ra.Session`` in ``capt`` to avoid polluting the notebook. 

</div>

In [None]:
%%capture capt

sp = sdir+sid

session = ra.Session(sp, 'radical.pilot')
pilots  = session.filter(etype='pilot', inplace=False)
tasks   = session.filter(etype='task' , inplace=False)

We usually want to collect some information about the sessions we are going to analyze. That information is used for bookeeping while performing the analysis (especially when having multiple sessions) and to add meaningful titles to (sub)plots.

In [None]:
sinfo = {}

sinfo.update({
    'cores_node': session.get(etype='pilot')[0].cfg['resource_details']['rm_info']['cores_per_node'],
    'pid'       : pilots.list('uid'),
    'ntask'     : len(tasks.get())
})

sinfo.update({
    'ncores'    : session.get(uid=sinfo['pid'])[0].description['cores'],
    'ngpus'     : pilots.get(uid=sinfo['pid'])[0].description['gpus']
})

sinfo.update({
    'nnodes'    : int(sinfo['ncores']/sinfo['cores_node'])
})

Use `ra.session.get()` on the filtered session objects that contains only task entities. Then use `ra.entity.timestamps()` to derive the time series for each event/state of interest. We put the time series into a pandas DataFrame to make plotting easier. 

In [None]:
tseries = {'AGENT_SCHEDULING': [], 
           'schedule_ok': []}

for task in tasks.get():
    ts_state = task.timestamps(event=state)[0]
    ts_event = task.timestamps(event=event)[0]
    tseries['AGENT_SCHEDULING'].append(ts_state)
    tseries['schedule_ok'].append(ts_event)

time_series = pd.DataFrame.from_dict(tseries)
time_series.head()

Usually, time series are printed as lineplots but, in our case, we want to plot just the events as the 'line' connecting the events might be a misleading artefact. We then use a scatterplot in which the X axes are the number of tasks. This somehow 'stretches' the meaning of a scatterplot as we do not use it to represent a correlation.

<div class="alert alert-info">
    
__Note:__ We need to zero the Y axes as the timestamps are taken starting from the first timestamp of the session and the moment of executing the application. The event/state we choose are much later down the execution. Here we select the event/state that has to appen first, based on our knowledge of <a href='https://github.com/radical-cybertools/radical.pilot/wiki/Architecture'>RP architecture</a>. Alternatively, we could find the min among all the time stamps we have in the dataframe and use that as a minimum.

</div>

<div class="alert alert-info">

__Note:__ Once we have found the zero point in time (`zero`) we subtract it to the whole time series. Pandas' dataframe make that easy. We also add 1 to the index we use for the X axes so to start to count tasks from 1.

</div>

In [None]:
fig, ax = plt.subplots(figsize=(ra.get_plotsize(212)))

# Find the min timestamp of the first event/state timeseries and use it to zero
# the Y axes.
zero = time_series['AGENT_SCHEDULING'].min()

ax.scatter(time_series['AGENT_SCHEDULING'].index + 1, 
           time_series['AGENT_SCHEDULING'] - zero, 
           marker = '.', 
           label = ra.to_latex('AGENT_SCHEDULING'))
ax.scatter(time_series['schedule_ok'].index + 1, 
           time_series['schedule_ok'] - zero, 
           marker = '.', 
           label = ra.to_latex('schedule_ok'))

ax.legend(ncol=1, loc='upper left', bbox_to_anchor=(0,1.25))
ax.set_xlabel('Number of Tasks')
ax.set_ylabel('Time (s)')

The plot above shows that all the tasks arrive at the scheduler together (dark blue, `AGENT_SCHEDULING` state). That is expected as tasks are transferred in bulk from RP Client's Task Manager to RP Agent's Staging In component.

The plot also shows that the 40 tasks are executed in two "generations". The first 20 tasks can be immediately scheduled (light blue, `schedule_ok`) and the second 20 tasks can be scheduled one by one, as soon as one of the first 20 tasks terminates to execute.

Adding execution events to our timestamps analysis should fill the gap between the two "generations" of 20 tasks. It should also confirm the measures obtains with duration distribution about the time it takes to the launch method used by RP Executor component to launch a task. 

We add the relevant events/states to the `time_series` dataframe:

In [None]:
executor = {
    'AGENT_EXECUTING': {ru.EVENT: 'state', ru.STATE: rps.AGENT_EXECUTING},
    'task_exec_start': {ru.EVENT: 'task_exec_start', ru.STATE: None},
    'task_exec_stop' : {ru.EVENT: 'task_exec_stop' , ru.STATE: None}
}

for name, event in executor.items():
    
    tseries = []    
    for task in tasks.get():
        ts_state = task.timestamps(event=event)[0]
        tseries.append(ts_state)
    
    time_series[name] = tseries
    
time_series.head()

We now plot the timeseries alongside the previous ones:

In [None]:
fig, ax = plt.subplots(figsize=(ra.get_plotsize(212)))

zero = time_series['AGENT_SCHEDULING'].min()

for ts in time_series.columns:    
    ax.scatter(time_series[ts].index + 1, 
               time_series[ts] - zero, 
               marker = '.', 
               label = ra.to_latex(ts))

ax.legend(ncol=2, loc='upper left', bbox_to_anchor=(-0.25,1.5))
ax.set_xlabel('Number of Tasks')
ax.set_ylabel('Time (s)')

`schedule_ok` is not visible in the plot because it is overlapped by `AGENT_EXECUTING` that happens right after each task has been scheduled. That confirms that the communication between the RP Scheduler and Executor components does not add significant overhead to the execution.

We see the time taken by the launch method (JSRUN on ORNL Summit in this case) to launch a task once it is handed out to the RP Executer component: the distance between dark orange dots, `TASK_EXECUTING` state and light orange dots, `task_exec_start` event. The overall task execution time (TTX) is consistent with what measured with the [duration analyses](duration.ipynb).

The plot also shows the time taken to execute each task: distance on the Y axes between light orange dots and dark green dots. Also in this case, that is consistent with what measured with the [duration analyses](duration.ipynb).

In presence of large amount of tasks, we can slice the time stamps to plot one or more of their subsets. 

In [None]:
fig, ax = plt.subplots(figsize=(ra.get_plotsize(212)))

# Slice time series to plot only one of their subsets
time_series = time_series.reset_index(drop=True)
time_series = time_series.iloc[16:24]

zero = time_series['AGENT_SCHEDULING'].min()

for ts in time_series.columns:    
    ax.scatter(time_series[ts].index + 1, 
               time_series[ts] - zero, 
               marker = '.', 
               label = ra.to_latex(ts))

ax.legend(ncol=2, loc='upper left', bbox_to_anchor=(-0.25,1.5))
ax.set_xlabel('Number of Tasks')
ax.set_ylabel('Time (s)')