## Running RAJAPerf 

In this section we will be running RAJAPerf, exploring some of the properties of a set of kernels that belong to a group,  
demonstrating several techniques to display the timing Hierarchy, and creating basic bar graphs for comparisons of   
timing using compilers GCC vs Clang.

[Back to Table of Contents](./00-intro-and-contents.ipynb)

## Basic help text
Running **raja-perf.exe --help** shows how we can control the performance run.  
For this notebook section we mainly modulate the number of reps,  
while in the next section we will change the focus to capture runs  
with varying problem size.

Note that several cells in this section are setup as bash command line,  
however you can switch to a terminal session under the Jupyter launcher  
and run the same commands

In [None]:
%%bash
$HOME/code/RAJAPerf/build_gcc/bin/raja-perf.exe --help


## RAJAPerf dryrun
First lets run RAJAPerf, performing a dryrun and looking at the run properties,  
paying particular attention to the default Problem size, and Reps.  
We're also just going to specify one group of kernels to run, in this case  
they all fall into the Algorithm group.

To see the full list of kernels edit the command by removing the **--kernels Algorithm** argument

In [None]:
%%bash
$HOME/code/RAJAPerf/build_gcc/bin/raja-perf.exe --dryrun --kernels Algorithm

## RAJAPerf run with one rep on group Algorithm
Next lets actually generate output, but we'll pass in the argument **--checkrun 1**   
to just run one rep of all the kernels in group Algorithm, we do this so the   
output is generated very quickly.

We also specify the **-sp** flag to show progress timing the kernels

Note that RAJAPerf also runs warmup kernels, which gets excluded from any timing. 

In [None]:
%%bash
$HOME/code/RAJAPerf/build_gcc/bin/raja-perf.exe -od $HOME/data/default_problem_size/gcc --checkrun 1 --kernels Algorithm -sp
ls $HOME/data/default_problem_size/gcc/*.cali

## Inspect Timing Hierachy using Caliper's cali-query tool
Next let's inspect the files created by running RAJAPerf. In addition to the .txt and .csv files,  
we also output a set of Caliper data .cali. 

We're going to show several techniques to display the Caliper trees (Timing Hierarchy)

The first technique is with Caliper's own tool cali-query, we run it with **-T** to display tree, 
or you can specify **--tree**.  
We'll focus on inspecting the timing generated by running RAJA with  
execution policy sequential, **RAJA_Seq.cali**

Since we installed RAJAPerf using Spack, we'll activate the environment containing the  
install, and load up the Caliper version installed in order to get cali-query on our PATH

In [None]:
%%bash
eval `spack env activate --sh  --dir /home/jovyan/spack_env`
eval `$HOME/spack/bin/spack load --sh caliper@master%gcc@10.4.0`
which cali-query
cali-query -T $HOME/data/default_problem_size/gcc/RAJA_Seq.cali

## Inspect timing hierarchy using Caliper's Python module
Next we inspect the .cali file with Caliper's own CaliperReader Python module. It's a short python script  

You can add a couple of lines to view the metadata keys captured by Caliper/Adiak  
```py
for g in r.globals:  
    print(g)  
```
You can also add a line to display metadata value in the dictionary **r.globals**  
For example print out the OpenMP Max Threads value recorded at runtime  
`print('OMP Max Threads: ' + r.globals['omp_max_threads'])`  
or the variant represented in this file    
`print('Variant: ' + r.globals['variant'])`

In [None]:

import os
import caliperreader as cr
DATA_DIR = os.getenv('HOME')+"/data/default_problem_size/gcc"
os.chdir(DATA_DIR)
r = cr.CaliperReader()
r.read("RAJA_Seq.cali")
metric = 'avg#inclusive#sum#time.duration'
for rec in r.records:
    path = rec['path'] if 'path' in rec else 'UNKNOWN'
    time = rec[metric] if metric in rec else '0'
    if not 'UNKNOWN' in path:
        if (isinstance(path, list)):
            path = "/".join(path)
        print("{0}: {1}".format(path, time))

## Using Hatchet to inspect Caliper trees
Finally we'll inspect Caliper trees using the Hatchet Python module which we already have installed

In [None]:
import hatchet as ht
DATA_DIR = os.getenv('HOME')+"/data/default_problem_size/gcc"
os.chdir(DATA_DIR)
gf1 = ht.GraphFrame.from_caliperreader("RAJA_Seq.cali")
print(gf1.tree())

## Run RAJAPerf full pass for GCC and Clang
Let's run the GCC/CLang versions of RAJAPerf, one full pass each,  
we'll perform a comparison plot in the next cell  

We'll save the outputs for each compiler in separate directories using the **-od** flag (output directory)

In [None]:
%%bash
$HOME/code/RAJAPerf/build_gcc/bin/raja-perf.exe -od $HOME/data/default_problem_size/gcc --checkrun 1
$HOME/code/RAJAPerf/build_clang/bin/raja-perf.exe -od $HOME/data/default_problem_size/clang --checkrun 1 
echo "All Done Running GCC and Clang RAJAPerf!"

## Generate Bar graphs which compare the variants across the two different compilers
We'll generate a multi-index based on the compilers and variants and generate  
a Pandas dataframe against that index, and plot it using built-in bargraph method  
```py
The multi-index looks like
MultiIndex([('clang++-9.0.1',   'Base_OpenMP'),
            ('clang++-9.0.1',      'Base_Seq'),
            ('clang++-9.0.1', 'Lambda_OpenMP'),
            ('clang++-9.0.1',    'Lambda_Seq'),
            ('clang++-9.0.1',   'RAJA_OpenMP'),
            ('clang++-9.0.1',      'RAJA_Seq'),
            (   'g++-10.4.0',   'Base_OpenMP'),
            (   'g++-10.4.0',      'Base_Seq'),
            (   'g++-10.4.0', 'Lambda_OpenMP'),
            (   'g++-10.4.0',    'Lambda_Seq'),
            (   'g++-10.4.0',   'RAJA_OpenMP'),
            (   'g++-10.4.0',      'RAJA_Seq')],
           names=['compiler', 'variant'])
```
This really only works because we sorted the file list to process at the beginning  
and the multi-index sort order implicitly matches.  

Note also that we're building up data structures using Hatchet's reader method

In [None]:
#!/usr/bin/env python3
import os, glob
import hatchet as ht
import pandas as pd

DATA_DIR = os.getenv('HOME')+"/data/default_problem_size/"

data = []
allfiles = sorted(glob.glob(glob.escape(DATA_DIR) + "*/*.cali"))
metric = 'avg#inclusive#sum#time.duration'
for f in allfiles:
    gf = ht.GraphFrame.from_caliperreader(f)
    compiler = gf.metadata['compiler']
    variant = gf.metadata['variant']
    root_node = gf.graph.roots[0]
    value = gf.dataframe.loc[root_node, metric]
    data_tuple = tuple((compiler,variant,value))
    data.append(data_tuple)
compilers = sorted({d[0] for d in data})
variants = sorted({v[1] for v in data})
idx = pd.MultiIndex.from_product([compilers,variants],sortorder=1, names=['compiler', 'variant'])
df = pd.DataFrame(data,index=idx,columns=['compiler','variant','value'])
df.unstack(level=0)['value'].plot(ylabel='time(seconds)',kind='bar')