## Report FIO results for EBS Benchmark on gp2 and gp3

Steps to run this report:

Option A - run the data to playbook:

- Download this playbook
- Download the data from [Google Drive](https://drive.google.com/drive/folders/1ADvccAAjdluoB0cJwENCNAZ_Evc7YN6f?usp=sharing)
- Make sure that the extracted data will be available on /results (from jupyter container). Eg:
```bash
tar xf results-b3_loop1.tar.xz -C ./results
podman run -v ${PWD}/results:/results:Z <jupyter_container_args>
```

Option B - load exported playbook with data for each runtime, available on:

```bash
$ JOB_GROUP=b3_loop1
$ results-${JOB_GROUP}.tar.xz/byGroup-${JOB_GROUP}/parser/${JOB_GROUP}.ipynb

$ ls -sh byGroup-*/parser/*.ipynb
4.2M byGroup-b3_loop10/parser/b3_loop10.ipynb  2.7M byGroup-b3_loop1/parser/b3_loop1.ipynb  3.8M byGroup-b4_loop1/parser/b4_loop1.ipynb  3.9M byGroup-b4_loop5/parser/b4_loop5.ipynb
```

### Description

The detailed description is available on the document [SPLAT-253 - AWS gp3 study case for IPI](https://docs.google.com/document/d/1r_WjugwBZyp508DAv_3PHWlYKhk5TG8DtSq8ZBc80mo/edit#heading=h.8csikri78lve)

Scenario (clusters):
- c1: OCP cluster with 1x gp2
- c2: OCP cluster with 2x gp2 (etcd isolated)
- c3: OCP cluster with 1x gp3
- c4: OCP cluster with 2x gp3 (etcd isolated)

This report aggregates the data collected on FIO tests, that tested all control plane disks on layouts described above.

The script to create the "battery 2" and collect the data is defined (by WIP script) [here](https://github.com/mtulio/openshift-cluster-benchmark-lab/blob/init/run-test.sh#L250-L271)

References:
 - [FIO doc](https://fio.readthedocs.io/en/latest/fio_doc.html)
 - This report (notebook): reports/fio-ebs_gp3-b2.ipynb
 - This report (markdown/exported): docs/examples/fio-ebs_gp3-b2.md

In [None]:
# install dependencies
! pip install pandas matplotlib natsort pyyaml

In [None]:
import os
import json

import pandas as pd
from IPython.display import display
import matplotlib.pyplot as plt

import tarfile
from pprint import pprint

from natsort import natsorted
from datetime import datetime

In [None]:
task_name_map = {
    "fio_ebs_initialize": "0_fio_ebs_initialize",
    "fio_psync_randwrite": "1_fio_psync_randwrite",
    "fio_libaio_read": "2_fio_libaio_read",
    "fio_libaio_write": "3_fio_libaio_write",
    "fio_libaio_rw": "X_fio_libaio_rw",
    "fio_libaio_randread": "4_fio_libaio_randread",
    "fio_libaio_randwrite": "5_fio_libaio_randwrite",
    "fio_libaio_randrw": "6_fio_libaio_randrw",
    "fio_sync_read": "2_fio_sync_read",
    "fio_sync_write": "1_fio_sync_write",
    "fio_sync_write_alias": "4_fio_sync_write_alias",
    "fio_sync_rw": "3_fio_sync_rw",
}

In [None]:
# Globals

#os.environ['JOB_GROUP'] = "b3_loop10"
#os.environ['JOB_GROUP'] = "b4_loop5"
job_group=(f"{os.getenv('JOB_GROUP', 'b3_loop1')}")

# specific for the test env, b3 will test only RW
if job_group.startswith("b3"):
    job_group_operations="rw"
else:
    job_group_operations="all" # read,write,sync

results_path=(f"/results/byGroup-{job_group}")
parser_path = (f"{results_path}/parser")

In [None]:
html_output_path = (f"{parser_path}/{job_group}.html")
now = datetime.now()
html_output = ("""
<!DOCTYPE html>
<html>
<head>
<style>
table, td, th {
  border: 1px solid black;
}

table {
  width: 100%;
  border-collapse: collapse;
}
</style>
</head>
<body>
""")
html_output += ("<h2> FIO Benchmark Report </h2>")
html_output += ("Generated at: " + now.strftime("%Y-%d-%m, %H:%M:%S") + "UTC <br>")
html_output += (f"""
<br>- Job Group (Report Name): {job_group}</>
<br>- Job Group Operations   : {job_group_operations}</>
<br>- Results base path      : {results_path}</>
""")

In [None]:
results_path

In [None]:
chars = [' ', '(', ')']
def output_add_table(html_output, title="", desc="", data=""):
    html_output += (f"<br><h4>{title}</h4>")
    #link = title
    #for c in chars:
    #    link = link.replace(c, '-')
    #html_output += (f"""<p><a href='#{link}'><h4>{title}</h4></a></p>""")
    html_output += (f"{desc}")
    html_output += (f"<br>{data}")
    return html_output

In [None]:
def lookup_result_files(base_path, results=[], start_str="", contains_str="", extension="", ignore_str=None):
    """
    Generic lookup based on filters criteria
    """
    for res in os.listdir(base_path):
        # check prefix
        if not res.startswith(filter_results_by_battery):
            #print(f"01: {res}")
            continue

        # check extension
        if not res.endswith(extension):
            #print(f"02: {res}")
            continue

        # check filter
        if contains_str not in res:
            #print(f"03: {res}")
            continue

        # ignore strings
        if (ignore_str != None) and (ignore_str in res):
            #print(f"04: {res}")
            continue

        results.append(res)
    return results

In [None]:
# Custom node alias builder. To get shorter columns =)
ocp_default_subnets = [
    {
        "azName": "us-east-1a-public",
        "azId": "use1-az4",
        "cidr": "10.0.0.0/20",
        "cidr_3o_start": 0,
        "cidr_3o_end": 127
    },
    {
        "azName": "us-east-1b-public",
        "azId": "use1-az6",
        "cidr": "10.0.16.0/20",
        "cidr_3o_start": 16,
        "cidr_3o_end": 31
    },
    {
        "azName": "us-east-1c-public",
        "azId": "use1-az1",
        "cidr": "10.0.32.0/20",
        "cidr_3o_start": 32,
        "cidr_3o_end": 47
    },
    {
        "azName": "us-east-1a-private",
        "azId": "use1-az4",
        "cidr": "10.0.128.0/20",
        "cidr_3o_start": 128,
        "cidr_3o_end": 143
    },
    {
        "azName": "us-east-1b-private",
        "azId": "use1-az6",
        "cidr": "10.0.144.0/20",
        "cidr_3o_start": 144,
        "cidr_3o_end": 159
    },
    {
        "azName": "us-east-1c-private",
        "azId": "use1-az2",
        "cidr": "10.0.160.0/20",
        "cidr_3o_start": 160,
        "cidr_3o_end": 175
    }
]

def locate_azId_by_hostname(hostname):
    """
    Assume AzId by hostname. OCP, by default, will deploy first on AzName=a and so on, with standard cidr,
    so discovery it in us-east-1 in a standard IPI is easy;
    """
    hostname_ip = (hostname.split('ip-')[1].split('.ec2.internal')[0])
    hostname_ip3o = int((hostname_ip.split('-')[2]))
    hostname_netIp = (f"{int((hostname_ip.split('-')[2]))}-{int((hostname_ip.split('-')[3]))}")
    for net in ocp_default_subnets:
        if (hostname_ip3o >= net['cidr_3o_start']) and (hostname_ip3o <= net['cidr_3o_end']):
            return (net['azId'], net['azName'], hostname_netIp)

    return ('AzNotFound', 'NA', hostname_netIp)


def find_node_alias_by_hostname(hostname="", add_prefix="", add_suffix="", fmt="azId"):
    # assuming all AWS node hostname starts with 'ip-...'
    if hostname.startswith('ip-'):
        azId, azName, netIp = locate_azId_by_hostname(hostname)
        if fmt == "azId_ipNet": #> '{region_id}-{az_id}_{ip3o}-{ip4o}'
            return (f"{add_prefix}{azId}_{netIp}{add_suffix}")
        elif fmt == "azIdShort_ipNet":  #> '{az_id}_{ip3o}-{ip4o}'
            return (f"{add_prefix}{azId.split('-')[1]}_{netIp}{add_suffix}")
        elif fmt == "ipNet_azIdShort":  #> '{ip3o}-{ip4o}_{az_id}'
            return (f"{add_prefix}{netIp}_{azId.split('-')[1]}{add_suffix}")
        else: # "azId" #> '{region_id}-{az_id}'
            return (f"{add_prefix}{azId}{add_suffix}")
    
    # default: not transformations
    return hostname

In [None]:
class Node(object):
    def __init__(self, name, cluster):
        self.node_name=name
        self.node_alias=""
        self.cluster=cluster
        self.cluster_full=""

        self.metrics=[]

    def add_metric(self, **kwargs):
        #print(f"Adding metric [{kwargs['metric']}]")
        self.metrics.append({
            "job_name": kwargs["job_name"],
            "job_group": kwargs["job_group"],
            "task_name": task_name_map[kwargs["task_name"]],
            "task_group": kwargs["task_group"],
            "task_execId": kwargs["execId"],
            "timestamp": kwargs["timestamp"],
            "metric": kwargs["metric"],
            "value": kwargs["value"],
        })


class Nodes(object):
    def __init__(self):
        self.nodes={}
    
    def add_node(self, node, cluster):
        try:
            node = self.nodes[node]
        except KeyError:
            self.nodes[node] = Node(node, cluster)
            print(f"Node [{node}] added")
            #self.nodes[node].node_alias = find_node_alias_by_hostname(hostname=node, add_prefix=f"{cluster}_", fmt="azIdShort_ipNet")
            self.nodes[node].node_alias = find_node_alias_by_hostname(hostname=node, add_prefix=f"{cluster}_")
        except:
            raise

    def get_node(self, node):
        try:
            return self.nodes[node]
        except:
            raise

In [None]:
def parser_results_fio_runtime(node, data_path, job_info):
    """
    FIO runtime log parser. See below some examples of data.
    sample of header line:
    #cluster=c1gp2x1> Running task [fio_psync_randwrite] on node [ip-10-0-142-138.ec2.internal], registering on log file ./.local/results/byGroup-b3_loop1/fio_stdout-c1-ip-10-0-142-138.ec2.internal.txt
    
    sample of metric line:
    [0] <=> ip-10-0-142-138 <=> Thu Sep  9 13:51:16 UTC 2021 <=>  13:51:16 up 32 min,  0 users,  load average: 1.27, 0.83, 1.21 
    [1] <=> ip-10-0-142-138 <=> Thu Sep  9 14:03:16 UTC 2021 <=>  14:03:16 up 44 min,  0 users,  load average: 0.87, 2.34, 2.93 
    """

    job_name, job_group = job_info
    task_group = "fio_runtime"
    with open(data_path) as f:
        last_job = ''
        time_init = None
        current_task = ""
        for line in f.readlines():
            # parse line : [...] Running task [fio_psync_randwrite] [...],
            if 'Running task [' in line:
                current_task = line.split('Running task [')[1].split(']')[0]
                if node.cluster_full == "":
                    node.cluster_full = line.split('#cluster=')[1].split('>')[0]
                continue
            if line.startswith('['):
                # extract jobId, time and Load1
                jobId = line.split(' <=> ')[0].replace('[','').replace(']','')
                load1 = line.split(' <=> ')[3].split('load average: ')[1].split(',')[0]
                ts = line.split(' <=> ')[2]
                node.add_metric(job_name=job_name,
                                job_group=job_group,
                                task_name=current_task,
                                task_group=task_group,
                                execId=jobId,
                                timestamp=ts,
                                metric='load1',
                                value=load1)
                continue

## FIO Payload (Sample)

Payload sample to build the metric parser fn()

> Open sample FIO result `reports/sample-fio_psync_randwrite.json` from task `fio_psync_randwrite`

In [None]:
def parser_results_fio_tasks(node, data_path, job_info):
    """
    FIO payload parser.
    Walk through fio result dir and load JSON files with FIO results,
    returning only desired metrics for each test.
    """
    job_name, job_group, task_name = job_info
    task_group = "fio_tasks"

    for root, dirs, files in os.walk(data_path):
        for file in files:
            if file.endswith(".json"):
                fpath=os.path.join(root, file)
                with open(fpath, 'r') as f:
                    res_payload=json.loads(f.read())

                    # Extract jobId from different standards (latest is fio_io_)
                    try:
                        jobId = res_payload['jobs'][0]['jobname'].split('fio_io_')[1]
                    except Exception as e:
                        raise e
                        
                    #print(job_group, job_name, task_name, task_group, jobId)
                    #pprint(res_payload)

                    ts = res_payload['timestamp']
                    metrics_collection = {
                        "global": {
                            "read_ios": res_payload['disk_util'][0]['read_ios'],
                            "write_ios": res_payload['disk_util'][0]['read_ios'],
                            "bs": res_payload['global options']['bs'],
                            "ioengine": res_payload['global options']['ioengine'],
                            "numjobs": res_payload['global options']['numjobs'],
                            "runtime": res_payload['global options']['runtime'],
                            "rw": res_payload['global options']['rw'],
                            "size": res_payload['global options']['size'],
                            "jobname": res_payload['jobs'][0]['jobname'],
                        },
                        "values": {
                            "elapsed": res_payload['jobs'][0]['elapsed'],
                            "latency_ms": res_payload['jobs'][0]['latency_ms'],
                            "read_bw": res_payload['jobs'][0]['read']['bw'],
                            "read_iops": res_payload['jobs'][0]['read']['iops'],
                            "read_total_ios": res_payload['jobs'][0]['read']['total_ios'],
                            "read_lat_ms_min": (float(res_payload['jobs'][0]['read']['lat_ns']['min'])/1e+6),
                            "read_lat_ms_max": (float(res_payload['jobs'][0]['read']['lat_ns']['max'])/1e+6),
                            "read_lat_ms_mean": (float(res_payload['jobs'][0]['read']['lat_ns']['mean'])/1e+6),
                            "read_clat_ms_p99": (float(res_payload['jobs'][0]['read']['clat_ns']['percentile']['99.000000'])/1e+6),
                            "read_clat_ms_p99.9": (float(res_payload['jobs'][0]['read']['clat_ns']['percentile']['99.900000'])/1e+6),
                            "read_clat_ms_p99.99": (float(res_payload['jobs'][0]['read']['clat_ns']['percentile']['99.990000'])/1e+6),
                            "read_clat_ms_stddev": (float(res_payload['jobs'][0]['read']['clat_ns']['stddev'])/1e+6),
                            "write_bw": res_payload['jobs'][0]['write']['bw'],
                            "write_iops": res_payload['jobs'][0]['write']['iops'],
                            "write_total_ios": res_payload['jobs'][0]['write']['total_ios'],
                            "write_lat_ms_min": (float(res_payload['jobs'][0]['write']['lat_ns']['min'])/1e+6),
                            "write_lat_ms_max": (float(res_payload['jobs'][0]['write']['lat_ns']['max'])/1e+6),
                            "write_lat_ms_mean": (float(res_payload['jobs'][0]['write']['lat_ns']['mean'])/1e+6),
                            "write_clat_ms_p99": (float(res_payload['jobs'][0]['write']['clat_ns']['percentile']['99.000000'])/1e+6),
                            "write_clat_ms_p99.9": (float(res_payload['jobs'][0]['write']['clat_ns']['percentile']['99.900000'])/1e+6),
                            "write_clat_ms_p99.99": (float(res_payload['jobs'][0]['write']['clat_ns']['percentile']['99.990000'])/1e+6),
                            "write_clat_ms_stddev": (float(res_payload['jobs'][0]['write']['clat_ns']['stddev'])/1e+6),
                            "sync_total_ios": res_payload['jobs'][0]['sync']['total_ios'],
                            "sync_lat_ms_min": (float(res_payload['jobs'][0]['sync']['lat_ns']['min'])/1e+6),
                            "sync_lat_ms_max": (float(res_payload['jobs'][0]['sync']['lat_ns']['max'])/1e+6),
                            "sync_lat_ms_mean": (float(res_payload['jobs'][0]['sync']['lat_ns']['mean'])/1e+6),
                            "sync_lat_ms_p99": (float(res_payload['jobs'][0]['sync']['lat_ns']['percentile']['99.000000'])/1e+6),
                            "sync_lat_ms_p99.9": (float(res_payload['jobs'][0]['sync']['lat_ns']['percentile']['99.900000'])/1e+6),
                            "sync_lat_ms_p99.99": (float(res_payload['jobs'][0]['sync']['lat_ns']['percentile']['99.990000'])/1e+6),
                            "sync_lat_ms_stddev": (float(res_payload['jobs'][0]['sync']['lat_ns']['stddev'])/1e+6),
                            "cpu_sys": res_payload['jobs'][0]['sys_cpu'],
                            "cpu_usr": res_payload['jobs'][0]['usr_cpu'],
                            "cpu_ctx": res_payload['jobs'][0]['ctx']
                        }
                    }
                    node.add_metric(job_name=job_name,
                                job_group=job_group,
                                task_name=task_name,
                                task_group=task_group,
                                execId=jobId,
                                timestamp=ts,
                                metric='collection',
                                value=metrics_collection)

In [None]:
def aggregate_metric_collection(data, metric_name, is_collection=True):
    """
    Filter desired {metric_name}, extract the jobs (rows) for each cluster (columns),
    and return the data frame.
    JobId | {cluster1}  | [...clusterN |]
    #id   | metricValue | [...metricValue |]
    """
    data_metric = {}
    
    def insert_metric(nid, jid, val):
        try:
            job = data_metric[jid]
        except KeyError:
            data_metric[jid] = {
                "job_Id": jid
            }
            job = data_metric[jid]
            pass
        job[nid] = val

    for n in data.nodes.keys():
        node = data.nodes[n]
        for metric in node.metrics:

            job_id = (f"{metric['task_name']}#{metric['task_execId']}")
            
            # get simple metric value (metric != 'collection')
            if not(is_collection) or (metric['metric'] != "collection"):
                if metric['metric'] == metric_name:
                    insert_metric(node.node_alias, job_id, metric['value'])
                continue

            #print(metric['metric'])
            #print(node.node_alias, metric_name)
            #print(metric['value']['values'])
            insert_metric(node.node_alias, job_id, metric['value']['values'][metric_name])
            #jid[node.node_alias] = metric['value']['values'][metric_name]

    data_pd = []
    for dk in natsorted(data_metric.keys()):
        data_pd.append(data_metric[dk])

    # create data frame and force job_id as first column
    df = pd.read_json(json.dumps(data_pd))
    #columns = 
    #print(columns)
    return df.reindex(['job_Id'] + natsorted(df.columns.drop('job_Id')), axis=1)

In [None]:
def _df_style_high(val, value_yellow=None, value_red=None, value_greenS=None, value_greenH=None, invert=False):
    "Data frame styling / cell formating"
    color_map = {
        "green_soft": "#DAF7A6",
        "green_hard": "#02FC11",
        "red_hard": "#FC5A5A",
        "yellow_hard": "#E6ED02",
    }
    color = None

    # ignore 0 values
    if (invert) and (val == 0.0):
        return color
    
    # yellow (high)
    if ((value_yellow != None) and not(invert)) and (val >=  value_yellow):
        color = color_map["yellow_hard"]
    if ((value_yellow != None) and (invert)) and (val <=  value_yellow):
        color = color_map["yellow_hard"]
    
    # red (very high)
    if ((value_red != None) and not(invert))  and (val >=  value_red):
        color = color_map["red_hard"]
    if ((value_red != None) and (invert)) and (val <=  value_red):
        color = color_map["red_hard"]

    # blue (low)
    if ((value_greenS != None) and not(invert))  and (val <=  value_greenS):
        color = color_map["green_soft"]
    if ((value_greenS != None) and (invert)) and (val >=  value_greenS):
        color = color_map["green_soft"]

    # green (very low)
    if ((value_greenH != None) and not(invert))  and (val <=  value_greenH):
        color = color_map["green_hard"]
    if ((value_greenH != None) and (invert)) and (val >=  value_greenH):
        color = color_map["green_hard"]
        
    # default color
    if color == None:
        return color
   
    #return f"color: {color}"
    return f"background-color: {color}"

## Discovery and Load results for 'fio' tests

In [None]:
# Globals
#battery_id = "b2"
filter_results_by_battery=""

nodes = {}

# Runtime runtime, custom stdout collecting when FIO jobs was running
fio_runtime = {}

# FIO Runtime log parser
result_fio_runtime_files = []

# Nodes entity
nodes = Nodes()

In [None]:
result_fio_runtime_files = lookup_result_files(results_path,
                                                results=result_fio_runtime_files,
                                                start_str=filter_results_by_battery,
                                                contains_str="fio_stdout",
                                                extension=".txt"
                                               )
#len(result_fio_runtime_files)
html_output += (f"""
<br>- Total of FIO runtime logs processed: {len(result_fio_runtime_files)}</>
<br>- FIO runtime logs processed: {result_fio_runtime_files}</>
""")

In [None]:
result_fio_runtime_files

In [None]:
# Build metrics from FIO Runtime (stdout parser)
for res in result_fio_runtime_files:
    task_name = f"{res.split('-')[0]}"
    job_name = f"{res.split('-')[1]}"
    node_name = f"{res.split(job_name+'-')[1].split('.txt')[0]}"

    nodes.add_node(node_name, job_name)

    parser_results_fio_runtime(nodes.get_node(node_name), f"{results_path}/{res}", job_info=(job_name, job_group))

In [None]:
#nodes.nodes['ip-10-0-173-135.ec2.internal'].node_alias

In [None]:
#nodes.nodes['ip-10-0-166-6.ec2.internal'].metrics

In [None]:
# FIO raw payload: files is saved on the format: {battery_id}_{cluster_id}-fio-{hostname}.tar.gz ;
# TODO unpack it, currently it should be done manually
results_dirs_fio = []
results_dirs_fio = lookup_result_files(results_path,
                                        results=results_dirs_fio,
                                        start_str="fio_",
                                        contains_str="fio_",
                                        extension="tar.gz",
                                        ignore_str=".txt"
                                       )
len(results_dirs_fio)
html_output += (f"""
<br>- Total of FIO task results' files processed: {len(results_dirs_fio)}</>
<br>- FIO task results' files processed: {results_dirs_fio}</>
""")

In [None]:
# FIO payload reader:
# 1. extract FIO results tarbal (saved by task)
# 2. lookup all JSON files for each result task
# 3. parse FIO payload: extract only desired metrics to be used in this report
for res in results_dirs_fio:

    task_name = f"{res.split('-')[0]}"
    job_name = f"{res.split('-')[1]}"
    node_name = f"{res.split(job_name+'-')[1].split('.tar.gz')[0]}"
    
    nodes.add_node(node_name, job_name)

    # crate parser result dir and unpack it
    dest_path_res = f"{parser_path}/{res.split('.tar.gz')[0]}"
    
    # dependens: mkdir .local/results/byGroup-b3_loop10/parser && chmod o+rw .local/results/byGroup-b3_loop10/parser
    !mkdir -p f"{dest_path_res}"
    
    try:
        if res.endswith('tar.gz'):
            tar = tarfile.open(f"{results_path}/{res}")
            tar.extractall(path=dest_path_res)
            tar.close()
    except:
        # when the file is not found, or corrupted. Add empty metric
        nodes.get_node(node_name).add_metric(
            job_name=job_name,
            job_group=job_group,
            task_name=task_name,
            task_group="fio_tasks",
            execId='',
            timestamp='',
            metric='empty',
            value=''
        )
        print(job_group, job_name, task_name, "fio_tasks", node_name)
        print(f"ERR, dataset not found or corrupted [{res}]; Empty metric added")
        continue

    # parser    
    parser_results_fio_tasks(nodes.get_node(node_name), f"{dest_path_res}", job_info=(job_name, job_group, task_name))
    #break

In [None]:
nodes.nodes[node_name].metrics

In [None]:
#results_fio

## Results for 'fio'

As described, the tests was done in 4 clusters in two disk layouts (single disk, etcd isolated) using gp2 and gp3. The volume has same capacity using standard values for IOPS and throughput (gp3)

- Total of FIO consecutive tests: 50
- Max IOPS on all jobs job: ~1.5/2k IOPS
- Max IOPS for gp2 device: 386 (capacity=128GiB, throughput*=128 MiB/s)
- Max IOPS for gp3 device: 3000 (capacity=128GiB, throughput=120MiB/s) 

\*[Important note from AWS doc](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html): 
*"The throughput limit is between 128 MiB/s and 250 MiB/s, depending on the volume size. Volumes smaller than or equal to 170 GiB deliver a maximum throughput of 128 MiB/s. Volumes larger than 170 GiB but smaller than 334 GiB deliver a maximum throughput of 250 MiB/s if burst credits are available. Volumes larger than or equal to 334 GiB deliver 250 MiB/s regardless of burst credits. gp2 volumes that were created before December 3, 2018 and that have not been modified since creation might not reach full performance unless you modify the volume."*

____

In [None]:
# Load manifest data
import yaml

node_data = []
for nm in list(nodes.nodes.keys()):
    node = nodes.nodes[nm]

    # output of : oc get infrastructures -o yaml > {res_path}/infrastructures.yaml
    with open(f"{results_path}/cluster-manifests-{node.cluster_full}/infrastructures.yaml", "r") as stream:
        try:
            yd = yaml.safe_load(stream)
            platform = yd['items'][0]['status']['platform']
            infraName = yd['items'][0]['status']['infrastructureName']
        except yaml.YAMLError as exc:
            print(exc)

    # output of : oc get machines -n openshift-machine-api -o yaml > {res_path}/machines.yaml
    with open(f"{results_path}/cluster-manifests-{node.cluster_full}/machines.yaml", "r") as stream:
        try:
            yd = yaml.safe_load(stream)
            for m in yd['items']:
                if m['status']['nodeRef']['name'] == nm:
                    mName = m['metadata']['name']
                    mType = m['spec']['providerSpec']['value']['instanceType']
                    vType = m['spec']['providerSpec']['value']['blockDevices'][0]['ebs']['volumeType']
                    vSize = m['spec']['providerSpec']['value']['blockDevices'][0]['ebs']['volumeSize']
                    azName = m['spec']['providerSpec']['value']['placement']['availabilityZone']
        except yaml.YAMLError as exc:
            print(exc)

    # output of : oc get clusterversion -o yaml > {res_path}/clusterversion.yaml
    with open(f"{results_path}/cluster-manifests-{node.cluster_full}/clusterversion.yaml", "r") as stream:
        try:
            yd = yaml.safe_load(stream)
            cversion = yd['items'][0]['status']['desired']['version']
        except yaml.YAMLError as exc:
            print(exc)
            
    node_data.append({
        "hostname": node.node_name,
        "machineName": mName,
        "node_alias": node.node_alias,
        "AZ": azName,
        "job": node.cluster,
        "cluAlias": node.cluster,
        "cluName": node.cluster_full,
        "infraName": infraName,
        "cluVersion": cversion,
        "platform": platform,
        "vmType": mType,
        "volType": vType,
        "volSzGB": vSize,
                
    })
df = pd.read_json(json.dumps(node_data))
display(df.sort_values(by=['node_alias']))
html_output = output_add_table(html_output,
                               title=f"Node Inventory",
                               desc="Job inventory of infrastructure that hosted the tests",
                               data=df.sort_values(by=['node_alias']).style.render())

## More filters

In [None]:
# Dataframe custom filters
df_filter_az2 = ["c1_use1-az2", "c2_use1-az2", "c3_use1-az2", "c4_use1-az2"]
df_filter_az4 = ["c1_use1-az4", "c2_use1-az4", "c3_use1-az4", "c4_use1-az4"]
df_filter_az6 = ["c1_use1-az6", "c2_use1-az6", "c3_use1-az6", "c4_use1-az6"]
df_filter_gp2 = ["c1_use1-az2", "c1_use1-az4", "c1_use1-az6", "c2_use1-az2", "c2_use1-az4", "c2_use1-az6",]
df_filter_gp3 = ["c3_use1-az2", "c3_use1-az4", "c3_use1-az6", "c4_use1-az2", "c4_use1-az4", "c4_use1-az6",]

In [None]:
def df_drop_zero_rows(df):
    """
    Replace NaN to 0 from a data frame (df),
    and remove **rows** that contains **only** 0.
    Return the changed data frame.
    """
    nan_value = float("NaN")
    df.replace(nan_value, 0, inplace=True)
    deleteIndexes=[]
    for col in df.columns.drop('job_Id'):
        indexNames = df[(df[col] == 0)].index
        if len(indexNames) == 0:
            continue
        if len(deleteIndexes) == 0:
            deleteIndexes = indexNames
        deleteIndexes = set(deleteIndexes).intersection(indexNames)
    
    print(f"Detected [{len(deleteIndexes)}] rows to be deleted: {deleteIndexes}. Droping it...\n")
    df.drop(deleteIndexes , inplace=True)
    return df

In [None]:
def build_oper_for_metric(mp, sync="lat"):
    delimiter="_"
    if mp == "":
        delimiter=""
    d = {
        "write": f"{mp}{delimiter}",
        "read": f"{mp}{delimiter}"
    }
    if job_group_operations == "rw":
        return d

    d['sync']=f"{sync}{delimiter}"
    return d

## Results (Lattency)

In [None]:
# Job specific
metric_prefix="lat"
oper_lat_prefix = build_oper_for_metric(metric_prefix)

for op in oper_lat_prefix.keys():
    # measurement metric
    mm="mean"
    metric=f"{op}_{oper_lat_prefix[op]}ms_{mm}"
    title=f"metric ({metric}) by Node(all)"

    df = df_drop_zero_rows(aggregate_metric_collection(nodes, f"{metric}"))

    display(df.plot(title=f"{op} {mm} by Node (all)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az2).plot(title=f"{op} {mm} by Node (az2)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az4).plot(title=f"{op} {mm} by Node (az4)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az6).plot(title=f"{op} {mm} by Node (az6)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_gp2).plot(title=f"{op} {mm} by Node (gp2)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_gp3).plot(title=f"{op} {mm} by Node (gp3)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    
    print(f"\n>> {title}")
    dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'), value_yellow=5.4, value_red=6).set_table_attributes('border="1"')
    display(dfs)
    html_output = output_add_table(html_output,
                                   title=f"{title}",
                                   desc="",
                                   data=dfs.render())

In [None]:
metric_prefix="lat"
oper_lat_prefix = build_oper_for_metric(metric_prefix)

for op in oper_lat_prefix.keys():
    mm="max"
    metric=f"{op}_{oper_lat_prefix[op]}ms_{mm}"
    title=f"metric ({metric}) by Node(all)"

    df = df_drop_zero_rows(aggregate_metric_collection(nodes, f"{metric}"))

    print(f">> {title}")
    dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                            value_yellow=20, value_red=50).set_table_attributes('border="1"')
    display(dfs)
    html_output = output_add_table(html_output,
                                   title=f"{title}",
                                   desc="",
                                   data=dfs.render())

## Results (Percentile)

In [None]:
metric_prefix="clat"
oper_lat_prefix = build_oper_for_metric(metric_prefix, sync="lat")
print(oper_lat_prefix)
for op in oper_lat_prefix.keys():
    mm="p99"
    metric=f"{op}_{oper_lat_prefix[op]}ms_{mm}"
    title=f"metric ({metric}) by Node(all)"

    df = df_drop_zero_rows(aggregate_metric_collection(nodes, f"{metric}"))

    print(f">> {title}")
    th_yellow=5
    th_red=10.0

    dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                            value_yellow=th_yellow, value_red=th_red).set_table_attributes('border="1"')
    display(dfs)
    display(df.plot(title=f"{op} {mm} by Node (all)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az2).plot(title=f"{op} {mm} by Node (az2)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az4).plot(title=f"{op} {mm} by Node (az4)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az6).plot(title=f"{op} {mm} by Node (az6)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_gp2).plot(title=f"{op} {mm} by Node (gp2)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_gp3).plot(title=f"{op} {mm} by Node (gp3)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    
    html_output = output_add_table(html_output,
                                   title=f"{title}",
                                   desc="",
                                   data=f"{dfs.render()}")

In [None]:
metric_prefix="clat"
oper_lat_prefix = build_oper_for_metric(metric_prefix, sync="lat")

for op in oper_lat_prefix.keys():
    mm="p99.9"
    metric=f"{op}_{oper_lat_prefix[op]}ms_{mm}"
    title=f"metric ({metric}) by Node(all)"

    df = df_drop_zero_rows(aggregate_metric_collection(nodes, f"{metric}"))

    display(df.plot(title=f"{op} {mm} by Node (all)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az2).plot(title=f"{op} {mm} by Node (az2)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az4).plot(title=f"{op} {mm} by Node (az4)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az6).plot(title=f"{op} {mm} by Node (az6)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_gp2).plot(title=f"{op} {mm} by Node (gp2)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_gp3).plot(title=f"{op} {mm} by Node (gp3)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    
    print(f">> {title}")
    dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'), value_yellow=10, value_red=20.0).set_table_attributes('border="1"')
    display(dfs)
    html_output = output_add_table(html_output,
                                   title=f"{title}",
                                   desc="",
                                   data=dfs.render())

In [None]:
metric_prefix="clat"
oper_lat_prefix = build_oper_for_metric(metric_prefix, sync="lat")

for op in oper_lat_prefix.keys():
    mm="stddev"
    metric=f"{op}_{oper_lat_prefix[op]}ms_{mm}"
    title=f"metric ({metric}) by Node(all)"

    df = df_drop_zero_rows(aggregate_metric_collection(nodes, f"{metric}"))

    print(f">> {title}")
    dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                            value_yellow=1, value_red=2).set_table_attributes('border="1"')
    display(dfs)
    html_output = output_add_table(html_output,
                                   title=f"{title}",
                                   desc="",
                                   data=dfs.render())

## Results (totals)

In [None]:
metric_prefix=""
# there's no sync for IOPS metric
oper_lat_prefix = build_oper_for_metric(metric_prefix, sync="")

for op in oper_lat_prefix.keys():
    # there's no sync for IOPS metric
    if op == "sync":
        continue
    
    metric=f"{op}_{oper_lat_prefix[op]}iops"
    title=f"metric ({metric}) by Node(all)"

    df = df_drop_zero_rows(aggregate_metric_collection(nodes, f"{metric}"))

    display(df.plot(title=f"{op} IOPS by Node (all)", xlabel="job_Id", ylabel="Time (ms)", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az2).plot(title=f"{op} IOPS by Node (az2)", xlabel="job_Id", ylabel="IOPS", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az4).plot(title=f"{op} IOPS by Node (az4)", xlabel="job_Id", ylabel="IOPS", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_az6).plot(title=f"{op} IOPS by Node (az6)", xlabel="job_Id", ylabel="IOPS", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_gp2).plot(title=f"{op} IOPS by Node (gp2)", xlabel="job_Id", ylabel="IOPS", fontsize="10", figsize=(25,5)))
    display(df.filter(items=df_filter_gp3).plot(title=f"{op} IOPS by Node (gp3)", xlabel="job_Id", ylabel="IOPS", fontsize="10", figsize=(25,5)))
    
    print(f">> {title}")
    dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                            value_yellow=2000, value_red=1000,
                            value_greenH=2950, value_greenS=2900,
                            invert=True).set_table_attributes('border="1"')
    display(dfs)
    html_output = output_add_table(html_output,
                                   title=f"{title}",
                                   desc="",
                                   data=dfs.render())

In [None]:
metric_prefix=""
# there's no sync for TOTAL IOs metric
oper_lat_prefix = build_oper_for_metric(metric_prefix, sync="")

for op in oper_lat_prefix.keys():
    # there's no sync for TOTAL IOs metric
    if op == "sync":
        continue

    metric=f"{op}_{oper_lat_prefix[op]}total_ios"
    title=f"metric ({metric}) by Node(all)"

    df = df_drop_zero_rows(aggregate_metric_collection(nodes, f"{metric}"))

    print(f">> {title}")
    dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                            value_yellow=200000, value_red=100000,
                            value_greenS=500000, value_greenH=530000,
                            invert=True).set_table_attributes('border="1"')
    display(dfs)
    html_output = output_add_table(html_output,
                                   title=f"{title}",
                                   desc="",
                                   data=dfs.render())

In [None]:
metric_prefix=""
# there's no sync for BW metric
oper_lat_prefix = build_oper_for_metric(metric_prefix, sync="")

for op in oper_lat_prefix.keys():
    # there's no sync for BW metric
    if op == "sync":
        continue

    metric=f"{op}_{oper_lat_prefix[op]}bw"
    title=f"metric ({metric}) by Node(all)"

    df = df_drop_zero_rows(aggregate_metric_collection(nodes, f"{metric}"))

    print(f">> {title}")
    dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                            value_yellow=20000, value_red=10000, value_greenS=40000,
                            value_greenH=47500, invert=True).set_table_attributes('border="1"')
    display(dfs)
    html_output = output_add_table(html_output,
                                   title=f"{title}",
                                   desc="",
                                   data=dfs.render())

In [None]:
metric="cpu_ctx"
title=f"metric ({metric}) by Node(all)"

df = aggregate_metric_collection(nodes, f"{metric}")

print(f">> {title}")
dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                        value_yellow=530000, value_red=537500).set_table_attributes('border="1"')
display(dfs)
html_output = output_add_table(html_output,
                               title=f"{title}",
                               desc="",
                               data=dfs.render())

In [None]:
metric="cpu_sys"
title=f"metric ({metric}) by Node(all)"

df = aggregate_metric_collection(nodes, f"{metric}")
df_columns = df.columns.drop('job_Id')

print(f">> {title}")
#display(df.style.applymap(_df_style_high, subset=df_columns, value_yellow=0.25, value_red=0.50, value_greenS=0.1, value_greenH=0.04))
dfs = df.style.applymap(_df_style_high, subset=df_columns, value_yellow=0.25, value_red=0.50, value_greenS=0.1, value_greenH=0.04).set_table_attributes('border="1"')
display(dfs)
html_output = output_add_table(html_output,
                               title=f"{title}",
                               desc="",
                               data=dfs.render())

In [None]:
metric="cpu_usr"
title=f"metric ({metric}) by Node(all)"

df = aggregate_metric_collection(nodes, f"{metric}")

print(f">> {title}")
dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                        value_yellow=0.1, value_red=0.2, value_greenS=0.05,
                        value_greenH=0.02).set_table_attributes('border="1"')
display(dfs)
html_output = output_add_table(html_output,
                               title=f"{title}",
                               desc="",
                               data=dfs.render())

In [None]:
metric="load1"
title=f"metric ({metric}) by Node(all)"

df = aggregate_metric_collection(nodes, f"{metric}", is_collection=False)

print(f">> {title}")

dfs = df.style.applymap(_df_style_high, subset=df.columns.drop('job_Id'),
                        value_yellow=2, value_red=4, value_greenS=1,
                        value_greenH=0.5).set_table_attributes('border="1"')
display(dfs)
html_output = output_add_table(html_output,
                               title=f"{title}",
                               desc="",
                               data=dfs.render())

In [None]:
# Save to HTML (ps: nbconvert is truncing the tables)
html_output += """
</body>
</html>
"""

with open(html_output_path, 'w') as f:
    f.write(html_output)