# Polybench 3.2 PMEM Analysis

This notebook is my latest attempt at analyzing the Polybench 3.2 behavior on Intel Optane DC Persistent Memory.

There are a variety of factors in considering this analysis including:

* If the memory is a single module or striped
* The size of the dataset
* The specific test being executed
* The compilation options for Polybench itself.  The options are:
    * **POLYBENCH_TIME** This option causes the benchmark to emit an elapsed time measurement.
    * **POLYBENCH_NO_FLUSH_CACHE** disables the flush cache operation before each run; I have
      not used this option.
    - **POLYBENCH_LINUX_FIFO_SCHEDULER** This option causes the code to use the Linux real-time
      FIFO scheduler at the maximum priority; essentially, this pins the test to the core
     - **POLYBENCH_CACHE_SIZE_KB** the size of the L1 cache, which is flushed prior to the runs.
       The default is 33MB, which is enough for our purposes (since we have a 32MB L1 cache).
     - **POLYBENCH_STACK_ARRAYS** uses stack space instead of `malloc`; we do not use this option
       since we are comparing to persistent memory using the `malloc` interface.
     - **POLYBENCH_DUMP_ARRAYS** This appears to be a debugging aid; it is not useful for my
       benchmarking.
     - **POLYBENCH_CYCLE_ACCURATE_TIMER** rather than using the realtime clock, which provides elapsed
       time, this uses the tick counter (`rdtsc`) instruction.  I have collected data from
       both approaches.  While I have attempted to run with the variable performance disabled,
       I have also read information that suggests this doesn't work for the Scalable Xeon CPUs.
       Thus, I expect clock ticks to be a more accurate comparison measure.
     - **POLYBENCH_PAPI** is a specific type of timing API that I have not investigated or used.
     - **{MINI|SMALL|STANDARD|LARGE|EXTRALARGE}_DATASET** one is chosen to pick a dataset size
       for the given test. Note that the largest option requires 160GB of memory for two of the
       tests and thus the PMEM region **must** accomodate that.
     - **POLYBENCH_USE_C99_PROTO** uses C99 function prototypes; I've not used this option and I'm
       not sure what the default is (C89?)
     - **POLYBENCH_USE_SCALAR_LAB** changes the way loops function; I have not used this option.
 
**Note**: Even though the README specifies that to use the FIFO scheduler you need to link with an
extra library, testing indicates that this is a no-op: the binaries it generates appear to be identical,
based upon a sampling of them.
 
I have created multiple **Makefile**s for the various options and scripts which _build_ the specific Makefile before running the given tests.  The output is then emitted into separate directories:

* Log files for each type of test:
    * DRAM
    * PMEM7 (striped)
    * PMEM1 (single)
    * Make (output from the Make command)
    
This script is thus set up to extract the information from the logs, filter it, and generate hopefully
useful output.  Since there may be errors, I'll try to deal with those and report them as well.  I have
set up the scripts (14 June 2019) so they do _not_ run AEPWatch, but the scripts should filter that out
to handle (future) runs where AEPWatch data is present.

 

In [1]:
import os, sys, re
from operator import truediv
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker
from matplotlib2tikz import save as tikz_save
import math
import pandas as pd
import seaborn as sns


In [3]:
# Results directories start with pb-results and then a date stamp
input_dir = '..'
results_dirs = [x for x in os.listdir(input_dir) if x.startswith('pb-results')]
print(results_dirs)

['pb-results-2019_06_13__23_33_00', 'pb-results-2019_06_13__23_59_38', 'pb-results-2019_06_14__00_51_01', 'pb-results-2019_06_14__10_18_31', 'pb-results-2019_06_14__11_13_03', 'pb-results-2019_06_14__11_40_20']


In [7]:
results_logs={}
for rd in results_dirs:
    results_logs[rd] = {}
    logs=[x for x in os.listdir('{}/{}'.format(input_dir, rd)) if x.endswith('.log')]
    results_logs[rd]['dram'] = [x for x in logs if 'dram' in x]
    results_logs[rd]['pmem1'] = [x for x in logs if 'pmem1' in x]
    results_logs[rd]['pmem7'] = [x for x in logs if 'pmem7' in x]
    results_logs[rd]['make'] = [x for x in logs if 'make' in x]
    # validate
    for log in results_logs[rd]: assert len(results_logs[rd][log]) is 1
    print(results_logs[rd])


{'dram': ['pb-dram-intelsdp1044-2019_06_13__23_33_00.log'], 'pmem1': ['pb-pmem1-intelsdp1044-2019_06_13__23_33_00.log'], 'pmem7': ['pb-pmem7-intelsdp1044-2019_06_13__23_33_00.log'], 'make': ['pb-make-intelsdp1044-2019_06_13__23_33_00.log']}
{'dram': ['pb-dram-intelsdp1044-2019_06_13__23_59_38.log'], 'pmem1': ['pb-pmem1-intelsdp1044-2019_06_13__23_59_38.log'], 'pmem7': ['pb-pmem7-intelsdp1044-2019_06_13__23_59_38.log'], 'make': ['pb-make-intelsdp1044-2019_06_13__23_59_38.log']}
{'dram': ['pb-dram-intelsdp1044-2019_06_14__00_51_01.log'], 'pmem1': ['pb-pmem1-intelsdp1044-2019_06_14__00_51_01.log'], 'pmem7': ['pb-pmem7-intelsdp1044-2019_06_14__00_51_01.log'], 'make': ['pb-make-intelsdp1044-2019_06_14__00_51_01.log']}
{'dram': ['pb-dram-intelsdp1044-2019_06_14__10_18_31.log'], 'pmem1': ['pb-pmem1-intelsdp1044-2019_06_14__10_18_31.log'], 'pmem7': ['pb-pmem7-intelsdp1044-2019_06_14__10_18_31.log'], 'make': ['pb-make-intelsdp1044-2019_06_14__10_18_31.log']}
{'dram': ['pb-dram-intelsdp1044-2019

In [125]:
class PolybenchResults:
    
    polybench_dataset_types = (
        'MINI',
        'SMALL',
        'STANDARD',
        'LARGE',
        'EXTRALARGE'
    )
    
    polybench_option_types = (
        'TIME',
        'FLUSH_CACHE',
        'LINUX_FIFO_SCHEDULER',
        'CACHE_SIZE_KB',
        'STACK_ARRAYS',
        'DUMP_ARRAYS',
        'CYCLE_ACCURATE_TIMER',
        'PAPI',
        'DATASET',
        'USE_C99_PROTO',
        'USE_SCALAR_LAB'
    )
    
    polybench_log_types = (
        'dram',
        'pmem1',
        'pmem7',
        'make'
    )
    
    def __init__(self, results_dir):
        self.polybench_options = {}
        for ot in self.polybench_option_types:
            self.polybench_options[ot] = False
        self.set_polybench_dataset('STANDARD')
        self.set_polybench_cache_size_kb()
        self.results_dir = results_dir
        self.logs = [x for x in os.listdir(results_dir) if x.endswith('.log')]
        self.log_results = {}
        self._load_results()
        self._process_options()
        self.data = {}
        for lt in self.polybench_log_types:
            if 'make' is lt: continue
            assert lt not in self.data # we shouldn't be overwriting existing data
            self.data[lt] = self._process_data(lt)
        
    def _load_results(self):
        for log in self.logs:
            for lt in self.polybench_log_types:
                if lt in log:
                    with open('{}/{}'.format(self.results_dir, log), 'r') as fd:
                        self.log_results[lt] = fd.readlines()
    
    def _process_options(self):
        assert 'make' in self.log_results
        for pbo in self.polybench_option_types:
            if pbo in self.log_results['make'][0]:
                if pbo == 'DATASET': self._process_dataset_option()
                else: self.polybench_options[pbo] = True
    
    def _process_dataset_option(self):
        for pdt in self.polybench_dataset_types:
            if '-D{}'.format(pdt) in self.log_results['make'][0]:
                self.set_polybench_dataset(pdt)
                
    
    def _process_data(self, type):
        data = []
        assert type in self.polybench_log_types
        index = 0
        tn = None
        alloc = 0
        time = 0
        error = False
        while index < len(self.log_results[type]):
            line = self.log_results[type][index].strip()
            # make sure we start with a clean slate
            assert tn is None
            assert alloc is 0
            assert time is 0
            if '_time' in line: # test name
                index = index + 1
                tn = line.strip().split('/')[-1][:-5]
            line = self.log_results[type][index].strip()         
            while 'posix_memalign' in line: # allocation
                if 'cannot' in line: # error condition
                    data.append((tn, -1, -1))
                    tn = None
                    alloc = 0
                    time = 0
                    index = index + 1
                    error = True
                    break
                pm = line.strip().split(' ')[-1]
                index = index + 1
                line = self.log_results[type][index].strip()
                try:
                    alloc = alloc + int(pm)
                except Exception as e:
                    print('line {}: unknown content {}, exception {}'.format(
                           index, line, e))
                    continue
            if error:
                error = False
                continue
            # at this point I should either have a floating point number
            # or an integer
            try:
                if '.' in line: time = float(line)
                else: time = int(line)
            except Exception as e:
                print('{} - {}, line {}: expected a number, got {}, exception{}'.format(
                      self.results_dir, type, 
                      index+1, line, e))
            index = index + 1
            # TODO: now we should add this data item to our list
            data.append((tn, alloc, time))
            tn = None
            alloc = 0
            time = 0
        return data
        
        
    def set_polybench_option(self, option_type, option):
        assert option_type in self.polybench_option_types
        self.polybench_options[option_type] = option
    
    
    
    def set_polybench_time(self, option=False):
        self.set_polybench_option('TIME', option)

        
    def set_polybench_flush_cache(self, option=False):      
        self.set_polybench_option('FLUSH_CACHE', option)
        
        
    def set_polybench_linux_fifo_scheduler(self, option=False):
        self.set_polybench_option('LINUX_FIFO_SCHEDULER', option)
        
        
    def set_polybench_cache_size_kb(self, option=33):
        self.set_polybench_option('CACHE_SIZE_KB', 33)
        
        
    def set_polybench_stack_arrays(self, option=False):
        self.set_polybench_option('STACK_ARRAYS', option)
        
        
    def set_polybench_dump_arrays(self, option=False):
        self.set_polybench_option('DUMP_ARRAYS', option)
        
        
    def set_polybench_cycle_accurate_timer(self, option=False):
        self.set_polybench_option('CYCLE_ACCURATE_TIMER', option)
    
    
    def set_polybench_papi(self, option=False):
        self.set_polybench_option('PAPI', option)
    
    
    def set_polybench_dataset(self, option='STANDARD'):
        assert option in self.polybench_dataset_types
        self.set_polybench_option('DATASET', option)
    
    
    def set_polybench_use_c99_proto(self, option=False):
        self.set_polybench_option('USE_C99_PROTO', option)
    
    
    def set_polybench_use_scalar_lab(self, option=False):
        self.set_polybench_option('USE_SCALAR_LAB', option=False)
        
    def get_polybench_options(self):
        return self.polybench_options
    
    def dump_polybench_options(self):
        for pbo in self.polybench_options: print('{}: {}'.format(pbo, self.polybench_options[pbo]))
            
    def dump_polybench_data(self, data_type):
        assert data_type in self.polybench_log_types
        for test, alloc, time in self.data[data_type]:
            if type(time) is float: 
                print('{}: ({},{:4.2f})'.format(test, alloc, time))
            else: print('{}: ({},{})'.format(test, alloc, time))



In [128]:
pb_results = {}
for rd in results_dirs:
    pb_results[rd] = PolybenchResults('{}/{}'.format(input_dir, rd))
    print(rd)
    print('dram:')
    pb_results[rd].dump_polybench_data('dram')
    print('pmem1:')
    pb_results[rd].dump_polybench_data('pmem1')
    print('pmem7:')
    pb_results[rd].dump_polybench_data('pmem7')

pb-results-2019_06_13__23_33_00
dram:
2mm: (40960,157228)
3mm: (57344,196192)
atax: (8960,3280)
bicg: (9216,5048)
cholesky: (8448,24742)
doitgen: (16800,8220)
gemm: (24576,69398)
gemver: (10240,5398)
gesummv: (17152,4998)
mvt: (9216,3842)
symm: (24576,38554)
syr2k: (24576,158942)
syrk: (16384,69190)
trisolv: (8704,1316)
trmm: (16384,28062)
durbin: (17408,3102)
dynprog: (139264,197108)
gramschmidt: (24576,64928)
lu: (8192,9948)
ludcmp: (9504,21152)
correlation: (16896,34540)
covariance: (16640,31122)
floyd-warshall: (8192,64768)
reg_detect: (1072,1776)
adi: (24576,38480)
fdtd-2d: (24592,5972)
fdtd-apml: (1186416,1250736)
jacobi-1d-imper: (8000,1414)
jacobi-2d-imper: (16384,4216)
seidel-2d: (8192,47122)
pmem1:
2mm: (40960,148550)
3mm: (57344,242156)
atax: (8960,5380)
bicg: (9216,26470)
cholesky: (8448,22294)
doitgen: (16800,52184)
gemm: (24576,86270)
gemver: (10240,5208)
gesummv: (17152,20594)
mvt: (9216,3968)
symm: (24576,54244)
syr2k: (24576,147478)
syrk: (16384,64750)
trisolv: (8704,1

In [None]:
        
        
        
# The make log tells us what options were used
def load_make_info(input_log):
    with open(input_log, 'r') as fd:
        for line in fd.readlines():
            if 'gcc' not in line: continue
            # we have a gcc line, which will include the parameters used
            print(line.split(' '))
            break

for rd in results_logs:
    load_make_info('{}/{}/{}'.format(input_dir, rd, results_logs[rd]['make'][0]))
    break # testing - just call the first one