In [2]:
import csv
from datetime import datetime
import statistics
import tempfile
import sys
import os

# the following lines are only needed if the oclude codebase
# lives in the same directory as this script
sys.path = sys.path[1:]
sys.path.remove('')

ValueError: list.remove(x): x not in list

In [3]:
# run this cell AFTER running the cell above

from oclude import profile_opencl_kernel
from openclio import argsIOrole

def write_message(msg):
    sys.stderr.write('<<< OCLUDIFY INFO >>> ' + msg + '\n')

## OpenCL kernel execution time measurement with `oclude` ###

1. This script runs the `oclude` profiler (which must be installed, obviously) on a list of OpenCL kernels and times them. To run, it expects **3 files**:
   - a CSV file with the list of kernels to run (see **2.** below)
   - a CSV file with the list of configurations to use for each kernel (see **3.** below)
   - a simple text file with the list of OpenCL devices to run the kernels on (see **4.** below)

2. These OpenCL kernels are expected to be stored in a CSV file with the following required columns:
   - `benchmark`: the name of the kernel
   - `src`: the actual source code of the kernel

   No other column is taken into account.
   The name of this CSV is stored in the constant `KERNELS_FILE` in the **GLOBAL VARIABLES SECTION** (see below).

3. The script expects one more CSV file with its name stored in the `CONFIG_FILE` global variable with the following columns:
   - `benchmark`: the name of the kernel (which must match the name of one of the kernels in the `KERNELS_FILE`)
   - `config`: the configuration to use for the kernel (see **5.** below)
   - `gsizes`: a space-separated list of 3 integers that defines the global sizes to use for the kernel. The format is: `<starting value (inclusive)> <ending value (inclusive)> <step>`. Note that each global size corresponds to a different run. If left empty, the default global sizes will be used (see constant `DEFAULT_GSIZES` in the **GLOBAL VARIABLES SECTION**).
   - `samples`: the number of samples to take for each run of the kernel i.e., the number of times the kernel will be executed. If left empty, the default number of samples will be used (see constant `DEFAULT_SAMPLES` in the **GLOBAL VARIABLES SECTION**).
   - `devices`: a space-separated list of OpenCL devices to run the kernel on. The device names must match the names of the devices in the `DEVICES_FILE` (see **4.** below). If left empty, all devices in the `DEVICES_FILE` will be used.
   - `timeouts`: a space-separated list of timeout values (in seconds) to use for the kernel. If left empty, the default timeouts will be used (see constant `DEFAULT_TIMEOUTS` in the **GLOBAL VARIABLES SECTION**). Because the execution of `oclude` can hang if there is a problem with the inputs or the value of the gsize (which is not at all unusual), the script will have to kill the kernel's execution. The timeout values are used to determine when to kill the execution. If more than one value is specified (like in the default value), the script will try as many times as there are timeout values before giving up. If there is a run that succeeds before reaching its respective timeout, the script will continue with the next run, and the remaining timeout values will be ignored.

   ***IMPORTANT NOTE***: the script will only deal with the kernels listed in the `CONFIG_FILE`, regardless of how many kernels are listed in the `KERNELS_FILE`. This is because the `CONFIG_FILE` is expected to be manually edited by the user to specify the kernels to run and their respective configurations.

4. The script expects a simple text file with the list of OpenCL devices to run the kernels on.
   - Each line in the file describes a different device to use. All the experiments will be run on each device.
   - Each line must be a space-separated list of 3 values with the following format: `<custom device name> <platform id> <device id>`.
   - The custom device name is used to identify the device in the output files (e.g. CPU, GPU1, GPU2, etc.). Must be unique and **cannot contain spaces**.
   - The platform id and device id are the values returned by the OpenCL API when querying the available devices.
   - Example:
     Let's say that we run the command `clinfo -l` and get the following output:
     ```
     Platform #0: Intel(R) OpenCL
      `-- Device #0: Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz
     Platform #1: NVIDIA CUDA
      +-- Device #0: GeForce GTX 1060 6GB
      `-- Device #1: GeForce GTX 1060 6GB
     ```
     Then, if we want to use all 3 devices, the file should contain the following lines:
     ```
     CPU 0 0
     GPU1 1 0
     GPU2 1 1
     ```
   - The name of this file is stored in the constant `DEVICES_FILE` in the **GLOBAL VARIABLES SECTION**.

5. How to configure the experiments per kernel:
   Because the execution of OpenCL kernels with random inputs is quite difficult to automate, the script expects the user to manually include/exclude kernels. This is done by specifying the configuration to use for each kernel in the `CONFIG_FILE`, under the `config` column. The configuration is a string. The following configurations are supported:
   - empty: if all kernels have an empty config value, the script will run all of them.
   - `<` and `>`: the script will run all kernels between the first `<` and the first `>` (both inclusive). All other kernels will be ignored. All config values of kernels outside this range will be ignored. All other range configs will be ignored. The (other) config values of kernels inside this range will be taken into account. These config options can be combined with others (e.g. `<!`).
   - `X`: the script will skip this kernel.
   - `!`: if there is at least one kernel with a `!` config value, the script will run this/these kernel(s) only. If range configs are used (`<` and `>`), the `!` config values will be ignored, unless there is at least one kernel with a `!` config value inside the specified range. In that case, the script will run only the kernels with a `!` config value inside the specified range. It is ignored if combined with the `X` config value (the kernel will be skipped regardless).
   - Note that, in the provided configuration file, there are 61 kernels marked to be skipped (i.e., with an `X` in the `config` column). These are kernels that cannot be compiled/run by `oclude`. They were just present in the provided `KERNELS_FILE`. Feel free to completely remove these rows from the `CONFIG_FILE` if you like.

6. The script will generate a single CSV results file. The name of the file is stored in the constant `RESULTS_FILE` in the **GLOBAL VARIABLES SECTION**. If left empty, the name will be `results_<timestamp>.csv`, where the timestamp is the current date and time in the format specified by the constant `TIMESTAMP_FORMAT` in the **GLOBAL VARIABLES SECTION**.

### GLOBAL VARIABLES SECTION

Feel free to edit these default values as needed. Remember that the 3 `DEFAULT_*` values can be overrided for each kernel individually by editing the `CONFIG_FILE`.

In [3]:
KERNELS_FILE = 'cgo17-amd.csv'
CONFIG_FILE = 'benchmarks_config.csv'
DEVICES_FILE = 'devices.txt'

DEFAULT_GSIZES = [100, 1000, 100]
DEFAULT_SAMPLES = 100
DEFAULT_TIMEOUTS = [30, 60, 120]

RESULTS_FILE = ''
TIMESTAMP_FORMAT = '%Y-%m-%d_%H-%M-%S'

### The main profiling loop

There should be no need to edit the following cell.

In [4]:
oclude_args = dict(
    file=None,
    kernel='A',
    gsize=None, lsize=None,
    platform_id=None, device_id=None,
    instcounts=False,
    timeit=True,
    samples=None,
    timeout=None,
    ignore_cache=True,
)

# read the kernels file and create a dict <kernel name> -> <kernel source>
sources = {}
with open(KERNELS_FILE, 'r') as f:
    kernelsfile = csv.DictReader(f)
    for row in kernelsfile:
        sources[row['benchmark']] = row['src']

# read the devices file and create a list [(<custom device name>, <platform id>, <device id>), ...]
devices = []
with open(DEVICES_FILE, 'r') as f:
    for line in f:
        line = line.strip()
        if line:
            data = line.split()
            devices.append((data[0], int(data[1]), int(data[2])))

# read the config file one first time to see if there is a range of kernels to run
# if there is, we will only read the config file again for the kernels in the range
# if there is no range, we will read the config file again for all kernels
kernels_range = (1, -1)
with open(CONFIG_FILE, 'r') as f:
    configfile = csv.DictReader(f)
    config = [row['config'] for row in configfile]
    kernels_range = (1, len(config))
found_start, found_end = (-1, -1)
for i, c in enumerate(config, 1):
    if '<' in c and found_start == -1:
        found_start = i
    if '>' in c and found_start != -1:
        found_end = i
        break

if found_end != -1:
    kernels_range = (found_start, found_end)

if kernels_range != (1, len(config)):
    write_message(
        'Range specified. Running kernels {} to {}.'
        .format(*kernels_range)
    )
else:
    write_message('No range specified. Running all kernels.')

# read the config file again, this time only for the kernels in the range
with open(CONFIG_FILE, 'r') as f:
    configfile = csv.DictReader(f)
    kernels = []
    for i, row in enumerate(configfile, 1):
        if i < kernels_range[0] or i > kernels_range[1]:
            continue
        kernels.append(row)

# find if there are kernels in the specified range have a '!' config value
# in this case, we will only run the kernels with a '!' config value
for kernel in kernels:
    if '!' in kernel['config']:
        write_message(
            'Found kernel(s) with a "!" config value. Running only this/these kernel(s).'
        )
        # filter out the kernels that don't have a '!' config value
        kernels = [k for k in kernels if '!' in k['config']]
        break

# run the kernels
results = []
n_kernels = len(kernels)
for i, kernel in enumerate(kernels, 1):
    if 'X' in kernel['config']:
        write_message(
            'Skipping kernel `{}` ({}/{})'
            .format(kernel['benchmark'], i, n_kernels)
        )
        continue

    write_message(
        'Running kernel `{}` ({}/{})'
        .format(kernel['benchmark'], i, n_kernels)
    )

    # set the kernel source using a temporary file with tempfile
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
        f.write(sources[kernel['benchmark']])
        oclude_args['file'] = f.name

    # set the samples
    oclude_args['samples'] = int(
        kernel['samples']
    ) if kernel['samples'] else DEFAULT_SAMPLES

    # set the gsize range
    gsize_range = DEFAULT_GSIZES
    if kernel['gsizes']:
        gsize_range = [int(x) for x in kernel['gsizes'].split()]
        gsize_range[1] += 1

    # set the devices
    kernel_devices = devices
    if kernel['devices']:
        kernel_devices = []
        for device in kernel['devices'].split():
            found_device = False
            for d in devices:
                if device == d[0]:
                    kernel_devices.append(d)
                    found_device = True
                    break
            if not found_device:
                kernel_devices.append((device, -1, -1))

    # set the timeouts
    timeouts_list = [
        int(x) for x in kernel['timeouts'].split()
    ] if kernel['timeouts'] else DEFAULT_TIMEOUTS

    for gsize in range(*gsize_range):
        oclude_args['gsize'] = gsize

        for device, pid, did in kernel_devices:
            if pid == -1:
                write_message(
                    'Unknown device `{}`. Skipping.'.format(device)
                )
                continue

            oclude_args['platform_id'], oclude_args['device_id'] = pid, did
            write_message(
                'Running kernel `{}` on device `{}` with gsize {} and {} samples.'
                .format(kernel['benchmark'], device, gsize, oclude_args['samples'])
            )

            success = False
            for timeout in timeouts_list:
                oclude_args['timeout'] = timeout
                write_message('Timeout set to {} seconds.'.format(timeout))

                # run the kernel
                try:
                    res = profile_opencl_kernel(**oclude_args)
                except TimeoutError as e:
                    write_message('Timeout error: {}'.format(e))
                    continue

                success = True
                result = dict(
                    kernel=kernel['benchmark'],
                    gsize=gsize,
                    device=device,
                    samples=oclude_args['samples'],
                )

                # compute time statistics
                hostcode_times = []
                device_times = []
                transfer_times = []
                for measurement in res['results']:
                    hostcode_times.append(
                        measurement['timeit']['hostcode']
                    )
                    device_times.append(measurement['timeit']['device'])
                    transfer_times.append(
                        measurement['timeit']['transfer']
                    )

                result['hostcode_mean'] = statistics.mean(hostcode_times)
                result['hostcode_median'] = statistics.median(
                    hostcode_times
                )
                result['hostcode_var'] = statistics.pvariance(
                    hostcode_times
                )

                result['device_mean'] = statistics.mean(device_times)
                result['device_median'] = statistics.median(device_times)
                result['device_var'] = statistics.pvariance(device_times)

                result['transfer_mean'] = statistics.mean(transfer_times)
                result['transfer_median'] = statistics.median(
                    transfer_times
                )
                result['transfer_var'] = statistics.pvariance(
                    transfer_times
                )

                # compute input/output bytes
                result['input_bytes'] = 0
                result['output_bytes'] = 0
                args_io_role = argsIOrole(
                    'A', sources[kernel['benchmark']], filename=oclude_args['file']
                )

                for arg, role in args_io_role.items():
                    argtype = arg.split(' %')[0]
                    vf = gsize if argtype.endswith('*') else 1
                    vecsize = 1
                    if argtype.startswith('<'):
                        vecsize = int(argtype.split('<')[1].split('x')[0])

                    n_bytes = 0
                    if argtype.startswith('i') or 'x i' in argtype:
                        n_bytes = int(
                            argtype
                            .split('i')[1]
                            .split('*')[0]
                            .split('>')[0]
                        ) // 8
                    elif 'float' in argtype:
                        n_bytes = 4
                    elif 'double' in argtype:
                        n_bytes = 8

                    if 'input' in role:
                        result['input_bytes'] += (n_bytes * vf * vecsize)
                    if 'output' in role:
                        result['output_bytes'] += (n_bytes * vf * vecsize)

                results.append(result)

                break

            if not success:
                write_message(
                    'All timeouts failed. Skipping current kernel run configuration.'
                )

    # delete the temporary file
    os.remove(oclude_args['file'])

# write the results to a csv file
results_filename = RESULTS_FILE if RESULTS_FILE else 'results_{}.csv'.format(
    datetime.now().strftime(TIMESTAMP_FORMAT)
)
with open(results_filename, 'w') as f:
    writer = csv.DictWriter(
        f,
        fieldnames=[
            'kernel',
            'gsize',
            'device',
            'input_bytes',
            'output_bytes',
            'samples',
            'device_mean',
            'device_median',
            'device_var',
            'hostcode_mean',
            'hostcode_median',
            'hostcode_var',
            'transfer_mean',
            'transfer_median',
            'transfer_var',
        ],
    )
    writer.writeheader()
    writer.writerows(results)

<<< OCLUDIFY INFO >>> No range specified. Running all kernels.
<<< OCLUDIFY INFO >>> Running kernel `amd-app-sdk-3.0-BinomialOption-binomial_options` (1/256)
<<< OCLUDIFY INFO >>> Running kernel `amd-app-sdk-3.0-BinomialOption-binomial_options` on device `CPU` with gsize 100 and 100 samples.
<<< OCLUDIFY INFO >>> Timeout set to 30 seconds.
[oclude] INFO: Ignoring cache
[oclude] Running kernel 'A' from file /tmp/tmp47wd7ffb
[hostcode] Using the following device:
[hostcode] Platform:	NVIDIA CUDA
[hostcode] Device:	NVIDIA GeForce RTX 2060
[hostcode] Version:	OpenCL 3.0 CUDA
[hostcode] Kernel name: A
[hostcode] Kernel arg 1: a (int, private)
[hostcode] Kernel arg 2: b (float4*, global)
[hostcode] Kernel arg 3: c (float4*, global)
[hostcode] Kernel arg 4: d (float4*, local)
[hostcode] Kernel arg 5: e (float4*, local)
[hostcode] About to execute kernel with Global NDRange = 100
[hostcode] Number of executions (a.k.a. samples) to perform: 100
100%|██████████| 100/100 [00:00<00:00, 656.24 kern

KeyboardInterrupt: 

### Results preview

In [None]:
import pandas as pd

results_df = pd.read_csv(results_filename)
results_df.head()

NameError: name 'results_filename' is not defined

### Further questions?

Feel free to contact me at `sot.niarchos [AT] gmail.com`.