In [1]:
from __future__ import print_function

Load in the Python modules necessary to load the data (gzip and yaml), process the data (numpy and pandas), and plot the data (toyplot).

In [2]:
import gzip
import yaml
import numpy
import pandas
import toyplot.pdf

print("yaml version:    ", yaml.__version__)
print("numpy version:   ", numpy.__version__)
print("pandas version:  ", pandas.__version__)
print("toyplot version: ", toyplot.__version__)

yaml version:     3.12
numpy version:    1.14.0
pandas version:   0.22.0
toyplot version:  0.16.0


## Ingest Data

Load the data, which is output into a YAML file. (Actually, there are several runs that have been concatinated into a YAML file that has a sequence in its top level.)

In [3]:
filename = 'miniGraphics-skybridge-vn-scaling.yaml.gz'
yaml_data = yaml.load(gzip.open(filename))
data = pandas.DataFrame(yaml_data)

The YAML data is hierarchical. The basic yaml reader to DataFrame just embeds dictionaries and lists in DataFrame columns. Fix that by expanding the data of these columns into new columns.

In [4]:
def expand_single_column(original_data, column_to_expand):

    expanded_data = pandas.DataFrame()
    for index in original_data.index:
        sub_table = pandas.DataFrame(original_data[column_to_expand][index])
        for column in original_data.columns:
            if column != column_to_expand:
                sub_table[column] = numpy.full(sub_table.index.shape,
                                               original_data[column][index],
                                               dtype=original_data[column].dtype)
        expanded_data = expanded_data.append(sub_table, ignore_index=True)
    return expanded_data

def flatten_table(original_data):
    flat_data = original_data
    for column_name in original_data.columns:
        if isinstance(flat_data[column_name][0], list):
            flat_data = expand_single_column(flat_data, column_name)
    return flat_data

In [5]:
data = flatten_table(data)

Add a column that gives a human-readable name to each image resolution.

In [6]:
image_height_names = {
    500: 'Desktop Window',
    1080: 'HDTV',
    4320: '8K UHD',
}

data['image-size'] = data['image-height'].map(image_height_names)

Rename the algorithms from the identifiers the program wrote out to the strings used in the paper. Note that there are some extras in the data that we are ignoring.

In [7]:
algorithm_names = {
    '2-3-SwapBase': '2-3 Swap',
    'BinarySwapFold': 'Naive',
    'BinarySwapTelescoping': 'Telescoping',
    'BinarySwapRemainder': 'Remainder',
    'IceTBase': 'IceT'
}

data['composite-algorithm'] = data['composite-algorithm'].map(algorithm_names)

Print a summary of the table data. There are multiple ways that Jupyter and pandas will report a summary of a table, but I find this method the most effective. It prints out every column. Then for all columns with a "small" number of unique values, it gives those values. This latter information really helps identify the proper way to group values.

In [8]:
import IPython.display

data_description = ''

for column_name in data.columns:
    data_description = data_description + '**' + column_name + '** '
    unique_values = data[column_name].unique()
    if (len(unique_values) < 10):
        for value in unique_values:
            data_description = data_description + str(value) + ' '
    elif (numpy.issubdtype(unique_values.dtype, numpy.number)):
        data_description = (
            data_description +
            str(numpy.nanmin(unique_values)) + ' &ndash; ' +
            str(numpy.nanmax(unique_values)) + ' '
        )
    elif not pandas.isnull(unique_values).any():
        data_description = (
            data_description +
            str(numpy.min(unique_values)) + ' &ndash; ' +
            str(numpy.max(unique_values)) + ' '
        )
    data_description = data_description + ' \n'
    
IPython.display.display(IPython.display.Markdown(data_description))

**color-buffer-format** byte  
**composite-algorithm** Naive Remainder Telescoping nan 2-3 Swap IceT  
**composite-seconds** 0.00227687 &ndash; 28.7058  
**compress-seconds**  
**construct-tree-seconds** 9.1633e-05 &ndash; 0.061385  
**depth-buffer-format** float  
**gather-seconds** 0.00095674 &ndash; 4.52609  
**geometry** box  
**geometry-distribution** duplicate  
**geometry-overlap** -0.05  
**icet-copy-result-seconds**  
**image-compression** True False  
**image-height** 500 1080 4320  
**image-width** 500 1920 7680  
**k**  
**max-image-split** nan 1000000.0  
**num-processes** 64 &ndash; 8192  
**num-triangles** 768 &ndash; 98304  
**paint-seconds** 0.00020071 &ndash; 0.170633  
**painter** simple  
**partial-composite-seconds** 0.00037141 &ndash; 27.0723  
**phi-rotation** -178.406 &ndash; 136.756  
**random-seed** 17627  
**rendering-order-dependent** False  
**start-time** 2018-04-03T12:58:47.000000000 &ndash; 2018-04-18T10:04:49.000000000  
**theta-rotation** -177.945 &ndash; 179.418  
**total-seconds** 0.00280725 &ndash; 28.7062  
**trial-num** 0 &ndash; 19  
**uncompress-seconds**  
**zoom** 1.5  
**image-size** Desktop Window HDTV 8K UHD  


## Plot Data

We are plotting for the 2-3 swap algorithm the time it takes to the "partial composite" breaking it up into the time for building the compositing tree and for everything else.

In [9]:
data['transfer-blend-seconds'] = data['partial-composite-seconds'] - data['construct-tree-seconds']

In [10]:
averages = data.pivot_table(
    values=['transfer-blend-seconds', 'construct-tree-seconds'],
    index='num-processes',
    columns=[
        'image-size',
        'composite-algorithm',
    ],
    aggfunc='mean',
)

averages

Unnamed: 0_level_0,construct-tree-seconds,construct-tree-seconds,construct-tree-seconds,transfer-blend-seconds,transfer-blend-seconds,transfer-blend-seconds
image-size,8K UHD,Desktop Window,HDTV,8K UHD,Desktop Window,HDTV
composite-algorithm,2-3 Swap,2-3 Swap,2-3 Swap,2-3 Swap,2-3 Swap,2-3 Swap
num-processes,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3
64,0.000125,0.000101,0.000108,0.385656,0.002477,0.021402
96,0.000148,0.000131,0.000134,0.386075,0.002551,0.020028
128,0.000227,0.000197,0.00022,0.376058,0.002407,0.020491
144,0.000251,0.000215,0.000237,0.385088,0.002495,0.020583
192,0.000275,0.000268,0.000274,0.37705,0.002509,0.019934
256,0.000424,0.000402,0.000432,0.373612,0.002496,0.020438
288,0.000457,0.000453,0.00047,0.380985,0.002609,0.020247
384,0.000569,0.000574,0.000572,0.371444,0.002506,0.019615
432,0.000678,0.000686,0.000669,0.378975,0.002633,0.020586
512,0.000927,0.000919,0.000903,0.373236,0.002493,0.020675


Create a filled chart showing the times to construct the composite tree and the rest of the composite.

In [11]:
image_size = 'HDTV'
algorithm = '2-3 Swap'

canvas = toyplot.Canvas('4.5in', '3in')

axes = canvas.cartesian(
    xlabel='Number of Processes',
    ylabel='Partial Composite Time (seconds)',
    xscale='log',
    bounds=(45,-15,15,-50),
)

#axes.x.ticks.locator = toyplot.locator.Log(base=2, format='{:.0f}')
axes.x.ticks.locator = toyplot.locator.Explicit(
    locations=[64, 128, 256, 512, 1024, 2048, 4096, 8192],
)
axes.y.domain.min = 0

x = averages.index
y = numpy.column_stack((
    numpy.zeros(numpy.shape(x)),
    numpy.array(averages['transfer-blend-seconds', image_size, algorithm]),
    numpy.array(averages['construct-tree-seconds', image_size, algorithm] +
                averages['transfer-blend-seconds', image_size, algorithm]),
))

axes.fill(
    x,
    y,
    color=toyplot.color.Palette()[0],
    opacity=[1.0, 0.6],
)

axes.text(
    1300,
    0.04,
    'Constructing Composite Tree',
    color=toyplot.color.Palette()[0],
    #opacity=0.6,
)

axes.text(
    1300,
    0.01,
    'Remaining Compositing (Transfer/Blending)',
    color='white',
)

<toyplot.mark.Text at 0x315d6f60>

In [12]:
toyplot.pdf.render(canvas, '2-3-swap-overhead.pdf')