In [1]:
from __future__ import print_function

Load in the Python modules necessary to load the data (gzip and yaml), process the data (numpy and pandas), and plot the data (toyplot).

In [2]:
import gzip
import yaml
import numpy
import pandas
import toyplot.pdf

print("yaml version:    ", yaml.__version__)
print("numpy version:   ", numpy.__version__)
print("pandas version:  ", pandas.__version__)
print("toyplot version: ", toyplot.__version__)

yaml version:     3.12
numpy version:    1.13.3
pandas version:   0.20.3
toyplot version:  0.16.0


## Ingest Data

Load the data, which is output into a YAML file. (Actually, there are several runs that have been concatinated into a YAML file that has a sequence in its top level.)

In [3]:
filename = 'miniGraphics-skybridge-vn-no-compress.yaml.gz'
yaml_data = yaml.load(gzip.open(filename))
data = pandas.DataFrame(yaml_data)

The YAML data is hierarchical. The basic yaml reader to DataFrame just embeds dictionaries and lists in DataFrame columns. Fix that by expanding the data of these columns into new columns.

In [4]:
def expand_single_column(original_data, column_to_expand):

    expanded_data = pandas.DataFrame()
    for index in original_data.index:
        sub_table = pandas.DataFrame(original_data[column_to_expand][index])
        for column in original_data.columns:
            if column != column_to_expand:
                sub_table[column] = numpy.full(sub_table.index.shape,
                                               original_data[column][index],
                                               dtype=original_data[column].dtype)
        expanded_data = expanded_data.append(sub_table, ignore_index=True)
    return expanded_data

def flatten_table(original_data):
    flat_data = original_data
    for column_name in original_data.columns:
        if isinstance(flat_data[column_name][0], list):
            flat_data = expand_single_column(flat_data, column_name)
    return flat_data

In [5]:
data = flatten_table(data)

Add a column that gives a human-readable name to each image resolution.

In [6]:
image_height_names = {
    500: 'Desktop Window',
    1080: 'HDTV',
    4320: '8K UHD',
}

data['image-size'] = data['image-height'].map(image_height_names)

Rename the algorithms from the identifiers the program wrote out to the strings used in the paper. Note that there are some extras in the data that we are ignoring.

In [7]:
algorithm_names = {
    '2-3-SwapBase': '2-3 Swap',
    'BinarySwapFold': 'Naive',
    'BinarySwapTelescoping': 'Telescoping',
    'BinarySwapRemainder': 'Remainder',
    'IceTBase': 'IceT'
}

data['composite-algorithm'] = data['composite-algorithm'].map(algorithm_names)

Print a summary of the table data. There are multiple ways that Jupyter and pandas will report a summary of a table, but I find this method the most effective. It prints out every column. Then for all columns with a "small" number of unique values, it gives those values. This latter information really helps identify the proper way to group values.

In [8]:
import IPython.display

data_description = ''

for column_name in data.columns:
    data_description = data_description + '**' + column_name + '**: '
    unique_values = data[column_name].unique()
    if (len(unique_values) < 10):
        for value in unique_values:
            data_description = data_description + str(value) + ' '
    elif (numpy.issubdtype(unique_values.dtype, numpy.number)):
        data_description = (
            data_description +
            str(numpy.nanmin(unique_values)) + ' &ndash; ' +
            str(numpy.nanmax(unique_values)) + ' '
        )
    elif not pandas.isnull(unique_values).any():
        data_description = (
            data_description +
            str(numpy.min(unique_values)) + ' &ndash; ' +
            str(numpy.max(unique_values)) + ' '
        )
    data_description = data_description + ' \n'
    
IPython.display.display(IPython.display.Markdown(data_description))

**color-buffer-format**: byte  
**composite-algorithm**: Naive Remainder Telescoping 2-3 Swap IceT  
**composite-seconds**: 0.00118646 &ndash; 4.17276  
**construct-tree-seconds**: 0.000183397 &ndash; 0.0731399  
**depth-buffer-format**: float  
**gather-seconds**: 0.00068552 &ndash; 2.67005  
**geometry**: box  
**geometry-distribution**: duplicate  
**geometry-overlap**: -0.05  
**icet-copy-result-seconds**: 1.052e-06 &ndash; 0.00239137  
**image-compression**: False  
**image-height**: 288 1080 4320  
**image-width**: 512 1920 7680  
**num-processes**: 128 &ndash; 8192  
**num-triangles**: 1536 &ndash; 98304  
**paint-seconds**: 0.000118637 &ndash; 0.141582  
**painter**: simple  
**partial-composite-seconds**: 0.000291541 &ndash; 2.07523  
**phi-rotation**: -178.406 &ndash; 136.756  
**random-seed**: 17627  
**rendering-order-dependent**: False  
**start-time**: 2018-05-24T17:07:04.000000000 &ndash; 2018-05-29T07:30:55.000000000  
**theta-rotation**: -177.945 &ndash; 179.418  
**total-seconds**: 0.00144766 &ndash; 4.31646  
**trial-num**: 0 &ndash; 19  
**zoom**: 1.5  
**image-size**: nan HDTV 8K UHD  


## Plot Data

We are plotting the time it takes to do a "partial composite" (that is the time to blend all the pixels, but the pixels are left distributed across all the processes).

The first thing we want to do is to average the time it took over all trials. This is easily done with a pivot table.

In [9]:
average_partial_composite = data.pivot_table(
    values='partial-composite-seconds',
    index='num-processes',
    columns=[
        'image-size',
        'composite-algorithm',
    ],
    aggfunc='mean',
)

print(average_partial_composite.index)
print(average_partial_composite.columns)

Int64Index([ 128,  160,  176,  192,  224,  256,  272,  320,  352,  400,  448,
             512,  560,  640,  720,  800,  912, 1024, 1136, 1280, 1440, 1616,
            1824, 2048, 2288, 2576, 2896, 3248, 3648, 4096, 4592, 5152, 5792,
            6496, 7296, 8192],
           dtype='int64', name=u'num-processes')
MultiIndex(levels=[[u'8K UHD', u'HDTV'], [u'2-3 Swap', u'IceT', u'Naive', u'Remainder', u'Telescoping']],
           labels=[[0, 0, 0, 0, 0, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]],
           names=[u'image-size', u'composite-algorithm'])


Make a grouping structure of the data so we can pull out the actual data values for each trial.

In [10]:
grouped_data = data.groupby(['image-size', 'composite-algorithm'])

Plot data.

In [11]:
image_size = 'HDTV'
algorithms = ['2-3 Swap', 'Naive', 'Remainder']

canvas = toyplot.Canvas('4.5in', '3in')

axes = canvas.cartesian(
    xlabel='Number of Processes',
    ylabel='Partial Composite Time (seconds)',
    xscale='log',
    bounds=(45,-15,15,-50),
)

#axes.x.ticks.locator = toyplot.locator.Log(base=2, format='{:.0f}')
axes.x.ticks.locator = toyplot.locator.Explicit(
    locations=[128, 256, 512, 1024, 2048, 4096, 8192],
)
axes.y.domain.min = 0

for algorithm in algorithms:
    # Plot line following average
    data_series = average_partial_composite[image_size, algorithm]
    x = data_series.index
    y = numpy.array(data_series)
    axes.plot(x, y)
    
    # Some adjustments for correct label placement
    if algorithm == 'Remainder':
        vert_shift = '-10%'
    elif algorithm == 'Naive':
        vert_shift = '80%'
    else:
        vert_shift = '0'
    
    # Put label at end of series
    axes.text(
        x[-1],
        y[-1],
        algorithm + '&#160;',
        style={
            'text-anchor':'start',
            '-toyplot-anchor-shift':'5px',
            'baseline-shift': vert_shift,
        },
    )
    
    # Put a dot at every recorded sample.
    scatter_data = data.loc[grouped_data.groups[(image_size, algorithm)]]
    x = scatter_data['num-processes']
    y = scatter_data['partial-composite-seconds']
    axes.scatterplot(x, y, opacity=1)

In [12]:
toyplot.pdf.render(canvas, 'no-compress-hdtv.pdf')

In [13]:
image_size = '8K UHD'
algorithms = ['2-3 Swap', 'Naive', 'Remainder']

canvas = toyplot.Canvas('4.5in', '3in')

axes = canvas.cartesian(
    xlabel='Number of Processes',
    ylabel='Partial Composite Time (seconds)',
    xscale='log',
    bounds=(45,-15,15,-50),
)

#axes.x.ticks.locator = toyplot.locator.Log(base=2, format='{:.0f}')
axes.x.ticks.locator = toyplot.locator.Explicit(
    locations=[128, 256, 512, 1024, 2048, 4096, 8192],
)
axes.y.domain.min = 0

for algorithm in algorithms:
    # Plot line following average
    data_series = average_partial_composite[image_size, algorithm]
    x = data_series.index
    y = numpy.array(data_series)
    axes.plot(x, y)
    
    if algorithm == 'Remainder':
        vert_shift = '-10%'
    elif algorithm == 'Naive':
        vert_shift = '80%'
    elif algorithm == '2-3 Swap':
        vert_shift = '100%'
    else:
        vert_shift = '0'
    
    # Put label at end of series
    axes.text(
        x[-1],
        y[-1],
        algorithm + '&#160;',
        style={
            'text-anchor':'start',
            '-toyplot-anchor-shift':'5px',
            'baseline-shift': vert_shift,
        },
    )
    
    # Put a dot at every recorded sample.
    scatter_data = data.loc[grouped_data.groups[(image_size, algorithm)]]
    x = scatter_data['num-processes']
    y = scatter_data['partial-composite-seconds']
    axes.scatterplot(x, y, opacity=1)

In [14]:
toyplot.pdf.render(canvas, 'no-compress-8k.pdf')