# Average throughput test

Use this notebook to test out an alternate approach to calculating the event throughput by using the average time to process an event rather than the total number of events and the total time. This measurement should be more robust against idle core fluctuations at the end of the event processing

## Gather the data

Let's start with some job results

In [1]:
import numpy as np

In [27]:
from utils.prep import parse_job_results, load_job_results
from utils.timing import get_throughput, get_evloop_start_time

In [3]:
ls

AvgThruputTest.ipynb              [34mresults_aibuild_1mu_2016_07_22[m[m/
G4HiveAna_1muon.ipynb             [34mresults_aibuild_tt_2016_05_21[m[m/
G4HiveAna_1muon_knl.ipynb         [34mresults_aibuild_tt_2016_05_25[m[m/
G4HiveAna_ttbar.ipynb             [34mresults_aibuild_tt_2016_05_31[m[m/
README.md                         [34mresults_cori_1mu_2016_05_17[m[m/
TestPickle.ipynb                  [34mresults_endeavour_1mu[m[m/
ThroughputTable.ipynb             [34mresults_endeavour_1mu_2016_08_06[m[m/
Untitled.ipynb                    [34mresults_endeavour_1mu_2016_08_14[m[m/
[31mparseResults.py[m[m*                  test.pickle
[34mresults_aibuild_1mu_2016_05_19[m[m/   [34mutils[m[m/


In [4]:
results_file = 'results_aibuild_1mu_2016_07_22/results.pickle'
job_results = load_job_results(results_file)

In [5]:
job_results

[<utils.prep.JobResult at 0x10dc79f98>,
 <utils.prep.JobResult at 0x10e846908>,
 <utils.prep.JobResult at 0x10e846ba8>,
 <utils.prep.JobResult at 0x10e846e48>,
 <utils.prep.JobResult at 0x10e855128>,
 <utils.prep.JobResult at 0x10e8554a8>,
 <utils.prep.JobResult at 0x10e855828>,
 <utils.prep.JobResult at 0x10e855ba8>,
 <utils.prep.JobResult at 0x10e855f28>,
 <utils.prep.JobResult at 0x10e8582e8>,
 <utils.prep.JobResult at 0x10e858668>,
 <utils.prep.JobResult at 0x10e8589e8>,
 <utils.prep.JobResult at 0x10e858d68>,
 <utils.prep.JobResult at 0x111f4b128>,
 <utils.prep.JobResult at 0x111f4b4a8>,
 <utils.prep.JobResult at 0x111f4b828>,
 <utils.prep.JobResult at 0x111f4bba8>,
 <utils.prep.JobResult at 0x111f4bf28>,
 <utils.prep.JobResult at 0x111f4e2e8>,
 <utils.prep.JobResult at 0x111f4e668>,
 <utils.prep.JobResult at 0x111f4e9e8>]

## Quick test with single thread

Can I calculate the throughput on a single thread. Compare to my existing function.

In [6]:
j0 = job_results[0]
print(j0.nThread, j0.nEvent)
print(j0.timeline_results.dtype)

1 1000
[('starts', '<i8'), ('ends', '<i8'), ('algs', '<U15'), ('tids', '<i4'), ('slots', '<i4'), ('events', '<i4')]


In [7]:
starts_events = j0.timeline_results[['starts', 'events']]
print(starts_events[:10])

[(1469166623831742318, 0) (1469166623832310076, 0) (1469166623865992343, 0)
 (1469166628370060150, 0) (1469166628567813934, 0) (1469166628567847419, 0)
 (1469166628570491966, 1) (1469166628570551335, 1) (1469166628570739322, 1)
 (1469166628928540478, 1)]


In [8]:
starts_events

array([(1469166623831742318, 0), (1469166623832310076, 0),
       (1469166623865992343, 0), ..., (1469167346610054353, 999),
       (1469167346611805687, 999), (1469167346611830146, 999)], 
      dtype=[('starts', '<i8'), ('events', '<i4')])

## Prepare the data

We need to put the timeline results into a way that makes this calculation easy. Events are processed within a slot. A slot should only process one event at a time, but it can be executing algorithms for that event on concurrent threads. Still, I think I could loop through a slot and identify when an event begins. The time between begins is the event period. The event rate is the reciprocal of the average period.

In [9]:
# Try to get the start times and event numbers of some timeline results
j = job_results[3]
print(j.nThread, j.nEvent)

4 4000


In [10]:
slots = j.timeline_results['slots']

In [11]:
np.unique(slots)

array([0, 1, 2, 3], dtype=int32)

In [12]:
slots == 3

array([False, False, False, ...,  True, False, False], dtype=bool)

In [13]:
def get_timeline_slot_idxs(job, slot):
    return job.timeline_results['slots'] == slot

def get_all_timeline_slot_idxs(job):
    slots = np.unique(job.timeline_results['slots'])
    return [get_timeline_slot_idxs(job, slot) for slot in slots]

In [14]:
x = get_all_timeline_slot_idxs(j)

In [15]:
x

[array([ True,  True,  True, ..., False,  True, False], dtype=bool),
 array([False, False, False, ..., False, False,  True], dtype=bool),
 array([False, False, False, ..., False, False, False], dtype=bool),
 array([False, False, False, ...,  True, False, False], dtype=bool)]

## Simpler calculation by thread
Let's do this a little simpler. Calculate the average throughput per thread by summing all events and individual thread times, then scale this up by the number of threads to get an average throughput.

In [26]:
def get_timeline_duration(timeline, start_time=None):
    """Calculation total duration of a set of timeline results"""
    if start_time is None:
        start_time = timeline['starts'].min()
    end_time = timeline['ends'].max()
    return (end_time - start_time)*1e-9

def get_timelines_by_tid(timeline):
    """Get dictionary of (tid, timeline)"""
    tids = timeline['tids']
    unique_tids = np.unique(tids)
    return dict((tid, timeline[tids == tid]) for tid in unique_tids)

In [20]:
timelines_by_tid = get_timelines_by_tid(j.timeline_results)
time_by_tid = dict((tid, get_timeline_duration(timeline))
                   for (tid, timeline) in timelines_by_tid.items())
print(time_by_tid)

{-1423886592: 820.13058350300003, -1419688192: 804.7894327140001, -1415489792: 804.79589456100007, -1411291392: 804.690985329}


In [21]:
avg_thruput = j.nEvent / sum(time_by_tid.values()) * j.nThread

In [25]:
print(avg_thruput, get_throughput(j))

4.94681111992 4.87727197344


Ok, so that works, but lets try to implement it another way

In [28]:
def get_avg_thread_time(job):
    """Get the thread-averaged event loop time"""
    tids = job.timeline_results['tids']
    unique_tids = np.unique(tids)
    start = get_evloop_start_time(job)
    ends = job.timeline_results['ends']
    time = 0
    for tid in unique_tids:
        thread_ends = ends[tids == tid]
        thread_end = thread_ends[-1]
        time += (thread_end - start)*1e-9
    return time / job.nThread

def get_avg_throughput(job):
    return job.nEvent / get_avg_thread_time(job)

In [29]:
print(avg_thruput, get_throughput(j), get_avg_throughput(j))

4.94681111992 4.87727197344 4.94681094052
