In [1]:
%pylab inline
%matplotlib notebook

Populating the interactive namespace from numpy and matplotlib


<hr/>
# Prob2 : Performance results
<hr/>

This note book assumes that you have run your code for processor counts $[1,2,4,8]$ and have files `trap_1.out`, `trap_2.out` and so on. 

First, we create a Panel of data collecting everything in the files you just created. 

In [2]:
import numpy, pandas

file_prefix = 'trap_'

pdata = {}
nprocs = [1,2,4,8, 16]
for p in nprocs:
    fname = file_prefix + '{:02d}'.format(p) + '.out'
    try:
        df = pandas.read_table(fname,names=['N','soln','err','t'],delim_whitespace=True)        
    except:
        print("File '{:s}' not found.".format(fname))
    else:
        tname = 'p' + '{:02d}'.format(p)       
        pdata[tname] = df
        
    
panel = pandas.Panel(pdata)

File 'trap_16.out' not found.


We can index panels as dictionaries. 

In [3]:
panel['p02']

Unnamed: 0,N,soln,err,t
0,1024.0,1.872592,4.6778e-07,0.0421
1,2048.0,1.872593,1.1695e-07,0.0395
2,4096.0,1.872593,2.9236e-08,0.0379
3,8192.0,1.872593,7.3091e-09,0.0373
4,16384.0,1.872593,1.8273e-09,0.0382
5,32768.0,1.872593,4.5681e-10,0.0377
6,65536.0,1.872593,1.1419e-10,0.0374
7,131072.0,1.872593,2.8554e-11,0.0399
8,262144.0,1.872593,7.1374e-12,0.0463
9,524288.0,1.872593,1.7852e-12,0.0576


We can also look at different slice of data.  For example, suppose we wanted to check our results across all processors for N=16384.  This corresponds to index value 4 (see above).  The N axis is the major axes, and so we can look across the "cubed" panel data using 

    panel.major_xs(4)    # Corresponds to N = 16384
    
Then, we look to see that we have virtually identical error results across all processors. Note that we transpose the data so that header labels are across the top.

In [5]:
panel.major_xs(18).transpose()    # Choose layers of N values

Unnamed: 0,N,soln,err,t
p01,268435456.0,1.872593,2.8644e-13,20.2
p02,268435456.0,1.872593,7.975e-12,10.7
p04,268435456.0,1.872593,1.7639e-12,5.65
p08,268435456.0,1.872593,6.1329e-13,1.2


We could also slice along the minor axis to see the timing results across all processors for our range of N values.

In [6]:
panel.minor_xs('t') 

Unnamed: 0,p01,p02,p04,p08
0,0.0297,0.0421,0.0484,0.0636
1,0.0315,0.0395,0.0478,0.0644
2,0.0283,0.0379,0.0501,0.0637
3,0.0308,0.0373,0.0472,0.0635
4,0.0309,0.0382,0.0481,0.0686
5,0.0303,0.0377,0.0468,0.0642
6,0.0329,0.0374,0.0509,0.0645
7,0.037,0.0399,0.0492,0.0646
8,0.0474,0.0463,0.0515,0.065
9,0.0674,0.0576,0.0562,0.066


## Plot timing results

Using the Panel, we can easily plot all of the timing results in a single plot.  

In [7]:
df_timing = panel.minor_xs('t') 
cols = ['N',*df_timing.columns]


df_timing['N'] = panel['p01']['N'].astype('int')
df_timing = df_timing[cols]
df_timing.plot(x='N',logx=True,logy=True,style='.-',markersize=10)

title("Timing results",fontsize=18);
xlabel("N",fontsize=16)
ylabel("Time (s)",fontsize=16)

<IPython.core.display.Javascript object>

Text(0,0.5,'Time (s)')

<hr/>

## Strong scaling

If an algorithm scales well, we expect that adding more processors to a problem of fixed size should speed up the calculation.  If a code were "embarrassingly parallel", we expect two processors to take half as much time as one processor, 4 processors to take a quarter of the time, and so on.  We call this type of scaling "strong" scaling.  

For strong scaling, we compare timings for a fixed value of $N$.   We will choose one of the larger values to see better results.

In [8]:
idx = 18    # Choose N corresponding to index=18
N = int(panel['p01']['N'][idx])

procs = array([1,2,4,8])

df_strong = panel.major_xs(idx).transpose()    
df_strong['p'] = procs
df_strong[['p','soln','err','t']].style.set_caption("N = {:d}".format(N))

Unnamed: 0,p,soln,err,t
p01,1,1.87259,2.8644e-13,20.2
p02,2,1.87259,7.975e-12,10.7
p04,4,1.87259,1.7639e-12,5.65
p08,8,1.87259,6.1329e-13,1.2


Plot the strong scaling results and show the best-fit line to get an estimate of the speed-up.  

In [9]:
df_strong.plot(x='p',y='t',logx=True,logy=True,style='.-',markersize=15)

# Plot best-fit speed-up line
t_strong = array(df_strong['t'])
c = polyfit(log(procs[:-1]),log(t_strong[:-1]),1)
loglog(procs,exp(polyval(c,log(procs))),'r--')

legend(['Time (slope={:6.2f})'.format(c[0]),'Speed-up'])
title('Speed-up',fontsize=18);

<IPython.core.display.Javascript object>

## Weak scaling

If an algorithm scales well, we expect to be able to solve bigger problems by adding more processors.  For example, if we double the size of the problem, and double the number of processors, we expect the code to take the same time as the original problem.  This sort of scaling is called "weak scaling".  

In [12]:
figure()
clf()
df_weak = panel.minor_xs('t')
df_weak

<IPython.core.display.Javascript object>

Unnamed: 0,p01,p02,p04,p08
0,0.0297,0.0421,0.0484,0.0636
1,0.0315,0.0395,0.0478,0.0644
2,0.0283,0.0379,0.0501,0.0637
3,0.0308,0.0373,0.0472,0.0635
4,0.0309,0.0382,0.0481,0.0686
5,0.0303,0.0377,0.0468,0.0642
6,0.0329,0.0374,0.0509,0.0645
7,0.037,0.0399,0.0492,0.0646
8,0.0474,0.0463,0.0515,0.065
9,0.0674,0.0576,0.0562,0.066


In [14]:
idx = 14     # Start with 'N' index;  shift by one as we increase the processor count

t_weak = array([df_weak[c][13+i] for i,c in enumerate(df_weak.columns)])

semilogx(procs,t_weak,'.-',markersize=15)
semilogx(procs,[t_weak[0]]*4,'k--')
title('Weak scaling', fontsize=18)
xlabel('Cores')
ylabel("Time (s)")
legend(['Time (s)','Perfect scaling'])
show()

## Efficiency

When we add more processors, we expect some overhead associated with more communication.  This is captured somewhat in the weak scaling results, but what is not shown is how quickly the efficiency drops off.  

Efficiency plots can often highlight poor scaling reslts that are not obvious from strong scaling results.

In [19]:
figure()
clf()

# Efficiency
E = t_strong[0]/(procs*t_strong)*100

semilogx(procs,E,'.-',markersize=15)
semilogx(procs,[100]*4,'k--',linewidth=2)

xlabel('Cores',fontsize=16)
ylabel('Efficiency (%)',fontsize=16)
title("Efficiency (%)");
legend(['Time (s)', 'Perfect efficiency'])
xlim([1/sqrt(2), 2**4.5])
ylim([10,300])
grid()

<IPython.core.display.Javascript object>