# OpenMP Parallel Performance and Scalability analysis of $n$-body problem
---

Import all the required libraries.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from IPython.display import display, HTML

## System Info
---

In [None]:
base_path = 'data/'

info = pd.read_csv(base_path + 'system_info.txt', ':', names = ["Parameter", "Description"])
info = pd.DataFrame(info, columns = ["Parameter", "Description"])
display(HTML(info.to_html(justify='left',bold_rows=True ,index=False)))

## Obtain the data
---

Serial and parallel execution times can be obtained by running the script *run_tests.sh*. The number of processors, $N$, can be specified in the OpenMP parallelization.

> $ ./run_tests simpar

> $ ./run_tests simpar-omp N

The results from each run is stored in a csv file e.g ***parallel_cpu4_results.csv*** where the tests where run for $N=4$.

Read data from csv files and create DataFrame objects for each of the files. Since serial and parallel results for each number of processors are divided in different files, we concatenate the data from all the parallel results.

In [None]:
serial = pd.read_csv(base_path + 'serial_cpu1_results.csv')
serial.insert(0, column='Test Number', value=serial.index)

num_procs = 8
parallel_data = []

for i in range(1, num_procs+1):
    filename = 'parallel_cpu' + str(i) + '_results.csv'
    
    try:
        parallel_data.append(pd.read_csv(base_path + filename))
    except:
        print('File {0} not found. Try running the test script for {1} threads'.format(filename,i))

parallel = pd.concat(parallel_data, ignore_index=False)
parallel.insert(0, column='Test Number', value=parallel.index)

## Display data
---

### Serial algorithm results

In [None]:
display(HTML(serial.to_html(index=False)))

### OpenMP algorithm results

In [None]:
display(HTML(parallel.sort_values(by=['Test Number','Processors']).to_html(index=False)))

## Calculate performance metrics
---

Performance results for the faculty provided tests, with varying number of particles, $n$, and simulation steps, $t$. The performance metric used to check scalability was the elapsed time, the speedup and efficiency.

In [None]:
# Set indexes for data frames
serial.set_index(['Test Number'], inplace=True)
parallel.set_index(['Test Number','Processors'], inplace=True)

### Speedup

The principal measure of parallelization efficiency is the speedup, $S$, defined to be the ratio of the time to execute the computational workload $W$ on a single processor to the time on $N$ processors.
    
\begin{equation}
    S_N = \frac{T_1}{T_N}
    \quad\text{where}\quad
    T_N = (f + \frac{(1-f)}{N})T_1
\end{equation}

$f$  represents the fraction of the code that cannot be parallelized. The remaining fraction, $1 – f$, is parallelizable. Optimally, if the parallelized code scales linearly with the number of processors once parallelized, then the runtime reduces to $\frac{(1-f)}{N}$.

In [None]:
parallel['Speedup'] = (serial['Real time'] / parallel['Real time'])

In [None]:
# Reset indexes in order to calculate the following metrics
serial.reset_index(inplace=True)
parallel.reset_index(inplace=True)

### Efficiency

The ratio of the true speedup to the theoretical speedup is the parallelization efficiency, which is a measure of the efficiency of the parallel processor to execute a given parallel algorithm. Any degradation in performance due to parallelization overhead will result in $\epsilon_N$ being less than one.

\begin{equation}
    \epsilon_N = \frac{S_N}{N}
\end{equation}

In [None]:
parallel['Efficiency'] = parallel['Speedup'] / parallel['Processors']

## Data analysis
---

In [None]:
aux = parallel[['Test Number','Particles', 'Steps','Processors','Real time','Speedup','Efficiency']]
temp = pd.DataFrame(aux)
temp.set_index(['Test Number','Processors'], inplace=True)
temp.sort_index(inplace=True)

display(HTML(temp.sort_values(by=['Test Number','Processors']).to_html(index=True)))

In [None]:
line_labels = []
lines = []

# Create matplotlib figure
fig, ax = plt.subplots(1,2, figsize=(10,4))
plt.tight_layout()
fig.subplots_adjust(wspace=0.18)

# Setup subplots
ax[0].grid()
ax[1].grid()
ax[0].set_xlabel('Processors')
ax[0].set_ylabel('Speedup')
ax[1].set_xlabel('Processors')
ax[1].set_ylabel('Efficiency')

# For each test in range
for i in range(2,6):
    line_labels.append('{0}: n={1}, t={2}'.format(i, int(temp.loc[(i,1)]['Particles']), int(temp.loc[(i,1)]['Steps'])))
    ax[0].plot(temp.loc[i]['Speedup'], marker=".")
    l, = ax[1].plot(temp.loc[i]['Efficiency'], marker=".")
    lines.append(l)

ax[0].plot([1,8],[1,8], linestyle=':', color='black' ,alpha=0.4)
ax[1].plot([1,8],[1,1], linestyle=':', color='black' ,alpha=0.4)

plt.subplots_adjust(right=0.8)

fig.legend(tuple(lines), tuple(line_labels), loc="center right", borderaxespad=0.1, title="# of test", shadow=True)
plt.show()