#    APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools

This notebook provide a complete example of reproducible experimentation in the field of life science. The objective is to determine the best reconstruction parameters for a simulated Positron Emission Tomography (PET) scaner system. For that purpose, the source code of two required programs has been distributed with this notebook. The first one, "reconstructor" is a implementation based on OPLM algorithm for image reconstruction of PET systems. The second, "evaluateImage", get a reconstructed image as input to compares it with the expected one and return a set of image quality parameters. The input data is a simulated adquisition of a PET system formed by 3 rings with 20 detector modules each one. The simulated data has been provided via a public S3 bucket (https://s3.amazonaws.com/grycap/datasets/apricot/reconstruction/data.txz) because its size.

# Experimentation

TO DO

## Setting up the system

For this experimentation we need perform a multiparametric analysis to obtain the best image reconstruction parameters. So, the first step is infrastructure deployment. The chosen infrastructure is a batch cluster with 5 worker nodes and ubuntu 18.04 LTS images as OS. Used nodes have 2 GB of RAM and 20 of disk space. Also, we recommend to assign, at least, two CPUs to front-end node for this experiment. To reproduce the experiment deploy this infrastructure using the APRICOT deploy extension and one of the supported cloud providers. Once the cluster has been configured, you can follow the experimentation.

Store the cluster and user names

In [None]:
clusterName = "reconstruction"

In [None]:
username = "ubuntu"

We need to load the python module with the APRICOT magics to manage our clusters. This step can be avoided if the module is loaded by default on each notebook.

In [None]:
%reload_ext apricot_magic

Once the infrastructure has been deployed and configured, check it using %apricot_ls magic

In [None]:
%apricot_ls

If the infrastructure deployment fails, we can get the output log with the following instruction,

In [None]:
%apricot_log $clusterName

## Preparing data and programs

Now, we must provide the necessary data and programs for our analisys:

1- Raw simulated data

2- Comparision program code

3- Reconstruction program code


All required source code can be upload easily from our computer using the ''%apricot_upload'' instruction. However, the input data with raw scanner simulated detections is stored externally because its size. This data has been stored in a public AWS S3 bucket thus can be downloaded using "curl". First, check if the "input" folder is at the current directory:

In [None]:
%%bash
ls

Next, upload required source and data files

In [None]:
%apricot_upload $clusterName input /home/$username

download input data using "curl". This step may take few minutes

In [None]:
%apricot exec $clusterName curl https://s3.amazonaws.com/grycap/datasets/apricot/reconstruction/data.txz --output /home/$username/data.txz

finally, extract input data

In [None]:
%apricot exec $clusterName cd /home/$username && tar -xvf data.txz

Check if all required files are in "input" folder

In [None]:
%apricot exec $clusterName ls /home/$username/input

Now, we need to compile the source codes. All the necessary compilers and cmake tools should be installed at configuration initialization.

Compile the reconstruction and comparision programs

In [None]:
%apricot exec $clusterName cd /home/$username/input/reconstructor_code && bash install.sh && cp reconstructor ../

In [None]:
%apricot exec $clusterName cd /home/$username/input && g++ -o evaluateImage evaluateImage.cpp -O2

Check if the executables has been created (evaluateImage and reconstructor)

In [None]:
%apricot exec $clusterName ls /home/$username/input/

Now we have all necessary files at cluster. This has been configured automatically with nfs, so the '/home' directory is shared by all workers and the frontend.

We need also a folder to store the results. Create a folder named 'results'.

In [None]:
%apricot exec $clusterName mkdir /home/$username/results

## Executing jobs

Now, is time to execute the experiment. For simplicity, this multiparametric study only uses three variable parameters. However, this can be extended to any number of parameter ranges. First of all, we need to specify each parameter interval and step size. To reduce computation time set larger step sizes.

In [None]:
minNvox_xy = 20
maxNvox_xy = 250
stepNvox_xy = 50

minNvox_z = 20
maxNvox_z = 250
stepNvox_z = 100

nChunksMin = 5
nChunksMax = 5
chunkStep = 1

Now, use %apricot_genMPid function to obtain an identifier for the specified ranges. We will use this identifier to repeat the experimentation keeping the results of previous runs.

In [None]:
ID = %apricot_genMPid $minNvox_xy $maxNvox_xy $stepNvox_xy $minNvox_z $maxNvox_z $stepNvox_z $nChunksMin $nChunksMax $chunkStep

In [None]:
print(ID)

Create a specific folder for this run with previous ID

In [None]:
%apricot exec $clusterName mkdir /home/$username/results/$ID

In [None]:
%apricot exec $clusterName ls /home/$username/results

Launch the jobs using ''%apricot_runMP'' and the local script ''script.sh''. This step can be delayed several minutes until all workers have been configured.

In [None]:
%apricot_runMP $clusterName script.sh /home/$username $minNvox_xy $maxNvox_xy $stepNvox_xy $minNvox_z $maxNvox_z $stepNvox_z $nChunksMin $nChunksMax $chunkStep
            

Check if all jobs has been finished

In [None]:
%apricot exec $clusterName squeue

When no jobs appear in the tasks queue, execute post-processing script

In [None]:
%apricot exec $clusterName cd /home/$username/ && bash input/getResults.sh $ID

## Getting results

At next step, download the file with the results of our multiparametric study.

In [None]:
resultsFilename = "results-" + ID + ".dat"
%apricot_download $clusterName /home/$username/$resultsFilename .

In [None]:
%%bash

ls

Read results file data

In [None]:
fileIn = open(resultsFilename,"r")

data = fileIn.read()

# Extract data lines
data = data.strip().split('\n')
# Remove header line
data.pop(0)

# Extract input data
XYnvox = []
Znvox = []
chunks = []
userTimeMin = []
userTimeSec = []
systemTimeMin = []
systemTimeSec = []
RMSE = []
PSNR = []
NRMSD = []
NMAD = []

for line in data:
    line = " ".join(line.split())
    words = line.strip().split(' ')
    XYnvox.append(float(words[0]))
    Znvox.append(float(words[1]))
    chunks.append(float(words[2]))
    userTimeMin.append(float(words[3]))
    userTimeSec.append(float(words[4]))
    systemTimeMin.append(float(words[5]))
    systemTimeSec.append(float(words[6]))
    RMSE.append(float(words[7]))
    PSNR.append(float(words[8]))
    NRMSD.append(float(words[9]))
    NMAD.append(float(words[10]))
    
    
fileIn.close()

Import numpy and plot libraries. If is not installed in your system execute the following line

In [None]:
%%bash
python3 -m pip install --user numpy scipy matplotlib

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d

userTime = np.add(np.multiply(userTimeMin,60.0),userTimeSec)
systTime = np.add(np.multiply(systemTimeMin,60.0),systemTimeSec)
totalTime = userTime + systTime

Plot the results

In [None]:

nChunks = 5.0
subXY = []
subZ = []
subTimes = []
subRMSE = []
subPSNR = []
subNRMSD = []
subNMAD = []

subRMSE_zoom = []
subXY_zoom = []
subZ_zoom = []
nVoxCut = 60

for i in list(range(len(XYnvox))):
    if nChunks == chunks[i]:
        subXY.append(XYnvox[i])
        subZ.append(Znvox[i])
        subTimes.append(totalTime[i])
        subRMSE.append(RMSE[i])
        subPSNR.append(PSNR[i])
        subNRMSD.append(NRMSD[i])
        subNMAD.append(NMAD[i])
        if XYnvox[i] > nVoxCut and Znvox[i] > nVoxCut:
            subXY_zoom.append(XYnvox[i])
            subZ_zoom.append(Znvox[i])
            subRMSE_zoom.append(RMSE[i])
        
Axpad = 280        
fig = plt.figure()
#plt.rcParams["figure.figsize"] = [100,100]
#plt.rcParams.update({'font.size': 128})

ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('nº voxels x-y')
ax.set_ylabel('nº voxels z')
ax.set_zlabel('time (s)')

ax.xaxis.labelpad = Axpad
ax.yaxis.labelpad = Axpad
ax.zaxis.labelpad = Axpad


ax.plot_trisurf(subXY,subZ,subTimes)
plt.show()

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('nº voxels x-y')
ax.set_ylabel('nº voxels z')
ax.set_zlabel('rmse')

ax.xaxis.labelpad = Axpad
ax.yaxis.labelpad = Axpad
ax.zaxis.labelpad = Axpad

ax.plot_trisurf(subXY,subZ,subRMSE)
plt.show()


fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('nº voxels x-y')
ax.set_ylabel('nº voxels z')
ax.set_zlabel('rmse')

ax.xaxis.labelpad = Axpad
ax.yaxis.labelpad = Axpad
ax.zaxis.labelpad = Axpad

ax.plot_trisurf(subXY_zoom,subZ_zoom,subRMSE_zoom)
plt.show()


Now, we can repeat the experiment with different parameter ranges or plot different results.

## Delete infrastructure

If we have been finished the experimentation, destroy the cluster

In [None]:
%apricot destroy $clusterName

# Conclusions and future work

TODO

# References

TODO