# High-performance computing in Python

# Table of Contents
 <p><div class="lev1 toc-item"><a href="#High-performance-computing-in-Python-" data-toc-modified-id="High-performance-computing-in-Python--1"><span class="toc-item-num">1&nbsp;&nbsp;</span>High-performance computing in Python <a class="tocSkip"></a></a></div><div class="lev2 toc-item"><a href="#HPC-on-the-ICL-clusters" data-toc-modified-id="HPC-on-the-ICL-clusters-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>HPC on the ICL clusters</a></div><div class="lev3 toc-item"><a href="#Preparing-scripts-for-running-on-the-HPC" data-toc-modified-id="Preparing-scripts-for-running-on-the-HPC-111"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Preparing scripts for running on the HPC</a></div><div class="lev1 toc-item"><a href="#Copying-scripts-from-your-computer-to-the-HPC-server" data-toc-modified-id="Copying-scripts-from-your-computer-to-the-HPC-server-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Copying scripts from your computer to the HPC server</a></div><div class="lev1 toc-item"><a href="#Running-scripts-on-the-HPC" data-toc-modified-id="Running-scripts-on-the-HPC-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Running scripts on the HPC</a></div><div class="lev1 toc-item"><a href="#Readings-&amp;-Resources" data-toc-modified-id="Readings-&amp;-Resources-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Readings \&amp; Resources</a></div><div class="lev4 toc-item"><a href="#Readings-&amp;-Resources" data-toc-modified-id="Readings-&amp;-Resources-4001"><span class="toc-item-num">4.0.0.1&nbsp;&nbsp;</span>Readings &amp; Resources</a></div>

<!--NAVIGATION-->

<  | [Back to Mathematical modelling in Jupyter -->](Jupyter-Maths.ipynb) >

<h2>Table of Contents</h2>

This chapter introduces you to HPC in python using of the [Imperial College HPC](https://wiki.imperial.ac.uk/display/HPC/Introduction). 

Note that there are a number of ways in which you can develop HPC implementations for your code locally (on your own computer). I will not cover these, but here is a list:

* Ipython [parallel](https://ipython.org/ipython-doc/3/parallel/)
* Multi-threading, using the [threading](https://docs.python.org/3/library/threading.html) package.
* Using multiple processors with the [multiprocessing](https://docs.python.org/2/library/multiprocessing.html) package  

The difference between threading and muliprcocessing is that threads share in the same memory space, while processes have separate memory spaces. This makes it a harder to share information between processes with multiprocessing, but this is till a useful approach for quick and dirty parallelization. When better communication between processes is required, sophisticated solutions such as MPI and OpenMP may be needed.The MPI (Message Passing Interface) standard/protocol can be used in Python to parallelize your code over multiple processors thorugh the [mpi4py](http://mpi4py.scipy.org/docs/usrman/index.html) package. You also parallelize scipy/numpy array loops with [Cython and OpenMP](http://www.perrygeo.com/parallelizing-numpy-array-loops-with-cython-and-mpi.html).


## HPC on the ICL clusters

The flowing instructions also apply, with suitable modifications, for R scripts.

### Preparing scripts for running on the HPC

The script you will run needs a sha-bang (telling it what shell to run, usually bash), you need to allocate resources to PBS (such as walltime, number of processors, and memory , using the `\#PBS` directive, and tell it what Python script to run. The bash script could look something [like this](../silbiocomp/Practicals/Code//PythonHPC.sh)


Or, you can do something like this to move all files one-by-one to 
avoid exceeding memory allocation ({\tt *.p} indicates that you used 
{\tt pickle} to dump results):

\begin{lstlisting}
for f in *.p; do
	echo "Processing $f..."
	mv $f $WORK/TestPyHPC/output/
done
\end{lstlisting}

NOTE: Most of the cx1 nodes have multiple cores, so there's no fixed memory assigned to each core. If you use more memory than your request on your \#PBS directive, your job is likely to be terminated. If you request more memory than is available, the job will remain queued until sufficient memory is free 
for the job to run

PBS also  allows  you  to  submit jobs using a Python (instead of 
shell) script as well. Look up the qsub manual ({\tt man qsub}) in the 
HPC terminal, or visit 
\url{https://gist.github.com/nobias/5b2373258e595e5242d5}

Your HPC enabled Python code could look like this:

\lstinputlisting{Practicals/Code/MyHPCScript.py}

% In your Python code you need to set the environment so that it
% knows its working directory and where to output files:

% \begin{lstlisting}
  % home <- os.getenv('HOME')
  % ..
  % save(object, file=`home/whatever/object.RData')
% \end{lstlisting}

\section{Copying scripts from your computer to the HPC server}
 
Secure copy bash script file to {\tt \$HOME} on HPC server following
{\tt \$ scp source host:destination} structure, e.g.:

\begin{lstlisting}
$ scp script.sh user@login.cx1.hpc.ic.ac.uk:/home/user/whatever/script.sh
\end{lstlisting}

\section{Running scripts on the HPC}

Open a secure shell (ssh):

\begin{lstlisting}
$ ssh user@login.cx1.hpc.ic.ac.uk
\end{lstlisting}

Check for available modules:

\begin{lstlisting}
$ module avail
\end{lstlisting}

Your job then needs to be queued using {\tt qsub} (PBS):

\begin{lstlisting}
  $ qsub -j eo script.sh
\end{lstlisting}

where {\tt -j eo} is an option to join both output and error into one 
file. Running the script will produce a job output (anything that is 
printed in the shell terminal (e.g. {\tt echo})), and an error file 
(related to whether the script was successful or not), in the form of 
\{scriptname\}.o\{job id\} and \{scriptname\}.e\{jobid\}.\\

The {\tt qstat} command provides information on the job being submitted 
(which queue (short, medium, long), status, etc.) as well as 
information on all queues available (-q, -Q).

\section{Readings \& Resources}
IC library gives you with access to several e- and paper books on UNIX, some 
specific to Ubuntu. Browse or search and find a good intro book.

\begin{itemize}
  \itemsep6pt
  \item The ICL HPC wiki is a very useful resource: \url{https://wiki.imperial.ac.uk/display/HPC/Command+line}
\end{itemize}




#### Readings & Resources

