High Performance Computing {#chap:HPC}
==========================

This chapter introduces you to HPC in python using of the Imperial
College HPC
([(https://wiki.imperial.ac.uk/display/HPC/Introduction]((https://wiki.imperial.ac.uk/display/HPC/Introduction)).

Local parallel processing
-------------------------

Note that there are a number of ways in which you can develop HPC
implementations for your code locally (on your own computer). I will not
cover these, but here is a list of particularly useful approaches/tools:

-   Ipython <span>parallel</span>:
    <https://ipython.org/ipython-doc/3/parallel/>

-   Multi-threading, using the <span>threading</span> package:
    <https://docs.python.org/3/library/threading.html>

-   Using multiple processors with the <span>multiprocessing</span>
    package <https://docs.python.org/2/library/multiprocessing.html>

The difference between threading and multiprocessing is that threads
share in the same memory space, while processes have separate memory
spaces. This makes it a harder to share information between processes
with multiprocessing, but this is till a useful approach for quick and
dirty parallelization. When better communication between processes is
required, sophisticated solutions such as MPI and OpenMP may be needed.
The MPI (Message Passing Interface) standard/protocol can be used in
Python to parallelize your code over multiple processors through the
<span>mpi4py</span> package:
<http://mpi4py.scipy.org/docs/usrman/index.html>. You can also
parallelize numpy array loops with <span>cython</span> and OpenMP:
<http://www.perrygeo.com/parallelizing-numpy-array-loops-with-cython-and-mpi.html>.

Running python scripts on IC HPC
--------------------------------

<span>*These instructions also apply, with suitable modifications, for R
scripts.*</span>

### Preparing the scripts for running on the HPC

The script you will run needs a sha-bang (telling it what shell to run,
usually bash), you need to allocate resources to PBS (such as walltime,
number of processors, and memory , using the <span>\#PBS</span>
directive), and tell it what Python script to run. The bash script could
look something like this:

Or, you can do something like this to move all files one-by-one to avoid
exceeding memory allocation (<span>.p</span> indicates that you used
<span>pickle</span> to dump results):

In [None]:
for f in *.p; do
    echo "Processing $f..."
    mv $f $WORK/TestPyHPC/output/
done

NOTE: Most of the cx1 nodes have multiple cores, so there’s no fixed
memory assigned to each core. If you use more memory than your request
on your <span>\#PBS</span> directive, your job is likely to be
terminated. If you request more memory than is available, the job will
remain queued until sufficient memory is free for the job to run.

Your HPC enabled Python code could look like this:

Note the lines in this Python code where you the environment so that it
knows the working directory and where to output files.

### Copying scripts from your computer to the HPC server

Then, secure copy bash script file to <span>\$HOME</span> on HPC server
following <span>\$ scp source host:destination</span> structure, e.g.:

In [None]:
$ scp script.sh user@login.cx1.hpc.ic.ac.uk:/home/user/whatever/script.sh

### Running the scripts

Open a secure shell (ssh):

In [None]:
$ ssh user@login.cx1.hpc.ic.ac.uk

where <span>user</span> is your ICL username. You will then be prompted
to enter your (ICL) password. Once on the HPC server, check for
available modules:

In [None]:
$ module avail

Your job then needs to be queued using <span>qsub</span> (PBS):

In [None]:
$ qsub -j eo script.sh

where <span>-j eo</span> is an option to join both output and error into
one file. Running the script will produce a job output (anything that is
printed in the shell terminal (e.g. <span>echo</span>)), and an error
file (related to whether the script was successful or not), in the form
of {scriptname}.o{job id} and {scriptname}.e{jobid}.\
The <span>qstat</span> command provides information on the job being
submitted (which queue (short, medium, long), status, etc.) as well as
information on all queues available (-q, -Q).

### Using a python script to submit jobs

PBS also allows you to submit jobs using a Python (instead of shell)
script. Look up the <span>qsub</span> manual (<span>man qsub</span>) in
the HPC terminal, or visit
<https://gist.github.com/nobias/5b2373258e595e5242d5>.

For example, the Python job script named “MyHPCPy.py” for a job named
“HelloJob” prints “Hello”:

In [None]:
#!/usr/bin/python
#PBS -l select=1:ncpus=3:mem=1gb
#PBS -N HelloJob
print "Hello"

To run a Python job script you would do the same a for a bash job script
above:

In [None]:
$ qsub MyHPCPy.py

Readings & Resources
--------------------

-   IC library gives you with access to several e- and paper books on
    UNIX, some specific to Ubuntu. Browse or search and find a good
    intro book.

-   The ICL HPC wiki is a very useful resource:
    <https://wiki.imperial.ac.uk/display/HPC/Command+line>