<!--NAVIGATION-->
| [Main Contents](Index.ipynb)|

# High Performance Computing <span class="tocSkip">` 

<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Local-parallel-processing" data-toc-modified-id="Local-parallel-processing-1">Local parallel processing</a></span><ul class="toc-item"><li><span><a href="#Threading-vs.-multiprocessing" data-toc-modified-id="Threading-vs.-multiprocessing-1.1">Threading vs. multiprocessing</a></span></li></ul></li><li><span><a href="#Running-python-scripts-on-IC-HPC" data-toc-modified-id="Running-python-scripts-on-IC-HPC-2">Running python scripts on IC HPC</a></span><ul class="toc-item"><li><span><a href="#Preparing-the-scripts-for-running-on-the-HPC" data-toc-modified-id="Preparing-the-scripts-for-running-on-the-HPC-2.1">Preparing the scripts for running on the HPC</a></span></li><li><span><a href="#Copying-scripts-from-your-computer-to-the-HPC-server" data-toc-modified-id="Copying-scripts-from-your-computer-to-the-HPC-server-2.2">Copying scripts from your computer to the HPC server</a></span></li><li><span><a href="#Running-the-scripts" data-toc-modified-id="Running-the-scripts-2.3">Running the scripts</a></span></li><li><span><a href="#Using-a-python-script-to-submit-jobs" data-toc-modified-id="Using-a-python-script-to-submit-jobs-2.4">Using a python script to submit jobs</a></span></li></ul></li><li><span><a href="#Readings-&amp;-Resources" data-toc-modified-id="Readings-&amp;-Resources-3">Readings &amp; Resources</a></span></li></ul></div>

This chapter introduces you to HPC in Python, including use of the [Imperial College HPC](https://wiki.imperial.ac.uk/display/HPC/High+Performance+Computing).

## Local parallel processing

Note that there are a number of ways in which you can develop HPC implementations for your code locally (on your own computer). We will not cover these, but here is a list of particularly useful approaches/tools:

* [Ipython `parallel`](https://ipython.org/ipython-doc/3/parallel)

* Multi-threading, using the [`threading` package](https://docs.python.org/3/library/threading.html)

* Using multiple processors with the [`multiprocessing` package](https://docs.python.org/2/library/multiprocessing.html)

### Threading vs. multiprocessing

The difference between threading and multiprocessing is that threads share the same memory allocation, while processes have separate memory allocations. This makes it a harder to share information between processes with multiprocessing, but this is till a useful approach for quick and dirty parallelization. When better communication between processes is required, sophisticated solutions such as MPI and OpenMP may be needed. The MPI (Message Passing Interface) standard/protocol can be used in Python to parallelize your code over multiple processors through the [`mpi4py` package]
(http://mpi4py.scipy.org/docs/usrman/index.html). You can also parallelize numpy array loops with [`cython` and OpenMP](http://www.perrygeo.com/parallelizing-numpy-array-loops-with-cython-and-mpi.html).

## Running python scripts on IC HPC

*These instructions also apply, with suitable modifications, for R scripts.*

### Preparing the scripts for running on the HPC

The script you will run needs a sha-bang (telling it what shell to run, usually bash), you need to allocate resources to PBS (such as walltime, number of processors, and memory , using the `#PBS` directive), and tell it what Python script to run. The bash script could
look something like this:

```bash
#!/bin/bash

#lines declaring parameters to request from HPC:

## tell the batch manager to limit the walltime for the job to given hh:mm:ss
#PBS -l walltime=06:30:00 

## tell the batch manager to use 1 node with 1 cpu (total 1*1 cpus) and 4000mb of memory per node
#PBS -l select=1:ncpus=1:mem=4000mb
## *NOTE: serial jobs do not require a number of cpus*

## Name your job (optional, but can be convenient)
#PBS -N Py_test_1

## setup to get an email when scripts starts and ends (or aborts) 
#PBS -m abe 
## Look up man qsub for what the options a,b,e do

## Specify email address (multiple addresses can be set; look up man qsub)
#PBS -M your.email@imperial.ac.uk

# Load python as engine; default is 2.7.3 change version from 2.7.3 if 
## needed (python 3 is supported)
module load python/2.7.3

# general tools
module load intel-suite
## Intel math kernel must be loaded at run time for compiling etc. 

echo "Python is about to run"

python < $WORK/TestPyHPC/MyHPCScript.py
## tells the batch manager to execute MyHPCScript.py in 
## TestPyHPC using python

# mv the output file result*
echo "Moving output files"
mv result* $WORK/TestPyHPC/output/

echo "Python has finished running"

```

Or, you can do something like this to move all files one-by-one to avoid exceeding memory allocation (`.p` indicates that you used `pickle` to dump results):

```bash
for f in *.p; do
    echo "Processing $f..."
    mv $f $WORK/TestPyHPC/output/
done
```

Note that to use python 3, you will need [Anaconda](https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/support/applications/python). 

NOTE: Most of the cx1 nodes have multiple cores, so there's no fixed memory assigned to each core. If you use more memory than your request on your `#PBS` directive, your job is likely to be terminated. If you request more memory than is available, the job will remain queued until sufficient memory is free for the job to run.

Your HPC enabled Python code could look like this:

```python
# -*- coding: utf-8 -*-
"""
Created on Wed Nov 02 16:20:48 2017

@author: Samraat Pawar

"""
import os # to get environment variables

home <- os.getenv('HOME')

i = int(os.getenv("PBS_ARRAY_INDEX"))

####Functions block start ####
def do_simulation(ar1, arg2, etc):
	results = ...
	return results
#### Functions block end ####

do_simulation(i)

save(results, file='home/MyProject/results_HPC.csv')
```
Note the lines in this Python code where you the environment so that it knows the working directory and where to output files.

### Copying scripts from your computer to the HPC server

Then, secure copy bash script file to `\$HOME` on HPC server using scp: 

`scp source host:destination` 

Fore example, 

`scp script.sh user@login.cx1.hpc.ic.ac.uk:/home/user/whatever/script.sh`

### Running the scripts

Open a secure shell (ssh):

`ssh user@login.cx1.hpc.ic.ac.uk`

where `user` is your ICL username. You will then be prompted to enter your (ICL) password. Once on the HPC server, check for available modules:

`module avail`

Your job then needs to be queued using `qsub` (PBS):

`qsub -j eo script.sh`

where `-j eo` is an option to join both output and error into one file. Running the script w ll produce a job output (anything that is printed in the shell terminal (e.g. `echo`)), an  an error file (related to whether the script was successful or not), in the form
of `{scriptname}.o{job id}` and `{scriptname}.e{jobid}.`

The `qstat` command provides information on the job being submitted (which queue (short, medium, long), status, etc.) as well as information on all queues available (-q, -Q).

### Using a python script to submit jobs

PBS also allows you to submit jobs using a Python (instead of shell) script. Look up the `qsub` manual (`man qsub`) in the HPC terminal, or [see this](https://gist.github.com/nobias/5b2373258e595e5242d5).

For example, the Python job script named "MyHPCPy.py" for a job named "HelloJob" prints "Hello":

```python
#!/usr/bin/python
#PBS -l select=1:ncpus=3:mem=1gb
#PBS -N HelloJob
print "Hello"
```

To run a Python job script you would do the same a for a bash job script above:

`qsub MyHPCPy.py`

## Readings & Resources

* The [ICL HPC wiki is a very useful resource] (https://wiki.imperial.ac.uk/display/HPC/Command+line)