## tldr

Expressed as concisely as possible (see Figure below) we will be working on a jupyter notebook server for the duration of this course.
We will access this notebook server through our local web browser. The server will be running inside a singularity container on a biowulf compute node.
The singularity container will be run from a persistent tmux session.
In order to access the jupyter notebook server from our local browser we will need to use ssh to forward the appropriate port from the compute node back to our local computer.

![](./course_setup_overview.png)

##  Course setup overview

In order to work remotely with python we will use a jupyter notebook server on the NIH HPC systems. This allows us to keep track of the commands we enter in a notebook. The notebook server also allows us to edit text, run terminals, and conveniently upload and download files with a graphical interface. In addition we could easily switch this setup to work with  R, Julia, or [many other](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) programming languages using the same interface.

In order to provide a consistent environment for all students we will run the jupyter notebook server from within a [singularity](http://singularity.lbl.gov) container. Container technologies (Docker and singularity) allow us to isolate a software stack from the host operating system. It means that using the same container we can rerun our analysis on most operating systems, and computers. There are limits to this portability which you will learn about during the course but overall containers are a very effective means of rapidly sharing a distinct environment in which to run an analysis.

A universal solution for a course setup to run the required software - for employees within NIH, as well as those with guest accounts from outside - is to run the aforementioned singularity container on a [biowulf](https://hpc.nih.gov/systems/) computing node. It utilizes the superb resources available from the NIH HPC as well as avoiding the problems of attempting to work reproducibly across many personal computers with varying degrees of installation privileges.

As we are working remotely we can be vulnerable to internet connectivity issues. To reduce the effect of drops in connectivity we will use [tmux](http://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/) with its session management, which allows us to maintain a persistent shell session on the compute node independent of an internet connection. Without this software, we would lose our work each time our internet connectivity dropped. Once the tmux session is setup, we will only need to reconnect to it if we need to change something with  our launched singularity container/jupyter notebook server (hopefully this won't happen). If connective does drop, we simply need to reopen a connection (with the appropriate port forwarding to our compute node).

The following setup will work through the steps required for obtaining persistent access to a computing node in the biowulf cluster and running our jupyter notebook from the singularity container.

## Course setup (with all the details)

In order to run the jupyter notebook we must:
+ Set up ssh keys
+ Set up a persistent tmux session on the biowulf headnode
+ Request the resources of a compute node
+ Run the jupyter notebook server in a singularity container.
+ Setup an ssh tunnel with port forwarding to the compute node

### Setting up ssh keys to work on the NIH cluster.

**NOTE:** For windows users will need to use putty to connect remotely to the hpc systems as described [here](https://docs.google.com/document/d/11Byl0wZ5FSqaj3lhMPlDmwaFUP-xQ8Cm8EgNEaBjmgw/edit#).

The setup for this course requires a number of hops using ssh in order for us to access the biowulf compute node. For unix-like OSes we can streamline this process to avoid typing in our password for each jump we make. We will achieve this by using ssh-agent to manage an encryption key. This process can be roughly broken down into the following steps:

+ Generating a keypair
+ Transferring the public key to .ssh on the target system.
+ Use ssh-agent to manage the private key.
+ Use a config file to configure our ssh behaviour

#### Generating a keypair

If you have not used ssh from your computer before you will have some extra steps to carry out. The first of these is:

In [None]:
mkdir ~/.ssh

In order to create an ssh encryption key pair we type:

In [None]:
cd ~/.ssh
ssh-keygen -f nih_ssh_key

Follow the prompts and be sure to create a passphrase.

#### Transferring the public key to .ssh on the target system.  

Before we transfer our key we must check that the .ssh directory exists for our hpc account. 
In the below commands you will need to change 'USER' to the username you have on the NIH hpc systems.

In [None]:
ssh USER@helix.nih.gov

When connected to helix check if the directory exists by typing (it will display nothing if the directory does not exist):

In [None]:
ls ~/.ssh

If the directory does not exist you must make it:

In [None]:
mkdir ~/.ssh

Now that we know we have a .ssh directory for our account we can transfer our public key across to helix:

In [None]:
scp nih_ssh_key.pub USER@helix.nih.gov:~/.ssh/

Copy the key to the authorized_keys file on the target system (once again if authorized_keys does not exist you can create it first with "touch authorized_keys"):

In [None]:
ssh USER@helix.nih.gov
    cd .ssh
    cat nih_ssh_key.pub >> authorized_keys
    chmod go-rwx authorized_keys

#### Use ssh-agent to manage the private key.

Back on our local system we can use ssh-agent for management of our private key. Type the following command:

In [None]:
ssh-add nih_ssh_key

To check it worked:

In [None]:
ssh-add -l 

#### Use a config file to configure our ssh behaviour

We can edit the file ~/.ssh/config (create one if it doesn’t already exist) in order to configure how we ssh to different hosts (helix, felix, biowulf), which can save lots of time. Forwarding X11 and ssh keys are useful so we will put this in our config files along with an entry for helix:

In [None]:
ForwardX11 yes
ForwardAgent yes

Host helix
    HostName helix.nih.gov
    User USER
    IdentityFile ~/.ssh/nih_ssh_key

### Setting up a persistent tmux session on the biowulf headnode

First ssh to helix, and from there, ssh to the biowulf head node:

In [None]:
ssh helix
    ssh biowulf

Now that we are on node biowulf head we should set up a persistent tmux session. 
A basic overview of tmux commands can be found [here](https://tmuxcheatsheet.com).
We need to first load tmux before we can use it:

In [None]:
module load tmux
tmux new -s jupyter_server

Now that we are running a tmux session everything we do will be persistent 
i.e. we can reconnect to it even if we close/lose our connection to biowulf.
We will now request resources from the cluster for our analysis...

### Request the resources of a compute node

The command below requests 2 cpus and 10GB of RAM for 2000 minutes:

In [None]:
sinteractive --mem=10g --cpus-per-task=2 -t 2000

This command will provide a terminal on an interactive node on biowulf. Take note of your nodename in the command prompt. For example:

[username@cn3092 ~]$

### Run the jupyter notebook server in a singularity container.

We can start the server by typing the following command (specify an available port, the port to use has been added next to your name on https://github.com/nih-fmrif/nimh_repro_wrkshpAug2017/blob/master/participants.txt ): 

In [None]:
module load singularity
singularity exec /data/classes/RepNeurSci/images/nih-workshop-2017-latest.img jupyter notebook --no-browser --port=[put your port number here]

### Setup an ssh tunnel with port forwarding to the compute node

With the server running and taking note of the port number that the server is running on,
we now need to create a connection that allows the notebook server communicate with own computer.
Without closing the old terminal window we type the following into a new window 
(you’ll need to use the node name and port number relevant to you):

In [None]:
scp helix:/data/DSST/scripts/connect_with_port .
bash connect_with_port [your node name] [your port number]

It’ll look like: bash connect_with_port cn3092 9090

### Connect to the jupyter notebook session from our local machine

Finally, go back to the original terminal where we started the server. 
Copy the link displayed and paste it into a local web browser.
Save this link because this is what you will need to connect to the server later too.

## Subsequent reconnection to the jupyter server

Assuming all of the steps detailed in the course setup have been followed the jupyter notebook server will be accessible
for the duration that the compute node resources are allocated to you.
Because of this, if you need to connect to the server after being disconnected is

a) Reconnect to the node with the ssh port forwarding in step 3.5

b) You can refresh the localhost web-page you had open connecting to the server,
or select "Kernel > Reconnect" from the notebook menu.

Note: If you have closed your browser you will have to paste the link you previously saved in step 3.5 into your browser.