### HPC Notebook Assignment

This notebook is meant to be run on the HPC Cluster to show that you have successfully setup SSH tunneling of jupyter notebooks by submitting a job script to start a notebook server, and that you have successfully installed a working conda environment. Read the instructions below to learn how to run a notebook server on the remote cluster. 

The instructions in this notebook mostly consist of bash commands, and they are commented out because you are intended to enter them into a separate terminal. However, the instructions at the very end includes one cell of Python code that you will execute at the very end after you have successfully opened this notebook on the remote cluster.

### Required software
The following package must be installed on the remote computer (HPC cluster). 

In [30]:
# conda install notebook 

### Clone and fork of the 7-ssh-pipelines repo to your PDSB/ directory on remote

In [5]:
# <fork the repo>
# mkdir PDSB/
# cd PDSB/
# git clone https://github.com/<gh-username>/7-ssh-pipelines
# cd ~

### SSH Tunneling with jupyter
So far we have only ran jupyter notebooks locally on our own machines. However, you may remember that in class I have referred to jupyter as a **notebook server**. The word server here is in fact the same term that we mean when we talk about a server that hosts websites. It is something that can send and recieve information over the internet, and which renders its results in a browser. Jupyter notebooks are a visual representation of the outputs produced when information is sent back and forth to the server. Because jupyter notebook servers are capable of exchanging information over the internet, we can actually run a server remotely, such as on the HPC cluster, and still interact with the notebook just like it was running on our local machine by connecting to it through a browser. In this notebook we will walk through how to set this up. 

### Set up a password
The first thing we want to do after installing jupyter is to set up a password so that we can securely connect to it over the internet. To do this, connect to your remote cluster using `ssh` and run the command `jupyter notebook password`. Then enter your password to store an encrypted key version of it that jupyter will use to authenticate you when you connect. 

In [31]:
# jupyter notebook password

### jupyter server arguments
When we start a jupyter notebook server locally we usually just type `jupyter-notebook` at the command line. To start a remote server we will need to add a few additional arguments that will help to secure the connection:

+ The argumnent `--ip` and the IP address of the remote host we are running the server from.
+ The argument `--port` and the port that we are sending information over (this can be any random number between 8000-9999). 
+ The argument `--no-browser` to tell it not to try to open a browser on the remote terminal. 

Below is an example of the command we would use to start a remote notebook server. In this case we use the command `hostname -i` inside of `$( )` which returns the result of that command as a bash variable. The result will be the ip address of whichever node the command is run on. **We don't want to run this command yet**, however, since we are still on the head node. Instead, we will write a slurm script to submit this command to run on a compute node. 

In [32]:
# jupyter-notebook --ip=$(hostname -i) --port=8888 --no-browser

## The job submisision script
Below is the slurm submission script `jupyter-edu.sbatch` that I have placed in our shared scratch space. It has the following commands: 

SBATCH commands:
+ The walltime for the job is 1 hour
+ The log file (output) will be saved in outputs/
+ The output will be named `slurm-<jobid>-<jobname>`
+ It will request one core

Code: 
+ cd into `$HOME` and sets a variable to "" (this is to fix a known bug of jupyter & slurm)
+ stores variables for a random port number and the job node ip address
+ prints a statement with `echo` to the output file with instructions for connecting to the notebook server once it is started. 
+ command to start jupyter running on a specific port and ip address. 


------------------------------------------

```bash
#!/bin/sh
#SBATCH --account=edu
#SBATCH --time=1:00:00
#SBATCH --job-name=notebook
#SBATCH --workdir=outputs
#SBATCH --output=slurm-%j-%x.out
#SBATCH -c 1

## cd home and unset XDG variable
cd $HOME
XDG_RUNTIME_DIR=""

## get random port and current IP
ipnport=$(shuf -i8000-9999 -n1)
ipnip=$(hostname -i)

## print tunneling instructions
echo -e "
   Copy/Paste this in your local terminal to ssh tunnel with remote
   ----------------------------------------------------------------
   ssh -N -L $ipnport:$ipnip:$ipnport user@host                    
   ------------------------------------------------------------------

   Then open a browser on your local machine to the following address
   ------------------------------------------------------------------
   localhost:$ipnport  (prefix w/ https:// if using password)       
   ------------------------------------------------------------------
   "

## start the notebook (no whitespace after the '\' marks).
jupyter-notebook --no-browser \
                 --ip=$ipnip \
                 --port=$ipnport
```

## Submit the job script
Next use `sbatch` to submit the job script on the remote cluster. Then follow the instructions in the output file to connect to jupyter from the browser in your local machine. This is demonstrated in the GIF below, though it is a bit hard to see. Here is what is happening: 

1. Call `sbatch` on **remote** to submit the script from `/rigel/edu/w4050/files/jupyter-edu.sbatch`. 
2. Use `cat` on **remote** to read the output from `./outputs/slurm-<jobid>-notebook.out`
3. Use `ssh` on **local** to run the command from the output file that forwards the information from the remote cluster to make it available in your browser. The last argument to the `ssh` command is our command to login to `habanero` which is simply `habanero` since we set up or config file earlier. 
4. Go to any browser on your **local** machine and open `localhost:<port>` where you get the port number from the output file in step 3. 

![../Lecture/ssh-habanero7.gif](../Lecture/ssh-habanero7.gif)

### Assignment

1. Run the instructions above to connect to your home directory on remote by ssh tunneling with a jupyter-notebook. 
2. `cd` into your forked repository of `7-ssh-pipelines`. 
3. Create a copy of this notebook and put the copy in the `Assignment/` directory named `Assignment/<gh-username>-7.3.ipynb`.  
4. In jupyter open the new notebook copy
5. Go all the way to the bottom of the notebook (the cell below) and execute the cell to prove that you successfully connected to this notebook on habanero. 
6. Save the notebook. Use `git` to add, commit, and push the notebook to GitHub and make a pull-request. You should only add the new Assignment file, no other files. Like below.


The code below demonstrates how to submit the assignement:

-------------------------------------

```bash
## copy notebook to Assignment dir/
cp Notebooks/nb-7.3-tunneling.ipynb Assignment/<gh-username>-7.3.ipynb

## run the notebook
## ...

## then add, commit, and push your Assignment notebook
git add Assignment/<gh-username>-7.3.ipynb
git commit -m "added notebook assignment from habanero"
git push origin master
```

### Run this final code before submitting the notebook to show you completed it.
And don't forget to save.

In [1]:
import socket
import os

print("hostname: {}".format(socket.gethostname()))
print("location: {}".format(os.path.realpath("./")))

hostname: node219
location: /rigel/home/arp2195/PDSB/7-remote-subprocess/Assignment


### Some final information about stopping the notebook server
As long as the slurm script is running your notebook server will continue to be active. You can connect to it, disconnect from it, and reconnect to it as much as you like by running or stopping the `ssh` tunneling command (i.e., the `ssh -N -L ...` command running on local). To stop the server running on the remote you can find the job-id by using `squeue -u <UNI>` and then you can cancel the job early by running `scancel <jobid>`. 