# Getting Started with HPCs and GitHub

The following notebook will outline how to:
- connect to the Bridges High Performance Computer at the Pittsburgh Supercomputing Center
- use GitHub and Git to transfer files from local machine to HPC
- loading modules and installing packages
- run a job on Bridges (batched or ondemand)


### Connecting to Bridges

The first thing you need to connect to Bridges is an account that gives you access.  Most of you will have a created an XSEDE user account at this point which requires Two Factor Authentication to login into systems with the [Single Sign On (SSO) Hub](https://portal.xsede.org/web/xup/single-sign-on-hub).  For more information on XSEDE 2FA go [here](https://portal.xsede.org/web/xup/single-sign-on-hub).  To forgo having to keep an additional device or method of login you can setup a password specific to Bridges [here](https://apr.psc.edu/autopwdreset/autopwdreset.html).

Once you have your PSC credentials you will be able to SSH into Bridges from any SSH client.  SSH (Secure Shell) is an encrytped way to access remote systems so your data is safe.  For more info on this go [here](https://www.psc.edu/about-using-ssh).  

For Windows users you will need an SSH client application.  There are many different options available but for brevity I will offer 2 options.  The first is to download [PuTTY](https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html).  Once you download and install PuTTY you should see this screen when you open it up. ![PuTTY Login Screen](/images/putty_default_login.png) All you need to do is edit the Host Name box with your _username@bridges.psc.edu_ and then click __Open__.  With success you should see a screen like this ![PuTTY Fingerprint](/images/putty_rsa_accept.png) Click Yes.  You should then be prompted for your newly minted PSC password. ![PuTTY Password](/images/putty_password.png)  Finally we will see a login screen the `$HOME` directory on Bridges.  There are two files spaces that you will regularly use on Bridges. `$HOME` should be used for your GitHub repo's which will contain all of the scripts need to run your analyses.  `$SCRATCH` should be used to store your data as there is more space allocated and the compute nodes can access information from this pylong5 storage much faster.  To switch between the directory use the `cd`(change directory) command (i.e. to change from my `$HOME` to my `$SCRATCH` directory I would type `cd $SCRATCH`).

On Linux and Mac computers or using a virtual machine with your favorite Linux distro you do not need any additional software to SSH as the default terminal contains built-in support.  Open terminal and type `ssh username@bridges.psc.edu`.  You will be prompted to save the fingerprint so type `yes`.  Then put in your PSC password.  I previously saved my fingerprint key but your screen will look similar to this.
<br/>
<br/>
<br/>
<br/>
![Terminal Login](/images/terminal_login.png)


### Git and GitHub

If you haven't used Git or other version control software when working on coding or really any type of collaborative projects you should start right away.  Git is the command line tool behind the website GitHub which stores all the changes within a project.  Think of GitHub as your cloud backup for your project files.  There are a lot of great tutorials to get started with Git but two I have found to be very helpful are [Happy Git](http://happygitwithr.com/) and [Software Carpentry](http://swcarpentry.github.io/git-novice/).  To get started we are going to fork (make our own copy) of the repo this notebook is in and put it on our Bridges `$HOME` directory.  Navigate to the [top page](https://github.com/dhbrand/reu_2018) of the reu_2018.  Click the fork button 
<br/>
<br/>
<br/>
![Fork](/images/github_fork.png) 

<br/>
<br/>
<br/>
and a screen will popup that looks like this
<br/>
<br/>
<br/>
![Forking](/images/github_forking.png). 

<br/>
<br/>
<br/>
Now you have a repo that you own and control. Click the green __clone or download__ button followed by the copy to clipboard button.
<br/>
<br/>
<br/>
![Clone and Copy](/images/github_clone.png)

<br/>
<br/>
<br/>
Head back over to the terminal window where you should be logged into your `$HOME` directory.  To download the __reu_2018__ repo type `git clone` and then `ctrl or cmd + v` to paste your repo URL.

### Bridges Modules

The PSC Bridges like many other HPC clusters provide modules which are often software such as Python and R with default packages to get you up and running.  To see a list of modules loaded run the following command:

```sh
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => module list
Currently Loaded Modulefiles:
  1) psc_path/1.1    2) slurm/default   3) intel/17.4
________________________________________________________________________________
```

Modules like Python and R will have several different versions available (i.e. python/2.7 or python/3.5).  To find out which versions are available run:

```sh
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => module avail python

------------------------------------------ /opt/modulefiles -------------------------------------------
python/2.7.11_gcc              python/intel_2.7.13            python3/3.4.2
python/2.7.11_gcc_np1.11       python/intel_2.7.14            python3/3.5.2_gcc_mkl(default)
python/2.7.14_gcc5_np1.13      python2/2.7.11_gcc(default)    python3/intel_3.5.2
python/2.7.14_gcc_np1.13       python2/2.7.11_gcc_np1.11      python3/intel_3.6.2
python/2.7.14_icc_np1.13       python2/2.7.14_gcc5_np1.13     python3/intel_3.6.3
python/3.4.2                   python2/intel_2.7.12
python/intel_2.7.12            python2/intel_2.7.13
________________________________________________________________________________
```

You'll notice a default listed for python2 and python3.  If your not search which version to pick you can load one of the defaults and check to make sure the module was loaded:


```sh
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => module load python3
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => module list
Currently Loaded Modulefiles:
  1) psc_path/1.1            3) intel/17.4
  2) slurm/default           4) python3/3.5.2_gcc_mkl
________________________________________________________________________________
```

We can do the same thing for R:

```sh
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => module avail R

------------------------------------------ /opt/modulefiles -------------------------------------------
R/3.2.3-mkl R/3.3.1-mkl R/3.3.3-mkl R/3.4.1-mkl
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => module load R
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => module list
Currently Loaded Modulefiles:
  1) psc_path/1.1            3) intel/17.4              5) R/3.4.1-mkl
  2) slurm/default           4) python3/3.5.2_gcc_mkl
________________________________________________________________________________
```

To load a specific version just name it in the load command:

```sh
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => module list
Currently Loaded Modulefiles:
  1) psc_path/1.1            3) intel/17.4              5) R/3.4.1-mkl
  2) slurm/default           4) python3/3.5.2_gcc_mkl   6) python/2.7.11_gcc
________________________________________________________________________________
```

### Install Packages

Once you have your Python or R module check the default packages that loaded with it.

### Python

For python you'll use _pip_.

```sh
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => pip3 list | column
Package            Version    	numpy              1.12.0b1
------------------ -----------	pamela             0.2.1
absl-py            0.2.1      	pandocfilters      1.4.2
affine             2.2.0      	parso              0.1.1
anvio              2.2.2      	pexpect            4.4.0
astor              0.6.2      	pickleshare        0.7.4
bleach             2.1.2      	pip                10.0.1
bottle             0.12.13    	prompt-toolkit     1.0.15
CherryPy           8.9.1      	protobuf           3.5.2.post1
click              6.7        	psutil             5.2.0
click-plugins      1.0.3      	ptyprocess         0.5.2
cligj              0.4.0      	Pygments           2.2.0
Cython             0.25.2     	pyparsing          2.2.0
decorator          4.0.6      	pysam              0.9.1.4
Django             1.10.6     	python-dateutil    2.6.1
entrypoints        0.2.3      	pyzmq              17.0.0
ete3               3.0.0b35   	qtconsole          4.3.1
gast               0.2.0      	rasterio           0.36.0
GDAL               1.11.0     	requests           2.13.0
grpcio             1.12.0     	scikit-learn       0.18.1
h5py               2.6.0      	scipy              0.18.1
html5lib           1.0.1      	Send2Trash         1.5.0
ipykernel          4.8.1      	setuptools         20.10.1
ipython            6.2.1      	simplegeneric      0.8.1
ipython-genutils   0.1.0      	six                1.10.0
ipywidgets         7.1.2      	snuggs             1.4.1
jedi               0.11.1     	SQLAlchemy         1.0.11
Jinja2             2.8        	tensorboard        1.8.0
jsonschema         2.6.0      	tensorflow-gpu     1.8.0
jupyter            1.0.0      	termcolor          1.1.0
jupyter-client     5.2.2      	terminado          0.8.1
jupyter-console    5.2.0      	testpath           0.3.1
jupyter-core       4.4.0      	tornado            4.3
Markdown           2.6.11     	traitlets          4.3.2
MarkupSafe         0.23       	virtualenv         15.1.0
mistune            0.8.3      	wcwidth            0.1.7
nbconvert          5.3.1      	webencodings       0.5.1
nbformat           4.4.0      	Werkzeug           0.14.1
nose               1.3.7      	wheel              0.29.0
notebook           5.4.0      	widgetsnbextension 3.1.4
________________________________________________________________________________
```

To install a package you'll want to use the `--user` option to install into your personal library.

```sh
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => pip3 install --user --progress-bar pretty tensorflow
Collecting tensorflow
  Downloading https://files.pythonhosted.org/packages/6d/dc/464f59597a5a8282585238e6e3a7bb3770c3c1f1dc8ee72bd5be257178ec/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl (49.1MB)
    100% ◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉◉ 49.1MB 174kB/s
Requirement already satisfied: protobuf>=3.4.0 in ./.local/lib/python3.5/site-packages (from tensorflow) (3.5.2.post1)
Requirement already satisfied: grpcio>=1.8.6 in ./.local/lib/python3.5/site-packages (from tensorflow) (1.12.0)
Requirement already satisfied: astor>=0.6.0 in ./.local/lib/python3.5/site-packages (from tensorflow) (0.6.2)
Requirement already satisfied: wheel>=0.26 in /opt/packages/python/Python-3.5.2-icc-mkl/lib/python3.5/site-packages (from tensorflow) (0.29.0)
Requirement already satisfied: gast>=0.2.0 in ./.local/lib/python3.5/site-packages (from tensorflow) (0.2.0)
Requirement already satisfied: termcolor>=1.1.0 in ./.local/lib/python3.5/site-packages (from tensorflow) (1.1.0)
Requirement already satisfied: six>=1.10.0 in /opt/packages/python/Python-3.5.2-icc-mkl/lib/python3.5/site-packages (from tensorflow) (1.10.0)
Collecting numpy>=1.13.3 (from tensorflow)
  Using cached https://files.pythonhosted.org/packages/7b/61/11b05cc37ccdaabad89f04dbdc2a02905cf6de6f9b05816dba843beed328/numpy-1.14.3-cp35-cp35m-manylinux1_x86_64.whl
Requirement already satisfied: absl-py>=0.1.6 in ./.local/lib/python3.5/site-packages (from tensorflow) (0.2.1)
Requirement already satisfied: tensorboard<1.9.0,>=1.8.0 in ./.local/lib/python3.5/site-packages (from tensorflow) (1.8.0)
Requirement already satisfied: setuptools in /opt/packages/python/Python-3.5.2-icc-mkl/lib/python3.5/site-packages (from protobuf>=3.4.0->tensorflow) (20.10.1)
Collecting html5lib==0.9999999 (from tensorboard<1.9.0,>=1.8.0->tensorflow)
Requirement already satisfied: markdown>=2.6.8 in ./.local/lib/python3.5/site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow) (2.6.11)
Requirement already satisfied: werkzeug>=0.11.10 in ./.local/lib/python3.5/site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow) (0.14.1)
Collecting bleach==1.5.0 (from tensorboard<1.9.0,>=1.8.0->tensorflow)
  Using cached https://files.pythonhosted.org/packages/33/70/86c5fec937ea4964184d4d6c4f0b9551564f821e1c3575907639036d9b90/bleach-1.5.0-py2.py3-none-any.whl
Installing collected packages: numpy, tensorflow, html5lib, bleach
Successfully installed bleach-1.5.0 html5lib-0.9999999 numpy-1.14.3 tensorflow-1.8.0
________________________________________________________________________________
```

You might notice I'm using `pip3` which installs packages for my __python3__ module I have loaded. Use `pip` for __python2__.  Now you have tensorflow installed which is great for our test job later in the tutorial.  We need to make sure we add our user installed packages to our `$PYTHONPATH`.  Let's start by running:

```bash
________________________________________________________________________________
| ~ @ br006 (dhbrand)
| => pip show tensorflow
Name: tensorflow
Version: 1.8.0
Summary: TensorFlow helps the tensors flow
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /home/dhbrand/.local/lib/python3.5/site-packages
Requires: absl-py, termcolor, numpy, protobuf, tensorboard, astor, wheel, six, gast, grpcio
Required-by:
________________________________________________________________________________
```

We want to add the location field to our `$PYTHONPATH`.  Run the following command and then copy and paste your own location in place of mine after __${PYTHONPATH}:__.

```bash
echo "export PYTHONPATH='${PYTHONPATH}:/home/dhbrand/.local/lib/python3.5/site-packages'" >> ~/.bashrc
```
This will add a permanent path to your library for this build of python.  So remember if you use a different versions to check that your libraries are defined in your `$PYTHONPATH`.  You can check this by running:

```bash
________________________________________________________________________________
| ~ @ br006 (dhbrand)
| => echo $PYTHONPATH
/opt/packages/python/Python-3.5.2-icc-mkl/lib/python3.5/site-packages:/opt/intel/advisor_2017.1.3.510716/pythonapi
________________________________________________________________________________
```

### R

For R packages you'll actually want to start R from the command line:

```r
________________________________________________________________________________
| ~ @ br018 (dhbrand)
| => R

R version 3.4.1 (2017-06-30) -- "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>
```

For brevity you might just want the user-installed packages and their versions as the _install.packages_ returns a lot of output.

```r
> ip <- as.data.frame(installed.packages()[,c(1,3:4)])
> rownames(ip) <- NULL
> ip <- ip[is.na(ip$Priority),1:2,drop=FALSE]
> print(ip, row.names=FALSE)
              Package     Version
              acepack       1.4.1
             annotate      1.56.1
        AnnotationDbi      1.40.0
                  aod         1.3
           assertthat       0.2.0
            backports       1.1.1
            base64enc       0.1-3
                   BH    1.65.0-1
                bindr         0.1
```

You can always search for a package directly to see if its installed and it will return the library associated with the package.

```r
> find.package("dplyr")
[1] "/opt/packages/R/3.4.1-mkl/lib64/R/library/dplyr"
> find.package("PReMiuM")
[1] "/home/dhbrand/R/x86_64-pc-linux-gnu-library/3.4/PReMiuM"
```



### Submitting jobs

The most flexible way to submit a job is using a batch script which tells the SLURM job scheduler exactly which resources you need.  You can also use the [OnDemand](https://ondemand.bridges.psc.edu/pun/sys/dashboard) portal for quick tests. Most likely you will need to run extensive jobs for machine learning and deep learning techniques.  This is what a sample batch job script looks like.

```sh
#!/bin/bash
#SBATCH -N 2
#SBATCH -p GPU
#SBATCH --ntasks-per-node 28
#SBATCH -t 5:00:00
#SBATCH --gres=gpu:p100:2

#echo commands to stdout
set -x

#move to working directory
cd $SCRATCH

#copy the input file from your pylon2 space 
#  to the working directory
cp $PROJECT/input.data .

#run GPU program
./mygpu

#copy output file to persistent storage
cp output.data $PROJECT
```

The __SBATCH__ options at the top tell the job schedule how many nodes you need `-n`, which queue you want to run on `-p`, and how long you want to job to run `-t`.  For a full list of options please reference the [Running Jobs Section](https://www.psc.edu/bridges/user-guide/running-jobs) section of the [Bridges User Guide](https://www.psc.edu/bridges/user-guide).

Once you have made your job script you would submit the job with the _sbatch_ command.

```sh
sbatch my_job.sh
```

### Sample Job using Tensorflow

Since we already installed tensorflow let's try to submit a job using a batch script.  In the repo you downloaded there is a bash script and python script to run a simple Tensorflow neural net. 