# Setting up the computer environment

Notes and tips on software configuration and management on Yale HPC cluster.

## Accessing Yale HPC cluster

Using AnyConnect to establish connection:

* VNP: access.yale.edu
* username: dc2325
* MFA: push # after this accept the access in the duo mobile app

Using Linux terminal command:
```
sudo openconnect -u dc2325 access.yale.edu
```
Then type password at first password prompt, and `push` at 2nd password prompt. After this accept the access in the duo mobile app.

Before you can login you must provide the public key of your computer to the server. To do so, please visit: https://secure.its.yale.edu/cas/login to login, then provide the key at http://gold.hpc.yale.internal/cgi-bin/sshkeys.py

To login from the terminal:

```
ssh dc2325@farnam.hpc.yale.edu
```

## Loading and listing modules in your environment on the cluster

```
$ module avail # For a list of modules available to use
$ module list # Displays all of the module files that are currently loaded in your environment
$ module avail python # To look for specific modules
$ module spider # Displays a description of all available modules
$ module load <name> # to load pre-installed software
$ module unload <name> # to unload
```

## Copying files/directories from and to the cluster

To copy from the cluster to your local machine

In your local terminal and to copy to the current dir `.`:

```
scp dc2325@farnam.hpc.yale.edu:/home/dc2325/results/pleiotropy/2020-04_bolt/BMI/*.snp_stats.bgen.gz . 
scp dc2325@farnam.hpc.yale.edu:/home/dc2325/scratch60/INT-BMI/*.stats.gz . 
```

From your local machine to the cluster:

```
scp Test_INT-BMI.txt.gz dc2325@farnam.hpc.yale.edu:/home/dc2325/results/pleiotropy/2020-04_bolt/INT-BMI/
```


# Installing software in your $HOME directory

## BOLT-LMM installation

For local installs add these lines to your ~/.bash_profile
```
# local installs
export MY_PREFIX=~/software
export PATH=$MY_PREFIX/bin:$PATH
export LD_LIBRARY_PATH=$MY_PREFIX/lib:$LD_LIBRARY_PATH
```

Then install the package:

```
cd ~/software && mkdir bin lib && \
wget https://data.broadinstitute.org/alkesgroup/BOLT-LMM/downloads/BOLT-LMM_v2.3.4.tar.gz && \
tar -zxvf BOLT-LMM_v2.3.4.tar.gz && \
rm -rf BOLT-LMM_v2.3.4.tar.gz && \
cp BOLT-LMM_v2.3.4/bolt ~/software/bin/ && \
cp BOLT-LMM_v2.3.4/lib/* ~/software/lib/
```

## SAIGE installation

#### Creating a conda environment

As per SAIGE tutorial

```
conda create -n RSAIGE r-essentials r-base=3.6.1 python=2.7
conda activate RSAIGE
conda install -c anaconda cmake
conda install -c conda-forge gettext lapack r-matrix
conda install -c r r-rcpp  r-rcpparmadillo r-data.table r-bh
conda install -c conda-forge r-spatest r-rcppeigen r-devtools  r-skat r-rcppparallel r-optparse boost openblas
pip3 install cget click
conda env export > environment-RSAIGE.yml
```

Solving some error issues in the installation of SAIGE https://github.com/weizhouUMICH/SAIGE/issues/118

```
conda create -n RSAIGE r-essentials r-base=3.6.1 python=2.7
conda activate RSAIGE
conda install -c anaconda cmake boost zlib
conda install -c conda-forge gettext lapack r-matrix 
conda install -c conda-forge r-spatest r-rcppeigen r-devtools r-skat
conda install -c conda-forge r-rcpp  r-rcpparmadillo r-data.table r-bh
conda install -c conda-forge r-rcppparallel r-optparse
pip install cget click
```


#### Activate conda environment

```
 conda activate RSAIGE
 FLAGPATH=`which python | sed 's|/bin/python$||'`
 export LDFLAGS="-L${FLAGPATH}/lib"
 export CPPFLAGS="-I${FLAGPATH}/include"
 export LDFLAGS="-L/gpfs/ysm/project/dewan/dc2325/conda_envs/RSAIGE/lib"
 export CPPFLAGS='-I/gpfs/ysm/project/dewan/dc2325/conda_envs/RSAIGE/include'
```

#### Intall required R libraries


For this part I had to install MetaSKAT using the remotes library otherwise I found an error
```
install.packages("remotes")
remotes::install_github("lin-lab/MetaSKAT")
```
#### Install SAIGE

Method 2: this method did not work for me, so I proceed to the next one

```
devtools::install_github("weizhouUMICH/SAIGE")
```

Method 3

```
src_branch=master
repo_src_url=https://github.com/weizhouUMICH/SAIGE
git clone --depth 1 -b $src_branch $repo_src_url
R CMD INSTALL SAIGE
```

#### SAIGE on Yale cluster

To install SAIGE in the HRC cluster first load necessary modules to create aspecific environment

Search and load modules

* `module avail` for a list of all available modules
 
* `module avail R` to see a list of all available R modules in Yale's HRC

	Select R-3.6.1 version if available by typing `module load R-3.6.1`

* `module avail gcc`

	Select gcc >= 5.4.0: `module load gcc-5.4.1`

* `module avail cmake`

	Select cmake 3.14.1: `module load cmake-3.14.1`

* `module avail cget`

	Select the latest version of cget: `module load cget`

* Install R packages using the `install_packages.R` script


Install SAIGE R package
	
```
R 
devtools::install_github("weizhouUMICH/SAIGE")`
```


Fixing problem with conda template: can't execute `conda activate` from bash script 
https://github.com/conda/conda/issues/7980

Added these variables to `.bash_profile` apparently fixed the issue

```
export -f conda
export -f __conda_activate
export -f __conda_reactivate
export -f __conda_hashr
        
```
Then `source .bash_profile`

## SoS installation and configuration

The first step is to download miniconda3 to your local directory, then `ssh` to install.

```
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
ssh Miniconda3-latest-Linux-x86_64.sh
```
By default a .bashrc file adding miniconda to the `$PATH` will be created, you can then modify as needed.

Then install sos, sos-pbs and sos-notebook as the minimum requirements

```
conda install sos sos-pbs sos-notebook jupyterlab-sos sos-papermill -c conda-forge
```

## SoS update to get the latest improvements

For development versions

```
pip install git+https://github.com/vatlab/sos -U

```

For released versions (when is implemented)
```
pip install sos -U
```

When you don't get the full features of the update do:
```
pip uninstall sos
pip install sos -U
```

## Test your code before running it

To check if the code in your notebook is running 

```
sos dryrun notebook.ipynb -q localhost
```


## SoS commands to create scripts from available notebooks



## Creating and switching environments with conda

Installing python 2.7 and creating a conda environment

```
conda create --name py2 python=2.7
conda activate py2
conda deactivate
```

## R installation using conda

Intalling R with conda will allow you to manage your own packages. Refer to https://docs.ycrc.yale.edu/clusters-at-yale/guides/r/ for more information

```
conda install -c r r-base
conda install -c r r=3.6
```
To install R packages using conda 

```
conda install -c r package_name
```

## QCTOOL version 2 usage

If installed from source it requires zlib to be installed and compilation needs to be done with python 2 

```
cd ~/software && \
hg clone -r beta https://gavinband@bitbucket.org/gavinband/qctool && cd qctool\
./waf-1.5.18 configure --prefix=$MY_PREFIX  && ./waf-1.5.18 \

```

If loaded from HPC cluster just write

```
module load awk '(NR==1){Min=$3;Max=$3};(NR>=3){if(Min>$3) Min=$3;if(Max<$3) Max=$3} END {printf "The Min is %d ,Max is %d",Min,Max}' ukb_mfi_chr1_v3.txt
```


## SLURM commands on Yale's cluster

Submit a submission script 

```
$ sbatch <script>
```

List queued and running jobs

```
$ squeue -u$USER
```

Cancel a queued job or kill a running job

```
$ scancel <job_id>
```

Cancel all your jobs (running and pending)

```
$ scancel -u$USER
```

Check status of individual job (including failed or completed)

```
$ sacct -j <job_id>
```

To see all pending jobs sorted by priority (jobs with higher priority at the top)

```
squeue --sort=-p -t PD -p general
```

To see files that will be deleted from scratch60 (they purge every 30 days)

```
cat /gpfs/ysm/scratch60/todelete/${UID}
```

To see the last job submitted slurm

```
sacct
sacct -S start-date -u user-name
```

## Starting a jupyter notebook server on Yale's cluster

1. Start a jupyter notebook job
2. Start a ssh tunnel
3. Use local browser to connect

### Submit jupyter-notebook server as a batch job. 

For more documentation see: https://docs.ycrc.yale.edu/clusters-at-yale/guides/jupyter/

1. Start the server:`sbatch jupyter-tunnel.sh`

2. Check if your job was submitted and is running with: `squeue -u$USER`  R: the job is running and PD: pending (you will have to wait for you job to start running)

3. Start the tunnel: open the log file jupyter-notebook-[jobid].log that contains the information on how to connect. This will be located in the directory you submitted the script from.

4. On a Mac or Linux machine, you can start the tunnel with an `SSH` command. You can check the output from the job you started to get the specifc info you need.

Example: `ssh -N -L 8511:c14n03:8511 dc2325@farnam.hpc.yale.edu`

5. Browse the notebook: open a browser in your local machine and enter the address `http://localhost:port`. 

In my case **localhost=127.0.0.1**

**TIP:** 
* The address Jupyter creates by default (the one with the name of a compute node) will not work outside the cluster's network. 
* The notebook will automatically generate a token that allows you to authenticate when you connect: it will look like this and it is at the end of the url that jupyter generates 
`http://c14n06:9230/?token=ad0775eaff315e6f1d98b13ef10b919bc6b9ef7d0605cc20` copy the information after the `token=`








## Accessing an interactive node

Interactive jobs can be used for testing and troubleshooting code. By requesting an interactive job, you will be allocated resources and logged onto the node in a shell.

```
srun --pty -p interactive --mem-per-cpu=48G --cpus-per-task=1 --time=1-00:00:00 bash
```

## Copying results to shared folder

After completing the runs of the association analysis you should copy the results to the project shared folder

#### Path

`/SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data/INT-BMI`

* The combined association analyses for the imputed data in .snp_stats.gz

cp ukb_imp_allchr_v3.UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720.BoltLMM.snp_stats.gz /SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data

* The combined association analyses for the hard called genotypes in .stats.gz

cp ukb_imp_allchr_v3.UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720.BoltLMM.stats.gz /SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data

* The gziped .stderr files in .stats.stderr.gz

cp ukb_imp_allchr_v3.UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720.BoltLMM.stats.stderr.gz /SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data

* The gziped .stdout files in .stdout.gz

cp ukb_imp_allchr_v3.UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720.BoltLMM.stdout.gz /SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data

