### Running scripts on the O2 cluster
Running scripts on the o2 cluster will require you to set up the "LSEnv" on your o2 account.  
The first part is making sure conda is installed and running whenever you start a session on o2.

First step is we log in:
`ssh nl82@o2.hms.harvard.edu`  

Which will land you on the login node.  
`(base) nl82@login05:~$ `  

On the o2 website, there is a set of instructions that tell you how to install conda on your acct.  
https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1594263516/Conda+on+O2  
You can largely just follow those directions, and you should be good.  

For myself, I would start an interactive compute session as conda installs can be quite computationally intensive and if you run them on the login node. it will kick you off.  
`srun -p interactive -t 12:00:00 --mem=16G --pty bash`  

Once you have an interactive session, if you type in `module avail`  
There are two conda versions available:  
```
conda2/4.2.13            (E)  
miniconda3/4.10.3          (E)  
```
As of writing this, both miniconda 4.10.3 and conda 4.2.13 are still considered "experimental" hence the (E) tag.  
Both modules should be fine.  
As for myself, i decided to install conda on my o2 account with the following:  
```
module load conda2/4.2.13
conda init bash
```
Once that has been set up, then close the terminal, restart and re-login to o2 and the `(base)` conda environment should automatically be activated whenever you log in to o2.

### About the (base) conda environment on o2
As far as i can tell, the base environment from the conda/4.2.13 module on o2 is not your typical base environment that you might have on your laptop.  
One thing thats different is that the conda on o2 already has "samtools" installed in the base environment.  It is a common tool that alot of people use.  
If you check the samtools path and version:  
```
(base) nl82@compute-a-16-167:~$ which samtools
/n/app/conda2/bin/samtools

(base) nl82@compute-a-16-167:~$ samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.
```
You can see that the base conda environment already has samtools v1.9.  This is good to be aware of because it might lead to some confusing discrepancies.  
For example if you run a python script that uses samtools on the o2 base conda environment, it will work fine. But it wont work if you run it in your (base) conda environment on your laptop.  

### Installing STAR aligner
There are a couple options for using STAR on o2.  

#### First Option
If you check the available modules, you can see that we already have a few versions of STAR aligner on o2.  
```
star/2.7.9a                (D)
```
When you start up a session on o2, the STAR aligner is not loaded by default, so if you try to call the STAR package:  
```
(base) nl82@compute-a-16-167:~$ STAR
bash: STAR: command not found
```
It will tell you that the command is not found.  

You can load the star module like so (Note: you have to load the gcc module as well):
```
module load gcc/6.2.0
module load star/2.7.9a

Currently Loaded Modules:
  1) gcc/6.2.0   2) star/2.7.9a
```
And if you run STAR: 
```
(base) nl82@compute-a-16-167:/n/data1/hms/sysbio/yin/Ninning$ STAR
Usage: STAR  [options]... --genomeDir /path/to/genome/index/   --readFilesIn R1.fq R2.fq
Spliced Transcripts Alignment to a Reference (c) Alexander Dobin, 2009-2020

STAR version=2.7.9a
STAR compilation time,server,dir=2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
For more details see:
<https://github.com/alexdobin/STAR>
<https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf>

To list all parameters, run STAR --help
```
Then it will recognize the command.

#### Second (preferred?) option
Setting up STAR the first way will require you to load the modules everytime you start up o2. Alternatively you can install STAR into your folder and have your python scripts automatically call them when running the script, which is what we do for Light-Seq.  
From the STAR aligner github, https://github.com/alexdobin/STAR  
you can just follow the instructions for the linux install (o2 is running linux kernel 3.10.0-1160.45.1.el7.x86_64).  

First, you would just copy the entire folder in your account, you can either upload the files through Filezilla,  
Or while on o2, type:  
```
wget https://github.com/alexdobin/STAR/archive/2.7.10a.tar.gz
tar -xzf 2.7.10a.tar.gz
cd STAR-2.7.10a
(base) nl82@compute-a-16-167:/n/data1/hms/sysbio/yin/Ninning/STAR-2.7.10a$ 
```
And go to the 'source' directory, which should have the `STAR.cpp` c file which you would compile to install STAR in your account.  
You still have to use a C compiler (based on my old notes), so make sure you load the c compiler before you compile the STAR file.  
The default gcc for o2 when you first login is `gcc (GCC) 4.8.5`, when I last tried to compile STAR with this gcc version, it gave an endless array of warning messages and then stalled, even though on re-login it seemed to have worked. To save yourself trouble down the line, I would absolutely load the recommended gcc version on o2 before compiling.
```
cd source
module load gcc/6.2.0
make STAR
```
Once its compiled, you would call STAR from your bash terminal by directing it to the path like this `./STAR`, instead of calling it with just `STAR`.  I explain this further in another notebook.  
At this point you can call the star aligner:  
```
(base) nl82@compute-e-16-229:/n/data1/hms/sysbio/yin/Ninning/STAR-2.7.10a/source$ ./STAR
Usage: STAR  [options]... --genomeDir /path/to/genome/index/   --readFilesIn R1.fq R2.fq
Spliced Transcripts Alignment to a Reference (c) Alexander Dobin, 2009-2020

STAR version=2.7.10a
STAR compilation time,server,dir=2022-04-27T18:00:41-0400 login06.o2.rc.hms.harvard.edu:/n/data1/hms/sysbio/yin/Ninning/STAR-2.7.10a/source
For more details see:
<https://github.com/alexdobin/STAR>
<https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf>

To list all parameters, run STAR --help
```
And it should be ready to use.  

Final note: The permissions for the STAR executable should all be enabled `-rwxrwxr-x 1 nl82 yin 3192704 Apr 27 18:03 STAR`.  
But sometimes its not. I havent encountered this random issue recently on o2 but its always good to check with `ls -l`.  
If the permissions are off then you will have to enable them with the `chmod u+x STAR`.

### Setting up the LSEnv on o2
Once you have the base conda environment installed and STAR installed, setting up the "LSEnv" should not be that different than the instructions currently written for the Light-Seq github.  
For example i will create a second LSEnv, (yours will just be `LSEnv`):  
```
(base) nl82@compute-e-16-229:/n/data1/hms/sysbio/yin/Ninning$ conda create --name LSEnv2 python=3.7.5 pandas matplotlib seaborn PyTables Biopython scikit-image
(base) nl82@compute-e-16-229:/n/data1/hms/sysbio/yin/Ninning$ conda activate LSEnv2

(LSEnv2) nl82@compute-e-16-229:/n/data1/hms/sysbio/yin/Ninning$ python --version
Python 3.7.5
```
At which point you can activate your LSEnv version, check the python version to make sure its installed properly, and proceed with the rest of the install.  

### Submitting a batch job on o2
Typically you would submit a batch job and forget about it and come back once the computation is done.  
The o2 website has instructions on how to submit a batch job.  

Essentially you would just create a `.sh` file, for example `job.sh`, with the following:  
```
#!/bin/bash
#SBATCH -c 12
#SBATCH -t 0-12:00    
#SBATCH -p short
#SBATCH --mem=64G
#SBATCH -o 220415map_%j.out
#SBATCH -e 220415map_%j.err

module load conda2/4.2.13
source activate LSEnv
python3 MapToMouseUniqueOnly.py
```
And submit by typing:  
```
sbatch job.sh
```

In this file, we would set the number of cores we want (12), the time, partition etc... and also set the file names for the logs `220415map_%j.out`, which will include the date at the time of running the script and the job number. You can also set it to save the logs to another folder like `bashOut/220415map_%j.out` if you want to organize it.  
For the commands we run, we still go through the motions of loading the conda module (in case its not automatically loaded), activating the LSEnv, then running the script with the python command.  
In this case we used the `python3` command just to make sure we are using the correct python version.  
If you are in the LSEnv, both `python` and `python3` should point to the same python version since thats the only version installed in "LSEnv".  
```
(LSEnv) nl82@compute-e-16-229:/n/data1/hms/sysbio/yin/Ninning$ python3 --version
Python 3.7.5
(LSEnv) nl82@compute-e-16-229:/n/data1/hms/sysbio/yin/Ninning$ python --version
Python 3.7.5
```
However if you did something weird with your `python` path pointing on your o2 account prior to this, then you should double check to make sure that when you call the `python` or `python3` command, its actually calling the right one.