# <span style="color:blue"><u>Protocol</u></span>

## <span style="font-family: Arial; font-size: 18px; color:blue">1. Connecting to Linux VM using SSH key</span>

i. Connected to Linux VM by specifying the IP address and path to ssh key.

In [None]:
ssh ubuntu@10.00.0.00 -i "C:\Users\YourUsername\Documents\sshkey_vm.txt"

## <span style="font-family: Arial; font-size: 18px; color:blue">2. Installing miniconda on Linux VM</span>

i.  Created a directory named miniconda3 in home directory of VM, installed & activated miniconda.

In [None]:
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
source ~/miniconda3/bin/activate

## <span style="font-family: Arial; font-size: 18px; color:blue">3. Setting up conda & bio environment</span>

i.  Added the channels conda-forge and bioconda under base and set the strict priority.

In [None]:
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --set channel_priority strict

ii. Created environment with the name ‘bio’ and with the python version 3.9 & activated it.

In [None]:
conda create --name bio python=3.9
conda activate bio

## <span style="font-family: Arial; font-size: 18px; color:blue">4. Installing packages</span>

i. Installed fastqc under ‘bio’ environment & the core R package (r-base) from the conda-forge channel.

In [None]:
conda install fastqc
conda install -c conda-forge r-base
R

ii. Installed the Irkernel within R.

In [None]:
> install.packages('IRkernel')

iii. Selected the CRAN mirror as 1: 0-Cloud [https] & Quit R, but saved the workspace image.

In [None]:
> q()
Save workspace image? [y/n/c]: y

iv. Installed Jupyter under bio environment & used R from Jupyter notebook.

In [None]:
conda install -c conda-forge Jupyter
R -e "IRkernel::installspec()"

v. Configured and started the Jupyter server.

In [None]:
jupyter server --generate-config
jupyter server password
jupyter lab --no-browser --ip "*"

## <span style="font-family: Arial; font-size: 18px; color:blue">5. Connecting to Jupyter Server via tmux session</span>

i. Started a new connection to the Linux VM. Created a new tmux session with the name ”jserver”.

In [None]:
tmux new -s jserver

ii. Started the Jupyter server in the session and detached from the session by pressing control + b, and then d.

In [None]:
jupyter lab --no-browser --ip "localhost" [detached (from session jserver)]

iii. Established an SSH tunnel for Jupyter notebook and opened in browser by with URL - http://localhost:8888/

In [None]:
ssh -i "C:\Users\YourUsername\Documents\sshkey_vm.txt" -L 8888:localhost:8888 ubuntu@10.00.0.00

## <span style="font-family: Arial; font-size: 18px; color:blue">6. Setting up project direcotories & downloading files</span>

i. Created directory structure for the project. Project’s Parent Directory Name: bioproject.

In [None]:
mkdir -p bioproject/{rawdata,reference,fastqc,filtered,alignment,counts,scripts,deseq2}

ii. Downloaded the reference genome files - genome.fa, illumina_adapter.fa and annotation file - annotation.gtf into reference directory.

In [None]:
wget <URL>/genome.fa -O reference/genome.fa
wget <URL>/illumina_adapter.fa -O reference/illumina_adapter.fa
wget <URL>/annotation.gtf -O reference /annotation.gtf

iii. Extracted the .tar file.

In [None]:
tar -xvf rawdata/fastq_files.tar -C rawdata

iv. Unzipped .gz files.

In [None]:
gunzip rawdata/*.fastq.gz

## <span style="font-family: Arial; font-size: 18px; color:blue">7. Installing bioconda packages & generating results</span>

i. Activated conda environment again and installed the package fastqc.

In [None]:
conda activate bio
conda install -c bioconda fastqc

ii. Set the channels’ priority to flexible and installed the package cutadapt.

In [None]:
conda config --set channel_priority flexible
conda install -c bioconda cutadapt

iii. Installed the package star, subread (for featurecounts) & bioconductor-deseq2

In [None]:
conda install -c bioconda star
conda install -c bioconda subread
conda install -c bioconda bioconductor-deseq2

iv. Changed the channels’ priority back to strict

In [None]:
conda config --set channel_priority strict

v. Created workflow.smk file to include and run steps using snakemake workflow.
Firstly, ran the rule run_fastqc for quality control of the raw data & generated fastqc.zip and fastqc.html files.

In [None]:
snakemake --snakefile workflow.smk --cores 4 run_fastqc

vi. Ran the rule run_cutadapt with filter params for quality filtering of raw reads & generated filtered fastq files.

In [None]:
snakemake --snakefile workflow.smk --cores 4 run_cutadapt

vii. Ran the rule run_starindex to generate genome indexes under alignment/STAR_index.

In [None]:
snakemake --snakefile workflow.smk --cores 4 run_starindex

viii. Installed samtools package to view SAM output files.

In [None]:
conda install -c bioconda samtools

ix. Ran the rule run_starmapping with latency wait 30s to map reads to the genome & generate SAM & BAM files.

In [None]:
snakemake --snakefile workflow.smk --cores 4 run_starmapping --latency-wait 30

x. Ran the rule featurecounts to obtain reads counts on the gene level. Modified the original output of featureCounts to have only ‘Geneid’, ‘sample_0’, ‘sample_1’, ‘sample_2’, ‘sample_3’, ‘sample_4’ & ‘sample_5’ columns in the output file.

In [None]:
snakemake --snakefile workflow.smk --cores 4 featurecounts

xi. Created R script using DESeq2 library to detect differentially expressed genes by generating deseq2_up.txt (condition A) and deseq2_down.txt (condition B) files.
Ran the rule deseq2 to execute the R script.

In [None]:
snakemake --snakefile workflow.smk --cores 4 deseq2