# Example ALOHA Depth Profile Analysis
This Jupyter Notebook walks through analysis of [dataset name] following steps detailed in ProSynTaxDB README. 

## Setting up the Workflow
### Installing the ProSynTaxDB Workflow
Note: run commands below in your Linux terminal instead of a Jupyter Notebook. 

In [6]:
%%bash  

# show current directory
pwd  

# create project directory
mkdir Classification_ALOHA  

# change into project directory 
cd Classification_ALOHA  

# clone the workflow into project directory
git clone https://github.com/jamesm224/ProSynTaxDB-workflow.git

# check to make sure repository was cloned
ls

/home/nvo


Cloning into 'ProSynTaxDB-workflow'...
Updating files: 100% (54/54), done.


ProSynTaxDB-workflow


### Installing ProSynTaxDB
Download associated database files from [Zenodo repository](https://zenodo.org/records/14889681?preview=1&token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6IjEwM2VjMmJlLTU2NzEtNDEyNC1hZTQwLWY0NDFkNzUwMTU4OSIsImRhdGEiOnt9LCJyYW5kb20iOiI4NjkwMTllMGQ4MWYyYTU1MzBkMDYzYWU3MmYwOTNhNSJ9.9Nedfc8bI5MZ4Mio_TaWmq26RYLHCf2mSdXpupnHUFoDb9CuAKTdL7cb88SeiSA1bW0Ft-XYe1YlmkVtijWQbg) (DOI 10.5281/zenodo.14889681). 

Note: run commands below in your Linux terminal instead of a Jupyter Notebook. 

In [9]:
%%bash  
# create directory for database files
mkdir -p Classification_ALOHA/ProSynTaxDB-files

# download the files from Zenodo and upload to directory

# check to make sure files were downloaded 
ls Classification_ALOHA/ProSynTaxDB-files

CyCOG6.dmnd
ProSynTaxDB_file.fmi
ProSynTaxDB_names.dmp
ProSynTaxDB_nodes.dmp


### Installing Dependencies
Install Mamba and Snakemake following instructions in Workflow README. 

In [20]:
%%bash  

# check that mamba was installed properly 
mamba --version 

# check that Snakemake was installed properly 
mamba activate snakemake
echo Snakemake version:
snakemake --version 
mamba deactivate 

mamba 1.4.2
conda 23.3.1
Snakemake version:
7.32.4


### Edit Workflow Specifications
1. Edit experimental configuration file ```inputs/config.yaml```  
    - Note: in this example, the `scratch directory` will be created inside `ProSynTaxDB-workflow`, but it is recommended that this directory is located in a non-backed up project storage such as /nobackup. 

In [24]:
%%bash

cat /nfs/home/nvo/Classification_ALOHA/ProSynTaxDB-workflow/inputs/config.yaml

experiment_name: read_classification

input: 
  sample table: inputs/samples.tsv
  adapter_file: inputs/all_illumina_adapters.fa
  cycog_file: inputs/cycog_len.tsv
  nodes_file: /nfs/home/nvo/Classification_ALOHA/ProSynTaxDB-files/ProSynTaxDB_nodes.dmp
  names_file: /nfs/home/nvo/Classification_ALOHA/ProSynTaxDB-files/ProSynTaxDB_names.dmp
  fmi_file: /nfs/home/nvo/Classification_ALOHA/ProSynTaxDB-files/ProSynTaxDB_file.fmi
  diamond_file: /nfs/home/nvo/Classification_ALOHA/ProSynTaxDB-files/CyCOG6.dmnd
  

classification_summary:
  # list of genus to extract read count for (remaining genus will be summed into "other_genus")
  genus_list: ['Synechococcus', 'Prochlorococcus', 'unclassified']

scratch directory: scratch
results directory: results


2. Create ```inputs/samples.tsv``` file containing metadata for your samples
    - For this example use case, we're analyzing Station ALOHA data from Mende et al. 2017

In [None]:
%%bash 

# create folder for raw reads
mkdir -p Classification_ALOHA/raw_reads

# download SRA files