## General pipeline for demultiplexing paired-end reads, deblurring, and taxonomy assignment

#### Before you begin

a) log in to gardner
   
b) load required modules  

    module load gcc/6.2.0
    module load python/3.5.3
    module load qiime2
   
c) set working directory, e.g.

    setwd("/group/gilbert-lab/Lutz/Cadaver/Alex/")

#### Begin analysis

#### 1) Validate mapping file

In [None]:
validate_mapping_file.py -m raw_data/mapfile_metadata.txt -o raw_data/validate_mappingfile

If there are errors in the mapping file, you will receive a warning message. To view errors, proceed to the .log file in the validate_mappingfile directory. Make necessary corrections to the mapping file and re-run validate_mapping_file.py again.

#### 2) Join reads and barcodes; demultiplex

In [None]:
#Join Reads & Barcodes
mkdir raw_data/joined
scripts/ea-utils/bin/fastq-join raw_data/Undetermined_S0_L001_R1_001.fastq raw_data/Undetermined_S0_L001_R2_001.fastq -o raw_data/joined/out.%.fastq > raw_data/joined/out.stats.txt

scripts/fastq-barcode.pl rawdata/barcodes.fastq rawdata/joined/out.join.fastq > rawdata/joined/out.barcodes.fastq

#Demultiplex Reads
mkdir raw_data/demultiplexed
split_libraries_fastq.py -i raw_data/joined/out.join.fastq -b raw_data/joined/out.barcodes.fastq -m raw_data/mapfile_metadata.txt -o raw_data/demultiplexed/cadaver_demux_seqs --barcode_type=12 --max_barcode_errors=0 --store_demultiplexed_fastq

#Download FastQC program to your local machine (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
#Open demultiplexed/seqs.fastq in Fastqc to determine parameter for Uparse


#### 3) Identify sub-OTUs (aka Exact Sequence Variants, or ESVs) using Deblur

#### Input file:
Demultiplexed FASTA file (e.g. filter_derep.fasta)

#### Output files:
    1) reference-hit.biom
    2) reference-hit.seqs.fa
    3) reference-non-hit.biom
    4) reference-non-hit.seqs.fa
    5) all.biom (contains both 1 and 3)
    6) all.seqs.fa (contains both 2 and 4)

We will concern ourselves with reference-hit outputs (1 and 2)

In [None]:
#Run Deblur

deblur workflow --seqs-fp raw_data/demultiplexed/cadaver_demux_seqs/seqs.fna --output-dir deblur_results -t 150

NOTE: To run the following Qiime-1 scripts, first load older python module (I prefer to work in new terminal window)

In [None]:
module load gcc/6.2.0
module load python/2.7.13

#### 4) Align sequences (using greengenes reference)

In [None]:
align_seqs.py -i deblur_results/reference-hit.seqs.fa -t /group/gilbert-lab/Lutz/Cadaver/Alex/gg_13_8_otus/rep_set_aligned/85_otus.pynast.fasta -o aligned

#### 5) Make phylogeny

In [None]:
mkdir final_biom_files

make_phylogeny.py -i /aligned/aligned.fasta -o final_biom_files/rep_phylo.tre

#### 6) Assign taxonomy

In [None]:
assign_taxonomy.py -i deblur_results/reference-hit.seqs.fa -r deblur_results/gg_13_8_otus/rep_set/97_otus.fasta -t deblur_results/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt -o deblur_results/taxon_assignment/

#### 7) biom - add metadata

In [None]:
biom add-metadata --sc-separated taxonomy --observation-header OTUID,taxonomy --observation-metadata-fp deblur_results/taxon_assignment/reference-hit.seqs_tax_assignments.txt -i deblur_results/reference-hit.biom -o final_biom/cadaver_deblur.biom

##### el fin