# Obitools installation/setup
Obitools3 was installed with git both on our computational cluster and my computer in a virtual environment. Use the following to activate the virtual environment.

In [7]:
source ./obitools3/obi3-env/bin/activate

(obi3-env) 

: 1

# Data processing
## Import data
Demultiplexed data was imported from paired-end fastq files and ngsfilter files (one per sample per barcoding marker) using the import.sh script on the computational cluster.

## Sequence processing
Pair the reads, remove unaligned reads, remove PCR artifacts and assign reads to samples using the obi_process.sh script on the computational cluster.

## Clean the data
Annotate sequence readers with sample names, concatenate all samples into one database, remove low-count sequences, and remove PCR/sequencing errors using the obi_clean.sh script on the computational cluster.

Download the DMS's from the cluster.

# Assigning sequences to the reference
## Build reference database
Download the EMBL database (minus environmental and humans sequences) and the NCBI taxonomy to your computer.

In [None]:
wget -nH --cut-dirs=5 -A rel_std_\*.dat.gz -R rel_std_hum_\*.dat.gz,rel_std_env_\*.dat.gz -m ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std/
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz

Import the sequences and taxonomy into the DMS's on your computer.

In [None]:
obi import --embl /Users/elizabethmallott/EMBL /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/embl_refs

In [None]:
obi import --taxdump /Users/elizabethmallott/taxdump.tar.gz /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax

Use ecoPCR to extract the sequences of interest from the database. Do once for each primer set/barcode and output results to sequence DMS's. For the invertebrate database, ecopcr was restricted to sequences within Arthropoda to reduce the size of the reference database.

In [None]:
obi ecopcr -e 0 -l 220 -L 240 --restrict-to-taxid 6656 -F GGATGAACWGTNTAYCCNCC -R ATTHARATTTCGRTCWGTTA --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/embl_refs /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs
obi import --taxdump /Users/elizabethmallott/taxdump.tar.gz /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/taxonomy/ncbi_tax
obi ecopcr -e 3 -l 25 -L 200 -F GGGCAATCCTGAGCCAA -R CCATTGAGTCTCTGCACCTATC --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/embl_refs /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_refs
obi ecopcr -e 3 -l 100 -L 200 -F TAGAACAGGCTCCTCTAG -R TTAGATACCCCACTATGC --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/embl_refs /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_refs
obi ecopcr -e 3 -l 200 -L 300 -F GCCTGTTTACCAAAAACATCAC -R CTCCATAGGGTCTTCTCGTCTT -r 7742 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/embl_refs /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_refs

## Clean the reference databases
Remove sequences without a description species, genus, AND family levels. 

In [None]:
obi grep --require-rank=species --require-rank=genus --require-rank=family --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs_clean
obi grep --require-rank=species --require-rank=genus --require-rank=family --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_refs /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_refs_clean
obi grep --require-rank=species --require-rank=genus --require-rank=family --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_refs /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_refs_clean
obi grep --require-rank=species --require-rank=genus --require-rank=family --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_refs /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_refs_clean

Dereplicate sequences in the reference database and make sure they are identified at the family level.

In [None]:
obi uniq --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs_clean /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs_uniq
obi uniq --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_refs_clean /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_refs_uniq
obi uniq --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_refs_clean /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_refs_uniq
obi uniq --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_refs_clean /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_refs_uniq
obi grep --require-rank=family --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs_uniq /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs_uniq_clean
obi grep --require-rank=family --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_refs_uniq /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_refs_uniq_clean
obi grep --require-rank=family --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_refs_uniq /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_refs_uniq_clean
obi grep --require-rank=family --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_refs_uniq /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_refs_uniq_clean

Build an obitools-specific reference database to make the next step more efficient.

In [None]:
obi build_ref_db -t 0.85 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs_uniq_clean /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs_db_80
obi build_ref_db -t 0.80 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_refs_uniq_clean /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_db_80
obi build_ref_db -t 0.80 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_refs_uniq_clean /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_db_80
obi build_ref_db -t 0.80 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_refs_uniq_clean /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_db_80

## Assign sequences to taxa
Use the ecotag command to assign each experimental sequence to a taxa.

In [None]:
obi ecotag -m 0.85 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/taxonomy/ncbi_tax -R /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy_invert_small/081720_coi_refs_db_80 /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_invert_samples_dms/cleaned_sequences /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_invert_samples_dms/assigned_sequences
obi ecotag -m 0.80 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax -R /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_trnl_db_80 /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_plant_samples_dms/cleaned_sequences /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_plant_samples_dms/assigned_sequences
obi ecotag -m 0.80 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax -R /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_12S_db_80 /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms/cleaned_sequences /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms/assigned_sequences
obi ecotag -m 0.80 --taxonomy /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/taxonomy/ncbi_tax -R /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/taxonomy/081720_16S_db_80 /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms/cleaned_sequences /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms/assigned_sequences

# Double check results and export
Get basic output stats on the results, align the sequences, and creat a visual representation of the analysis history.

In [None]:
obi stats -c SCIENTIFIC_NAME /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_invert_samples_dms/assigned_sequences
obi align -t 0.95 /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_invert_samples_dms/assigned_sequences /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_invert_samples_dms/aligned_assigned_sequences
obi history -d /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_invert_samples_dms > /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/invert.dot
dot -Tpng /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/invert.dot -o /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/invert.png

obi stats -c SCIENTIFIC_NAME /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_plant_samples_dms/assigned_sequences
obi align -t 0.95 /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_plant_samples_dms/assigned_sequences /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_plant_samples_dms/aligned_assigned_sequences
obi history -d /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_plant_samples_dms/aligned_assigned_sequences > /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/plant.dot
dot -Tpng /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/plant.dot -o /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/plant.png

obi stats -c SCIENTIFIC_NAME /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms/assigned_sequences
obi align -t 0.95 /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms/assigned_sequences /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms/aligned_assigned_sequences
obi history -d /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms > /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms.dot
dot -Tpng /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms.dot -o /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms.png

obi stats -c SCIENTIFIC_NAME /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms/assigned_sequences
obi align -t 0.95 /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms/assigned_sequences /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms/aligned_assigned_sequences
obi history -d /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms > /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms.dot
dot -Tpng /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms.dot -o /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms.png

Export the results as a table in order to run downstream analyses.

In [None]:
obi export --tab-output --header /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_invert_samples_dms/assigned_sequences > /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/invert_results.tsv
obi export --tab-output --header /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_plant_samples_dms/assigned_sequences > /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/plant_results.tsv
obi export --tab-output --header /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_oldvert_samples_dms/assigned_sequences > /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/oldvert_results.tsv
obi export --tab-output --header /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/combined_newvert_samples_dms/assigned_sequences > /Users/elizabethmallott/Dropbox/Projects/gut_microbiome/Caatinga_marmosets/diet_data/newvert_results.tsv