To do:
*  [x] Filtering
    * [x] Filter rep seqs
    * [x] Filter tables
*  [x] Build Phylogenetic Tree
    - [x] Build tree for PICRUST & Downstream Analysis
    - [x] Build Tree (with genomes)
* [x] Alpha rarefaction
* [ ] Calculate and explore diversity metrics
* [x] Assign taxonomy

In [2]:
import os
import pandas as pd

from qiime2 import Visualization

In [2]:
#carry out a multiple seqeunce alignment using Mafft
! mkdir ../data/qiime2/filtered
! mkdir ../data/qiime2/filtered/tree

! qiime alignment mafft \
  --i-sequences ../data/qiime2/rep-seqs-deblur.qza \
  --o-alignment ../data/qiime2/filtered/tree/aligned-rep-seqs.qza

#mask (or filter) the alignment to remove positions that are highly variable. These positions are generally considered to add noise to a resulting phylogenetic tree.
! qiime alignment mask \
  --i-alignment ../data/qiime2/filtered/tree/aligned-rep-seqs.qza \
  --o-masked-alignment ../data/qiime2/filtered/tree/masked-aligned-rep-seqs.qza

#create the tree using the Fasttree program
! qiime phylogeny fasttree \
  --i-alignment ../data/qiime2/filtered/tree/masked-aligned-rep-seqs.qza \
  --o-tree ../data/qiime2/filtered/tree/unrooted-tree.qza

#root the tree using the longest root
! qiime phylogeny midpoint-root \
  --i-tree ../data/qiime2/filtered/tree/unrooted-tree.qza \
  --o-rooted-tree ../data/qiime2/filtered/tree/rooted-tree.qza

mkdir: cannot create directory ‘../data/qiime2/filtered’: File exists
mkdir: cannot create directory ‘../data/qiime2/filtered/tree’: File exists
[32mSaved FeatureData[AlignedSequence] to: ../data/qiime2/filtered/tree/aligned-rep-seqs.qza[0m
[32mSaved FeatureData[AlignedSequence] to: ../data/qiime2/filtered/tree/masked-aligned-rep-seqs.qza[0m
[32mSaved Phylogeny[Unrooted] to: ../data/qiime2/filtered/tree/unrooted-tree.qza[0m
[32mSaved Phylogeny[Rooted] to: ../data/qiime2/filtered/tree/rooted-tree.qza[0m


In [7]:
Visualization.load('../data/qiime2/rep-seqs-deblur.qzv')

# Alpha Rarefaction

In [6]:
Visualization.load('../data/qiime2/table-deblur.qzv')

In [8]:
! qiime diversity alpha-rarefaction \
  --i-table ../data/qiime2/table-deblur.qza \
  --i-phylogeny ../data/qiime2/filtered/tree/rooted-tree.qza \
  --p-max-depth 13593 \
  --m-metadata-file ../data/metadata/metadata-selection.tsv \
  --o-visualization ../data/qiime2/filtered/alpha-rarefaction.qzv

[32mSaved Visualization to: ../data/qiime2/filtered/alpha-rarefaction.qzv[0m


In [9]:
Visualization.load('../data/qiime2/filtered/alpha-rarefaction.qzv')

# Calculate and explore diversity metrics

In [10]:
! rm -rf ../data/qiime2/filtered/core-metrics-results
! qiime diversity core-metrics-phylogenetic \
  --i-phylogeny ../data/qiime2/filtered/tree/rooted-tree.qza \
  --i-table ../data/qiime2/table-deblur.qza \
  --p-sampling-depth 13593 \
  --m-metadata-file ../data/metadata/metadata-selection.tsv \
  --output-dir ../data/qiime2/filtered/core-metrics-results

[32mSaved FeatureTable[Frequency] to: ../data/qiime2/filtered/core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ../data/qiime2/filtered/core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ../data/qiime2/filtered/core-metrics-results/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ../data/qiime2/filtered/core-metrics-results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ../data/qiime2/filtered/core-metrics-results/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: ../data/qiime2/filtered/core-metrics-results/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ../data/qiime2/filtered/core-metrics-results/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ../data/qiime2/filtered/core-metrics-results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ../data/qiime2/filtered/core-metrics-results/bray_curtis_distance_matr

In [19]:
#Visualization.load('../data/qiime2/filtered/core-metrics-results/unweighted_unifrac_emperor.qzv')
Visualization.load('../data/qiime2/filtered/core-metrics-results/weighted_unifrac_emperor.qzv')
#Visualization.load('../data/qiime2/filtered/core-metrics-results/jaccard_emperor.qzv')
#Visualization.load('../data/qiime2/filtered/core-metrics-results/bray_curtis_emperor.qzv')

In [11]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity ../data/qiime2/filtered/core-metrics-results/faith_pd_vector.qza \
  --m-metadata-file ../data/metadata/metadata-selection.tsv \
  --o-visualization ../data/qiime2/filtered/core-metrics-results/faith-pd-group-significance.qzv

! qiime diversity alpha-group-significance \
  --i-alpha-diversity ../data/qiime2/filtered/core-metrics-results/evenness_vector.qza \
  --m-metadata-file ../data/metadata/metadata-selection.tsv \
  --o-visualization ../data/qiime2/filtered/core-metrics-results/evenness-group-significance.qzv

! qiime diversity alpha-group-significance \
  --i-alpha-diversity ../data/qiime2/filtered/core-metrics-results/shannon_vector.qza \
  --m-metadata-file ../data/metadata/metadata-selection.tsv \
  --o-visualization ../data/qiime2/filtered/core-metrics-results/shannon_group-significance.qzv

[32mSaved Visualization to: ../data/qiime2/filtered/core-metrics-results/faith-pd-group-significance.qzv[0m
[32mSaved Visualization to: ../data/qiime2/filtered/core-metrics-results/evenness-group-significance.qzv[0m
[32mSaved Visualization to: ../data/qiime2/filtered/core-metrics-results/shannon_group-significance.qzv[0m


In [15]:
#Visualization.load('../data/qiime2/filtered/core-metrics-results/faith-pd-group-significance.qzv')
#Visualization.load('../data/qiime2/filtered/core-metrics-results/evenness-group-significance.qzv')
Visualization.load('../data/qiime2/filtered/core-metrics-results/shannon_group-significance.qzv')

# Assign Taxonomy

In [16]:
! qiime feature-classifier classify-sklearn \
    --p-n-jobs 8 \
    --i-classifier ../data/classifier/gg_13_8_otus/classifier.qza \
    --i-reads ../data/qiime2/rep-seqs-deblur.qza \
    --o-classification ../data/qiime2/filtered/taxonomy.qza \
    --verbose

[32mSaved FeatureData[Taxonomy] to: ../data/qiime2/filtered/taxonomy.qza[0m


In [17]:
! qiime metadata tabulate \
  --m-input-file ../data/qiime2/filtered/taxonomy.qza \
  --o-visualization ../data/qiime2/filtered/taxonomy.qzv

! qiime taxa barplot \
  --i-table ../data/qiime2/table-deblur.qza \
  --i-taxonomy ../data/qiime2/filtered/taxonomy.qza \
  --m-metadata-file ../data/metadata/metadata-selection.tsv \
  --o-visualization ../data/qiime2/filtered/taxa-bar-plots.qzv

[32mSaved Visualization to: ../data/qiime2/filtered/taxonomy.qzv[0m
[32mSaved Visualization to: ../data/qiime2/filtered/taxa-bar-plots.qzv[0m


'\n#first, export your data as a .biom\n! qiime tools export   ../data/qiime2/filtered/feature-table.qza   --output-dir ../data/qiime2/filtered/exported-feature-table\n\n#then export taxonomy info\n! qiime tools export   ../data/qiime2/filtered/taxonomy.qza   --output-dir ../data/qiime2/filtered/exported-feature-table\n\n#then combine the two using the biome package (dependence loaded as part of QIIME2 install)\n'

In [20]:
Visualization.load('../data/qiime2/filtered/taxonomy.qzv')

In [3]:
Visualization.load('../data/qiime2/filtered/taxa-bar-plots.qzv')

In [19]:
#first, export your data as a .biom
! qiime tools export \
  --input-path ../data/qiime2/table-deblur.qza \
  --output-path ../data/qiime2/filtered/exported-feature-table

#then export taxonomy info
! qiime tools export \
  --input-path ../data/qiime2/filtered/taxonomy.qza \
  --output-path ../data/qiime2/filtered/exported-feature-table

! qiime tools export \
    --input-path ../data/qiime2/rep-seqs-deblur.qza \
    --output-path ../data/qiime2/filtered/exported-feature-table

[32mExported ../data/qiime2/table-deblur.qza as BIOMV210DirFmt to directory ../data/qiime2/filtered/exported-feature-table[0m
[32mExported ../data/qiime2/filtered/taxonomy.qza as TSVTaxonomyDirectoryFormat to directory ../data/qiime2/filtered/exported-feature-table[0m
[32mExported ../data/qiime2/rep-seqs-deblur.qza as DNASequencesDirectoryFormat to directory ../data/qiime2/filtered/exported-feature-table[0m


In [20]:
! biom convert -i ../data/qiime2/filtered/exported-feature-table/feature-table.biom \
    -o ../data/qiime2/filtered/exported-feature-table/feature-table.tsv \
    --to-tsv

In [21]:
df_biom = pd.read_csv('../data/qiime2/filtered/exported-feature-table/feature-table.tsv', sep='\t', skiprows=1, index_col=0)
df_biom.columns = [i.replace('-','_') for i in df_biom.columns.values]
df_biom.index.name = 'asv'
df_biom

Unnamed: 0_level_0,P8_rep4,P5_rep4,P9_rep1,P9_rep5,P9_rep4,P5_rep1,P8_rep2,P8_rep1,P9_rep2,P9_rep3,P8_rep3,P5_rep5,P5_rep2,P5_rep3,P8_rep5
asv,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
e5400356daabbc5a41f935af70513043,1598.0,630.0,297.0,795.0,1449.0,359.0,2160.0,265.0,220.0,1736.0,1673.0,420.0,714.0,234.0,1850.0
7cbebea20e305a557ffc3dc23bae61ae,1111.0,36.0,127.0,335.0,610.0,24.0,1502.0,202.0,54.0,699.0,1089.0,26.0,43.0,19.0,1216.0
4dc5023fdc00f325e169fced16dca21c,1022.0,93.0,323.0,863.0,2013.0,50.0,1855.0,212.0,254.0,1387.0,1048.0,88.0,108.0,48.0,1028.0
4c076cffb4dc7aaae47cb237d7067066,739.0,942.0,104.0,347.0,814.0,590.0,611.0,113.0,183.0,1069.0,476.0,546.0,832.0,331.0,742.0
3c4c98cf9b1264b89f9ecd0812a0f7d8,632.0,195.0,224.0,571.0,1259.0,113.0,615.0,88.0,157.0,1190.0,530.0,135.0,200.0,76.0,611.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1b5c982a1bb0eb9715fa9be2ed108bf6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,0.0
19f24683ed9a487906f6a50bd09181a4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0
7e2632e260efa3086386801f5b1484ef,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,43.0
af6122ff00ffab21a39e62cbb656e011,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15.0


In [22]:
df_tax = pd.read_csv('../data/qiime2/filtered/exported-feature-table/taxonomy.tsv', sep='\t', index_col=0)

In [23]:
df_tax

Unnamed: 0_level_0,Taxon,Confidence
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1
e5400356daabbc5a41f935af70513043,k__Bacteria; p__Proteobacteria; c__Alphaproteo...,0.776749
7cbebea20e305a557ffc3dc23bae61ae,k__Bacteria; p__Actinobacteria; c__Actinobacte...,0.994210
4dc5023fdc00f325e169fced16dca21c,k__Bacteria; p__Verrucomicrobia; c__[Spartobac...,1.000000
4c076cffb4dc7aaae47cb237d7067066,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bac...,0.999167
3c4c98cf9b1264b89f9ecd0812a0f7d8,k__Bacteria; p__Proteobacteria; c__Alphaproteo...,0.999924
...,...,...
1b5c982a1bb0eb9715fa9be2ed108bf6,k__Bacteria; p__Acidobacteria; c__iii1-8; o__D...,1.000000
19f24683ed9a487906f6a50bd09181a4,k__Bacteria; p__Proteobacteria; c__Alphaproteo...,0.794185
7e2632e260efa3086386801f5b1484ef,k__Bacteria; p__Verrucomicrobia; c__[Spartobac...,0.999990
af6122ff00ffab21a39e62cbb656e011,k__Bacteria; p__Acidobacteria; c__Acidobacteri...,0.999949
