The goals of this script are:
1. To assign taxonomy to all the sequences in the feature table. We get to find out who these features are!

## Python 3 API import qiime plugins

In [3]:
from qiime2 import Visualization
from qiime2 import Artifact

# [Taxonomic Classification](https://docs.qiime2.org/2023.5/tutorials/moving-pictures-usage/#taxonomic-analysis:~:text=and%20bottom%20plots.)

In the next sections we’ll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our FeatureData[Sequence] QIIME 2 artifact. We’ll do that using a pre-trained Naive Bayes classifier and the q2-feature-classifier plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We’ll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.

Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in Training feature classifiers with q2-feature-classifier to train your own taxonomic classifiers. We provide some common classifiers on our data resources page, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.

We can learn a lot from diversity metrics, alpha and beta. But to really dig into the data, we need to know what microbes are in each sample 🦠. To do this, we'll classify the reads in QIIME2 using a Bayesian classifier. Several such classifiers are available at https://docs.qiime2.org/2023.9/data-resources/#taxonomy-classifiers-for-use-with-q2-feature-classifier

In [None]:
!pip install wget
import wget

In [78]:
# Specify the directory where you want to save the downloaded file
output_directory = '../data/taxonomy-classifiers/'

# Download the file and save it to the specified directory
wget.download('https://data.qiime2.org/2023.9/common/silva-138-99-nb-classifier.qza', out=output_directory)
wget.download('https://data.qiime2.org/2023.9/common/silva-138-99-515-806-nb-classifier.qza', out=output_directory)

100% [......................................................................] 148294965 / 148294965

'../data/taxonomy-classifiers//silva-138-99-515-806-nb-classifier.qza'

In [None]:
# other potential taxonomy classifiers to get/try?
https://data.qiime2.org/2023.9/common/silva-138-99-nb-classifier.qza
https://data.qiime2.org/2023.9/common/silva-138-99-515-806-nb-classifier.qza
https://data.qiime2.org/classifiers/greengenes/gg_2022_10_backbone_full_length.nb.qza
https://data.qiime2.org/classifiers/greengenes/gg_2022_10_backbone.v4.nb.qza
http://ftp.microbio.me/greengenes_release/2022.10/

In [79]:
!qiime feature-classifier classify-sklearn \
    --i-reads ../output/dada/representative_sequences.qza \
    --i-classifier ../data/taxonomy-classifier/silva-138-99-515-806.qza \
    --p-n-jobs 2 \
    --o-classification ../output/taxon/taxa.qza

Usage: [94mqiime feature-classifier classify-sklearn[0m [OPTIONS]

  Classify reads by taxon using a fitted classifier.

[1mInputs[0m:
  [94m[4m--i-reads[0m ARTIFACT [32mFeatureData[Sequence][0m
                         The feature data to be classified.         [35m[required][0m
  [94m[4m--i-classifier[0m ARTIFACT
    [32mTaxonomicClassifier[0m  The taxonomic classifier for classifying the reads.
                                                                    [35m[required][0m
[1mParameters[0m:
  [94m--p-reads-per-batch[0m VALUE [32mInt % Range(1, None) | Str % Choices('auto')[0m
                         Number of reads to process in each batch. If "auto",
                         this parameter is autoscaled to min( number of query
                         sequences / [4mn-jobs[0m, 20000).         [35m[default: 'auto'][0m
  [94m--p-n-jobs[0m INTEGER     The maximum number of concurrently worker processes.
                         If -1 all CPUs are u

## Silva 515/806

In [82]:
!qiime feature-classifier classify-sklearn \
    --i-reads ../output/dada/representative_sequences.qza \
    --i-classifier ../data/taxonomy-classifiers/silva-138-99-515-806-nb-classifier.qza \
    --p-n-jobs 2 \
    --output-dir ../output/taxonomy

[32mSaved FeatureData[Taxonomy] to: ../output/taxonomy/classification.qza[0m
[0m

In [1]:
!qiime taxa barplot \
    --i-table ../output/dada/table.qza \
    --i-taxonomy ../output/taxonomy/classification.qza \
    --m-metadata-file ../data/sample-metadata-verbose.tsv \
    --o-visualization ../output/taxonomy/taxa_barplot.qzv

[32mSaved Visualization to: ../output/taxonomy/taxa_barplot.qzv[0m
[0m

In [4]:
Visualization.load("../output/taxonomy/taxa_barplot.qzv")

# Next Steps
<br>
- Filter all the bacteria that showed up in the 'blank' sample out of the treatment samples
<br>
- How do I interpret/integrate the mock community?
<br>
- Remove chloroplasts
<br>
- I want to know which taxa are uniquely present in each treatment, and which ones stay the same
 - ANCOM, composition plugin (apply ANalysis of Composition of Microbiomes (ANCOM) to identify features that are differentially abundant across groups
<br>
- Use [PICRUST](https://library.qiime2.org/plugins/q2-picrust2/13/) for functional analysis
