# Assign Taxonomy

The next step is to assign taxonomy by first training a naive-bayes classifier on a known reference set (SILVA 132 ref) by targetting the same region as our primers. The next step is to fit that classifier onto our representative sequences. Just like the DADA2 pipeline, it is best to run each of these steps in a `tmux` session.

## Import SILVA data

import the reference sequences and the reference taxonomy. The SILVA reference set formatted for qiime2 is located [here](https://www.arb-silva.de/download/archive/qiime/). We will import the 99% references sequences and the majority 7 level taxonomy files. 

**note**: the training and classifying is being performed in a subdirectory `train-feature-classifier`.

In [None]:
%%bash
qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path train-feature-classifier/silva132_99.fna \
  --output-path train-feature-classifier/silva132_99.qza

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path train-feature-classifier/majority_taxonomy_7_levels.txt \
  --output-path train-feature-classifier/majority_taxonomy_7_levels.qza

## Extract reference reads

Extract the target region from the reference reads. Use the primer pair noted in "Heather Biofilm Data Analysis.doc" shared file.

In [None]:
%%bash
qiime feature-classifier extract-reads \
  --i-sequences train-feature-classifier/silva132_99.qza \
  --p-f-primer GCCTACGGGNGGCWGCAG \
  --p-r-primer GGACTACHVGGGTATCTAATCC \
  --o-reads train-feature-classifier/silva132_99_trained_dataset.qza

## Train the feature classifier

In [None]:
%%bash
qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads train-feature-classifier/silva132_99_trained_dataset.qza \
  --i-reference-taxonomy train-feature-classifier/majority_7_levels.qza \
  --o-classifier train-feature-classifier/classifier.qza

## Fit the classifier to our representative reads

In [None]:
%%bash
qiime feature-classifier classify-sklearn \
    --i-classifier train-classifier/classifier.qza \
    --i-reads rep-seqs.qza \
    --p-n-jobs 32 \
    --o-classification taxonomy.qza

## Inspect the resulting taxonomy

In [None]:
%%bash
qiime metadata tabulate \
    --m-input-file taxonomy.qza \
    --o-visualization taxonomy.qzv