# IMPORTANT: Unlike qiime1, qiim2 uses [view.qiime2.org](https://view.qiime2.org) to visualize the graphics and tables

### Userful resources for Qiime 2:
- a) http://caporasolab.us/teaching/courses/2017.01-450/homework_assignments.html
- b) https://docs.qiime2.org/2017.8/tutorials/

# Step 1: Downloading the data

In [1]:
!curl -sL "https://data.qiime2.org/2017.8/tutorials/moving-pictures/sample_metadata.tsv" > "sample-metadata.tsv"

In [3]:
import os 

In [4]:
ls

newqiime.ipynb  [0m[01;34mqiime_overview_tutorial[0m/     sample-metadata.tsv
oldqiime.ipynb  qiime_overview_tutorial.zip


## Note: .tsv files are tab separated value files

In [5]:
cat sample-metadata.tsv

#SampleID	BarcodeSequence	LinkerPrimerSequence	BodySite	Year	Month	Day	Subject	ReportedAntibioticUsage	DaysSinceExperimentStart	Description
L1S8	AGCTGACTAGTC	GTGCCAGCMGCCGCGGTAA	gut	2008	10	28	subject-1	Yes	0	subject-1.gut.2008-10-28
L1S57	ACACACTATGGC	GTGCCAGCMGCCGCGGTAA	gut	2009	1	20	subject-1	No	84	subject-1.gut.2009-1-20
L1S76	ACTACGTGTGGT	GTGCCAGCMGCCGCGGTAA	gut	2009	2	17	subject-1	No	112	subject-1.gut.2009-2-17
L1S105	AGTGCGATGCGT	GTGCCAGCMGCCGCGGTAA	gut	2009	3	17	subject-1	No	140	subject-1.gut.2009-3-17
L2S155	ACGATGCGACCA	GTGCCAGCMGCCGCGGTAA	left palm	2009	1	20	subject-1	No	84	subject-1.left-palm.2009-1-20
L2S175	AGCTATCCACGA	GTGCCAGCMGCCGCGGTAA	left palm	2009	2	17	subject-1	No	112	subject-1.left-palm.2009-2-17
L2S204	ATGCAGCTCAGT	GTGCCAGCMGCCGCGGTAA	left palm	2009	3	17	subject-1	No	140	subject-1.left-palm.2009-3-17
L2S222	CACGTGACATGT	GTGCCAGCMGCCGCGGTAA	left palm	2009	4	14	subject-1	No	168	subject-1.left-palm.2009-4-14
L3S242	ACAGTTGCGCGA	GTGCCAGCMGCCGCGGTAA

In [7]:
!mkdir emp-single-end-sequences

In [8]:
!curl -sL "https://data.qiime2.org/2017.8/tutorials/moving-pictures/emp-single-end-sequences/barcodes.fastq.gz" > "emp-single-end-sequences/barcodes.fastq.gz"

In [9]:
!curl -sL "https://data.qiime2.org/2017.8/tutorials/moving-pictures/emp-single-end-sequences/sequences.fastq.gz" > "emp-single-end-sequences/sequences.fastq.gz"

# Step 2: Importing data using qiime tools

In [10]:
!qiime tools import \
  --type EMPSingleEndSequences \
  --input-path emp-single-end-sequences \
  --output-path emp-single-end-sequences.qza

output can be found [here](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Femp-single-end-sequences.qza)

In [13]:
ls

[0m[01;34memp-single-end-sequences[0m/     oldqiime.ipynb               sample-metadata.tsv
emp-single-end-sequences.qza  [01;34mqiime_overview_tutorial[0m/
newqiime.ipynb                qiime_overview_tutorial.zip


## Note: a .qza file is a qiima artifact file

# Step 3: Demultiplexing sequences 

Pooling multiple samples increases the efficiency and lowers the cost of DNA sequencing. One approach to multiplexing is to use short DNA indices to uniquely identify each sample. After sequencing, reads must be assigned in silico to the sample of origin, a process referred to as demultiplexing.

In [14]:
!qiime demux emp-single \
  --i-seqs emp-single-end-sequences.qza \
  --m-barcodes-file sample-metadata.tsv \
  --m-barcodes-category BarcodeSequence \
  --o-per-sample-sequences demux.qza

[32mSaved SampleData[SequencesWithQuality] to: demux.qza[0m


In [16]:
!qiime demux summarize \
  --i-data demux.qza \
  --o-visualization demux.qzv

[32mSaved Visualization to: demux.qzv[0m


[artifact](https://view.qiime2.org/?src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Fdemux.qza)
[visualization](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Fdemux.qzv)

## Note: a qzv file is a qiima visualization file and unfortunately cannot be run over a jupyter server at this time

In [17]:
!qiime tools view demux.qzv

Usage: qiime tools view [OPTIONS] VISUALIZATION_PATH

Error: Visualization viewing is currently not supported in headless environments. You can view Visualizations (and Artifacts) at https://view.qiime2.org, or move the Visualization to an environment with a display and view it with `qiime tools view`.


# Step 4: Feature table construction and quality control

DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data.

In [18]:
!qiime dada2 denoise-single \
  --i-demultiplexed-seqs demux.qza \
  --p-trim-left 0 \
  --p-trunc-len 120 \
  --o-representative-sequences rep-seqs-dada2.qza \
  --o-table table-dada2.qza

[32mSaved FeatureTable[Frequency] to: table-dada2.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs-dada2.qza[0m


In [20]:
!mv rep-seqs-dada2.qza rep-seqs.qza
!mv table-dada2.qza table.qza

[rep-seqs](https://view.qiime2.org/peek/?src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Frep-seqs.qza)
[table](https://view.qiime2.org/peek/?src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Ftable.qza)

In [21]:
!qiime feature-table summarize \
  --i-table table.qza \
  --o-visualization table.qzv \
  --m-sample-metadata-file sample-metadata.tsv
!qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

[32mSaved Visualization to: table.qzv[0m
[32mSaved Visualization to: rep-seqs.qzv[0m


[rep-seqs](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Frep-seqs.qzv)
[table](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Ftable.qzv)

# Step 5: Generate phylogenetic tree for diversity analysis

In [23]:
# multi-sequence alignment
!qiime alignment mafft \
  --i-sequences rep-seqs.qza \
  --o-alignment aligned-rep-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: aligned-rep-seqs.qza[0m


[result here](https://view.qiime2.org/peek/?src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Faligned-rep-seqs.qza)

In [25]:
# mask to remove highly variable alignment results
!qiime alignment mask \
  --i-alignment aligned-rep-seqs.qza \
  --o-masked-alignment masked-aligned-rep-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: masked-aligned-rep-seqs.qza[0m


[masked result here](https://view.qiime2.org/peek/?src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Fmasked-aligned-rep-seqs.qza)

In [27]:
# FastTree to quickly construct an unrooted phylogenetic tree (evolutionary diversification, etc.)
!qiime phylogeny fasttree \
  --i-alignment masked-aligned-rep-seqs.qza \
  --o-tree unrooted-tree.qza

[32mSaved Phylogeny[Unrooted] to: unrooted-tree.qza[0m


[unrooted tree](https://view.qiime2.org/peek/?src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Funrooted-tree.qza)

In [28]:
# Root the previously constructed tree at the midpoint of the longest tip-to-tip distance in the unrooted tree
!qiime phylogeny midpoint-root \
  --i-tree unrooted-tree.qza \
  --o-rooted-tree rooted-tree.qza

[32mSaved Phylogeny[Rooted] to: rooted-tree.qza[0m


[rooted tree](https://view.qiime2.org/peek/?src=https%3A%2F%2Fdocs.qiime2.org%2F2017.8%2Fdata%2Ftutorials%2Fmoving-pictures%2Frooted-tree.qza)

# Notes about Qiime2 vs Qiime1:

Qiime2 is a lot crisper in terms of a single unifying program suite - it's very satisfying not needing to run each individual python script anymore. In addition the visualization software is very nice as well. One thing that I do miss is the flexibility of Qiime1 - you could modify the visualization output script directly to use over headless display for example. The actual pipeline itself is relatively similar due to Qiime2 seeming to be use most of the same scripts as Qiime1.