# Exporting QIIME2 data for PhyloSeq Analysis

###### Pedro J. Torres 2018

By this point you should have already gone through most of your qiime2 tutorial. Files that are needed are:
1. Mapping File
2. unrooted-tree.qza
3. taxonomy.qza
4. table.qza



In [1]:
# Export tree
!qiime tools export unrooted-tree.qza \
    --output-dir Phyloseq

In [6]:
# Export taxonomy
!qiime tools export taxonomy.qza \
    --output-dir Phyloseq

In [8]:
# Export table
!qiime tools export filtered-table.qza \
    --output-dir Phyloseq

In [17]:
%%bash
# Our Phyloseq folder should now have the following files
ls Phyloseq

feature-table.biom
sample-metadata.tsv
taxonomy.tsv
tree.nwk


### Next we will odify the exported taxonomy file's header before using it with BIOM software. 

In [19]:
%%bash
# This is what our current header looks like in our taxonomy file
head -n 2 Phyloseq/taxonomy.tsv

Feature ID	Taxon	Confidence
401bd8572d676efe34bb69cd84144424	Unassigned	0.9556681222602029


- The first line in our taxonomy file must be changes to 
#OTUID	taxonomy	confidence

In [20]:
%%bash
# Following script will change the first line of our file to desired header
sed 's/Feature ID/#OTUID/' Phyloseq/taxonomy.tsv | sed 's/Taxon/taxonomy/' | sed 's/Confidence/confidence/' > Phyloseq/biom-taxonomy.tsv

In [21]:
%%bash 
head -n 2 Phyloseq/biom-taxonomy.tsv

#OTUID	taxonomy	confidence
401bd8572d676efe34bb69cd84144424	Unassigned	0.9556681222602029


### Add the taxonomy data to your biom file

In [None]:
%%bash
biom add-metadata \
    -i Phyloseq/feature-table.biom \
    -o Phyloseq/table-with-taxonomyv2.biom \
    --observation-metadata-fp Phyloseq/biom-taxonomy.tsv \
    --sc-separated taxonomy 

In [1]:
%%bash
#change into our Phyloseq directory
cd Phyloseq
ls

biom-taxonomy.tsv
feature-table.biom
sample-metadata.tsv
table-with-taxonomy.biom
taxonomy.tsv
tree.nwk


## Now that we have the necessary files we will hop onto Phyloseq

In [1]:
# Install and load R pakcages that are necessary for the analysis - Packages are collections of R functions, data, 
# and compiled code in a well-defined format. Remove the hash sign to download and install the packages.

#source('http://bioconductor.org/biocLite.R')
#biocLite('phyloseq')
library("phyloseq")
packageVersion("phyloseq")

#biocLite("biomformat")
library("biomformat")
packageVersion("biomformat")

#install.packages("ggplot2")
library("ggplot2")
packageVersion("ggplot2")

#install.packages("vegan")
library("vegan")
packageVersion('vegan')

#install.packages("grid")
library("grid")
packageVersion('grid')

#install.packages("magrittr")
library(magrittr)
packageVersion('magrittr')

library(dplyr)
packageVersion('dplyr')

library(plyr)
packageVersion('plyr')

library(broom)
packageVersion('broom')

library('stringr')
packageVersion('stringr')

[1] ‘1.19.1’

[1] ‘1.2.0’

[1] ‘2.2.1’

Loading required package: permute
Loading required package: lattice
This is vegan 2.4-6


[1] ‘2.4.6’

[1] ‘3.4.1’

[1] ‘1.5’


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



[1] ‘0.7.4’

------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize



[1] ‘1.8.4’

[1] ‘0.4.3’

[1] ‘1.2.0’

## Load Data into PhyloSeq Object

In [5]:
getwd()

In [7]:
#add biome table, tree and metadata
biom_data <- import_biom(BIOMfilename = "table-with-taxonomyv2.biom", 
                         treefilename = "tree.nwk")
mapping_file <- import_qiime_sample_data(mapfilename = "sample-metadata.tsv")

# if the above script doesn't work, try including the full file path not just the file name

“input string 1 is invalid in this locale”

In [8]:
# Merge the OTU and mapping data into a phyloseq object
phylo <- merge_phyloseq(biom_data, mapping_file)
#Add names to biom table and check phyloseq objects
colnames(tax_table(phylo))= c("Kingdom","Phylum","Class","Order","Family","Genus", "Species")
rank_names(phylo)

In [9]:
# Start to explore the data a bit 
#number of samples
print ('Number of Samples in our Biom Table')
nsamples(phylo)
# number of sequence variants
print ('Number of Sequence variants we have.')
ntaxa(phylo)
#summary statistics of sampling depth
print ('Sequencing depth.')
depths <- sample_sums(phylo)
summary(depths)

# We see that we have a sample with a very low sequencing depth of 19 we will remove this sample

[1] "Number of Samples in our Biom Table"


[1] "Number of Sequence variants we have."


[1] "Sequencing depth."


   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    239   43602   47714   48348   55020   90198 

# There you go! 
