# The phyloseq package

Phyloseq is a package made for organizing and working with microbiome data in R. With the phyloseq package we can have all our microbiome amplicon sequence data in a single R object. With functions from the phyloseq package, most common operations for preparing data for analysis is possible with few simple commands.

This document is an overview on how phyloseq objects are organized and how they can be accessed and changed.

The paper presenting phyloseq:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0061217

A comprehensive documetation of the phyloseq package:
https://joey711.github.io/phyloseq/

To work with phyloseq objects we first have to load the package

In [None]:
library(phyloseq)

# The phyloseq object

Let's load our test dataset, and see how phyloseq is organized. 

In [None]:
load("physeq.RData")

If we print the name of the phyloseq object, we can see what it contains

In [None]:
phy

The *phy* object contains all our data and associated metadata. This is organized in 5 different sub-objects:
* **otu_table:** Contains a matrix with the abundance of each taxa (ASV) for each sample
* **sample_data:** Contains the metadata for each sample
* **tax_table:** Contains the taxonomical annotation for each taxon (ASV)
* **phy_tree:** Contains a phylogenetic tree
* **refseq:** Contains sequences (16S rRNA gene sequence) for each taxon (ASV)

*Note:* "*phy*" is an arbitrary name, it could be anything else

Below is a section on each of the objects describing what they contain and how to access them.

## otu_table
The otu_table contains the abundance of each OTU/ASV for each sample. We can see from above that it contains data for 1428 taxa and 150 samples. We can access it with the otu_table() function

In [None]:
otu_table(phy)

Here we can see that ASV_2 was not detected in sample S_001, but that 193 reads from sample S_144 was assigned to ASV_2, and so on.

We can subset specific taxa with the **object**[subset] notation

In [None]:
otu_table(phy)["ASV_3"]

In [None]:
otu_table(phy)[c("ASV_2", "ASV_15")]

Similarly for samples by preceeding a , inside the [ ]. (For the sake of this tutoial, we use head() to only print the first 6 rows)

In [None]:
head(otu_table(phy)[, "S_006"])

These operations can be combined:

In [None]:
otu_table(phy)["ASV_2", c("S_006", "S_144")]

## sample_data
The sample_data object contains metadata for our samples. We can access it with the sample_data() function. (For the sake of this tutoial, we use head() to only print the first 6 rows)

In [None]:
head(sample_data(phy))

We can subset it in the same way as we did with the otu_table.

In [None]:
sample_data(phy)["S_001",]

In [None]:
sample_data(phy)[c("S_002", "S_150"), c("Patient", "Time")]

## tax_table

The tax_table contains the taxonomical annotations of our taxa/ASVs. 
It can optionally also contain other metadata on our taxa/ASVs.
We can access it with the tax_table() function.

Subsetting is done as with the other objects

In [None]:
tax_table(phy)[c("ASV_1", "ASV_10")]

## phy_tree

The phy_tree contains our phylogenetic tree, constructed from an aligment of the 16S rRNA gene sequences of our ASVs. We can access it with the phy_tree() function.

In [None]:
phy_tree(phy)

This prints some basic info about our tree, which we can access with the $ notation

In [None]:
# The 10 first labels:
phy_tree(phy)$tip.label[1:10]

and we can plot it (*cex* sets the size of the labels):

In [None]:
plot(phy_tree(phy), cex = 0.5)

## refseq
refseq contains the actual DNA sequences of our ASVs (or alternatively the reference sequences of OTUs). We can access it with the refseq() function.

In [None]:
refseq(phy)

Again, we can subset with the [ ] notation

In [None]:
refseq(phy)["ASV_1"]

To see the entire sequence, convert it to a string ("character" in R jargon)

In [None]:
as.character(refseq(phy)[c("ASV_10", "ASV_2")])