Skip to content
Speedy implementations of phyloseq functions
R
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R Refactor psmelt() to use data.table Jan 28, 2020
man Fix errors found by package check Aug 22, 2019
tests Adjust psmelt() sorting and add tests Aug 22, 2019
.Rbuildignore Fix errors found by package check Aug 22, 2019
.travis.yml Set up Travis CI and Codecov Aug 22, 2019
DESCRIPTION
LICENSE Change license to AGPL-3 Nov 8, 2019
NAMESPACE Import phyloseq functions for internal use. Apr 9, 2019
README.Rmd Add DOI badge Oct 10, 2019
README.md Refactor psmelt() to use data.table Jan 28, 2020

README.md

speedyseq

DOI Travis build status Codecov test coverage

The goal of speedyseq is to accelerate common operations that are currently very slow in phyloseq: the function psmelt (and the plotting functions that use it) and the taxonomic aggregation functions tax_glom() and (hopefully eventually) tip_glom().

The current version of speedyseq reimplements psmelt() to be faster than phyloseq’s version, and includes copies of plot_bar(), plot_heatmap(), and plot_tree() so that when called from speedyseq the faster psmelt() will be used. It also implements a faster version of tax_glom().

Installation

Install with devtools

# install.packages("devtools")
devtools::install_github("mikemc/speedyseq")

Usage

Method 1: Call speedyseq functions explicitly when you want to use speedyseq’s version instead of phyloseq:

library(phyloseq)
data(GlobalPatterns)
system.time(
    # Calls phyloseq's psmelt
    df1 <- psmelt(GlobalPatterns) # slow
)
#>    user  system elapsed 
#>  91.843   0.147  92.233
system.time(
    df2 <- speedyseq::psmelt(GlobalPatterns) # fast
)
#>    user  system elapsed 
#>   0.554   0.000   0.563
dplyr::all_equal(df1, df2, ignore_row_order = TRUE)
#> [1] TRUE
detach(package:phyloseq)

Method 2: Load speedyseq, which will load phyloseq and cause calls to the overlapping function names to go to speedyseq by default:

library(speedyseq)
#> Loading required package: phyloseq
#> 
#> Attaching package: 'speedyseq'
#> The following objects are masked from 'package:phyloseq':
#> 
#>     plot_bar, plot_heatmap, plot_tree, psmelt, tax_glom
data(GlobalPatterns)
system.time(
    ps1 <- phyloseq::tax_glom(GlobalPatterns, "Genus") # slow
)
#>    user  system elapsed 
#>  37.535   0.073  37.688
system.time(
    # Calls speedyseq's tax_glom
    ps2 <- tax_glom(GlobalPatterns, "Genus") # fast
)
#>    user  system elapsed 
#>   0.275   0.000   0.275
all.equal(taxa_names(ps1), taxa_names(ps2))
#> [1] TRUE
all.equal(otu_table(ps1), otu_table(ps2))
#> [1] TRUE
all.equal(tax_table(ps1), tax_table(ps2))
#> [1] TRUE
all.equal(phy_tree(ps1), phy_tree(ps2))
#> [1] TRUE

Notes

My aim is for these functions to be drop-in replacements for phyloseq’s versions. See the unit tests in the tests/ folder for the tests for functional equivalence I’ve implemented so far. The tax_glom() functions should produce identical results to phyloseq’s version, though I have only implemented tests using the default NArm and bad_empty arguments. The psmelt() function in phyloseq drops columns of taxonomy data that are all NA (such as after performing a tax_glom()), and returns a data frame with extraneous row names. Speedyseq’s psmelt() will not drop columns and does not return row names. Both functions sort rows by the Abundance and OTU columns, but the row order can differ in cases of ties for both variables. If you have any problems or find any discrepancies from phyloseq’s behavior, please post an issue. Warning: Like phyloseq, speedyseq’s psmelt() will convert your taxonomy variables to factors if getOption("stringsAsFactors") is TRUE.

You can’t perform that action at this time.