Skip to content

A continually expanding collection of microbiome analysis tools

License

Notifications You must be signed in to change notification settings

mdozmorov/Microbiome_notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 

Repository files navigation

A continually expanding collection of microbiome analysis tools

These notes are not intended to be comprehensive. They include notes about methods, packages and tools I am learing and/or would like to explore. The tools are in process of being listed as newest on top.

For a comprehensive overview of the subject, consider other bioinformatics resources and collections of links to various resources. Issues with suggestions and pull requests are welcome!

Table of content

  • awesome-microbes - List of computational resources for analyzing microbial sequencing data. Databases, tools, publications. Tweet

Pipelines

  • QIIME 2 - microbiome bioinformatics platform. Plugin architecture supporting latest-generation tools for different technologies and pipelines. Interactive visualization tools. Software Development Kit SDK. API and CLI interfaces, CWL wrapper available. GitHub.
    Paper Bolyen, Evan, Jai Ram Rideout, Matthew R. Dillon, Nicholas A. Bokulich, Christian C. Abnet, Gabriel A. Al-Ghalith, Harriet Alexander, et al. “Reproducible, Interactive, Scalable and Extensible Microbiome Data Science Using QIIME 2.” Nature Biotechnology, July 24, 2019. https://doi.org/10.1038/s41587-019-0209-9.
  • MicrobiotaProcess - an R package for analysis, visualization and biomarker discovery of microbiome. Input - dada2 or qiime2 processed data. https://github.com/YuLab-SMU/MicrobiotaProcess

  • ATLAS - Three commands to start analysing your metagenome data. Documentation, https://metagenome-atlas.readthedocs.io/en/latest/, GitHub, https://github.com/metagenome-atlas/atlas

    • ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data Silas Kieser, Joseph Brown, Evgeny M Zdobnov, Mirko Trajkovski, Lee Ann McCue bioRxiv 737528; August 2019 doi: https://doi.org/10.1101/737528
  • HiMAP - high-resolution microbial analysis pipeline for 16S data analysis. Wraps many DADA2 functions. Comparison with DADA2, QIIME, detects more species. https://github.com/taolonglab/himap

    • Segota, Igor, and Tao Long. “A High-Resolution Pipeline for 16S-Sequencing Identifies Bacterial Strains in Human Microbiome.” BioRxiv, March 4, 2019. https://doi.org/10.1101/565572.
  • SqueezeMeta - a pipeline for metagenomics/metatranscriptomics for co-assembly (SPAdes, Canu), gene and rRNA prediction (prodigal, RDP classifier), binning, gene abundance estimation, taxonomic annotation (fast LCA). Support for MinION nanopore sequencing data (long, error-prone reads). Table 1 - comparison with other pipelines. https://github.com/jtamames/SqueezeMeta

    • Tamames, Javier, and Fernando Puente-Sánchez. “SqueezeMeta, A Highly Portable, Fully Automatic Metagenomic Analysis Pipeline.” Frontiers in Microbiology 9 (January 24, 2019): 3349. https://doi.org/10.3389/fmicb.2018.03349.
  • HUMAnN2: The HMP Unified Metabolic Analysis Network 2 - functional profiling and pathway reconstruction of metagenomes. Tiered approach: 1) Screening for known species with MetaPhlAn2; 2) mapping against pangenomes; 3) mapping against protein sequences. These mappings can help to assign metabolic and functional annotations. http://huttenhower.sph.harvard.edu/humann2

    • Franzosa, Eric A., Lauren J. McIver, Gholamali Rahnavard, Luke R. Thompson, Melanie Schirmer, George Weingart, Karen Schwarzberg Lipson, et al. “Species-Level Functional Profiling of Metagenomes and Metatranscriptomes.” Nature Methods 15, no. 11 (November 2018): 962–68. https://doi.org/10.1038/s41592-018-0176-y.
  • MetaMap - microbial composition in host's RNA-seq data, a resource and a pipeline. Pipeline, https://www.protocols.io/view/metamap-pipeline-msec6be/metadata, Data and R tutorial, https://github.com/theislab/MetaMap

  • bioBakery - an environment for metagenomics analysis. VM running on Vagrantr/VirtualBox, Docker image, Google Cloud and Amazon Machine Image. Homebrew/Linuxbrew installation. AnADAMA2 controls the workflows. Wiki, https://bitbucket.org/biobakery/biobakery/wiki/Home, workflows and tutorials, http://huttenhower.sph.harvard.edu/biobakery_workflows

    • McIver, Lauren J, Galeb Abu-Ali, Eric A Franzosa, Randall Schwager, Xochitl C Morgan, Levi Waldron, Nicola Segata, and Curtis Huttenhower. “BioBakery: A Meta’omic Analysis Environment.” Edited by John Hancock. Bioinformatics 34, no. 7 (April 1, 2018): 1235–37. https://doi.org/10.1093/bioinformatics/btx754.
  • Microbiome Helper - wrapper scripts and tutorials for metagenomics analysis. https://github.com/LangilleLab/microbiome_helper/wiki.

    • Comeau, André M., Gavin M. Douglas, and Morgan G. I. Langille. “Microbiome Helper: A Custom and Streamlined Workflow for Microbiome Research.” Edited by Jonathan Eisen. MSystems 2, no. 1 (February 28, 2017). https://doi.org/10.1128/mSystems.00127-16.
  • F1000_workflow - Microbiome workflow. RSV instead of OTU. Data preprocessing from raw reads. DADA2 pipeline, ASV summary tables using RDP (Greengenes and SILVA are available), phylogenetic tree reconstruction (pangorn). phyloseq downstream analysis, from filtering to agglomeration, transformation, various ordination visualizations (from PCoA, DPCoA, rank PCA, to CCA), supervised learning, graph-based visualization and testing, multi-omics analyses. https://github.com/spholmes/F1000_workflow

    • Callahan, Ben J., Kris Sankaran, Julia A. Fukuyama, Paul J. McMurdie, and Susan P. Holmes. “Bioconductor Workflow for Microbiome Data Analysis: From Raw Reads to Community Analyses.” F1000Research 5 (2016): 1492. https://doi.org/10.12688/f1000research.8986.2.
  • Deblur - resolves Illumina sequencing errors and creates sub-operational taxonomic unit (sOTU) clusters. Operates on individual samples. Plugin for QIIME2 exists. Competing methods - DADA2, UNOISE2. Methods in the supplementary text S1. https://github.com/biocore/deblur

    • Amir, Amnon, Daniel McDonald, Jose A. Navas-Molina, Evguenia Kopylova, James T. Morton, Zhenjiang Zech Xu, Eric P. Kightley, et al. “Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns.” Edited by Jack A. Gilbert. MSystems 2, no. 2 (April 25, 2017). https://doi.org/10.1128/mSystems.00191-16.
  • DADA2 - resolves sequencing errors and reconstructs sequences for finer-resolution clustering. Complete pipeline to process PI FASTQ into merged, denoised, chimera-free, error-corrected sample sequences. The error model quantifies the rate $\lambda_{ij}$ at which an amplicon read with sequence $i$ is produced from sample sequence $j$ as a function of sequence composition and quality, Poisson distribution. The NCBI RefSeq 16S rrna database (RefSeq) and the Genome Taxonomy Database (GTDB) are both now available to use with dada2's assignTaxonomy function! https://zenodo.org/record/2541239#.XEyoLc9Kjfa. DADA2 page: https://github.com/benjjneb/dada2. A DADA2 workflow for Big Data, https://benjjneb.github.io/dada2/bigdata.html

    • Callahan, Benjamin J., Paul J. McMurdie, Michael J. Rosen, Andrew W. Han, Amy Jo A. Johnson, and Susan P. Holmes. “DADA2: High-Resolution Sample Inference from Illumina Amplicon Data.” Nature Methods 13, no. 7 (2016): 581–83. https://doi.org/10.1038/nmeth.3869.
  • microbial-rnaseq - microbial composition from host's RNA-seq data, https://github.com/FredHutch/microbial-rnaseq

Downstream analysis

  • Calour - Heatmap-based visual exploration of microbiome data. Input - sOTU table (Deblur-processed) and phenodata. Normalization,sorting, filtering, interface with annotation databases, machine learning methods from scikit-learn. Python, Jupyter notebooks. http://biocore.github.io/calour/

    • Xu, Z.Z., Amir, A., Sanders, J., Zhu, Q., Morton, J.T., Bletz, M.C., Tripathi, A., Huang, S., McDonald, D., Jiang, L., et al. (2019). Calour: an Interactive, Microbe-Centric Analysis Tool. MSystems 4, e00269-18.
  • Decontam - an R package to remove contaminating reads from meta-genomic sequencing. Removes high-frequency contaminants and sequences from contaminating taxa (higher prevalence). Helps to reduce variability due to sequencing center or DNA extraction kit. https://github.com/benjjneb/decontam, Vignette

    • Davis, Nicole M., Diana M. Proctor, Susan P. Holmes, David A. Relman, and Benjamin J. Callahan. “Simple Statistical Identification and Removal of Contaminant Sequences in Marker-Gene and Metagenomics Data.” Microbiome 6, no. 1 (December 2018). https://doi.org/10.1186/s40168-018-0605-2.
  • microbiome R package with rich set of functions for microbiome analysis, visualization, statistical analysis. Supports phyloseq objects. Leo Lahti, Sudarshan Shetty et al. (2017). Tools for microbiome analysis in R. Version 1.5.23. URL: http://microbiome.github.com/microbiome

  • microbiomeSeq - An R package for microbial community analysis in an environmental context. GitHub, https://github.com/umerijaz/microbiomeSeq, and tutorial, http://userweb.eng.gla.ac.uk/umer.ijaz/projects/microbiomeSeq_Tutorial.html

  • WHAM! - data exploration, clustering, visualization, differential expression analysis (ANOVA-like). Input format uses bioBakery pipeline output. Shiny app, https://ruggleslab.shinyapps.io/wham_v1/, GitHub, https://github.com/ruggleslab/jukebox/tree/master/wham_v1

    • Devlin, Joseph C., Thomas Battaglia, Martin J. Blaser, and Kelly V. Ruggles. “WHAM!: A Web-Based Visualization Suite for User-Defined Analysis of Metagenomic Shotgun Sequencing Data.” BMC Genomics 19, no. 1 (June 25, 2018): 493. https://doi.org/10.1186/s12864-018-4870-z.
  • Metaviz - visual exploratory data analysis of annotated microbiome data. Java/D3 implementation. Imports metagenomeSeq object, works with phyloseq objects. Web interface with 33 demo datasets, http://metaviz.cbcb.umd.edu/, metavizr R package, https://www.bioconductor.org/packages/release/bioc/html/metavizr.html. Docker https://epiviz.github.io/tutorials/metaviz/usingDocker/. GitHub, https://github.com/epiviz/metavizr. Documentation, https://epiviz.github.io/tutorials/metaviz/

    • Wagner, Justin, Florin Chelaru, Jayaram Kancherla, Joseph N Paulson, Alexander Zhang, Victor Felix, Anup Mahurkar, Niklas Elmqvist, and Héctor Corrada Bravo. “Metaviz: Interactive Statistical and Visual Analysis of Metagenomic Data.” Nucleic Acids Research 46, no. 6 (April 6, 2018): 2777–87. https://doi.org/10.1093/nar/gky136.
  • MicrobiomeAnalyst - web tool for microbiome data analysis and visualization. Four modules: the Marker Data Profiling for 16S rRNA analysis; the Shotgun Data Profiling for metagenomics/transcriptomics data; the Taxon Set Enrichment Analysis for taxonomic signature enrichment; the Projection with Public Data for visual exploration in relation to public reference data. Description of common issues (sparsity, sequencing depth, large variance), ways to account for them. Input - txt, csv, or biom files and sample metadata. Filtering, normalization, profiling functional diversity, biomarker identification and classification (LEfSe, random forest). Comparison with other web-based tools. https://www.microbiomeanalyst.ca/

    • Dhariwal, Achal, Jasmine Chong, Salam Habib, Irah L. King, Luis B. Agellon, and Jianguo Xia. “MicrobiomeAnalyst: A Web-Based Tool for Comprehensive Statistical, Visual and Meta-Analysis of Microbiome Data.” Nucleic Acids Research 45, no. W1 (03 2017): W180–88. https://doi.org/10.1093/nar/gkx295.
  • Shiny-phyloseq app. http://joey711.github.io/shiny-phyloseq/

    • McMurdie, P. J., and S. Holmes. “Shiny-Phyloseq: Web Application for Interactive Microbiome Analysis with Provenance Tracking.” Bioinformatics 31, no. 2 (January 15, 2015): 282–83. https://doi.org/10.1093/bioinformatics/btu616.
  • phyloseq R package for import of the most OTU clustering data formats, preprocessing (normalization, standartization, subsampling, filtering), visualization (various definitions of distance, dimensionality reduction methods), and analysis (comparative) of microbiome data. phyloseq-class with four components (otu_table, sample_data, tax_table, phy_tree). Plotting functions using ggplot2 graphics. http://www.bioconductor.org/packages/release/bioc/html/phyloseq.html, http://joey711.github.io/phyloseq/, https://github.com/joey711/phyloseq

    • McMurdie, Paul J., and Susan Holmes. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” Edited by Michael Watson. PLoS ONE 8, no. 4 (April 22, 2013): e61217. https://doi.org/10.1371/journal.pone.0061217.

Integrative analysis

  • MMCA - microbiome and metabolome correlation analysis pipeline. Five correlation methods (Spearman default), unsupervised (CCA, O2PLS) and supervised (PCA, PLS-DA, OPLS-DA, RF) analyses, network (WGCNA) analysis, KEGG enrichment in modules (Tax4Fun2). Report generation. http://mmca.met-bioinformatics.cn/
    • Ni, Yan, Gang Yu, Yongqiong Deng, Xiaojiao Zheng, Tianlu Chen, Junfeng Fu, and Wei Jia. “MMCA: A Web-Based Server for the Microbiome and Metabolome Correlation Analysis.” BioRxiv, January 1, 2019, 678813. https://doi.org/10.1101/678813.

Taxonomy

  • Centrifuge - microbial classification using BWT and FM index. Compression of genome sequences before indexing, then progressive exact matching of k-mers. Memory-efficient and faster than Kraken. https://github.com/infphilo/centrifuge

    • Kim, Daehwan, Li Song, Florian P. Breitwieser, and Steven L. Salzberg. “Centrifuge: Rapid and Sensitive Classification of Metagenomic Sequences.” Genome Research 26, no. 12 (2016): 1721–29. https://doi.org/10.1101/gr.210641.116.
  • Kraken - assigning taxonomic labels to metagenomic DNA sequences. Exact matching of k-mers (31bp) against a database (different versions for memory considerations). Their own optimized algorithm for k-mer match search. https://ccb.jhu.edu/software/kraken2/

    • Wood, Derrick E., and Steven L. Salzberg. “Kraken: Ultrafast Metagenomic Sequence Classification Using Exact Alignments.” Genome Biology 15, no. 3 (March 3, 2014): R46. https://doi.org/10.1186/gb-2014-15-3-r46.
  • KrakenUniq - Extension of the original k-mer-based classification with a HyperLogLog algorithm for assessing the coverage of unique k-mers (cardinality). Better handling of false positives. https://github.com/fbreitwieser/krakenuniq

    • Breitwieser, F. P., D. N. Baker, and S. L. Salzberg. “KrakenUniq: Confident and Fast Metagenomics Classification Using Unique k-Mer Counts.” Genome Biology 19, no. 1 (December 2018). https://doi.org/10.1186/s13059-018-1568-0.
  • metagenomeFeatures - R package for annotating OTUs with Greengene IDs (v.13.8), RDP and SILVA (in future?). https://bioconductor.org/packages/release/bioc/html/metagenomeFeatures.html

    • Olson, Nathan D, Nidhi Shah, Jayaram Kancherla, Justin Wagner, Joseph N Paulson, and Hector Corrada Bravo. “MetagenomeFeatures: An R Package for Working with 16S RRNA Reference Databases and Marker-Gene Survey Feature Data.” Edited by Janet Kelso. Bioinformatics, March 1, 2019. https://doi.org/10.1093/bioinformatics/btz136.

Phylogenetics

  • iTOL - display and annotation of phylogenetic trees, https://itol.embl.de/

  • ggtree R package for phylogenetic tree visualization, coloring, and annotation. Support for multiple file formats. https://github.com/GuangchuangYu/ggtree

    • Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Edited by Greg McInerny. Methods in Ecology and Evolution 8, no. 1 (January 2017): 28–36. https://doi.org/10.1111/2041-210X.12628.
  • phyloT - generates phylogenetic trees from based on the NCBI taxonomy. Input: NCBI scientific names and more, output: tree in Newick and other formats. Results can be visualized in iTOL, interactive Tree Of Life. https://phylot.biobyte.de/

Differential analysis

  • ALDEx2 R package - a compositional data analysis tool that uses Bayesian methods to infer technical and statistical errors. Works with RNA-seq, microbiome, and other compositional data. Distinction between absolute counts and compositional data. Counts are converted to probabilities by Monte Carlo sampling (128 by default) from the Dirichlet distribution with a uniform prior. Centered log-ratio transformation, clr - divide by the geometric mean. https://bioconductor.org/packages/release/bioc/html/ALDEx2.html

    • Fernandes, Andrew D., Jennifer Ns Reid, Jean M. Macklaim, Thomas A. McMurrough, David R. Edgell, and Gregory B. Gloor. “Unifying the Analysis of High-Throughput Sequencing Datasets: Characterizing RNA-Seq, 16S RRNA Gene Sequencing and Selective Growth Experiments by Compositional Data Analysis.” Microbiome 2 (2014): 15. https://doi.org/10.1186/2049-2618-2-15.
  • metagenomeSeq R package. Differential microbial abundance analysis. New normalization - Cumulative-sum scaling (CSS) - raw counts are divided by the cumulative sum of counts up to a percentile determined using a data-driven approach, e.g., the 75th percentile of each sample’s nonzero count distribution. Zero-inflated Gaussian (ZIG) distribution mixture model that accounts for biases in differential abundance testing resulting from undersampling of the microbial community, https://bioconductor.org/packages/release/bioc/html/metagenomeSeq.html

    • Paulson, Joseph N, O Colin Stine, Héctor Corrada Bravo, and Mihai Pop. “Differential Abundance Analysis for Microbial Marker-Gene Surveys.” Nature Methods 10, no. 12 (September 29, 2013): 1200–1202. https://doi.org/10.1038/nmeth.2658.
  • HMP R package - Hypothesis Testing and Power Calculations for Comparing Metagenomic Samples from HMP. https://cran.r-project.org/web/packages/HMP/index.html. Dirichlet-Multinomial distribution for the analysis of microbial data. Power analysis to detect compositional differences as a function of the number of subjects and the number of reads (Table 2).

    • La Rosa, Patricio S., J. Paul Brooks, Elena Deych, Edward L. Boone, David J. Edwards, Qin Wang, Erica Sodergren, George Weinstock, and William D. Shannon. “Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data.” PloS One 7, no. 12 (2012): e52078. https://doi.org/10.1371/journal.pone.0052078.

Data

Misc

Local files and folders

About

A continually expanding collection of microbiome analysis tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published