Skip to content
Switch branches/tags

Sunbeam: a robust, extensible metagenomic sequencing pipeline

CircleCI Documentation Status DOI:10.1186/s40168-019-0658-x

Sunbeam is a pipeline written in snakemake that simplifies and automates many of the steps in metagenomic sequencing analysis. It uses conda to manage dependencies, so it doesn't have pre-existing dependencies or admin privileges, and can be deployed on most Linux workstations and clusters. To read more, check out our paper in Microbiome.

Sunbeam currently automates the following tasks:

  • Quality control, including adaptor trimming, host read removal, and quality filtering;
  • Taxonomic assignment of reads to databases using Kraken;
  • Assembly of reads into contigs using Megahit;
  • Contig annotation using BLAST[n/p/x];
  • Mapping of reads to target genomes; and
  • ORF prediction using Prodigal.

Sunbeam was designed to be modular and extensible. Some extensions have been built for:

  • IGV for viewing read alignments
  • KrakenHLL, an alternate read classifier
  • Kaiju, a read classifier that uses BWA rather than kmers
  • Anvi'o, a downstream analysis pipeline that does lots of stuff!

More extensions can be found at the extension page:

To get started, see our documentation!

If you use the Sunbeam pipeline in your research, please cite:

EL Clarke, LJ Taylor, C Zhao, A Connell, J Lee, FD Bushman, K Bittinger. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome 7:46 (2019)

See how people are using Sunbeam:


Development version (future 3.0 release; as of April 1, 2020)

  • Support use of .smk file extensions in Sunbeam extensions (in addition to .rules)
  • Support mamba as an alternate package dependency solver at install time, for faster installs
  • New command sunbeam extend to automatically install Sunbeam extensions! Use like sunbeam extend
  • sunbeam init and sunbeam config update now add options for extensions you've installed to your default config file! (#247)
  • Updated the path to the Illumina adapter sequences from hardcoded to templated (fixes #150 and #152)
  • Use the updated kraken2 classifier instead of kraken
  • Update other dependencies (trimmomatic -> 0.3.9; grabseqs -> 0.6.1; snakemake -> <5.7.0)

v2.1.0 (November 26, 2019)

  • Added a build manifest, which is run every time on integration testing and can be fed into conda by users to install the most recent successful dependencies
  • Updates to documentation (#169, #230, #231)
  • Fix missing samtools (#224)
  • Integration test updates to schedule weekly builds (#222)
  • Fix issues with old paired-end illumina adapters (#221)
  • Script updates to use conda commands instead of source commands (#220)
  • Add h5py package explicitly to avoid dependency metadata problem (#219)
  • Add multiQC to build QC report (#203)
  • Use multithreading for cutadapt in QC (#202)
  • Correct conda channel priority during install (#201)
  • Update documentation to spell out requirements (#199)
  • New megahit failure handling (#194)
  • Enforce sample wildcard constraints in Snakemake rules (#190)
  • Run megahit multithreaded (#189)

v2.0.2 (August 28, 2019)

  • Add implicit dependencies (samtools and bcftools) to environment file to make them explicit

v2.0.1 (July 24, 2019)

  • Increment Snakemake version requirement for compatibility with recent conda
  • Specify earlier megahit version to ensure compatbility with existing assembly behavior
  • Integration test improvements

v2.0.0 (January 22, 2019)

  • Start a project using resources directly from the SRA using sunbeam init --data_acc [SRA ###]. For more information, see the docs
  • New extension website:
  • Improved documentation
  • Numerous bugfixes and optimizations

v1.2.1 (May 24, 2018)

  • Minor bugfixes

v1.2.0 (May 2, 2018)

  • Low-complexity reads are now removed by default rather than masked
  • Bug fixes related to single-end sequencing experiments
  • Documentation updates

v1.1.0 (April 8, 2018)

  • Reports include number of filtered reads per host, rather than in aggregate
  • Static binary dependency for komplexity for easier deployment
  • Remove max length filter for contigs

v1.0.0 (March 22, 2018)

  • First stable release!
  • Support for single-end sequencing experiments
  • Low-complexity read masking via komplexity
  • Support for extensions
  • Documentation on
  • Better assembler (megahit)
  • Better ORF finder (prodigal)
  • Can remove reads from any number of host/contaminant genomes
  • Semantic versioning checks
  • Integration tests and continuous deployment