Skip to content
jgurtowski edited this page Feb 25, 2013 · 3 revisions

Jnomics is a cloud-scale sequence analysis suite designed to help meet the computational challenges presented by the continuing revolution in massively parallel DNA sequencing technologies. In total, current worldwide second-generation sequencing capacity exceeds 13 Pbp/year, and continues to increase annually by a factor of five. The storage and analysis of such massive volumes of genomic data represents the primary challenge in computational biology today. Jnomics attempts to address these problems by applying recent innovations in distributed computing to the challenge of large-scale genomic storage and analysis. It is based on Apache Hadoop, an open-source implementation of Google's MapReduce framework.

Jnomics offers a number of features that allow the rapid development of parallelized genomic analysis pipelines:

  • Minimal configuration: Out-of-the-box, Jnomics provides a number of tools that allow many common genomic tasks – including sorting, merging, filtering, and selection – to be performed as distributed tasks spread across a cluster.
  • File-format agnostic: Jnomics allows users to seamlessly read and write many common formats (SAM, BED, fastq), largely eliminating time-consuming format conversions that add significant overhead to genomics pipelines.
  • Extensiblity: The Java-based API makes it easy to add new distributed components.
  • Parallelization of existing tools: Although many excellent genomic tools already exist, very few of these are designed to operate in a distributed environment. Jnomics allows the user to distribute the execution of existing tools, allowing an easy transition from serial to distributed analyses. Jnomics currently supports BWA and Novoalign (with more to come!), and other tasks can be added easily using the Java-based Jnomics API.

Announcements

  • February, 2013 - Moved Wiki to Github

Links

Related Links

  • Bowtie An ultrafast memory-efficient short read aligner
  • CloudBurst Highly Sensitive Short Read Mapping with MapReduce
  • Crossbow Whole Genome Resequencing Analysis in the Clouds

Funding

Clone this wiki locally