Skip to content

salzman-lab/SPLASH

Repository files navigation

Updated software package here

Since the initial release of this software package, we have worked with collaborators to develop an improved codebase for this project which is publicly available on github here.

Introduction

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation uses one container per process, making it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

SPLASH

A statistical, reference-free algorithm subsumes myriad problems in genome science and enables novel discovery

nomad_1 nomad_2

Prerequisites

  1. Install Java.
  2. Install nextflow (>=20.04.0).
  3. Depending on your use case, install conda, docker, or singularity. By using the docker or singularity nextflow profile, the pipeline can be run within the SPLASH docker container (also available on dockerhub, which contains all the required dependencies.

Try the pipeline

To test this pipeline, use the command below. The test profile will launch a pipeline run with a small dataset.

How to run with singularity:

nextflow run salzmanlab/nomad \
    -profile test,singularity \
    -r main \
    -latest

How to run with docker:

nextflow run salzmanlab/nomad \
    -profile test,docker \
    -r main \
    -latest

How to run with conda:

nextflow run salzmanlab/nomad \
    -profile test,conda \
    -r main \
    -latest

Usage

Please see this document for descriptions of SPLASH inputs and parameters.

Outputs

Please see this document for descriptions of SPLASH output.

Citations

Marek Kokot*, Roozbeh Dehghannasiri*, Tavor Baharav, Julia Salzman, and Sebastian Deorowicz. SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads bioRxiv (2023)

Kaitlin Chaung*, Tavor Baharav*, George Henderson, Ivan Zheludev, Peter Wang, Julia Salzman. SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery , Cell (2023)

Tavor Baharav, David Tse, and Julia Salzman. An Interpretable, Finite Sample Valid Alternative to Pearson’s X2 for Scientific Discovery, bioRxiv (2023)

This pipeline uses code and infrastructure developed and maintained by the nf-core initiative, and reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.