GitHub - salzman-lab/SpliZ: Nextflow implementation of SpliZ

Introduction

salzmanlab/spliz is a bioinformatics best-practise analysis pipeline for calculating the splicing z-score for single cell RNA-seq analysis.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Quick Start

Install nextflow (>=20.04.0) and conda.

Download environment file.

wget https://raw.githubusercontent.com/salzmanlab/SpliZ/main/environment.yml

Create conda environment and activate.

conda env create --name spliz_env --file=environment.yml
conda activate spliz_env

Run the pipeline on the test data set. You may need to modify the executor scope in the config file, in accordance to your compute needs.

nextflow run salzmanlab/spliz \
    -r main \
    -latest \
    -profile small_test_data

Sherlock users should use the sherlock profile:

 nextflow run salzmanlab/spliz \
     -r main \
     -latest \
     -profile small_test_data,sherlock

Run the pipeline on your own dataset.
1. Edit your config file with the parameters below. (You can use /small_data/small.config as a template, be sure to include any memory or time paramters.)
2. Run with your config file:
```
nextflow run salzmanlab/spliz \
    -r main \
    -latest \
    -c YOUR_CONFIG_HERE.conf
```

See usage docs for all of the available options when running the pipeline.

Pipeline Summary

By default, the pipeline currently performs the following:

Calculate the SpliZ scores for:
- Identifying variable splice sites
- Identifying differential splicing between cell types.

Input Parameters

Argument	Description	Example Usage
`dataname`	Descriptive name for SpliZ run	"Tumor_5"
`run_analysis`	If the pipeline will perform splice site identifcation and differential splicing analysis	`true`, `false`
`input_file`	File to be used as SpliZ input	tumor_5_with_postprocessing.txt
`SICILIAN`	If `input_file` is output from SICILIAN	`true`, `false`
`pin_S`	Bound splice site residuals at this quantile (e.g. values in the lower `pin_S` quantile and the upper 1 - `pin_S` quantile will be rounded to the quantile limits)	0.1
`pin_z`	Bound SpliZ scores at this quantile (e.g. values in the lower `pin_z` quantile and the upper 1 - `pin_z` quantile will be rounded to the quantile limits)	0
`bounds`	Only include cell/gene pairs that have more than this many junctional reads for the gene	5
`light`	Only output the minimum number of columns	`true`, `false`
`svd_type`	Type of SVD calculation	`normdonor`, `normgene`
`n_perms`	Number of permutations	100
`grouping_level_1`	Metadata column by which the data is intially partitioned	"tissue"
`grouping_level_2`	Metadata column by which the partitioned data is grouped	"compartment"
`libraryType`	Library prepration method of the input data	`10X`, `SS2`

Optional Parameters for non-SICILIAN Inputs (`SICILIAN` = `false`)

Argument	Description	Example Usage
`samplesheet`	If input files are in BAM format, this file specifies the locations of the input bam files. Samplesheet formatting is specified below.	Tumor_5_samplesheet.csv
`annotator_pickle`	Genome-specific annotation file for gene names	hg38_refseq.pkl
`exon_pickle`	Genome-specific annotation file for exon boundaries	hg38_refseq_exon_bounds.pkl
`splice_pickle`	Genome-specific annotation file for splice sites	hg38_refseq_splices.pkl
`gtf`	GTF file used as the reference annotation file for the genome assembly	GRCh38_genomic.gtf
`meta`	If input files are in BAM format, this file contains per-cell annotations. This file must contain columns for `grouping_level_1` and `grouping_level_2`.	metadata_tumor_5.tsv

Samplesheets

The samplesheet must be in comma-separated value(CSV) format. The file must be without a header. The sampleID must be a unique identifier for each bam file entry.

For non-SICILIAN samples, samplesheets must have 2 columns: sampleID and path to the bam file.

Tumor_5_S1,tumor_5_S1_L001.bam
Tumor_5_S2,tumor_5_S2_L002.bam
Tumor_5_S3,tumor_5_S3_L003.bam

For SICILIAN SS2 samples, amplesheets must have 3 columns: sampleID, read 1 bam file, and read 2 bam file.

Tumor_5_S1,tumor_5_S1_L001_R1.bam,tumor_5_S1_L001_R2.bam
Tumor_5_S2,tumor_5_S2_L002_R1.bam,tumor_5_S2_L002_R2.bam
Tumor_5_S3,tumor_5_S3_L003_R1.bam,tumor_5_S3_L003_R2.bam

Credits

salzmanlab/spliz was originally written by Salzman Lab.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

This repositiory contains code to perform the analyses in this paper:

The SpliZ generalizes “Percent Spliced In” to reveal regulated splicing at single-cell resolution

Julia Eve Olivieri*, Roozbeh Dehghannasiri*, Julia Salzman.

Nature Methods 2022 Mar 3. doi: https://www.nature.com/articles/s41592-022-01400-x.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
.github		.github
assets		assets
bin		bin
conf		conf
docs		docs
lib		lib
modules/local		modules/local
small_data		small_data
subworkflows/local		subworkflows/local
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Quick Start

Pipeline Summary

Input Parameters

Optional Parameters for non-SICILIAN Inputs (`SICILIAN` = `false`)

Samplesheets

Credits

Contributions and Support

Citations

About

Releases

Packages

Contributors 2

Languages

License

salzman-lab/SpliZ

Folders and files

Latest commit

History

Repository files navigation

Introduction

Quick Start

Pipeline Summary

Input Parameters

Optional Parameters for non-SICILIAN Inputs (SICILIAN = false)

Samplesheets

Credits

Contributions and Support

Citations

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Optional Parameters for non-SICILIAN Inputs (`SICILIAN` = `false`)

Packages