Python Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data update snakefiles Dec 2, 2015
kallisto update readmes Dec 3, 2015
star_express change samtools mem to 4000 Jun 6, 2016
.gitignore add .gitignore Nov 24, 2015
LICENSE Update LICENSE Dec 5, 2016
README.md Update README.md Jul 18, 2017
bsub.py use <= instead of < for mem Apr 19, 2016
make_samples.py update snakefiles Dec 2, 2015
samples.json update snakefiles Dec 2, 2015

README.md

snakefiles

This repository has Snakefiles for common RNA-seq data analysis workflows. Please feel free to copy them and modify them to suit your needs.

Getting started

If you are new to Snakemake, you might like to start by walking through my tutorial for beginners. Next, have a look at Johannes Koster's introductory slides, tutorial, documentation, and FAQ.

Quick start:

# Copy the files
git clone https://github.com/slowkow/snakefiles.git

# Go to the kallisto directory
cd snakefiles/kallisto

# Run snakemake
snakemake

Data

This repository includes 6 FASTQ files in data/fastq/ to illustrate the usage of each of the RNA-seq workflows.

  • Sample1
    • Sample1.R1.fastq.gz has the first mates of sequenced fragments.
    • Sample1.R2.fastq.gz has the second mates of sequenced fragments.
  • Sample2
    • Sample2.L1.R1.fastq.gz
    • Sample2.L2.R1.fastq.gz
      • The first mate reads (R1), split across two files (L1 and L2). Some software such as STAR requires these reads to be merged into one file.
    • Sample2.L1.R2.fastq.gz
    • Sample2.L2.R2.fastq.gz
      • Likewise, the second mate reads (R2) are also split across two files (L1 and L2). To make matters worse, Sample2.L2.R2.fastq.gz has only 2000 reads, whereas Sample2.L2.R1.fastq.gz has 2500 reads. The Snakefiles in this repository can handle this without any problems.

Scripts

  • make_samples.py creates the samples.json file.
  • bsub.py receives job scripts from Snakemake and automatically submits them to an appropriate LSF queue based on job requirements.

RNA-seq workflows

kallisto/

Quantify gene isoform expression in transcripts per million (TPM) with kallisto and collate outputs from multiple samples into one file.

star_express/

Execute a multi-sample 2-pass STAR alignment, sharing the splice junctions across samples. Count fragments per gene and fragments per splice site. Also produce a BAM file with coordinates relative to transcripts. Quantify transcripts in TPM with eXpress. Collate outputs from multiple samples.

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository.