Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
dna-pipeline updates bash script Dec 11, 2018
rna-hybrid updates RNA pipeline Nov 8, 2018
.gitignore adds gitignore Jul 20, 2018
README.rst updates README Mar 12, 2019 pulls data from figshare Mar 12, 2019


PDX Mouse Subtraction Pipeline

Authors: Oliver Hampton
Chase Miller
Liu Xi
Maria Cardenas and Komal S. Rathi
Contact: Komal S. Rathi (
Organization: DBHi, CHOP
Status: Completed
Date: 2019-07-18


The goal of this repo is to make the Mouse subtraction pipeline from BCM (Wheeler Lab) reproducible.


  1. Create python3 environment:
conda create --name pdx-subtract-env
conda activate pdx-subtract-env
conda install -c bioconda samtools
conda install -c bioconda htslib
conda install -c bioconda sambamba
conda install -c bioconda picard
conda install -c bioconda cufflinks
conda install -c anaconda java-1.7.0-openjdk-cos6-x86_64 # required by rna-seqc
conda install -c bioconda rna-seqc
conda install -c bioconda htseq
conda install -c bioconda star=2.5.3a
conda install -c bioconda trinity=2.5.1 # required by star-fusion
conda install -c bioconda star-fusion=1.1.0
conda install -c bioconda bwa
conda install -c bioconda alignstats
conda config --add channels
conda install defuse
conda install -c bioconda bamutil
  1. Create python2 environment (STAR-Fusion v1.0.1 is python 2.7 compatible):
conda create --name star-fusion-env python=2.7
source activate star-fusion-env
conda install -c bioconda star-fusion
conda install -c bioconda trinity
conda install -c conda-forge -c bioconda samtools bzip2
conda install -c conda-forge configparser

# install some non-standard perl modules:
perl -MCPAN -e shell
install DB_File
install URI::Escape
install Set::IntervalTree
install Carp::Assert
install JSON::XS


# SOAPfuse has to be installed separately as it is not available on conda
tar -xzf SOAPfuse-v1.26.tar.gz
cd SOAPfuse-v1.26

# get SOAPfuse database
cd /mnt/isilon/cbmi/variome/reference/soapfuse_db

# update SOAPfuse config file according to
# add cytoBand file from ucsc and update SOAPfuse config
wget hg19-GRCh37.59/
gunzip cytoBand.txt.gz

# change PA_all_fq_postfix in config file to .fq


# for deFUSE, python 2 is required so use the python2 environment created for STAR-Fusion

# Install via source:

# in the tools directory, download boost
cd tools && wget
tar -zxvf boost_1_68_0.tar.gz
export CPLUS_INCLUDE_PATH=/mnt/isilon/maris_lab/target_nbl_ngs/PPTC-PDX-genomics/mouse_subtraction_pipeline/scripts/dranew-defuse-0f198c242b82/tools/boost_1_68_0
cd tools && make

# download deFUSE reference database
# change perl in to /usr/bin/env perl -d /mnt/isilon/cbmi/variome/reference/defuse_db/hg19/


# get reference files and prepare corresponding index files
gunzip 1000G_phase1.indels.b37.vcf.gz
bgzip 1000G_phase1.indels.b37.vcf
tabix -p vcf 1000G_phase1.indels.b37.vcf.gz

gunzip Mills_and_1000G_gold_standard.indels.b37.vcf.gz
bgzip Mills_and_1000G_gold_standard.indels.b37.vcf
tabix -p vcf 1000G_phase1.indels.b37.vcf.gz

gunzip dbsnp_138.b37.vcf.gz
bgzip dbsnp_138.b37.vcf
tabix -p vcf dbsnp_138.b37.vcf.gz

Download reference files:

# run this code to create output directories and download reference data

Prepare reference fasta and gtf:

# Code to prepare reference fasta and gtf (this might be inaccurate because I got the reference files from BCM): bash scripts/
# make sure all reference fasta files are indexed:
samtools faidx <file.fasta|file.fa>

# make sure the fasta reference used by bwa is indexed:
bwa index protein_coding_canonical.T_chr.fa

BCM-specific scripts and software:

1. pindel_0.2.5b5_tdonly
2. ERCCPlot.jar

Steps to run the RNA-pipeline:

The RNA pipeline is divided into four steps:

  1. Snakefile_Phase1: Align PDX RNA-seq data to hybrid genome, split into human and mouse bams and create human specific fastq files.
  2. Snakefile_Phase2: Realign to human reference, do QC, run htseq and pindel.
  3. Snakefile_fusions_py2: Run python2 dependent fusion callers like STAR-Fusion and deFUSE
  4. Snakefile_soapfuse: Run python3 dependent fusion caller like SOAPfuse

Each snakefile has a corresponding bash script to run the pipeline:

# Run phase 1
cd rna-hybrid && bash

# Run phase 2
cd rna-hybrid && bash

# Run python2 based fusion callers
cd rna-hybrid && bash

# Run python3 based fusion callers
cd rna-hybrid && bash

Steps to run the DNA-pipeline:

cd dna-pipeline && bash
You can’t perform that action at this time.