Skip to content

Commit

Permalink
Merge pull request #66 from maxplanck-ie/HiC
Browse files Browse the repository at this point in the history
New module : Hi-C
  • Loading branch information
vivekbhr committed Nov 10, 2017
2 parents 2d5a9d7 + 7a975e5 commit 70ca0b3
Show file tree
Hide file tree
Showing 27 changed files with 827 additions and 40 deletions.
1 change: 1 addition & 0 deletions HiC
7 changes: 4 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,15 @@ MPI-IE Snakemake workflows : snakePipes

snakePipes are our in-house, flexible and powerful workflows built using `snakemake <snakemake.readthedocs.io>`__ that simplify the analysis of NGS data.

Workflows
----------
Workflows available
--------------------

- DNA-mapping
- DNA-mapping (normal and allele-specific)
- ChIP-seq (normal and allele-specific)
- RNA-seq (normal and allele-specific)
- scRNA-seq
- ATAC-seq
- Hi-C

Installation
-------------
Expand Down
40 changes: 29 additions & 11 deletions ChangeLog → docs/ChangeLog.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
version 0.0.1 - March 23, 2016 - Fabian Kilpert
Change Log
================

The history of development of the workflows is listed below, along with github IDs of
people with most (but not all) contribution to the changes.


version 0.0.1 - March 23, 2016 - @kilpert
- initial version

version 0.1.0 - June 15, 2016 - Andreas Richter
version 0.1.0 - June 15, 2016 - @asrichter
- added --fastqc and --bw-binsize parameters to DNA-mapping wrapper script
- additional organisms are now supported by adding new genome.py files
- defined (effective) genome size as (genome length)-(number of 'N's) in genome.py files
Expand All @@ -17,17 +24,17 @@ version 0.1.0 - June 15, 2016 - Andreas Richter
- removed --local-cores parameter from all wrapper scripts as there are no local snakemake rules defined
- many small other changes

version 0.1.1 - June 17, 2016 - Andreas Richter
version 0.1.1 - June 17, 2016 - @asrichter
- added option to run workflow locally instead of cluster submission
- FASTQ.snakefile replaces FASTQ_symlink.snakefile and FASTQ_downsample.snakefile
- several small changes

version 0.2.0 - June 22, 2016 - Andreas Richter
version 0.2.0 - June 22, 2016 - @asrichter
- added filtering option to filter BAM files for duplication, proper pairs and MAPQ
- added variable 'outdir' to configuration
- many small changes

version 0.3.0 - June 24, 2016 - Andreas Richter
version 0.3.0 - June 24, 2016 - @asrichter
- rewrote ChIP-seq workflow completely including wrapper script ChIP-seq
- added histoneHMM for calling broadly enriched regions
- added MACS2 peak quality controls
Expand All @@ -36,15 +43,15 @@ version 0.3.0 - June 24, 2016 - Andreas Richter
- positional instead of required optional command line arguments
- many small changes

version 0.3.1 - June 25, 2016 - Andreas Richter
version 0.3.1 - June 25, 2016 - @asrichter
- run Picard quality control on unfiltered BAM files
- added --gcbias parameter to DNA-mapping wrapper script to run computeGCBias optionally
- replaced --input-dir and --output-dir by --working-dir parameter in ChIP-seq
wrapper script to specify the working directory, which is output directory of
the pipeline and must also contain the DNA-mapping pipeline output files
- bugfixes

version 0.3.2 - June 27, 2016 - Andreas Richter
version 0.3.2 - June 27, 2016 - @asrichter
- added generation of QC reports for all samples to ChIP-seq pipeline
- added consistency check for ChIP-seq pipeline whether all required input files exist for all samples
- added peak count to MACS2 peak quality controls
Expand All @@ -53,9 +60,9 @@ version 0.3.2.1 - June 27, 2016 - Andreas Richter
- added documentation to README.md
- moved R library

version 0.4 - 2016 - Fabian Kilpert, Steffen Heyne
version 0.4 - 2016 - @kilpert, @steffenheyne

version 0.5 - 2017 - Steffen Heyne, Fabian Kilpert, Michael Rauer
version 0.5 - 2017 - @steffenheyne, @kilpert, @mirax87
- major cleanup and refactoring of wrappers and code structure (but not rules)
- scRNAseq workflow added
- using yaml config files all over, ie.
Expand All @@ -72,7 +79,7 @@ version 0.5 - 2017 - Steffen Heyne, Fabian Kilpert, Michael Rauer
now there is true hierachy: defaults->configfile->wrapper !


version 0.6 (a.k.a RattleSnake) - Sept 2017 - Vivek Bhardwaj
version 0.6 (a.k.a Tiger RattleSnake) - Sept 2017 - @vivekbhr
- MAJOR CHANGES:
- Allele-Specific mapping : Allele-specific DNA and RNA-mapping is now possible and both ChIP-Seq and RNA-seq pipeline can handle "allele_mapping" mode.
- Differential binding : Differential binding can be performed using CSAW, both normal and allele-specific.
Expand All @@ -90,4 +97,15 @@ version 0.6 (a.k.a RattleSnake) - Sept 2017 - Vivek Bhardwaj
Version 0.6.1

- MINOR CHANGES:
- The allele-specific option is no longer on by default (it was Vivek's fault)
- The allele-specific option is no longer on by default (it was @vivekbhr's fault)

Version 0.6.2

- MINOR CHANGES:
- Explicitly define which snakemake version to use

Version 0.7 (a.k.a Green Mamba) - Nov 2017 - @vivekbhr

- MAJOR CHANGES:
- Read the Docs integration
- New workflow Hi-C, from mapping to TAD calling, using BWA and HiCExplorer
48 changes: 48 additions & 0 deletions docs/content/setting_up.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Setting up the workflows
==========================

To setup snakePipes after a fresh download, you need to do the following.

Set up slurm and snakemake
--------------------------

The pipelines require snakemake in order to work, and slurm in order to submit jobs to the cluster.
If you don't have slurm configured with the cluster, you can skip this and run the pipelines locally using the
`--local` option in the wrappers.

Edit the paths to the required programs
---------------------------------------

The paths to the required programs can be configured under `shared/paths.yaml`. This contains a list of all
programs required, but not all workflows required all of these programs to be installed and therefore some of
them can be skipped depending upon the workflow used.

.. warning:: Do not edit the yaml keywords corresponding to each required entry.

Configure the organisms
------------------------

For each organism of your choice, create a file called `shared/organisms/<organism>.yaml` and
fill the paths to the required files next to the corresponding yaml entry.

.. warning:: Do not edit the yaml keywords corresponding to each required entry.

An example from drosophila genome dm3 is below.

.. parsed-literal::
genome_size: 142573017
genome_fasta: "/data/repository/organisms/dm3_ensembl/genome_fasta/genome.fa"
genome_index: "/data/repository/organisms/dm3_ensembl/genome_fasta/genome.fa.fai"
genome_2bit: "/data/repository/organisms/dm3_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/dm3_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/dm3_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/dm3_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/dm3_ensembl/ensembl/release-78/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/dm3_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/dm3_ensembl/Ensembl/release-78/genes.bed"
genes_gtf: "/data/repository/organisms/dm3_ensembl/Ensembl/release-78/genes.gtf"
blacklist_bed:
ignore_forNorm: "U Uextra X XHet YHet dmel_mitochondrion_genome"
Not all files are required for all workflows, but we recommend to keep all required files ready nevertheless..
9 changes: 9 additions & 0 deletions docs/content/workflows/HiC.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _HiC:

HiC
============

.. argparse::
:filename: ../workflows/HiC/HiC
:func: parse_args
:prog: HiC
10 changes: 7 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ snakePipes

snakePipes are pipelines built using snakemake for the analysis of various sequencing datasets.

The following is the list of pipelines available in snakePipes
---------------------------------------------------------------
Below is the list of pipelines available in snakePipes
-------------------------------------------------------


=============================== ===========================================================================================
Expand All @@ -14,6 +14,7 @@ Pipeline Description
:ref:`ChIP-Seq` Use the DNA mapping output and run ChIP/Input normalization and peak calling
:ref:`RNA-Seq` RNA-Seq workflow : From mapping to differential expression using DEseq2
:ref:`scRNA-Seq` Single-cell RNA-Seq workflow : From mapping to differential expression
:ref:`HiC` Hi-C analysis workflow, from mapping to TAD calling
=============================== ===========================================================================================

Quick start
Expand Down Expand Up @@ -86,7 +87,7 @@ Further organisms can be supported by adding a genome configuration file `my_org
blacklist_bed: "/SOMEPATH/hs37d5_ensembl/ENCODE/hs37d5_extended_Encode-blacklist.bed"
If no blacklist regions are available for your organism of interest, leave it empty `blacklist_bed: `
.. note:: If no blacklist regions are available for your organism of interest, leave `blacklist_bed:` empty


Contents:
Expand All @@ -95,10 +96,13 @@ Contents:
.. toctree::
:maxdepth: 2

content/setting_up.rst
content/workflows/DNA-mapping.rst
content/workflows/ChIP-seq.rst
content/workflows/RNA-seq.rst
content/workflows/scRNA-seq.rst
content/workflows/HiC.rst
ChangeLog.rst

Citation
---------
Expand Down
1 change: 1 addition & 0 deletions shared/organisms/GRCz10.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ genome_index: "/data/repository/organisms/GRCz10_ensembl/genome_fasta/genome.fa.
genome_2bit: "/data/repository/organisms/GRCz10_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/GRCz10_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/GRCz10_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/GRCz10_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/GRCz10_ensembl/ensembl/release-88/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/GRCz10_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/GRCz10_ensembl/ensembl/release-88/genes.bed"
Expand Down
3 changes: 2 additions & 1 deletion shared/organisms/SchizoSPombe_ASM294v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@ genome_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/genome_f
genome_2bit: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/ensembl/release-35/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/Ensembl/release-35/genes.bed"
genes_gtf: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/Ensembl/release-35/genes.gtf"
blacklist_bed:
ignore_forNorm:
ignore_forNorm:
1 change: 1 addition & 0 deletions shared/organisms/dm3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ genome_index: "/data/repository/organisms/dm3_ensembl/genome_fasta/genome.fa.fai
genome_2bit: "/data/repository/organisms/dm3_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/dm3_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/dm3_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/dm3_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/dm3_ensembl/ensembl/release-78/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/dm3_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/dm3_ensembl/Ensembl/release-78/genes.bed"
Expand Down
1 change: 1 addition & 0 deletions shared/organisms/dm6.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ genome_index: "/data/repository/organisms/dm6_ensembl/genome_fasta/genome.fa.fai
genome_2bit: "/data/repository/organisms/dm6_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/dm6_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/dm6_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/dm6_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/dm6_ensembl/ensembl/release-79/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/dm6_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/dm6_ensembl/Ensembl/release-79/genes.bed"
Expand Down
1 change: 1 addition & 0 deletions shared/organisms/hs37d5.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ genome_index: "/data/repository/organisms/hs37d5_ensembl/genome_fasta/genome.fa.
genome_2bit: "/data/repository/organisms/hs37d5_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/hs37d5_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/hs37d5_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/hs37d5_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/hs37d5_ensembl/gencode/release_19/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/hs37d5_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/hs37d5_ensembl/gencode/release_19/genes.bed"
Expand Down
1 change: 1 addition & 0 deletions shared/organisms/mm10.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ genome_index: "/data/repository/organisms/GRCm38_ensembl/genome_fasta/genome.fa.
genome_2bit: "/data/repository/organisms/GRCm38_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/GRCm38_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/GRCm38_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/GRCm38_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/GRCm38_ensembl/gencode/m9/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/GRCm38_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/GRCm38_ensembl/gencode/m9/genes.bed"
Expand Down
1 change: 1 addition & 0 deletions shared/organisms/mm10_gencodeM13.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ genome_index: "/data/repository/organisms/GRCm38_ensembl/genome_fasta/genome.fa.
genome_2bit: "/data/repository/organisms/GRCm38_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/GRCm38_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/GRCm38_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/GRCm38_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/GRCm38_ensembl/gencode/m13/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/GRCm38_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/GRCm38_ensembl/gencode/m13/genes.bed"
Expand Down
1 change: 1 addition & 0 deletions shared/organisms/mm9.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ genome_index: "/data/repository/organisms/GRCm37_ensembl/genome_fasta/genome.fa.
genome_2bit: "/data/repository/organisms/GRCm37_ensembl/genome_fasta/genome.2bit"
bowtie2_index: "/data/repository/organisms/GRCm37_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/GRCm37_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/GRCm37_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/GRCm37_ensembl/gencode/m1/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/GRCm37_ensembl/STARIndex/"
genes_bed: "/data/repository/organisms/GRCm37_ensembl/gencode/m1/genes.bed"
Expand Down
10 changes: 7 additions & 3 deletions shared/paths.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# external tools
# path names must end with "/" and not include the name of the executable
# 'workflow_tools' is set on runtime (see common_functions.py:load_paths())
# paths referring to a local folder
R_libs_path: "/data/manke/repository/scripts/snakemake_workflows/R/x86_64-unknown-linux-gnu-library/3.3/"
histoneHMM_path: "/data/manke/repository/scripts/snakemake_workflows/R/x86_64-redhat-linux-gnu-library/3.3/histoneHMM/bin/"
# paths referring to /package/
bedtools_path: "/package/bedtools2-2.25.0/bin/"
bowtie2_path: "/package/bowtie2-2.2.8/bin/"
cutadapt_path: "/package/cutadapt-1.9.1/bin/"
Expand All @@ -9,7 +13,6 @@ fastqc_path: "/package/FastQC-0.11.3/bin/"
feature_counts_path: "/package/subread-1.5.2/bin/"
hisat2_path: "/package/hisat2-2.0.4/"
star_path: "/package/STAR-2.5.3a/bin/"
histoneHMM_path: "/data/manke/repository/scripts/snakemake_workflows/R/x86_64-redhat-linux-gnu-library/3.3/histoneHMM/bin/"
macs2_path: "/package/MACS2-2.1.1.20160309/bin/"
qualimap_path: "/package/qualimap-2.2/"
picard_path: "/package/picard-tools-1.136/"
Expand All @@ -20,7 +23,8 @@ sambamba_path: "/package/sambamba-0.6.6/bin/"
tabix_path: "/package/tabix-1.2.1/"
trim_galore_path: "/package/trim_galore_v0.4.4/"
UCSC_tools_path: "/package/UCSCtools/"
R_libs_path: "/data/manke/repository/scripts/snakemake_workflows/R/x86_64-unknown-linux-gnu-library/3.3/"
histoneHMM_path: "/data/manke/repository/scripts/snakemake_workflows/R/x86_64-redhat-linux-gnu-library/3.3/histoneHMM/bin/"
SNPsplit_path: "/package/SNPsplit-0.3.3/bin/"
multiqc_path: "/package/MultiQC-1.2/bin/"
bwa_path: "/package/bwa-0.7.12/bin/"
# paths referring to a conda installation (Temporary : in the future, all paths would be loaded from conda via module load)
hicExplorer_path: "/package/anaconda3/envs/HiCExplorer-1.8/bin/"
2 changes: 1 addition & 1 deletion shared/rules/CSAW.snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ rule CSAW:
shell(
"( export R_LIBS_USER="+R_libs_path+" && "
"cat "+os.path.join(workflow_tools,"CSAW.R")+" | "
""+os.path.join(R_path,"R")+" --vanilla --args "
""+os.path.join(R_path,"R")+" --vanilla --slave --args "
"{input.sample_info} "
"{params.fdr} "
"{params.paired} "
Expand Down

0 comments on commit 70ca0b3

Please sign in to comment.