Major DROP update (#113)

* Major DROP update * removed obsolete file * changed project folder structure for initialisation * API change: separated config and sample annotation parsers * fixed missin root path * import config differently * package finding for subpackage * running pipeline * added circleci config * removed obsolete file * changed project folder structure for initialisation * API change: separated config and sample annotation parsers * fixed missin root path * import config differently * package finding for subpackage * using correct .travis.yml * removed subworkflow and running pipeline for aberrantExpression; new functions introduced for Script-Rule-Html conversions; analysis script directory renamed; pipeline code copied to different destination * removed subworkflow for AS and MAE; changed folder structure; using log for tmp snakemake objects * fixed input flags for analysis scripts * running pipeline * added circleci config * removed obsolete file * changed project folder structure for initialisation * API change: separated config and sample annotation parsers * fixed missin root path * import config differently * package finding for subpackage * API change: separated config and sample annotation parsers * fixed missin root path * import config differently * package finding for subpackage * removed subworkflow and running pipeline for aberrantExpression; new functions introduced for Script-Rule-Html conversions; analysis script directory renamed; pipeline code copied to different destination * removed subworkflow for AS and MAE; changed folder structure; using log for tmp snakemake objects * fixed input flags for analysis scripts * set submodules to master * applied branch-specific modifications * removed submodules.py and refactored path setup * fixed Readme and config copy errors * fixed copying helpers functions * using wbuild with fixed Readme.html functionality and saving wbuild config in DropConfig * moved exportCounts into separate class * parsing GENE_COUNT_FILE column in sample annotation * removed wbuild from conda recipe * using bioconda version of drop for dependencies, pip install drop, no conda building * remove bioconda wbuild * install wbuild with expl pip * remove wbuild with pip * get count info from AE class * some refactoring and saving external counts IDs in separate dictionary * removed ID renaming in merge script (naming already in counts script); fixed missing input bug in DropConfig * add external counts to count files * merge counts and coverage including external counts * file checks for config and sample annotation reimplemented * config file getters for submodules * refactored export counts * travis run export counts rule * reapplied bcftools command modification * Documentation update (#102) * README update to include Baylor counts * updated drop installation command * update install command docs to include conda-forge plus better descriptions Co-authored-by: Vicente <yepez@in.tum.de> Co-authored-by: Michaela Müller <mi.mueller@tum.de> Co-authored-by: Christian Mertes <mertes@in.tum.de> * updated documentation to include count import * splicing export counts all columns added * resolved requested formality errors * fixed requested MAE changes * removed scanBamParam * create missing columns for MAE results * Subindex (#3) * first version of subindex implemented * including readme and dependency graph for subindexes; removed fileRegex key from config (using default of ".*\.(R|md)") * using different (wb1.8) config file * separated dependency graph computation from rest of the pipeline * using latest updated wbuild version * use conda install for wbuild Co-authored-by: mumichae <mi.mueller@tum.de> * Fix version for wbuild (#108) Co-authored-by: mumichae <mi.mueller@tum.de> * updated version number in README and drop/cli * Tests (#4) Added pytest suite to project. The main things tested are: * cli: basic drop command line functions * all config classes * pipeline runs for all submodules (including checking numbers of entries in output) and count export ## Commits * first setup of pytest, pipeline runthroughs, no output checking yet * updated version number in README and drop/cli * fixed dependencies * pip uninstall with -y * changed pytest installation and error catching * compare pipeline output and add more pipeline tests * refactored pipeline tests * fixed demo creation * refactored pipeline tests again & fixed fixture scoping * fixed minor issues in tests * added config tests * refactored getHtmlFromScript Co-authored-by: mumichae <mi.mueller@tum.de> * downloading data to temporary directory * updated version to 1.0.0 Co-authored-by: Michaela Müller <mi.mueller@tum.de> Co-authored-by: Vicente <yepez@in.tum.de> Co-authored-by: Christian Mertes <mertes@in.tum.de> * updated travis to use correct R and wbuild versions * use github release badge instead of fixing it Co-authored-by: Michaela Müller <mi.mueller@tum.de> Co-authored-by: Vicente <yepez@in.tum.de> Co-authored-by: Christian Mertes <mertes@in.tum.de>
gagneurlab · Oct 17, 2020 · 5ce58f4 · 5ce58f4
1 parent 8a493cf
commit 5ce58f4
Show file tree

Hide file tree

Showing 89 changed files with 1,817 additions and 1,434 deletions.
diff --git a/.gitignore b/.gitignore
@@ -9,6 +9,7 @@ Output/
 .ipynb_checkpoints*
 __pycache__*
 *.egg-info*
+.eggs*
 dist/*
 
 # typical latex tmp files

diff --git a/.travis.yml b/.travis.yml
@@ -17,18 +17,17 @@ install:
   - conda config --set always_yes yes --set changeps1 no
   - conda update -q conda
 
-  - conda config --add channels bioconda
-  - conda config --add channels conda-forge
-
-  # build package with cond
-  - conda install conda-build
-  - conda build conda.recipe --output-folder=$HOME/build
-  - conda config --add channels "file://${HOME}/build"
-
-  # test package
+  # install dependencies
   - source $HOME/miniconda/etc/profile.d/conda.sh
-  - conda create -q -n drop drop_travis
+  - conda create -q -n drop -c conda-forge -c bioconda python=$TRAVIS_PYTHON_VERSION r-base=4.0.2
   - conda activate drop
+  - conda install -c conda-forge -c bioconda drop
+  - conda remove --force drop wbuild
+  - conda install -c conda-forge -c bioconda wbuild=1.7.1
+  - pip install -r requirements_test.txt
+
+  # install drop
+  - pip install . -vv
 
 script: 
   - conda list 
@@ -37,10 +36,7 @@ script:
   - samtools --version
   - bcftools --version
   - drop --version
+  - wbuild --version
   - python --version
-
-  - mkdir drop_demo
-  - cd drop_demo
-  - drop demo
-  - snakemake -n
-  - snakemake --jobs 2 --cores 2
+
+  - pytest -vv -s
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Detection of RNA Outlier Pipeline
 [![Pipeline status](https://travis-ci.org/gagneurlab/drop.svg?branch=master)](https://travis-ci.org/gagneurlab/drop)
-[![Version](https://img.shields.io/badge/Version-0.9.2-green.svg)](https://github.com/gagneurlab/drop/releases/tag/0.9.2)
+[![Version](https://img.shields.io/github/v/release/gagneurlab/drop?include_prereleases)](https://github.com/gagneurlab/drop/releases)
 [![Version](https://readthedocs.org/projects/gagneurlab-drop/badge/?version=latest)](https://gagneurlab-drop.readthedocs.io/en/latest)
 
 The manuscript main file, supplementary figures and table can be found in the manuscript folder or in 
@@ -51,7 +51,8 @@ snakemake aberrantExpression -n
 ```
 
 ## Datasets
-The following publicly-available datasets of gene counts can be used as controls:
+The following publicly-available datasets of gene counts can be used as controls.
+Please cite as instructed for each dataset.
 
 * 119 non-strand specific fibroblasts: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3887451.svg)](https://doi.org/10.5281/zenodo.3887451)
 

diff --git a/conda.recipe/meta.yaml b/conda.recipe/meta.yaml
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -23,7 +23,7 @@
 author = 'Michaela Müller'
 
 # The full version, including alpha/beta/rc tags
-release = '0.9.2'
+release = '1.0.0'
 
 
 # -- General configuration ---------------------------------------------------

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -101,7 +101,7 @@ Alternatively, DROP can be installed without ``conda``. In this case the followi
 
   * `python <https://www.python.org/>`_ >= 3.6 and `pip <https://pip.pypa.io/en/stable/installing/>`_ >= 19.1
 
-  * `R <https://www.r-project.org/>`_ >= 3.6 and corresponding `bioconductor <https://bioconductor.org/install/>`_ version
+  * `R <https://www.r-project.org/>`_ >= 3.6, <=4.0.2 and corresponding `bioconductor <https://bioconductor.org/install/>`_ version
 
 * Commandline tools:
 

diff --git a/docs/source/prepare.rst b/docs/source/prepare.rst
@@ -39,15 +39,13 @@ Parameter            Type        Description
 ===================  ==========  =======================================================================================================================================  ======
 projectTitle         character   Title of the project to be displayed on the rendered HTML output                                                                         ``Project 1``
 htmlOutputPath       character   Full path of the folder where the HTML files are rendered                                                                                ``/data/project1/htmlOutput``
-indexWithFolderName  boolean     variable needed for wBuild, do not edit it                                                                                               ``true``
-fileRegex            character   variable needed for wBuild, do not edit it                                                                                               ``.*\.R``
+indexWithFolderName  boolean     If true, the basename of the project directory will be used as prefix for the index.html file                                            ``true``
 genomeAssembly       character   Either hg19 or hg38, depending on the genome assembly used for mapping                                                                   ``/data/project1``
 sampleAnnotation     character   Full path of the sample annotation table                                                                                                 ``/data/project1/sample_annotation.tsv``
 root                 character   Full path of the folder where the subdirectories processed_data and processed_results will be created containing DROP's output files.    ``/data/project1``
 geneAnnotation       dictionary  A key-value list of the annotation name (key) and the full path to the GTF file (value). More than one annotation file can be provided.  ``anno1: /path/to/gtf1.gtf``
 
                                                                                                                                                                           ``anno2: /path/to/gtf2.gtf``
-scanBamParam         character   Either null or the path to an Rds file containing a scanBamParam object. Refer to the advanced options below.                            ``/path/to/scanBamParam.Rds``
 tools                dictionary  A key-value list of different commands (key) and the command (value) to run them                                                         ``gatkCmd: gatk``
 
                                                                                                                                                                           ``bcftoolsCmd: bcftools``
@@ -126,10 +124,19 @@ qcGroups               list       Same as “groups”, but for the VCF-BAM matc
 Creating the sample annotation table
 ------------------------------------
 
-For details on how to generate the sample annotation, please refer to the DROP manuscript. 
-Here we provide some examples on how to deal with certain situations. For simplicity, we
-do not include the other compulsory columns ``PAIRED_END``, ``COUNT_MODE``,
-``COUNT_OVERLAPS`` and ``STRAND``.
+For a detailed explanation of the columns of the sample annotation, please refer to
+the DROP manuscript. 
+Inside the sample annotation, each row corresponds to a unique pair of RNA and DNA
+samples derived from the same individual. An RNA assay can belong to one or more DNA
+assays, and vice-versa. If so, they must be specified in different rows. The required
+columns are ``RNA_ID``, ``RNA_BAM_FILE`` and ``DROP_GROUP``, plus other module-specific
+ones (see DROP manuscript). In case external counts are included, add a new row for each
+sample from those files (or a subset if not all samples are needed).
+
+The sample annotation file should be saved in the tab-separated values (tsv) format. The 
+column order does not matter. Also, it does not matter where it is stored, as the path is 
+specified in the config file. Here we provide some examples on how to deal with certain
+situations. For simplicity, we do not include all possible columns in the examples.
 
 Example of RNA replicates 
 ++++++++++++++++++++++++++++++++++
@@ -144,22 +151,41 @@ S10R_M  S10G    MUSCLE      /path/to/S10R_M.BAM  /path/to/S10G.vcf.gz
 Example of DNA replicates 
 ++++++++++++++++++++++++++++++++++
 
-======  ======  ==========  ===================  ==
-RNA_ID  DNA_ID  DROP_GROUP  RNA_BAM_FILE         DNA_VCF_FILE
-======  ======  ==========  ===================  ==
-S20R    S20E    WES         /path/to/S20R.BAM    /path/to/S20E.vcf.gz
-S20R    S20G    WGS         /path/to/S20R.BAM    /path/to/S20G.vcf.gz
-======  ======  ==========  ===================  ==
+======  ======  ==========  =================  ==
+RNA_ID  DNA_ID  DROP_GROUP  RNA_BAM_FILE       DNA_VCF_FILE
+======  ======  ==========  =================  ==
+S20R    S20E    WES         /path/to/S20R.BAM  /path/to/S20E.vcf.gz
+S20R    S20G    WGS         /path/to/S20R.BAM  /path/to/S20G.vcf.gz
+======  ======  ==========  =================  ==
 
 Example of a multi-sample vcf file
 ++++++++++++++++++++++++++++++++++
 
-======  ======  ==========  ===================  ==
-RNA_ID  DNA_ID  DROP_GROUP  RNA_BAM_FILE         DNA_VCF_FILE
-======  ======  ==========  ===================  ==
-S10R    S10G    WGS         /path/to/S10R.BAM    /path/to/multi_sample.vcf.gz
-S20R    S20G    WGS         /path/to/S20R.BAM    /path/to/multi_sample.vcf.gz
-======  ======  ==========  ===================  ==
+======  ======  ==========  =================  ==
+RNA_ID  DNA_ID  DROP_GROUP  RNA_BAM_FILE       DNA_VCF_FILE
+======  ======  ==========  =================  ==
+S10R    S10G    WGS         /path/to/S10R.BAM  /path/to/multi_sample.vcf.gz
+S20R    S20G    WGS         /path/to/S20R.BAM  /path/to/multi_sample.vcf.gz
+======  ======  ==========  =================  ==
+
+External count matrices
++++++++++++++++++++++++
+
+In case counts from external matrices are to be integrated into the analysis,
+the file must be specified in the GENE_COUNTS_FILE column. A new row must be
+added for each sample from the count matrix that should be included in the 
+analysis. An RNA_BAM_FILE must not be specified. The DROP_GROUP of the local
+and external samples that are to be analyzed together must be the same.
+Similarly, the GENE_ANNOTATION of the external counts and the key of the `geneAnnotation`
+parameter from the config file must match.
+
+======  ======  ==========  =================  ==
+RNA_ID  DNA_ID  DROP_GROUP  RNA_BAM_FILE       GENE_COUNTS_FILE
+======  ======  ==========  =================  ==
+S10R    S10G    BLOOD       /path/to/S10R.BAM  
+EXT-1R          BLOOD                          /path/to/externalCounts.tsv.gz
+EXT-2R          BLOOD                          /path/to/externalCounts.tsv.gz
+======  ======  ==========  =================  ==
 
 
 Advanced options
@@ -183,12 +209,4 @@ We recommend the search space to be at most N/3 for the aberrant expression,
 and N/6 for the aberrant splicing case. Nevertheless, the user can specify the 
 denominator with the parameter ``maxTestedDimensionProportion``.
 
-In order to influence which fields of the BAM files are imported, the user can 
-provide a ``scanBamParam`` object. This will affect how the files are counted in 
-the aberrant expression and splicing modules. Refer to the function's 
-`documentation <https://www.rdocumentation.org/packages/Rsamtools/versions/1.24.0/topics/ScanBamParam>`_ for details.
-
-
-
-
 
diff --git a/drop/GeneticDiagnosis_Demo.R b/drop/GeneticDiagnosis_Demo.R
diff --git a/drop/__init__.py b/drop/__init__.py
@@ -1,21 +1,3 @@
-from .setupDrop import setupDrop as drop
-from .configHelper import ConfigHelper as config
-from .submodules import *
-
-def init():
-    wbuild.cli.init()
-    # compy our template
-
-def update():
-    wbuild.cli.update()
-
-if __name__ == '__main__':
-    import sys
-    import wbuild
-
-    arg = sys.args[1]
-    if arg == 'init':
-        init()
-    elif arg == 'update':
-        update()
-
+from .setupDrop import *
+from . import config
+from . import utils